informetric mapping of “big data” in fi-ware · big data, fi-ware, ... introduction today,...

2
Informetric Mapping of “Big Data” in FI-WARE Elio Villaseñor Found of Information and Documentation for the Industry Avenida San Fernando 37, Toriello Guerra, Tlalpan, 14050, Mexico City 00 52 55 5624 2800 [email protected] Hugo Estrada Found of Information and Documentation for the Industry Avenida San Fernando 37, Toriello Guerra, Tlalpan, 14050, Mexico City 00 52 55 5624 2800 [email protected] ABSTRACT Today, governmental entities are embracing new trends in information technology. One of the technological developments that generate more excitement is Big Data; because this technology let us analyze the huge amount of information produced by the government and is useful for decisions making. On the other hand, the Future Internet platform of the European Community (FI- WARE) is one of the most powerful trends around the world and has aroused more interest in governments. This technology is based on a set of Generic Enablers (GE) for various applications, including Big Data. The FI-WARE is a platform under construction and knowing how this process performed is essential to join in this monumental effort and take advantages of its benefits. This document presents the results of the application of text and data mining techniques as well as informetric mapping to gain understanding regarding the development of Big Data technology present in the FI- WARE. Categories and Subject Descriptors H.2.8 [Database Applications]: Data mining. General Terms Documentation, Experimentation, Security, Human Factors, Standardization Keywords Big Data, FI-Ware, Generic Enablers, Informetric Analysis. 1. INTRODUCTION Today, governments are increasingly making use of information technologies to improve the services offered to citizens. One technology of impact is Big Data because of its ability to analyze government data and get this useful information for citizens. Governments produce a huge amount of information that in most cases is not used properly for decisions making. The term Big Data is relatively new but its impact and the disruptive capacity of technologies with this concept are already a reality. In the European Community is giving a big boost to both Big Data and the technologies associated with the Internet of the Future, which, besides Big Data, consider technologies like the Internet of Things, Cloud Computing, Digital Media or Augmented Reality. In this work is presented an informetric analysis that takes as input information the web pages that describes the Generics Enablers relating to Big Data and generates a network analysis of authors of GE, clustering of GE according to the concepts extracted from textual, and finally the trends in the development activity of GE. 2. FI-WARE and Generic Enablers Research on Future Internet (FI) has become a strategic priority around the world, with major national initiatives in countries like the United States, Korea, Japan, China and the European Community. These initiatives have led to development platforms that allow validation of new concepts, technologies, business models, applications and IF services in big scale [1]. One of the most solid proposals in this area is the Program Future Internet Public Private Partnership (FI- PPP) which is a program of public- private cooperation in the field of FI technologies which is funded by the European Commission involving more than 152 European companies and organizations, with a budget of 600 million euros has allowed the union of the most relevant actors in research and ICT industry in order to position Europe in the leading place. The objective of FI- PPP can be translated in the conceptualization, design and construction of platforms of open-source software to allow the development of services in a number of areas that are not necessarily purely ICT, such as: energy, logistics, software platforms, transportation, smart cities, environmental data, content, mobility - security, and, in the future health and tourism. 3. Informetric Analysis and Mapping of FI- WARE Wiki Web Pages Currently there are tools that allow semi-automatically the massive recovery and analysis of big sets of digital documents, including web pages. From this analysis it is possible to establish and quantify relations between entities such as documents, authors, semantic entities, etc.; and analyze their temporal behavior. Furthermore, by using visualization tools is possible to build visual representations of the results of infometric analysis (Informetric Mapping). This section shows the informetric mapping of a set of web pages retrieved from the portal Wiki FI- WARE that are related to Big Data. 3.1 Retrieval and Preproscesing of Generic Enablers Web Pages At the site http://forge.fi-ware.org is allocated the collaborative platform that tracks the GE to be developed and integrated to FI- WARE. This has a search engine that retrieves different types of Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). dg.o '14, Jun 18-21 2014, Aguascalientes, Mexico ACM 978-1-4503-2901-9/14/06. http://dx.doi.org/10.1145/2612733.2619954

Upload: lamthu

Post on 25-Apr-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Informetric Mapping of “Big Data” in FI-WARE

Elio Villaseñor Found of Information and Documentation for the Industry

Avenida San Fernando 37, Toriello Guerra, Tlalpan, 14050, Mexico City 00 52 55 5624 2800

[email protected]

Hugo Estrada Found of Information and Documentation for the Industry

Avenida San Fernando 37, Toriello Guerra, Tlalpan, 14050, Mexico City 00 52 55 5624 2800

[email protected] ABSTRACT Today, governmental entities are embracing new trends in information technology. One of the technological developments that generate more excitement is Big Data; because this technology let us analyze the huge amount of information produced by the government and is useful for decisions making. On the other hand, the Future Internet platform of the European Community (FI- WARE) is one of the most powerful trends around the world and has aroused more interest in governments. This technology is based on a set of Generic Enablers (GE) for various applications, including Big Data. The FI-WARE is a platform under construction and knowing how this process performed is essential to join in this monumental effort and take advantages of its benefits. This document presents the results of the application of text and data mining techniques as well as informetric mapping to gain understanding regarding the development of Big Data technology present in the FI- WARE. Categories and Subject Descriptors H.2.8 [Database Applications]: Data mining.

General Terms Documentation, Experimentation, Security, Human Factors, Standardization

Keywords Big Data, FI-Ware, Generic Enablers, Informetric Analysis.

1. INTRODUCTION Today, governments are increasingly making use of information technologies to improve the services offered to citizens. One technology of impact is Big Data because of its ability to analyze government data and get this useful information for citizens. Governments produce a huge amount of information that in most cases is not used properly for decisions making. The term Big Data is relatively new but its impact and the disruptive capacity of technologies with this concept are already a reality. In the European Community is giving a big boost to both Big Data and the technologies associated with the Internet of the

Future, which, besides Big Data, consider technologies like the Internet of Things, Cloud Computing, Digital Media or Augmented Reality. In this work is presented an informetric analysis that takes as input information the web pages that describes the Generics Enablers relating to Big Data and generates a network analysis of authors of GE, clustering of GE according to the concepts extracted from textual, and finally the trends in the development activity of GE.

2. FI-WARE and Generic Enablers Research on Future Internet (FI) has become a strategic priority around the world, with major national initiatives in countries like the United States, Korea, Japan, China and the European Community. These initiatives have led to development platforms that allow validation of new concepts, technologies, business models, applications and IF services in big scale [1]. One of the most solid proposals in this area is the Program Future Internet Public Private Partnership (FI- PPP) which is a program of public-private cooperation in the field of FI technologies which is funded by the European Commission involving more than 152 European companies and organizations, with a budget of 600 million euros has allowed the union of the most relevant actors in research and ICT industry in order to position Europe in the leading place. The objective of FI- PPP can be translated in the conceptualization, design and construction of platforms of open-source software to allow the development of services in a number of areas that are not necessarily purely ICT, such as: energy, logistics, software platforms, transportation, smart cities, environmental data, content, mobility - security, and, in the future health and tourism.

3. Informetric Analysis and Mapping of FI-WARE Wiki Web Pages Currently there are tools that allow semi-automatically the massive recovery and analysis of big sets of digital documents, including web pages. From this analysis it is possible to establish and quantify relations between entities such as documents, authors, semantic entities, etc.; and analyze their temporal behavior. Furthermore, by using visualization tools is possible to build visual representations of the results of infometric analysis (Informetric Mapping). This section shows the informetric mapping of a set of web pages retrieved from the portal Wiki FI-WARE that are related to Big Data.

3.1 Retrieval and Preproscesing of Generic Enablers Web Pages At the site http://forge.fi-ware.org is allocated the collaborative platform that tracks the GE to be developed and integrated to FI-WARE. This has a search engine that retrieves different types of

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). dg.o '14, Jun 18-21 2014, Aguascalientes, Mexico ACM 978-1-4503-2901-9/14/06. http://dx.doi.org/10.1145/2612733.2619954

web pages, including the covers of the GE. Using the search strategy "Big Data", 50 GE covers are recovered. These in turn have links to the short description and history update. By using OpenRefine [2] the html code of all web pages are downloaded and extracted information related to: GE name, code name developer and the history of updates. In addition to using the API dataTXT-NEX [3] for the named entities extracted from textual fields: Goal, Description and Rationale. This information was processed using the Excel macro Toolinf [4] to calculate different informetric indicators.

3.2 Grups of Develoment Using the developer’s codename the coauthor ship network is analyzed to establish development groups; this information is supplemented with the name of GE´s to construct the graph shown in Figure 1 (the graphs shown in Figures 1 and 2 are made by the NodeXL Excel template [5]). As can be seen, these groups are isolated. So we can infer that each of these 11 groups is specialized in certain types of GE. This graph clearly shows the way the development groups are structured and the GE that are developing. The colors of the nodes indicate membership in a class that is obtained by considering the semantic characterization of GE and by applying a clustering method. The procedure for obtaining these classes is detailed in the following section.

Figure 1: Network of Developers (Spheres) and GE

(Triangles).

3.3 Semantic Characterization and Trend Analysis Using the extracted named entities terms are semantically characterized identify each of the GE. This characterization allows to establish relations between developments, even if they are not performed by the same group. By applying clustering algorithm Clauser-Newman-Moore 6 clusters that are semantically characterized were identified. Figure 2 shows a mapping of these clusters, these GE and more frequent terms occurring in the text snippets of your web pages are mapped displays. Analyzing the composition of each cluster they can be characterized as follows(G1) Hadoop Technology, (G2) Big Data Streaming and Multimedia, (G3) Users and Infrastructure Management, (G4) Software Development and Deployment, (G5) Network Monitoring and Improvement and (G6) Integration with Cloud Chapter. Using information update history of the GE n- grams for each cluster are built. These indicate the level of activity of each of the classes of GE in terms of number of updates per year (see Figure 3)

Figure 2: Semantic Clusters of GE.

Figure 3: Trends of activity in the development of GE classes.

4. CONCLUTIONS AND FUTURE WORK As a result of these analyzes, it was found that:

• The development is carried out by isolated groups specializing in different aspects.

• In the first year, the main efforts were focused on the development of related Big Data Streaming and Multimedia methods.

• Current efforts are directed toward developing the components of Hadoop and related Technology; as well as Users and Infrastructure Management.

• The GE of emerging activity are related to the development and deployment software.

• Network Monitoring and Improvement and Integration with Cloud Chapter, present less activity than the rest.

As future work we propose to use this approach all the GE FI- WARE platform.

5. REFERENCES [1] XIPI, «XIPI. Shining a light on infrastructures for the benefit

of the Future Internet community». [En línea]. Disponible en: http://www.xipi.eu/es [Last access: 25-abr-2014].

[2] Downloaded from: http://openrefine.org/ [3] Downloaded from: https://dandelion.eu [4] Guzmán M.V., Villaseñor E.A, Carrillo H., Jimenez J.L.

2005. ViBlioSOM: Aplicaciones en MEDLINE, Ediciones Finlay, La Habana, Cuba.

[5] Downloded from: http://nodexl.codeplex.com/ [6] Clauset, A., Newman, M. E., & Moore, C. 2004. Finding

community structure in very large networks. Physical review E, 70(6).