[ieee 2011 international conference on multimedia technology (icmt) - hangzhou, china...

4

Click here to load reader

Upload: alcione

Post on 15-Apr-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2011 International Conference on Multimedia Technology (ICMT) - Hangzhou, China (2011.07.26-2011.07.28)] 2011 International Conference on Multimedia Technology - Searching metadata

Searching Metadata in Spatial Data Infrastructures for Urban Planning at Local Level

Odilon Corrêa Silva Departamento de Informática

Centro Federal de Educação Tecnológica de Minas Gerais Leopoldina, Brazil

[email protected]

Thiago Silva Miranda Departamento de Informática

Centro Universitário do Leste de Minas Gerais Coronel Fabriciano, Brazil

[email protected]

Jugurta Lisboa-Filho, Alcione de Paiva Oliveira Departamento de Informática

Universidade Federal de Viçosa Viçosa, Brazil

[email protected], [email protected] Abstract – Geographical information is a key resource for urban planning and management of public law. Spatial Data Infrastructures (SDI) enable cooperation and sharing of geospatial data, but the open and distributed character of these infrastructures is a factor that hampers the interaction and contribution from the citizen to those data. This paper presents a proposal supported by Terminological Ontologies, seeking to minimize some problems related to semantics at the search for information in metadata catalogs in a SDI, at a local level.

Keywords- Spatial Data Infrastructure, Urban planning, Metadata, Terminological ontologies

I. INTRODUCTION Maintenance, management, planning and political decisions

that directly affect the citizen’s life must be activities performed continuously by the municipal public administration [1]. Adequate tools must be provided so that the government agencies can play their role with efficiency and focus on the population needs. In this sense, one must know the reality of the municipality and identify the areas in most need and forms of intervention, in addition to assessing the impacts and monitoring the results.

For all those tasks, Geographical Information (GI) is essential. The urban planning of a region requires, at first, cognizance of the register of the areas built, the transportation network, water and sewage, the public services, the areas of preservation, among others. The process of acquisition and update of the geographical data utilized in the urban planning performed by mapping agencies is expensive, for it needs substantial financial resources and professionals specialized in production of maps and spatial data. At a local level, the situation is similar, since they comprise the activities of maintenance and evolution of their geographical base [2].

In order to minimize those problems in the municipal scenario, we sought an alternative for the development and upkeep of the geographic collection, which, based on the aspects of voluntary participation, characterizes the concept of Volunteered Geographic Information (VGI) [3]. In this sense, it is expected that the set of geographical data and metadata

available in a Municipal SDI be accessible and updated voluntarily, through the participation of its citizens, allowing the community to contribute with information about the infrastructure of the city.

In this sense, the SDIs will have a bigger audience, and the semantic understanding may hamper the users’ interaction, since the world views influence, for example, the metadata query. In general, search on an SDI is made mainly on queries that use keywords, coordinates, thematic or temporal classification. This approach presents several difficulties, e.g. which suitable keywords to use?

This paper presents a proposal for a search engine that aims to help the user understand the context that will be searched for. The engine backed by terminological ontology, topic maps and visual techniques is part of an Information Retrieval Systems (IRS) in SDI metadata catalogs. As a result, a functional prototype was developed to validate the system.

II. SDI AND KNOWLEDGE REPRESENTATION The focus, in the concept of SDI, is a natural consequence

of the evolution of the web, and its architecture, before the guidelines that have been drawn by its normalizing organ, the World Wide Web Consortium (W3C) [5]. Therefore, the semantic understanding must be a requisite for an SDI that intends to minimize the problems of Information Retrieval (IR).

Among the components that compose an SDI, the metadata catalog is considered a key component – its use enables us to relate data by means of a classification that is common among the many types of data available. Besides, ontologies can be utilized in order to improve the integration and sharing of these data.

Ontology is the explicit and formal specification of a shared conceptualization of a domain of interest [6]. The ontologies can be classified in three types: Formal Ontologies, Ontologies Based in Prototypes and Terminological Ontologies [7] – this last one will be described further, as it is utilized in this paper.

6524978-1-61284-774-0/11/$26.00 ©2011 IEEE

Page 2: [IEEE 2011 International Conference on Multimedia Technology (ICMT) - Hangzhou, China (2011.07.26-2011.07.28)] 2011 International Conference on Multimedia Technology - Searching metadata

Terminological Ontologies do not need toby axioms and definitions like formal ontolopartially specified by relations such as subpart/whole, which determine the relativeconcepts regarding the others, but which docompletely (Figure 1). The difference betweeand a formal ontology is rather of degree thmore axioms added to a terminological ontogets to being a formal ontology [7].

Figure 1. Example of Terminological O

The representation and management of korganize and optimize repositories so that tretrieve information effectively. Accordinspeaking about knowledge structure is spemaps. Topic maps is an international normdescribe knowledge structures and formalizewith information resources [9]. Thus, topic ma paradigm that enables organization information stored in an SDI.

Ahmed [10] presents a pattern of projecThis pattern can be used for representinontologies in topic maps, and can be appliethe creation of a mechanism that enableindexation of the terms of a terminologikeywords of geospatial metadata, generaFigure 2 illustrates this approach, which specifications of IRS in metadata catalogspaper.

Figure 2. Elements of Topic maps, Terminological O

III. INFORMATION RETRIEV

The search for information starts with thThe gap between the knowledge the user has or topic and what the user needs for solconstitutes the need for information [11].

Visual techniques of search facilitate theof the environment [12]. For the authors, the

o be fully specified ogies, but they are btype/supertype or position of the o not define them en a terminological han of content, the ology, the closer it

Ontology

knowledge seeks to the user is able to ng to Rath [8],

eaking about topic m (ISO 13250) to e their association

maps can be seen as and retrieval of

ct for topic maps. ng terminological

ed, for instance, in es processing and ical ontology and ating topic maps.

backed the first s proposed in this

ntology and Metadata

VAL he user’s problem. about the problem

lving the problem

e user’s perception navigation feature

is essential, since it helpsrepresentation of the environmexample of technology for visuon the focus+context techniquefor the graphic representation the user with more ease of visuinformational content, withoutthe term desired. When utirepresent topic maps, the topicfocus seems bigger. At the peripheral topics increases expdislocated. Due to these featualong with terminological ontogood alternative for the visualizdata, e.g. metadata catalogs in a

IV. IRS S

The purpose of this study idifficulties at retrieving informSDI. It is expected not only to mbut also to associate more squerying metadata catalogs. Fastudy of technical viability wasmethods, concepts and technoapplication is towards the needspecification and development can be applied in SDI of superio

The services provided bygenerating, consulting and assimplement the services, the were specified. The system cprovide the services by meapplications (one for the admina module of persistency and tthe system’s architecture. relationships are illustrated infollows:

• Resource base: reposcollected from the resource consists of thand part of its content;

Figure 3. Diagram of

s the user build their own ment. The hyperbolic tree is one

ualization and navigation based e. This technique can be utilized of topic maps, which provides

ualization and navigation for the t their needing to know exactly ilizing the hyperbolic tree to c positioned by the user on the navigation, the emergence of

ponentially as the topic maps is ures, the hyperbolic navigation, ologies and topic maps, shows a zation and retrieval of big sets of an SDI.

PECIFICATION is to provide alternatives for the

mation in metadata catalogs in an minimize part of these problems, semantics to the operations of acing the initial needs arisen, a s guided to the existing patterns, logies. The specification of the

ds of the local SDI. However, its have been conducted so that it

or levels (regional and global).

y the application are based in sessing a database. In order to IRS’s architecture components consists of three modules that ans of interface classes, two

nistration and another for query), three data repositories compose The components and their

n Figure 3 and described as it

sitory that stores the resource SDI’s metadata catalog. The

he identification of the metadata

f the system components

6525

Page 3: [IEEE 2011 International Conference on Multimedia Technology (ICMT) - Hangzhou, China (2011.07.26-2011.07.28)] 2011 International Conference on Multimedia Technology - Searching metadata

• Semantic base: repository that stores the semantic content. Its structure was developed according to the specification of the framework for the representation of terminological ontologies [13];

• Indexed base: repository that stores the result of the indexation process between the semantic and resource bases. Its structure follows the specifications of ISO 13250 [9] for the making of topic maps;

• Module of generation of indexed base: module responsible for implementing an interface with the application of administration and managing the process of generation of indexed base as a whole;

• Module of indexed base query: responsible for implementing the interface by means of an application of query and managing the process of query to the database;

• Module of assessment of the indexed base: responsible for implementing an interface with application of administration and managing the process of assessment of the indexed base;

• Application of administration: application responsible for turning the features of the modules of generation and assessment of indexed base into something that the user can understand and, therefore, interact with the system;

• Application of query: application that presents the indexed base by means of a hyperbolic tree, where the user can navigate, choose terms of their interest and, finally, process their search.

V. IMPLEMENTATION AND FUNCTIONING OF THE IRS A prototype of the IRS proposed was implemented in the

Java language. The Relational Database Management System (RDBMS) PostgreSQL was utilized to store part of the data necessary for the functioning of the system. The access and manipulation of the content of XML documents are provided by the DOM API. For the generation of the hyperbolic tree, the Java Treebolic package was utilized.

The functioning of the IRS is divided in three steps, as it is seen in Figure 3. The first step is to import the metadata catalog and the terminological ontology for the resource and semantic bases, respectively. Next, perform the routine of generation of indexed base. Lastly, the indexed base becomes available for query. The tests conducted with the system proposed solved problems that were relatively simple, yet sufficient to illustrate the possibilities of utilizing it in real scenarios.

Before generating the indexed base, it is necessary to import the metadata and ontology for the resource and semantic bases, respectively. Table 1 presents an example of the Resource Base content after the importation of the metadata, where the Items column holds the identifier of the metadata in the IRS’s catalog and the Resources column holds its keywords.

Figure 4. Steps of the IRS

TABLE I. RESOURCE BASE CONTENT

Items Resources

1 "Subway Station", "Viçosa", "Brazil"

2 "Sanitation", "Viçosa", "Brazil"

3 "Health", "Emergency Medical Transportation", "Viçosa", "Brazil"

4 "Bus Station", "Viçosa", "Brazil"

5 "Rodoviária", "Viçosa", "Brazil"

6 "Bus Stops", "Viçosa", "Brazil"

Table 2 illustrates how the terms and their ontology’s attributes are stored in the semantic base. The types of attributes are indentified by: EQ – equivalent term; TE – specific term; TG – generic term; TR – related term; USE – term used by.

TABLE II. SEMANTIC BASE CONTENT

Terms Attributes

Transportation

"TG: Configurational Urban Systems", "TE: Telephony", "TE: Sanitation", "TE: Lighting", "TE: Health", "TE: Bus Station", "TE: Bus Stops", "EQ: Transporte"

Bus Station "TR: Bus Stops", "TG: Transportation", "EQ: Rodoviária"

The second functioning stage consists of generating the Indexed Base. This procedure extracts terms and relationships of the semantic base (ontology), contents of the resource base (metadata catalogs) and generates the indexed base (topic maps) with terms, relationships and occurrences among the resource’s terms and contents. Finally, the indexed base is available for queries and analyses. Queries were submitted to the IRS and to the standard interface of the SDI, which enabled the comparison between the two mechanisms. Figure 5 illustrates the interface of the prototype’s query.

An interesting piece of information that can be shown to the user is the description of the term “ontology”. The interface of the prototype’s query displays this information when the user places the mouse cursor on the term. Another characteristic can be seen in the indexed terms (e.g. “Bus Station” and “Subway Station”), which have their color altered regarding a non-indexed term (e.g. “Airport”). This feature, in addition to

6526

Page 4: [IEEE 2011 International Conference on Multimedia Technology (ICMT) - Hangzhou, China (2011.07.26-2011.07.28)] 2011 International Conference on Multimedia Technology - Searching metadata

avoiding unnecessary queries, facilitates its identification. The link between the related terms (e.g. “Bus Station” and “Bus Stops”) is dotted, being set apart from the other relationships.

The first test simulates a situation in which only the term “Bus Station” is given, with no scope criteria. The search results in two resources (Items 4 and 5 from Table 1). By analyzing the result with the contents of Table 1 and Table 2, it is possible to verify that the multilanguage factor was taken into account at the search. The same term given as a parameter at the option “keywords” of the SDI interface, results in only one identifier (Item 4 from Table 1).

At the second, the same previous term was used, and as scope criterion the option that refers to its related terms. The search results in three resources (Items 4, 5 and 6 from Table 1). By analyzing the result with the content of Table 1 and Table 2, it is possible to verify that the scope criterion is utilized at the search. Since the term was not altered, the result shown on the SDI interface is the same as in the first test.

The third test simulated a situation in which we come across a term ("Transportation") that was not indexed with its criterion of scope that refers to its specialized terms. The search results in four resources (Items 1, 4, 5 and 6 from Table 1). By analyzing the result, according to the content from Tables 1 and 2, it is possible to verify that, in addition to the criterion of scope, the Multilanguage factor was utilized at the search. Following the procedure of the previous tests, the term was informed at the “keyword” option from the other interface and resulted in only one resource (Item 3 from Table 1). It was not possible to analyze the implementation of the search engine of this SDI interface, but for analogy with other tests performed, it is believed that the engine recognized the occurrence of the term “Transportation” in the keywords “Emergency Medical Transportation”.

Figure 5. Query Interface

VI. CONCLUSION From the studies and research performed, a IRS in metadata

catalogs was specified. The paper described a search engine that uses terminological ontologies, topic maps and visual techniques. These artifacts contribute to the understanding of the context to be consulted by the user and influence the search engine. Besides that, they can facilitate the voluntary interaction of the citizen with the geographical data, allowing the community to contribute with the urban planning, by means of information about the city’s infrastructure.

The IRS’s assessment seeks to show solutions for some semantic questions, which distinguish this proposal from a regular engine search of an SDI. It is not an objective to discuss the criteria related to the usage of the interfaces presented, which would not be within the scope of this paper, and which would demand research of the techniques utilized at “Interface Man-Computer” discipline. What set the two interfaces apart, and were taken into account at the development of the IRS were the features that help and contribute to the understanding of the context that will be consulted.

ACKNOWLEDGMENT This project was partially funded by the FAPEMIG.

REFERENCES [1] F. Carrera, J. Ferreira jr. The future of spatial data infrastructures:

capacity-building for the emergence of municipal SDIs. International Journal of Spatial Data Infrastructures Research, v. 2, p. 54-73, 2007.

[2] M. F. Goodchild. Citizens as voluntary sensors: spatial data infrastructure in the world of Web 2.0. International Journal of Spatial Data Infrastructures Research, v. 2, p. 24-32, 2007.

[3] M. F. Goodchild. Commentary: whither VGI? GeoJournal, v. 72, n. 3-4, p. 239-244, 2008a.

[4] H. Hochmair. Ontology matching for spatial data retrieval from Internet portals. In: GEOSPATIAL SEMANTICS – LNCC 3799, 2005, Mexico City. Proceedings... Mexico City, Mexico, 2005. p. 166-182.

[5] C. A. JR. Davis, L. L Alves. Local spatial data infrastructures based on a service-oriented architecture. In: BRAZILIAN SYMPOSIUM ON GEOINFORMATICS. Proceedings... [S.l. : s.n.], 2005. p. 30-45.

[6] T. Gruber. Towards principles for the design of ontologies used for knowledge sharing. Presented at the Padua workshop on Formal Ontology, March 1993, later published in International Journal of Human-Computer Studies, Vol. 43, Issues 4-5, Novembro 1995.

[7] J. Sowa, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brookes/Cole, 2000. ISBN 0-534-94965-7.

[8] H. H. Rath. The topic maps handbook. Gütersloh, Germany: Empolis, 2003. Savoy, J.; Picard, J. Retrieval effectiveness on the Web. In: Information Processing and Management, v. 37, n. 4, p. 543-569, 1998.

[9] ISO 13250. Topic maps. 2. ed. [S.l.]: International Organization for Standardization (ISO), 2002.

[10] L. M. Garshol. Metadata? Thesauri? Taxonomies? Topic maps! Making sense of it all. J. of Information Science, v. 30, n. 4, p. 378-391, 2004.

[11] C. Kuhlthau. Inside the search process: information seeking from the user´s perspective. Journal of the American Society for Information Science, v. 42, n. 5, p. 361-371, 1990.

[12] B. L. Grand, M. Soto. Visualisation of the semantic Web: topic maps visualisation. In: International Conference on Information Visualisation (IV’02), 6., p. 344-349, 2002.

[13] J. L. Miguel. Contributions to the problem of knowledge management in Spatial Data Infrastructures. 2009. 215 f. Ph.D. Dissertation (Computer Science and Systems Engineering Department) – Universidad de Zaragoza, Spain.

6527