development of ontological information for agriculture in ... · pdf fileontology and...

5
IAALD AFITA WCCA2008 WORLD CONFERENCE ON AGRICULTURAL INFORMATION AND IT Development of Ontological Information for Agriculture in Thailand Masahiko Nagai 1 , Teerayut Horanont 1 , Thepchai Supnithi 2 , Asanee Kawtrakul 3 , Kulapramote Prathumchai 4 , Ryosuke Shibasaki 1 1 The University of Tokyo, Japan, [email protected] 2 National Electronics and Computer Technology Center, Thailand 3 Kasetsart University, Thailand 4 Asian Institute of Technology, Thailand Abstract Numerous amounts and different types of data are available for agricultural study. Under each individual condition, different data models and names exist for different organizations and countries. Many standardization organizations are working for syntactic interoperability, but in the same time, semantic interoperability should be considered. In this study, lexicographic ontology and geographic ontology are applied as reference information for data interoperability. In order to collect the ontological information, the system is developed by using Semantic MediaWiki which allows for the encoding of semantic data within wiki pages. Utilization of Semantic MediaWiki is good for ontology development in terms of simple construction, and easy browsing and modification. Ontological information for agriculture is developed by using Semantic MediaWiki to register and update of ontological information. Constructed knowledge of terminologies and their relations is utilized for data interoperability and that information should be sustainable for updating of ontology. Also, the ontological information must be reliable, so that it is necessary to corroborate with scientific society. It is very important to provide more sophisticated and more user friendly tools for sharing ontological information. Keywords: ontology, data interoperability, semantic network dictionary Introduction “Ontology” is originally used as Philosophical word, which means the branch of metaphysics that deals with the nature of being. But currently, in the field of context of knowledge sharing, the termOntology” means a specification of a conceptualization. That is, “Ontology” is a description of the concepts and relationships that can exist for a community or a particular field. This definition is consistent with the usage of ontology as set-of-concept-definitions, but more general (Smith, 2003). In order to integrate or share the information, ontological information is collected, managed, and compared; for example, data dictionaries, classification schemata, terminologies, thesauruses, and their relations are handled. Data sharing and data service are applied for better use of data by helping of data retrieval, metadata deign, and information mining. Semantic network dictionary is proposed as an ontology dictionary for agriculture. Dictionaries and data models are added to the system, developing “knowledge writing tool” for experts, and extracting semantic relations from authoritative documents with natural language processing techniques. Generally, ontology is applied to strict and well defined purpose such as task ontology (Kitamura, et al., 2004), but in this study, ontology is not restricted and is applied 479

Upload: lyhanh

Post on 30-Mar-2018

222 views

Category:

Documents


3 download

TRANSCRIPT

IAALD AFITA WCCA2008 WORLD CONFERENCE ON AGRICULTURAL INFORMATION AND IT

Development of Ontological Information for Agriculture in Thailand Masahiko Nagai1, Teerayut Horanont1, Thepchai Supnithi2, Asanee Kawtrakul3, Kulapramote Prathumchai4, Ryosuke Shibasaki 1 1 The University of Tokyo, Japan, [email protected] 2 National Electronics and Computer Technology Center, Thailand 3 Kasetsart University, Thailand 4 Asian Institute of Technology, Thailand Abstract Numerous amounts and different types of data are available for agricultural study. Under each individual condition, different data models and names exist for different organizations and countries. Many standardization organizations are working for syntactic interoperability, but in the same time, semantic interoperability should be considered. In this study, lexicographic ontology and geographic ontology are applied as reference information for data interoperability. In order to collect the ontological information, the system is developed by using Semantic MediaWiki which allows for the encoding of semantic data within wiki pages. Utilization of Semantic MediaWiki is good for ontology development in terms of simple construction, and easy browsing and modification. Ontological information for agriculture is developed by using Semantic MediaWiki to register and update of ontological information. Constructed knowledge of terminologies and their relations is utilized for data interoperability and that information should be sustainable for updating of ontology. Also, the ontological information must be reliable, so that it is necessary to corroborate with scientific society. It is very important to provide more sophisticated and more user friendly tools for sharing ontological information. Keywords: ontology, data interoperability, semantic network dictionary Introduction “Ontology” is originally used as Philosophical word, which means the branch of metaphysics that deals with the nature of being. But currently, in the field of context of knowledge sharing, the term“Ontology” means a specification of a conceptualization. That is, “Ontology” is a description of the concepts and relationships that can exist for a community or a particular field. This definition is consistent with the usage of ontology as set-of-concept-definitions, but more general (Smith, 2003). In order to integrate or share the information, ontological information is collected, managed, and compared; for example, data dictionaries, classification schemata, terminologies, thesauruses, and their relations are handled. Data sharing and data service are applied for better use of data by helping of data retrieval, metadata deign, and information mining. Semantic network dictionary is proposed as an ontology dictionary for agriculture. Dictionaries and data models are added to the system, developing “knowledge writing tool” for experts, and extracting semantic relations from authoritative documents with natural language processing techniques. Generally, ontology is applied to strict and well defined purpose such as task ontology (Kitamura, et al., 2004), but in this study, ontology is not restricted and is applied

479

IAALD AFITA WCCA2008 WORLD CONFERENCE ON AGRICULTURAL INFORMATION AND IT

as reference information for data interoperability. Ontological information is classified into two groups, lexicographic ontology and geographic ontology, as shown in Fig. 1. There are diverse data and an individual dataset has its own definition of data. Those definitions are described as schema, for example, land use data schema, climate data schema. Under individual different data schema, it has different data names. Referring lexicographic ontology, it may estimate or sometimes successfully establish association of data. As long as, definition of data itself is focused on data interoperability, lexicographic ontology is applied. But, if it focus on the location of the observation site, it is necessary to have a dictionary for geographic information such as spatial coordinate and land name for establishing association of data, so such kinds of ontology called geographic ontology.

Agricultural information has strong local characteristics, such as climate, cultural, history, language and so on. Therefore, it is very important to collect local agricultural information. In this study, semantic network dictionary system is developed for information sharing by using Semantic MediaWiki. It helps to gather local information and associate with existing global ontology such as AGROVOC developed by the Food and Agriculture Organization of the United Nations (FAO).

1 2 3 4 5 6 7 8 9 101112Year2010Year2020Year2030Year2040Year2050051015202530354045

単位:0.1度

0

50

100

150

200

250

300

350

1992

1994

1996

1998

2000

2002

2004

2006

2008

2010

2012

2014

2016

2018

2020

A1:高度成長社会

A2:多元化型

B1:持続発展型

B2:地域共存型

MetadataElement

CoreElement(MD_Metadata)

ProfessionalElement OrganizationalElement

+professionalElement0..*0..*+coreElement+organizationalElement

0..*

Socio Economic data schema

Population data schemaHealth data shema

Climate data schema

Hydrology data schema

Landuse data schema

Agriculture data schema

Crop Yield data schema

Land use model

Lexicographic Ontology Geographic Ontology

Data integration by Ontological Information

Fig. 1. Ontological information

Semantic Network Dictionary

Semantic network dictionary means that a certain term is expressed by not only definitions, but also relations of the term such as synonym, is-a, part-of, and so on. Entry words are handled as a node, and relations of the terms are handled as a link. There are three peculiarities for the semantic network dictionary and its usage in terms of reliability, simple structure, and easy browsing and modification.

480

IAALD AFITA WCCA2008 WORLD CONFERENCE ON AGRICULTURAL INFORMATION AND IT

At first, the semantic network dictionary must be reliable, when users handle data by referring ontological information. If reliability is low, interoperability of data is not achieved. For reliability of the information, reliable data source should be used, and data documentation must be obvious. In this study, collaboration with scientific societies or research institutes is carried out for collecting reliable information. List of technical terms and associations of the terms are provided as ontological information from specialists. Reliability of data documentation is also achieved by adding authors and title of the references. Not only achieving technical terms but also editing of terms is carried out by specialist for updating.

Secondly, basic structure of semantic network dictionary should be simple. In semantic network dictionary, not only technical terms but and their relations are handled. That is, it must be easy to obtain a lot of data from various sources, and it helps to save labor for data construction. This is one of the key points to collect ontological information. In this study, ontological information is collected by XML format.

Thirdly, the purpose of semantic network dictionary is to support interoperability of data, that is, it is necessary to refer to trans-disciplinary fields and link to existing data or systems. The structure of dictionary is just network between technical terms, so browsing is very simple like hyper link of web browser. Also, it is easy to add or edit their links and nodes, and to cut off certain parts of dictionary, and to dump in XML format.

Semantic MediaWiki

In order to handle ontological information, the semantic network dictionary is developed which is based on Semantic MediaWiki (Leuf and Cunningham, 2001). Semantic MdeiaWiki allows users to freely create and edit contents using any Web browser. Semantic MediaWiki is a feature-rich wiki implementation, as shown in Fig. 2. Semantic MediaWiki handles hyperlinks and has simple text syntax for creating new pages and crosslink between terms. A visual depiction of content is expressed by tags. It is not easy to add relations by tags. Therefore, in this study, table like editor is developed. It displays not only definition, but also relations of terms on the table as shown in Fig. 2. Table editor is applied in order to modify relations of terms by using a table without putting tags. Data managing system is also developed to maintain the system users and entry words. This function is necessary to keep data reliability. The data managing system controls user access, working group, and gives permission to add or edit the semantic network dictionary.

Fig. 2. Semantic MediaWiki and Table editor

481

IAALD AFITA WCCA2008 WORLD CONFERENCE ON AGRICULTURAL INFORMATION AND IT

In this study, dictionaries, and data schemata are collected for examination of semantic network dictionary. In addition to agricultural information, the fields of collected trans-disciplinary dictionaries are biology, civil engineering, earth science, soil science, meteorology, health science, and remote sensing. Moreover, landuse classification schemata are collected as a geographic ontology. These scientific fields are strongly related to agriculture and it is necessary to understand agricultural information. Reverse Dictionary

Constructed ontological information is used for the reverse dictionary. The reverse dictionary describes a concept of term from definition and associations of terms. The reverse dictionary is based on GETA which is developed by National Institute of Informatics, Japan. It is tools for manipulating large dimensional sparse matrices for text retrieval. GETA is an engine for association’s calculation such as similarity measurement (Takano, et al., 2000). For example, if user wants to know about “rice disease caused by insect”, reverse dictionary returns the list of terms with similarity scores, such as “Orange Leaf Disease”, “Yellow Dwarf Disease”, “Gall Dwarf Disease”, “Grassy Stunt Disease”, and so on. The reverse dictionary helps data retrieval and information mining by calculating similarity of items. Also, the reverse dictionary is linked with existing translation web service, which is called LEXTRON developed by National Electronics and Computer Technology Center, Thailand. Agricultural ontological information is developed by Thai language, but that information is searched by English keyword as shown in Fig. 3.

Scientists(Foreigners)

University of Tokyo

Portal Site

KeywordsRice , disuses, gray, bacteria

“Rice Blast”

Reveres dictionary

ข้าว, โรค, เทา , เชือ้

MediaWiki

Translation

③โรคไหม้

Fig. 3. Reverse dictionary

KeyGraph Viewer

In order to compare associations among the different terms, graph representation is useful as shown in Fig. 4. Landuse classification schemata in Thailand and Indonesia are compared as an example. The term “water body” can be found in both countries’ classification schemata. Apparently, both “water body” classes look the same, but level of hierarchy is a bit different in each classification schema. In the case of Indonesian landuse classification, “water body” does

482

IAALD AFITA WCCA2008 WORLD CONFERENCE ON AGRICULTURAL INFORMATION AND IT

not include water course because it is the second class of the classification schema. However, “water body” in Thailand includes all water related land types because “water body” is the top class of the classification schema. Consequently, graph representation proves a clear distinction between the two terms and schemata.

Landuse in Thailand Landuse in Indonesia

Fig. 4. Reverse dictionary Discussion

In conclusion, many standardization organizations are working for syntactic level of data interoperability, but in the same time, semantic interoperability of data must be considered in heterogeneous condition and also diversified data set for local agricultural information. Ontological information is developed by the proposed system for lexicographic ontology and geographic ontology. Collaboration with local scientists is very important for agriculture; therefore, it is necessary to develop effective and simple tool to collect reliable ontological information. Semantic network dictionary are developed to register and update of ontological information based on Semantic MediaWiki. They must be a tool to support scientist and specialist for their ontology development. In order to invite contributions from the user community with various local scientists, it is necessary to provide more sophisticated and user friendly tools and systems for sustainable development of ontological information.

References Smith, B., 2003, Preprint version of chapter “Ontology”, in L. Floridi (ed.), Blackwell Guide to

the Philosophy of Computing and Information, Oxford: Blackwell, pp.155–166. Kitamura, Y,. Kashiwase, M., Fuse M., and Mizoguchi, R., 2004, Deployment of an ontological

framework of functional design knowledge, Advanced Engineering Informatics, Volume 18, Issue 2, pp. 115-127.

Leuf, B. and Cunningham, W., 2001, The Wiki Way: Quick Collaboration on the Web. Addison-Wesley, USA.

Takano, A., Niwa, Y., Nishioka, S., Iwayama, M., Hisamitsu, T., Imaichi, O., Sakurai, H., 2000, Information Access based on Associative Calculation, In Lecture Notes in Computer Science LNCS:1963, Springer.

483