a description method for scientific data based on kos
Upload: aims-agricultural-information-management-standards-fao-of-the-un
Post on 13-Jan-2015
678 views
DESCRIPTION
Presentation held by Wei Sun, Xuefu Zhang at the Agricultural Ontology Service (AOS) Workshop 2012 in Kutching, Sarawak, Malaysia from September 3 - 4, 2012TRANSCRIPT
A DESCRIPTION METHOD FOR
SCIENTIFIC DATA BASED ON
KOS Wei Sun, Xuefu Zhang
AII, CAAS
OUTLINE
Background
Research Objectives
Description Schema for Scientific Data
Process for establishing the scientific data
description schema
Key Technology for Constructing the Scientific
Data Description Scheme
Empirical analysis
Conclusion
BACKGROUND
Currently, the scientific data haven’t been
described enough at depth in current related
researches which in turn influence the efficiency
of integrated knowledge discovery based on
scientific data or based on scientific data and
other resources.
Resource description is one of important content
in Knowledge Organization. Generally KOS can
improve the organization and description
granularity of scientific data semantically.
RESEARCH OBJECTIVES
Based on the above considerations, the paper
proposes an integrated description method for
scientific data based on the Knowledge
organization system by referring to the faceted
classification and ontology in KOS.
By taking the agricultural scientific data as an
example, the paper verifies its feasibility which
has laid a resource base for further improving the
efficiency of integrated knowledge discovery
based on scientific data.
DESCRIPTION SCHEMA FOR SCIENTIFIC
DATA
Faceted Classification Description
for Scientific Data
Concept Mapping for Scientific Data
Entity
DESCRIPTION SCHEMA FOR SCIENTIFIC
DATA
Concept Mapping
mF1
F
sF1
sF2
T1 T2 T3
T4 T5
Term Space of sF2
Dummy Root Node
Facet
Term
( a( Faceted Classification Description Structure ( b( Domain Ontology Structure
Fig. 1. Description schema for scientific data
PROCESS FOR ESTABLISHING THE
SCIENTIFIC DATA DESCRIPTION SCHEMA
Seven steps
“confirming the scope of description”
“making clear purpose of the description schema”
“selection and construction of the facets”
“term extraction and construction of the term space”
“index for facets, terms and scientific data entities”
“concept mapping description structure for scientific
data to domain ontology”
“maintenance for the scientific data description
based on the faceted classification ”
KEY TECHNOLOGY FOR CONSTRUCTING
THE SCIENTIFIC DATA DESCRIPTION
SCHEME (Ⅰ)
Selection and Construction of the Facets
Facet Analysis.
Facet Reduction.
KEY TECHNOLOGY FOR CONSTRUCTING
THE SCIENTIFIC DATA DESCRIPTION
SCHEME (Ⅱ)
Term Extraction
Term Extraction Based on Numeric
Attributive Variable.
Term Extraction Based on Text-type
Attribute Variable.
Term Extraction Based on Mixed-type
Attribute Variable.
KEY TECHNOLOGY FOR CONSTRUCTING
THE SCIENTIFIC DATA DESCRIPTION
SCHEME (Ⅱ)
Construction of the Term Space (Linguistic
Value Mapping of the Term Space)
Numeric Attribute
Attribute Variable
Text-type Attribute
Conceptual
AttributeDiscrete Numeric
Attribute
Wide Threshold
Attribute
Narrow Threshold
Attribute
Mixed-type Attribute
Numeric
Attribute
Text-type
Attribute
Mapping After Discretization Mapping Directly
Continous
Numeric Attribute
Fig. 2. Attribute division mode and linguistic value mapping for different attributes
KEY TECHNOLOGY FOR CONSTRUCTING
THE SCIENTIFIC DATA DESCRIPTION
SCHEME (Ⅲ)
Index for Scientific Data Entity
Rule 1. If the attribute variable extracted from any field is the concept attribute or narrow threshold attribute, and then accurate retrieval can be conducted on concept attribute (or narrow threshold attribute) or the combination of the concept attribute (or narrow threshold attribute) and its facet, then the retrieval result will be regarded as index words for the scientific data entity d.
KEY TECHNOLOGY FOR CONSTRUCTING
THE SCIENTIFIC DATA DESCRIPTION
SCHEME (Ⅲ)
Index for Scientific Data Entity
Rule 2. If the attribute variable extracted from
any field is wide threshold attribute variable or
non-interval continuous attribute, and then
mapping should be conducted on the attributes
as well as the interval value of the faceted
classification structure. As a result, the interval
value mapped in the faceted classification
structure will be regarded as an index for the
scientific data entity d.
KEY TECHNOLOGY FOR CONSTRUCTING
THE SCIENTIFIC DATA DESCRIPTION
SCHEME (Ⅲ)
Index for Scientific Data Entity
Rule 3. If the attribute variable extracted
from any field is continuous interval
attribute, and then middle-point value of
the interval attribute should be mapped
with the interval value in the facet
classification structure. As a result, the
interval value mapped in the faceted
classification structure will be regarded as
an index for the scientific data entity d.
KEY TECHNOLOGY FOR CONSTRUCTING
THE SCIENTIFIC DATA DESCRIPTION
SCHEME (Ⅳ)
Concept Mapping for Scientific Data Entity
Upward Matching Principle for Scientific Data
Description Term.
Downward Matching Principle for Term in Domain
Ontology.
EMPIRICAL ANALYSIS
Data Resource
No. Data class Name of the database Name of the sub-database
affiliated to the database
Obtained
time
1
Agricultural
resource and
environmental
science
Database for leaf vegetable
pests in China
Sub-database of the database
for national agricultural pests 2009.10
2
Agricultural
resource and
environmental
science
Database for the water
demands of the reference crop
Sub-database of the database
for national irrigation
experiment
2009.10
3 Agricultural
science base
Database for the dynamic
development of China’s
agricultural science
Database for the dynamic
development of agricultural
science
2009.10
Table 1. Name list of the scientific data resources
EMPIRICAL ANALYSIS
Result Demonstration for Method
Separator between Each Level of Facets
Identifier of Facet’s Specific Location
A Facet
Fig. 3. Example of faceted logic structure description document for scientific data
EMPIRICAL ANALYSIS
Result Demonstration for Method
Separator between Each Level of Terms Identifier of Term’s Specific Location
Term
Fig. 4. Term logic structure description document after linguistic value mapping
EMPIRICAL ANALYSIS
Result Demonstration for Method
The Name of a Scientific Data Entity
Descriptor
A Term
Fig. 5. Screenshot for the index result of a scientific data entity
EMPIRICAL ANALYSIS
Mapping match
classification based
on the matching
degree
Mapping match
classification based on
the conceptual level
Mapping
intensity
Matchin
g number
Neighbor matching Complete matching strong 2473
Neighbor matching Incomplete matching Relatively
strong 3050
Distant matching Complete matching Relatively
weak 231
Distant matching Incomplete matching weak 788
Table 2. Statistical table for the concept mapping among different types of scientific data entity
CONCLUSION
The integrated description method for scientific data put forward in the paper has improved the description efficiency for the scientific data no matter from the description granularity or from the affinity of the entity semantic relation.
Limited by time, length and experimental conditions, there are still certain shortages in this research. (i) The method is only testified with the agricultural
scientific data, further improvement and verification on the method should be conducted based on other fields or wider data sources.
(ii) Semantic web hasn’t been adopted to conduct standardized process on the description result. Ontology will be introduced into the next step, and standardized process will be conducted on the description result by employing RDF.
Thanks!