a description method for scientific data based on kos

21
A DESCRIPTION METHOD FOR SCIENTIFIC DATA BASED ON KOS Wei Sun, Xuefu Zhang AII, CAAS

Category:

Documents


1 download

DESCRIPTION

Presentation held by Wei Sun, Xuefu Zhang at the Agricultural Ontology Service (AOS) Workshop 2012 in Kutching, Sarawak, Malaysia from September 3 - 4, 2012

TRANSCRIPT

Page 1: A Description Method for Scientific Data Based on KOS

A DESCRIPTION METHOD FOR

SCIENTIFIC DATA BASED ON

KOS Wei Sun, Xuefu Zhang

AII, CAAS

Page 2: A Description Method for Scientific Data Based on KOS

OUTLINE

Background

Research Objectives

Description Schema for Scientific Data

Process for establishing the scientific data

description schema

Key Technology for Constructing the Scientific

Data Description Scheme

Empirical analysis

Conclusion

Page 3: A Description Method for Scientific Data Based on KOS

BACKGROUND

Currently, the scientific data haven’t been

described enough at depth in current related

researches which in turn influence the efficiency

of integrated knowledge discovery based on

scientific data or based on scientific data and

other resources.

Resource description is one of important content

in Knowledge Organization. Generally KOS can

improve the organization and description

granularity of scientific data semantically.

Page 4: A Description Method for Scientific Data Based on KOS

RESEARCH OBJECTIVES

Based on the above considerations, the paper

proposes an integrated description method for

scientific data based on the Knowledge

organization system by referring to the faceted

classification and ontology in KOS.

By taking the agricultural scientific data as an

example, the paper verifies its feasibility which

has laid a resource base for further improving the

efficiency of integrated knowledge discovery

based on scientific data.

Page 5: A Description Method for Scientific Data Based on KOS

DESCRIPTION SCHEMA FOR SCIENTIFIC

DATA

Faceted Classification Description

for Scientific Data

Concept Mapping for Scientific Data

Entity

Page 6: A Description Method for Scientific Data Based on KOS

DESCRIPTION SCHEMA FOR SCIENTIFIC

DATA

Concept Mapping

mF1

F

sF1

sF2

T1 T2 T3

T4 T5

Term Space of sF2

Dummy Root Node

Facet

Term

( a( Faceted Classification Description Structure ( b( Domain Ontology Structure

Fig. 1. Description schema for scientific data

Page 7: A Description Method for Scientific Data Based on KOS

PROCESS FOR ESTABLISHING THE

SCIENTIFIC DATA DESCRIPTION SCHEMA

Seven steps

“confirming the scope of description”

“making clear purpose of the description schema”

“selection and construction of the facets”

“term extraction and construction of the term space”

“index for facets, terms and scientific data entities”

“concept mapping description structure for scientific

data to domain ontology”

“maintenance for the scientific data description

based on the faceted classification ”

Page 8: A Description Method for Scientific Data Based on KOS

KEY TECHNOLOGY FOR CONSTRUCTING

THE SCIENTIFIC DATA DESCRIPTION

SCHEME (Ⅰ)

Selection and Construction of the Facets

Facet Analysis.

Facet Reduction.

Page 9: A Description Method for Scientific Data Based on KOS

KEY TECHNOLOGY FOR CONSTRUCTING

THE SCIENTIFIC DATA DESCRIPTION

SCHEME (Ⅱ)

Term Extraction

Term Extraction Based on Numeric

Attributive Variable.

Term Extraction Based on Text-type

Attribute Variable.

Term Extraction Based on Mixed-type

Attribute Variable.

Page 10: A Description Method for Scientific Data Based on KOS

KEY TECHNOLOGY FOR CONSTRUCTING

THE SCIENTIFIC DATA DESCRIPTION

SCHEME (Ⅱ)

Construction of the Term Space (Linguistic

Value Mapping of the Term Space)

Numeric Attribute

Attribute Variable

Text-type Attribute

Conceptual

AttributeDiscrete Numeric

Attribute

Wide Threshold

Attribute

Narrow Threshold

Attribute

Mixed-type Attribute

Numeric

Attribute

Text-type

Attribute

Mapping After Discretization Mapping Directly

Continous

Numeric Attribute

Fig. 2. Attribute division mode and linguistic value mapping for different attributes

Page 11: A Description Method for Scientific Data Based on KOS

KEY TECHNOLOGY FOR CONSTRUCTING

THE SCIENTIFIC DATA DESCRIPTION

SCHEME (Ⅲ)

Index for Scientific Data Entity

Rule 1. If the attribute variable extracted from any field is the concept attribute or narrow threshold attribute, and then accurate retrieval can be conducted on concept attribute (or narrow threshold attribute) or the combination of the concept attribute (or narrow threshold attribute) and its facet, then the retrieval result will be regarded as index words for the scientific data entity d.

Page 12: A Description Method for Scientific Data Based on KOS

KEY TECHNOLOGY FOR CONSTRUCTING

THE SCIENTIFIC DATA DESCRIPTION

SCHEME (Ⅲ)

Index for Scientific Data Entity

Rule 2. If the attribute variable extracted from

any field is wide threshold attribute variable or

non-interval continuous attribute, and then

mapping should be conducted on the attributes

as well as the interval value of the faceted

classification structure. As a result, the interval

value mapped in the faceted classification

structure will be regarded as an index for the

scientific data entity d.

Page 13: A Description Method for Scientific Data Based on KOS

KEY TECHNOLOGY FOR CONSTRUCTING

THE SCIENTIFIC DATA DESCRIPTION

SCHEME (Ⅲ)

Index for Scientific Data Entity

Rule 3. If the attribute variable extracted

from any field is continuous interval

attribute, and then middle-point value of

the interval attribute should be mapped

with the interval value in the facet

classification structure. As a result, the

interval value mapped in the faceted

classification structure will be regarded as

an index for the scientific data entity d.

Page 14: A Description Method for Scientific Data Based on KOS

KEY TECHNOLOGY FOR CONSTRUCTING

THE SCIENTIFIC DATA DESCRIPTION

SCHEME (Ⅳ)

Concept Mapping for Scientific Data Entity

Upward Matching Principle for Scientific Data

Description Term.

Downward Matching Principle for Term in Domain

Ontology.

Page 15: A Description Method for Scientific Data Based on KOS

EMPIRICAL ANALYSIS

Data Resource

No. Data class Name of the database Name of the sub-database

affiliated to the database

Obtained

time

1

Agricultural

resource and

environmental

science

Database for leaf vegetable

pests in China

Sub-database of the database

for national agricultural pests 2009.10

2

Agricultural

resource and

environmental

science

Database for the water

demands of the reference crop

Sub-database of the database

for national irrigation

experiment

2009.10

3 Agricultural

science base

Database for the dynamic

development of China’s

agricultural science

Database for the dynamic

development of agricultural

science

2009.10

Table 1. Name list of the scientific data resources

Page 16: A Description Method for Scientific Data Based on KOS

EMPIRICAL ANALYSIS

Result Demonstration for Method

Separator between Each Level of Facets

Identifier of Facet’s Specific Location

A Facet

Fig. 3. Example of faceted logic structure description document for scientific data

Page 17: A Description Method for Scientific Data Based on KOS

EMPIRICAL ANALYSIS

Result Demonstration for Method

Separator between Each Level of Terms Identifier of Term’s Specific Location

Term

Fig. 4. Term logic structure description document after linguistic value mapping

Page 18: A Description Method for Scientific Data Based on KOS

EMPIRICAL ANALYSIS

Result Demonstration for Method

The Name of a Scientific Data Entity

Descriptor

A Term

Fig. 5. Screenshot for the index result of a scientific data entity

Page 19: A Description Method for Scientific Data Based on KOS

EMPIRICAL ANALYSIS

Mapping match

classification based

on the matching

degree

Mapping match

classification based on

the conceptual level

Mapping

intensity

Matchin

g number

Neighbor matching Complete matching strong 2473

Neighbor matching Incomplete matching Relatively

strong 3050

Distant matching Complete matching Relatively

weak 231

Distant matching Incomplete matching weak 788

Table 2. Statistical table for the concept mapping among different types of scientific data entity

Page 20: A Description Method for Scientific Data Based on KOS

CONCLUSION

The integrated description method for scientific data put forward in the paper has improved the description efficiency for the scientific data no matter from the description granularity or from the affinity of the entity semantic relation.

Limited by time, length and experimental conditions, there are still certain shortages in this research. (i) The method is only testified with the agricultural

scientific data, further improvement and verification on the method should be conducted based on other fields or wider data sources.

(ii) Semantic web hasn’t been adopted to conduct standardized process on the description result. Ontology will be introduced into the next step, and standardized process will be conducted on the description result by employing RDF.

Page 21: A Description Method for Scientific Data Based on KOS

Thanks!