semantic web and retrieval of scientific data semantics

19
1 Semantic Web and Retrieval of Scientific Data Semantics Goran Soldar University of Brighton UK Dan Smith University of East Anglia UK

Upload: dionne

Post on 05-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Semantic Web and Retrieval of Scientific Data Semantics. Goran Soldar University of Brighton UK. Dan Smith University of East Anglia UK. Introduction. Semantic Web Introduced by Tim Berners-Lee Data and resources described, interchanged, and processed - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semantic Web and Retrieval of Scientific Data Semantics

1

Semantic Web and Retrieval of Scientific Data Semantics

Goran SoldarUniversity of Brighton

UK

Dan SmithUniversity of East Anglia

UK

Page 2: Semantic Web and Retrieval of Scientific Data Semantics

2

Semantic Web Introduced by Tim Berners-Lee Data and resources described, interchanged, and processed Machine understanding of heterogeneous data

Most search engines on the Web are human use oriented Finding and processing scientific data on the web is time- consuming process

Example Search: Web pages containing the word temperature Search engine: Google Search domain: www.cru.uea.ac.uk Results: 773 web pages

Introduction

Page 3: Semantic Web and Retrieval of Scientific Data Semantics

3

Introduction

Inefficiency of the traditional search Humans have to browse through web pages No guarantee that the wanted information will be found

Preferred approach Describe the semantics of data using RDF/XML format Store the data in a DBMS Automatically retrieve desired information based on users requests Enable client machines to learn the semantics of RDF format described data

Page 4: Semantic Web and Retrieval of Scientific Data Semantics

4

Introduction

Objectives of this work

To address the problem of extracting semantics from data files within the meteorology domain.

To build the ontology for the meteorology domain.

To create semantic cases with RDF Model/RDF Schema.

To employDB2 DBMS as the data repository.

To enhance standard DBMS with RDF Triples Engine.

To manage the RDF graph structure with RDF Triples Engine.

Page 5: Semantic Web and Retrieval of Scientific Data Semantics

5

RDF and Domain Ontology

RDF is a framework for describing metadata.

It enables interoperability between machines by interchanging information about information resources It is represented with a Directed Labeled Graph

Name

File ltgrid.dat

Resource Property Value

(Subject) (Predicate) (Object)

RDF structure

Page 6: Semantic Web and Retrieval of Scientific Data Semantics

6

RDF and Domain Ontology

Specific domains represented with RDF Our focus: The Meteorology domain The concepts, semantics and the relations between the concepts defined with RDF Schema. Ontology: An explicit specification of an information domain

RDF Schema: Uses the syntax of RDF Model Corresponds to XML’s DTD or XML Schema RDF Schema is a basis for RDF instances

Page 7: Semantic Web and Retrieval of Scientific Data Semantics

7

Modelling RDF Model for Meteorology

Three phases of modelling Development of the vocabulary (ontology) Design of semantic cases to capture resource description Creation of semantic case instances

The vocabulary is comprised of main concepts and classes represented by classes and properties

RDF Schema uses RDF Model encoding syntax

rdf:type separates RDF classes from properties

rdfs:subClassOf allows expression of inheritance-relationship between RDF classes

Page 8: Semantic Web and Retrieval of Scientific Data Semantics

8

Modelling RDF Model for Meteorology

The Meteorology domain at cru.sys.uea.ac.uk:

Contains about 1000 data files Made of 9 meteorological topic (sub-domains) Have all sub-domains designed as RDF classes have all concepts and elements defined in its Namespace

The ontology is defined in two RDF files: Class.rdf Property.rdf

Semantic cases are based on the existing vocabulary Simple semantic cases designed first Complex cases are the combination of complex ones

Page 9: Semantic Web and Retrieval of Scientific Data Semantics

9

Modelling RDF Model for Meteorology

Our prototype model: Describes 100 data sets Contains 4 semantic cases

HeaderCase URL FormatType DataParameter Comment Domain

SizeCase Compression FileSize Value

ObservationCase Frequency TimePeriod Value

The semantic cases

PeriodCase TimeRange TimePeriod Value

Page 10: Semantic Web and Retrieval of Scientific Data Semantics

10

<rdf:Description about="hgt.1958.1000.6h.w1.53x21.dat.gz"> <cru:URL> http://www.cru.uea.ac.uk/cru/pressure/hgt/hgt1000_6h </cru:URL> <cru:FormatType>ASCII</cru:FormatType> <cru:DataParameter> GeopotentialHeight_AtPressure </cru:DataParameter> <rdfs:comment> 6-Hourly GeopotentialHeight at 1000mb </rdfs:comment> <rdfs:domain>cru:Height</rdfs:domain></rdf:Description>

RDF Instance of HeaderCase for a data file

Modelling RDF Model for Meteorology

Page 11: Semantic Web and Retrieval of Scientific Data Semantics

11

From RDF to Relational Model

Our prototype model: Comprises of 12 RDF files

One holds semantic case descriptions Two hold RDF Schema descriptions Nine contain RDF onstances of semantic cases

Management of RDF-described data W3C does not recommend any method for manipulating RDF Triples RDF structure is similar to XML XML comes with APIs for data manipulation (SAX, DOM), RDF does not

Page 12: Semantic Web and Retrieval of Scientific Data Semantics

12

Mapping RDF model for Meteorology into RDBMS

DB2

CRU MeteorologicalDomain

RDF TripleEngine

SiRPAC

RDF Triples Model

Ontology Semantic

Cases

We utilise RDF triple structure to achieve the manipulation of data

XML parsers check the syntax of RDF

RDF parsers converts it into triples

RDF tags removed

Triples converted onto Relational model

Stored in DB2 DBMS

Modelling RDF Model for Meteorology

Page 13: Semantic Web and Retrieval of Scientific Data Semantics

13

(TCP/IP)

Distributed Data and Information Sources for Meteorology

html

dbms

ascii file

RDF Triple Engine

RDF Model

RDF Schema

Semantics Support Server

(TCP/IP)

Applications

Semantics Retrieval Language (SRL)

(HTTP Requests)

(TCP/IP)

(SRL Requests)

Users

DB2

JDBC

SiRPAC

Web Interface

Web Server

Java Servlet Engine

RDF architecture for retrieving semantic information

Modelling RDF Model for Meteorology

Page 14: Semantic Web and Retrieval of Scientific Data Semantics

14

Retrieval of Semantic Information

RDF Triple Engine is responsible for manipulating triples and executing semantic queries

Based on Client/Server architecture with specialised RDF servers

Records in DBMS have graph structure Not semantically atomic

Additional query processing added to RTE RTE is aware of graph structure of triples Able to produce results that reconstruct the graph structure and present in format specified by users

Page 15: Semantic Web and Retrieval of Scientific Data Semantics

15

Property Resource Value frequency temperature daily

domain temperature weather

recorded temperature file

name file ltgrid.dat

url file www.cru.uea.ac.uk

size file size_id

value size_id 40

temperature

file

recorded

domain weather

frequency

name

daily

size

unit

ltgrid.dat

size_id

40 Kb

value

www.cru.uea.ac.uk

url

unit size_id Kb

RDF graph for the Weather domain Relational structure of the RDF graph

Retrieval of Semantic Information

Page 16: Semantic Web and Retrieval of Scientific Data Semantics

16

Property Resource Value cru:URL hgt.1958.1000.6h.w1.53x21.dat.gz http://www.cru.uea.ac.uk/cru/data/ncep/window1/ 6hourly/pressure/hgt/hgt1000_6h cru:FormatType hgt.1958.1000.6h.w1.53x21.dat.gz ASCII cru:DataParameter hgt.1958.1000.6h.w1.53x21.dat.gz GeopotentialHeight_AtPressure rdfs:comment, hgt.1958.1000.6h.w1.53x21.dat.gz 6-Hourly GeopotentialHeight at 1000mb rdfs:domain hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height rdf:type cru:Height#genid2 Rdf:Seq rdf:_1 cru:Height#genid2 Compressed rdf:_2 cru:Height#genid2 Kilobyte rdf:_3 cru:Height#genid2 2593 cru:size hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height#genid2 rdf:type cru:Height#genid3 rdf:Seq rdf:_1 cru:Height#genid3 Frequency rdf:_2 cru:Height#genid3 Hour rdf:_3 cru:Height#genid3 6 cru:observation hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height#genid3 rdf:type cru:Height#genid4 rdf#Seq rdf:_1 cru:Height#genid4 TimeRange rdf:_2 cru:Height#genid4 Year rdf:_3 cru:Height#genid4 1958 cru:period hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height#genid4

RDF instance “MetInstance” converted into a relational table

Retrieval of Semantic Information

Page 17: Semantic Web and Retrieval of Scientific Data Semantics

17

Retrieval of Semantic Information

RTE relies on SQL query processor to extract relevant triples Semantics Retrieval Language (SRL) prototype developed SQL-similar syntax

Example DESCRIBE RESOURCE “hgt.1958.1000.6h.w1.53x21.dat.gz”;

Processing of the above SRL queryStep 1: Transform the query into a standard SQL sentence and submit it to DB2

SELECT * FROM MetInstanceWHERE RESOURCE=“hgt.1958.1000.6h.w1.53x21.dat.gz”;

Page 18: Semantic Web and Retrieval of Scientific Data Semantics

18

Retrieval of Semantic Information

Step 2RTE applies the rules to generate XML as the output:1. Extract name space prefixes and generate XML namespace node.

2. For all (real) atomic value create XML elements with Property values as XML elements

3. For all non-atomic values, create XML nodes as sub-elements of the resources where they appear as values

4. Ensure that if the node type is Seq container, all elements must be ordered

Page 19: Semantic Web and Retrieval of Scientific Data Semantics

19

Conclusion

RTE-DBS approach enables querying and retrieval of semantic information from scientific data files available on the Web

Such retrieved information can be further processed by a machine or used by humans

Future work will be based on building a user interface into RTE to maintain individual triples to prevent removal of triples who are nodes

A method for for identifying data semantics of data sets, based on reasoning over semantic cases will be developed