semantic web and retrieval of scientific data semantics
DESCRIPTION
Semantic Web and Retrieval of Scientific Data Semantics. Goran Soldar University of Brighton UK. Dan Smith University of East Anglia UK. Introduction. Semantic Web Introduced by Tim Berners-Lee Data and resources described, interchanged, and processed - PowerPoint PPT PresentationTRANSCRIPT
1
Semantic Web and Retrieval of Scientific Data Semantics
Goran SoldarUniversity of Brighton
UK
Dan SmithUniversity of East Anglia
UK
2
Semantic Web Introduced by Tim Berners-Lee Data and resources described, interchanged, and processed Machine understanding of heterogeneous data
Most search engines on the Web are human use oriented Finding and processing scientific data on the web is time- consuming process
Example Search: Web pages containing the word temperature Search engine: Google Search domain: www.cru.uea.ac.uk Results: 773 web pages
Introduction
3
Introduction
Inefficiency of the traditional search Humans have to browse through web pages No guarantee that the wanted information will be found
Preferred approach Describe the semantics of data using RDF/XML format Store the data in a DBMS Automatically retrieve desired information based on users requests Enable client machines to learn the semantics of RDF format described data
4
Introduction
Objectives of this work
To address the problem of extracting semantics from data files within the meteorology domain.
To build the ontology for the meteorology domain.
To create semantic cases with RDF Model/RDF Schema.
To employDB2 DBMS as the data repository.
To enhance standard DBMS with RDF Triples Engine.
To manage the RDF graph structure with RDF Triples Engine.
5
RDF and Domain Ontology
RDF is a framework for describing metadata.
It enables interoperability between machines by interchanging information about information resources It is represented with a Directed Labeled Graph
Name
File ltgrid.dat
Resource Property Value
(Subject) (Predicate) (Object)
RDF structure
6
RDF and Domain Ontology
Specific domains represented with RDF Our focus: The Meteorology domain The concepts, semantics and the relations between the concepts defined with RDF Schema. Ontology: An explicit specification of an information domain
RDF Schema: Uses the syntax of RDF Model Corresponds to XML’s DTD or XML Schema RDF Schema is a basis for RDF instances
7
Modelling RDF Model for Meteorology
Three phases of modelling Development of the vocabulary (ontology) Design of semantic cases to capture resource description Creation of semantic case instances
The vocabulary is comprised of main concepts and classes represented by classes and properties
RDF Schema uses RDF Model encoding syntax
rdf:type separates RDF classes from properties
rdfs:subClassOf allows expression of inheritance-relationship between RDF classes
8
Modelling RDF Model for Meteorology
The Meteorology domain at cru.sys.uea.ac.uk:
Contains about 1000 data files Made of 9 meteorological topic (sub-domains) Have all sub-domains designed as RDF classes have all concepts and elements defined in its Namespace
The ontology is defined in two RDF files: Class.rdf Property.rdf
Semantic cases are based on the existing vocabulary Simple semantic cases designed first Complex cases are the combination of complex ones
9
Modelling RDF Model for Meteorology
Our prototype model: Describes 100 data sets Contains 4 semantic cases
HeaderCase URL FormatType DataParameter Comment Domain
SizeCase Compression FileSize Value
ObservationCase Frequency TimePeriod Value
The semantic cases
PeriodCase TimeRange TimePeriod Value
10
<rdf:Description about="hgt.1958.1000.6h.w1.53x21.dat.gz"> <cru:URL> http://www.cru.uea.ac.uk/cru/pressure/hgt/hgt1000_6h </cru:URL> <cru:FormatType>ASCII</cru:FormatType> <cru:DataParameter> GeopotentialHeight_AtPressure </cru:DataParameter> <rdfs:comment> 6-Hourly GeopotentialHeight at 1000mb </rdfs:comment> <rdfs:domain>cru:Height</rdfs:domain></rdf:Description>
RDF Instance of HeaderCase for a data file
Modelling RDF Model for Meteorology
11
From RDF to Relational Model
Our prototype model: Comprises of 12 RDF files
One holds semantic case descriptions Two hold RDF Schema descriptions Nine contain RDF onstances of semantic cases
Management of RDF-described data W3C does not recommend any method for manipulating RDF Triples RDF structure is similar to XML XML comes with APIs for data manipulation (SAX, DOM), RDF does not
12
Mapping RDF model for Meteorology into RDBMS
DB2
CRU MeteorologicalDomain
RDF TripleEngine
SiRPAC
RDF Triples Model
Ontology Semantic
Cases
We utilise RDF triple structure to achieve the manipulation of data
XML parsers check the syntax of RDF
RDF parsers converts it into triples
RDF tags removed
Triples converted onto Relational model
Stored in DB2 DBMS
Modelling RDF Model for Meteorology
13
(TCP/IP)
Distributed Data and Information Sources for Meteorology
html
dbms
ascii file
RDF Triple Engine
RDF Model
RDF Schema
Semantics Support Server
(TCP/IP)
Applications
Semantics Retrieval Language (SRL)
(HTTP Requests)
(TCP/IP)
(SRL Requests)
Users
DB2
JDBC
SiRPAC
Web Interface
Web Server
Java Servlet Engine
RDF architecture for retrieving semantic information
Modelling RDF Model for Meteorology
14
Retrieval of Semantic Information
RDF Triple Engine is responsible for manipulating triples and executing semantic queries
Based on Client/Server architecture with specialised RDF servers
Records in DBMS have graph structure Not semantically atomic
Additional query processing added to RTE RTE is aware of graph structure of triples Able to produce results that reconstruct the graph structure and present in format specified by users
15
Property Resource Value frequency temperature daily
domain temperature weather
recorded temperature file
name file ltgrid.dat
url file www.cru.uea.ac.uk
size file size_id
value size_id 40
temperature
file
recorded
domain weather
frequency
name
daily
size
unit
ltgrid.dat
size_id
40 Kb
value
www.cru.uea.ac.uk
url
unit size_id Kb
RDF graph for the Weather domain Relational structure of the RDF graph
Retrieval of Semantic Information
16
Property Resource Value cru:URL hgt.1958.1000.6h.w1.53x21.dat.gz http://www.cru.uea.ac.uk/cru/data/ncep/window1/ 6hourly/pressure/hgt/hgt1000_6h cru:FormatType hgt.1958.1000.6h.w1.53x21.dat.gz ASCII cru:DataParameter hgt.1958.1000.6h.w1.53x21.dat.gz GeopotentialHeight_AtPressure rdfs:comment, hgt.1958.1000.6h.w1.53x21.dat.gz 6-Hourly GeopotentialHeight at 1000mb rdfs:domain hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height rdf:type cru:Height#genid2 Rdf:Seq rdf:_1 cru:Height#genid2 Compressed rdf:_2 cru:Height#genid2 Kilobyte rdf:_3 cru:Height#genid2 2593 cru:size hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height#genid2 rdf:type cru:Height#genid3 rdf:Seq rdf:_1 cru:Height#genid3 Frequency rdf:_2 cru:Height#genid3 Hour rdf:_3 cru:Height#genid3 6 cru:observation hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height#genid3 rdf:type cru:Height#genid4 rdf#Seq rdf:_1 cru:Height#genid4 TimeRange rdf:_2 cru:Height#genid4 Year rdf:_3 cru:Height#genid4 1958 cru:period hgt.1958.1000.6h.w1.53x21.dat.gz cru:Height#genid4
RDF instance “MetInstance” converted into a relational table
Retrieval of Semantic Information
17
Retrieval of Semantic Information
RTE relies on SQL query processor to extract relevant triples Semantics Retrieval Language (SRL) prototype developed SQL-similar syntax
Example DESCRIBE RESOURCE “hgt.1958.1000.6h.w1.53x21.dat.gz”;
Processing of the above SRL queryStep 1: Transform the query into a standard SQL sentence and submit it to DB2
SELECT * FROM MetInstanceWHERE RESOURCE=“hgt.1958.1000.6h.w1.53x21.dat.gz”;
18
Retrieval of Semantic Information
Step 2RTE applies the rules to generate XML as the output:1. Extract name space prefixes and generate XML namespace node.
2. For all (real) atomic value create XML elements with Property values as XML elements
3. For all non-atomic values, create XML nodes as sub-elements of the resources where they appear as values
4. Ensure that if the node type is Seq container, all elements must be ordered
19
Conclusion
RTE-DBS approach enables querying and retrieval of semantic information from scientific data files available on the Web
Such retrieved information can be further processed by a machine or used by humans
Future work will be based on building a user interface into RTE to maintain individual triples to prevent removal of triples who are nodes
A method for for identifying data semantics of data sets, based on reasoning over semantic cases will be developed