semanteco annotator for linked data generation and generalized semantic mapping session:...

12
SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics for Biodiversity Symposium at the 2013 TDWG conference Patrice Seyed, Katherine Chastain, Brendan Ashby, Evan Patton, Tim Lebo, and Deborah McGuinness (presented by Cynthia Parr)

Upload: joy-hilary-barker

Post on 29-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

SemantEco Annotator for Linked Data Generation and

Generalized Semantic MappingSession: Technologies, Reasoning, and Annotation Methods

of the Semantics for Biodiversity Symposium at the 2013 TDWG conference

Patrice Seyed, Katherine Chastain, Brendan Ashby, Evan Patton, Tim Lebo, and Deborah McGuinness

(presented by Cynthia Parr)

Page 2: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

Introduction• Challenges of enabling search and discovery of

scientific data. • Semantic web technologies + Linked Data (LD) a

medium for meeting these challenges• 2 Major Obstacles:

1. The process of translating tabular data and domain knowledge sources into a linked data format still has its difficulties, based on existing tools.

2. The notion of building an IT infrastructure that relies heavily on linked data can be perceived as a risky proposition due to immaturity of current LD management tools.

Page 3: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

SemantEco Annotator

• Mitigates both obstacles • To address #1, plays the role of translator

– Converting tabular data into RDF– Leveraging OWL ontologies and vocabularies– Resulting enriched RDF data can be used immediately within RDF stores / hosted

as LD. • To address #2, plays the role of a semantic mapper

– column headers -> OWL properties – Column value typing -> OWL classes or datatypes – Mappings are serialized as RDF, can be used for

• RDF/XML to XML Schema via XSLT for use in non-linked data environments (e.g., SBC-LTER)– Clarifying or extending the schema of their data– Enabling optimized semantic search

• RDF based annotation (e.g., Open Annotation Model)– Services both LD or non-LD IT environments

• Provides the architects of non-LD environments the ability to “future proof” and migrate to LD at their own pace.

Page 4: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

SemantEco Annotator• A web application that a user

visits in a web browser, loads a CSV-delimited file

• The ontology selector menu to select hard-coded ontologies (e.g., OBO-E, SWEET, ENVO) or enter in a URI that is a URL that resolves to an RDF graph for vocabulary selection.

• Provides advanced manipulation features such as column based translation, and aggregating columns along implicit entity representations

• Recently to convert eBird data + eBird taxonomy into RDF, which is available now in our SemantEco Discovery and Search Portal, alongside water quality data, to enable a researcher to identify potential trends between water quality and organism counts.

Page 5: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

SemantEco AnnotatorScreenshot

Page 6: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

SemantEco AnnotatorScreenshot

Page 7: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

Mappings Example (in RDF/Turtle)prefix ov: <http://open.vocab.org/terms/>prefix conversion: <http://purl.org/twc/vocab/conversion/>prefix geonames: <http://www.mindswap.org/2003/owl/geo/geoFeatures20040307.owl#>

_:c1 rdf:type conversion:EnhancementConversionProcess;_:c1 conversion:enhance _:c2, _:c3_:c2 ov:csvCol http://base.org/source/SSS/dataset/DDD/version/VVV/input/1.csv#col3; ov:csvHeader "Lake Name"@en; conversion:label "Lake Name"@en; conversion:range rdfs:Resource; conversion:equivalent_property prov:atLocation ;

conversion:range_name “Lake”@en .

:_c3 conversion:class_name “Lake”@en; conversion:subclass_of geonames:GeographicFeature .

Page 8: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

Translated Data (in RDF/Turtle)prefix prov: <http://www.w3.org/ns/prov>

b:thing_2 void:inDataset

<http://purl.org/twc/semantgeo/source/a/dataset/b/version/2> ; prov:atLocation <http://purl.org/twc/semantgeo/source/a/dataset/b/typed/Big_Moose> ; e1:accession_code_sample "9446846" ; e1:date "30-Jun-94" ; e1:z_max_m "22.8" ; e1:sample_z_m "6" ; e1:nh4_mg_l "0.03" ; e1:no3_mg_l "1.3" ; ov:csvRow "2"^^xsd:integer . <http://purl.org/twc/semantgeo/source/a/dataset/b/typed/Big_Moose> a prov:Entity

Page 9: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

SPARQL CONSTRUCT(to refactor Mapping as an Annotation)

prefix oa: <http://www.w3.org/ns/oa#>CONSTRUCT {

//open annotation style (http://www.openannotation.org/spec/core/) _:x a oa:Annotation ; oa:target ?colNum ; oa:body ?property ; oa:body ?typing ; oa:motivatedBy oa:tagging .?property a owl:ObjectProperty . ?typing a owl:Class .}

WHERE {

?cp rdf:type conversion:EnhancementConversionProcess;?cp conversion:enhance ?en1, ?en2 .?en1 ov:csvCol ?colNum; ov:csvHeader ?colHeader

conversion:range_name ?className conversion:equivalent_property ?property .?en2 conversion:class_name ?className conversion:subclass_of ?typing.}

(?colNum has convention http://base.org/source/SSS/dataset/DDD/version/VVV/input/1.csv#col1)

Page 10: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

• “Okay, so I get annotations out, and I can do whatever I want with that -- but what could I possibly want?”1. We've already done it for translating tabular to RDF linked

data.2. The current RDF output of the annotator can be mapped to

other forms (OpenAnnotation Model)3. Annotate “legacy” stuff (e.g., XML) to facilitate semantic

mappings among them.4. Can extend it to annotation images, etc., as well

Page 11: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

Future Work

• Automatic mappings directed to a particular graph closed under a predicate/object pair, use of OWL domain and range restriction axioms to guide the user in vocabulary selection decisions

• Use of OWL class definitions to enable a top-down approach for modeling their data

• Ontology extraction to complement and enable reasoning alongside the generated RDF

• Architecting a platform for better management of linked data, within which the Annotator plays a vital role.

Page 12: SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics

SemantEco Annotator Quick Look (YouTube Video)

http://bit.ly/17VEfSp4:40 minute duration