cshals 2010 w3c semanic web tutorial
DESCRIPTION
These slides were presented as part of a W3C tutorial at the CSHALS 2010 conference (http://www.iscb.org/cshals2010). The slides are adapted from a longer introduction to the Semantic Web available at http://www.slideshare.net/LeeFeigenbaum/semantic-web-landscape-2009 . A PDF version of the slides is available at http://thefigtrees.net/lee/sw/cshals/cshals-w3c-semantic-web-tutorial.pdf .TRANSCRIPT
The Semantic Web LandscapeA Practical Introduction
Lee FeigenbaumVP Technology & Standards, Cambridge Semantics
Co-chair, W3C SPARQL Working Group
For CSHALS 2010 Tutorial AttendeesFebruary 24, 2010
The W3C HCLS interest group set out to use Semantic Web technologies to receive precise answers to a complex question:
A Motivating Example: Drug Discovery
Find me genes involved in signal transduction that are related to pyramidal neurons.
General search
223,000 hits, 0 results
Domain-limited search
2,580 potential results
Specific databases
Too many silos!
A Semantic Web Approach
Integrate disparate databases…
MeSHPubMedEntrez GeneGene Ontology…
A Semantic Web Approach (cont’d)
…so that one query…
A Semantic Web Approach (cont’d)
…(trivially) spans several databases…
A Semantic Web Approach (cont’d)
…to deliver targeted results…
1. Agreement on common terms and relationships
2. Incremental, flexible data structure3. Good-enough modeling4. Query interface tailored to the data
model
What’s the trick?
WHAT IS THE SEMANTIC WEB?
Names
Semantic WebWeb of DataGiant Global GraphData WebWeb 3.0Linked Data WebSemantic Data Web
Branding
“The Semantic Web” a.k.a “Linked Open Data”Augments the World Wide WebRepresents the Web’s information in a machine-readable fashionEnables…
…targeted search…data browsing…automated agents
What is it & why do we care? (1)
World Wide Web : Web pages :: The Semantic Web : Data
“Semantic Web technologies”A family of technology standards that ‘play nice together’, including:
Flexible data modelExpressive ontology languageDistributed query language
Drive Web sites, enterprise applications
What is it & why do we care? (2)
The technologies enable us to build applications and solutions that were not possible, practical, or feasible traditionally.
A common set of technologies:...enables diverse uses...encourages interoperability
A coherent set of technologies:…encourage incremental application…provide a substantial base for innovation
A standard set of technologies:...reduces proprietary vendor lock-in...encourages many choices for tool sets
A Common & Coherent Set of Technology Standards
The (In)Famous Layer Cake
Semantic Web Technology Timeline
1999 2001 2004 2008 20102007
RIF
HCLS
As technologies & tools have evolved, Semantic Web advocates have progressed through stages:
2010: Where we are
Report on… Execute on…
Semantic Web vision Initial experiments
Experiments Technology standards
Technology standards Software packages
Software packages Proofs of concept
Proofs of concept Production implementations
2010: Where we’re not
Semantic Web technologies are not a ‘magic crank’ for discovering new drugs (or solving other problems, for that matter)!
Image from Trey Ideker via Enoch Huang
2010: Where we’re not (cont’d)
The Semantic Web still suffers from confusing and conflicting messaging, each of which asserts it’s “correct”.
XML vs. RDF?“Ontology” vs. “ontology”?
Semantic Web vs. Linked Data?
Data integration vs. reasoning vs. KBs vs. search vs. app. development vs. …
2010: Where we’re not (cont’d)
People with appropriate skill sets for designing & building Semantic Web solutions are not widely available.
2010: Where we’re not (cont’d)
We don’t yet have standard solutions for privacy, trust, probability, and other elements of the Semantic Web vision.
What do Semantic Web solutions look like?
RDF is…
Resource Description Framework
RDF is…
The data model of the Semantic Web.
RDF is…
A schema-less data model that features unambiguous identifiers and named relations
between pairs of resources.
RDF graphs are collections of triplesTriples are made up of a subject, a predicate, and an object
Resources and relationships are named with URIs
RDF is…
A labeled, directed graph of relations between resources and literal values.
subject objectpredicate
“Lee Feigenbaum works for Cambridge Semantics”
“Lee Feigenbaum was born in 1978”
“Cambridge Semantics is headquartered in Massachusetts”
Example RDF triples
Lee Feigenbaum
Cambridge Semantics
works for
Lee Feigenbaum 1978
born in
Cambridge Semantics
headquarteredMassachusetts
Triples connect to form graphs
Lee Feigenbaum
Cambridge Semantics
works for
1978
born inheadquartered
Massachusetts
Boston
lives in
capital
The graph data structure makes merging data with shared identifiers trivialTriples act as a least common denominator for expressing dataURIs for naming remove ambiguity
…the same identifier means the same thing
Why RDF? What’s different here?
Why RDF? Incremental Integration
Flexible Graph Model
URIs for
naming
Agile, Incremental
Integration
RelationalDatabase RDF
RDF is the model, for which there are several concrete syntaxes:
RDF/XML – standard, complex XML syntaxTurtle – common, textual, triples-oriented syntaxN3 – more expressive superset of TurtleN-Triples – textual, line-oriented, useful for streaming
What does RDF look like?
When writing RDF by hand and in many guides, examples, and discussions these days, you’ll see Turtle most often.
Write a triple by writing its parts separated by spaces (subject predicate object)
A Bit of Turtle
@prefix ex: <http://example.org/myvocab/> .@prefix geo: <http://geonames.example/> .
ex:LeeFeigenbaum ex:employer ex:CambridgeSemantics .ex:LeeFeigenbaum ex:birthYear 1978 .ex:CambridgeSemantics ex:headquarters geo:BostonMA .geo:BostonMA ex:population 574000 .
SPARQL is…
SPARQL Protocol And RDF Query Language
SPARQL is…
The query language of the Semantic Web.
SPARQL is…
A SQL-like language for querying sets of RDF graphs.
SPARQL is…
A simple protocol for issuing queries and receiving results over HTTP. So…
Every SPARQL client works with every SPARQL server!
SPARQL lets us:Pull information from structured and semi-structured data.Explore data by discovering unknown relationships.Query and search an integrated view of disparate data sources.Glue separate software applications together by transforming data from one vocabulary to another.
Why SPARQL?
What automobiles get more than 25 miles per gallon, fit within my department’s budget, and can be purchased at a dealer located within 10 miles of one of my employees?
SELECT ?automobileWHERE { ?automobile a ex:Car ; epa:mpg ?mpg ; ex:dealer ?dealer . ?employee a ex:Employee ; geo:loc ?loc . ?dealer geo:loc ?dealerloc . FILTER(?mpg > 25 && geo:dist(?loc, ?dealerloc) <= 10) .}
Web dashboard SPARQL query
EmployeeDirectory
ERP / BudgetSystem
Web
Dealer 1Dealer 2
Dealer 3
EPA Fuel EfficiencySpreadsheet
SPARQL Query Engine
bio2rdf.org – querying life sciences data
bio2rdf.org – querying life sciences data
3 pieces of the Semantic Web technology stack are about describing a domain well enough to capture (some of) the meaning of resources and relationships in the domain
RDF SchemaOWLRIF
From the explicit to the inferred
Apply knowledge to data to get more data.
RDFS is…
RDF Schema
Elements of:Vocabulary (defining terms)
I define a relationship called “prescribed dose.”
Schema (defining types)“prescribed dose” relates “treatments” to “dosages”
(my prescribed dose is 2mg; therefore 2mg is a dosage)
Taxonomy (defining hierarchies)Any “doctor” is a “medical professional”
(therefore Dr. Brown is a medical professional)
RDF Schema is…
WOL OWL is…
Web Ontology Language
Elements of ontologySame/different identity
“author” and “auteur” are the same relationtwo resources with the same “ISBN” are the same “book”
More expressive type definitionsA “cycle” is a “vehicle” with at least one “wheel”A “bicycle” is a “cycle” with exactly two “wheels”
More expressive relation definitions“sibling” is a symmetric predicatethe value of the “favorite dwarf” relation must be one of “happy”, “sleepy”, “sneezy”, “grumpy”, “dopey”, “bashful”, “doc”
OWL is…
A class is a (named) collection of things with similar attributes
OWL: Rich Class Definitions
A class is a (named) collection of things with similar attributes
OWL: Rich Class Definitions
A class is a (named) collection of things with similar attributes
OWL: Rich Class Definitions
OWL: Rich Class Definitions
RIF is…
Rules Interchange Format
Standard representation for exchanging sets of logical and business rulesLogical rules
A buyer buys an item from a seller if the seller sells the item to the buyerA customer becomes a "Gold" customer as soon as his cumulative purchases during the current year top $5000
Production rulesCustomers that become "Gold" customers must be notified immediately, and a golden customer card will be printed and sent to them within one weekFor shopping carts worth more than $1000, "Gold" customers receive an additional discount of 10% of the total amount
RIF is…
Fantasy Land Architecture
Ontology / Schema+
Custom UI
Custom UI
Custom UI
Custom UI
Custom UI
Custom UI
Reality
Internet
OracleRDB
DB2XML
LDAP Directory
Custom UI
Custom UI
Custom UI
Custom UI
Custom UI
Custom UI
GRDDL is…
Gleaning Resource Descriptions from Dialects of Language
GRDDL is…
A method for authoritatively getting RDF data from XML and XHTML documents.
GRDDL is…
A mechanism for authoritatively deriving RDF data from families of XML and XHTML
documents.
RDB2RDF is…
Relational Database to RDF
RDB2RDF is…
A W3C Working Group to define a standard way to map from relational databases to RDF (and
SPARQL).
A simple set of 4 guidelines for publishing RDF data on the Web (over HTTP)
Developed by Tim Berners-Lee in 2006
1. Use URIs as names for things• Globally unique identity
2. Use HTTP URIs • Everyone has a Web browser/client
3. When someone looks up a URI, provide useful information• …in the form of RDF data
4. Include links to other URIs• Foster discovery of additional information
Linked Data is…
The LOD “cloud”, March 2009
Application specific portions of the cloud Notably, bio-related data sets (in light purple)
some by the W3C “Linking Open Drug Data” task force
RDFa is…
RDF in Attributes
RDFa is…
A collection of HTML attributes that allow RDF to be embedded directly in Web pages.
Don’t Repeat Yourself (DRY)In-context metadata (copy & paste)Authoritative (no screen scrapig)
Why RDFa?
RDFa in action
SEMANTIC WEB LANDSCAPE TODAY
Semantic Web Tools
In 2010, there are a wide variety of open-source and commercial Semantic Web tools available.
Triple storesBuilt on relational databaseNative RDF store
Development librariesFull-featured application servers
Types of RDF Tools
Most RDF tools contain some elements of each of these.
Community-maintained listshttp://esw.w3.org/topic/SemanticWebTools
Emphasis on large triple storeshttp://esw.w3.org/topic/LargeTripleStores
Michael Bergman’s Sweet Tools searchable list:http://www.mkbergman.com/?page_id=325
Finding RDF Tools
Query enginesThings that can run queriesMost RDF stores provide a SPARQL engine
Query rewritersE.g. to query relational databases (more later)
EndpointsThings that accept queries on the Web and return results
Client librariesThings that make it easy to ask queries
Types of SPARQL Tools
Community-maintained list of query engineshttp://esw.w3.org/topic/SparqlImplementations
Publicly accessible SPARQL endpointshttp://esw.w3.org/topic/SparqlEndpoints
Michael Bergman’s Sweet Tools searchable list:http://www.mkbergman.com/?page_id=325
Finding SPARQL Tools
Editors/environmentsOiled, Protégé, Swoop, TopBraid, Ontotrack, …
Developing Tools and Infrastructure
Editors/environmentsOiled, Protégé, Swoop, TopBraid, Ontotrack, …
Reasoning systemsCerebra, FaCT++, Kaon2, Pellet, Racer, CEL, …
Developing Tools and Infrastructure
Pellet
KAON2 CEL
Visualizing and Publishing Vocabularies
Reusable, public ontologies
Measurement Units Ontology
The Event Ontology
FOAF
Community-maintained list:http://esw.w3.org/topic/GrddlImplementations
GRDDL tools
Most GRDDL tools are adapters to existing RDF stores or SPARQL engines to allow loading or querying data from XML and XHTML sources.
What about… everything else?
Standards don’t yet exist, but many tools exist to derive RDF and/or run SPARQL queries against
other sources of data.
LDAP Directories
Squirrel RDFhttp://jena.sourceforge.net/SquirrelRDF/
Excel spreadsheets
Anzo for Excelhttp://www.cambridgesemantics.com/products/anzo_for_excel
Web-based data sources
Virtuoso Sponger Cartridgeshttp://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger
Unstructured Text
Calaishttp://www.opencalais.com/
Unstructured Text
Zemanta Web Servicehttp://developer.zemanta.com/
On the WebGoogle, Yahoo!Best BuyNY TimesUS GovernmentUK Government
Where is it being used?
IndustriesOil & Gas (integration, classification)Finance (structured data, ontologies, XBRL)Publishing (metadata)Government (structured data, metadata, classification)Libraries & museums (metadata, classification)IT (rapid application development & evolution)
Where is it being used?
Health CareCleveland Clinic
Clinical researchData integration, classification (= better search)
UT School of HealthPublic health surveillanceSAPPHIRE—classification, ontology-driven development
VariousClinical Decision SupportAgile, rule-driven, scalable in the face of change
Where is it being used?
Life SciencesAgile knowledgebases at PfizerTarget assessment at Eli LillyIntegrated information links at NovartisAstra Zeneca, J&J, UCB, …
Where is it being used?
CSHALS chronicles many of these uses and many more.
TAKE-AWAY ADVICE
These are horizontal, enabling technologies.But they apply particularly well to problems with these characteristics:
Heterogeneous data from multiple sourcesIncreasing reliance on connections within this data
Rapidly changing information needsSignificant early-mover advantageLarge amounts of data that would benefit from classification
Why are Semantic Web technologies appropriate for the life sciences?
Many tactical and strategic challenges in the life sciences industry feature these traits.
Getting Started with Semantic Web technologies
Don’t boil the ocean.
Getting Started with Semantic Web technologies
Goal: quick tactical wins on the path to large strategic valueBe sure to consider the operational ramifications
Who does what differently?
Ideal Semantic Web projects/applications have an incremental path towards broad deployment that generates demonstrable value along the way
Look beyond the core Semantic Web capabilities and consider:
integration with existing enterprise systemsdevelopment & extension modelsdeployment, logging, maintenance, backuptoolinguser experience
Choose practical, enterprise-ready tools
If you choose to build new components and assemble existing components together, it’s quite
likely you’ll end up reinventing the wheel.
What level of expertise is necessary?Technologies only?Technologies + API?Technologies + tooling?Tooling only?…
How will we acquire the expertise?In-house (and if so, how?)Vendor services3rd-party servicesOpen-source community
Plan for Acquiring Expertise