cshals 2010 w3c semanic web tutorial

95
The Semantic Web Landscape A Practical Introduction Lee Feigenbaum VP Technology & Standards, Cambridge Semantics Co-chair, W3C SPARQL Working Group For CSHALS 2010 Tutorial Attendees February 24, 2010

Upload: leefeigenbaum

Post on 07-May-2015

5.538 views

Category:

Technology


0 download

DESCRIPTION

These slides were presented as part of a W3C tutorial at the CSHALS 2010 conference (http://www.iscb.org/cshals2010). The slides are adapted from a longer introduction to the Semantic Web available at http://www.slideshare.net/LeeFeigenbaum/semantic-web-landscape-2009 . A PDF version of the slides is available at http://thefigtrees.net/lee/sw/cshals/cshals-w3c-semantic-web-tutorial.pdf .

TRANSCRIPT

Page 1: CSHALS 2010 W3C Semanic Web Tutorial

The Semantic Web LandscapeA Practical Introduction

Lee FeigenbaumVP Technology & Standards, Cambridge Semantics

Co-chair, W3C SPARQL Working Group

For CSHALS 2010 Tutorial AttendeesFebruary 24, 2010

Page 2: CSHALS 2010 W3C Semanic Web Tutorial

The W3C HCLS interest group set out to use Semantic Web technologies to receive precise answers to a complex question:

A Motivating Example: Drug Discovery

Find me genes involved in signal transduction that are related to pyramidal neurons.

Page 3: CSHALS 2010 W3C Semanic Web Tutorial

General search

223,000 hits, 0 results

Page 4: CSHALS 2010 W3C Semanic Web Tutorial

Domain-limited search

2,580 potential results

Page 5: CSHALS 2010 W3C Semanic Web Tutorial

Specific databases

Too many silos!

Page 6: CSHALS 2010 W3C Semanic Web Tutorial

A Semantic Web Approach

Integrate disparate databases…

MeSHPubMedEntrez GeneGene Ontology…

Page 7: CSHALS 2010 W3C Semanic Web Tutorial

A Semantic Web Approach (cont’d)

…so that one query…

Page 8: CSHALS 2010 W3C Semanic Web Tutorial

A Semantic Web Approach (cont’d)

…(trivially) spans several databases…

Page 9: CSHALS 2010 W3C Semanic Web Tutorial

A Semantic Web Approach (cont’d)

…to deliver targeted results…

Page 10: CSHALS 2010 W3C Semanic Web Tutorial

1. Agreement on common terms and relationships

2. Incremental, flexible data structure3. Good-enough modeling4. Query interface tailored to the data

model

What’s the trick?

Page 11: CSHALS 2010 W3C Semanic Web Tutorial

WHAT IS THE SEMANTIC WEB?

Page 12: CSHALS 2010 W3C Semanic Web Tutorial

Names

Page 13: CSHALS 2010 W3C Semanic Web Tutorial

Semantic WebWeb of DataGiant Global GraphData WebWeb 3.0Linked Data WebSemantic Data Web

Branding

Page 14: CSHALS 2010 W3C Semanic Web Tutorial

“The Semantic Web” a.k.a “Linked Open Data”Augments the World Wide WebRepresents the Web’s information in a machine-readable fashionEnables…

…targeted search…data browsing…automated agents

What is it & why do we care? (1)

World Wide Web : Web pages :: The Semantic Web : Data

Page 15: CSHALS 2010 W3C Semanic Web Tutorial

“Semantic Web technologies”A family of technology standards that ‘play nice together’, including:

Flexible data modelExpressive ontology languageDistributed query language

Drive Web sites, enterprise applications

What is it & why do we care? (2)

The technologies enable us to build applications and solutions that were not possible, practical, or feasible traditionally.

Page 16: CSHALS 2010 W3C Semanic Web Tutorial

A common set of technologies:...enables diverse uses...encourages interoperability

A coherent set of technologies:…encourage incremental application…provide a substantial base for innovation

A standard set of technologies:...reduces proprietary vendor lock-in...encourages many choices for tool sets

A Common & Coherent Set of Technology Standards

Page 17: CSHALS 2010 W3C Semanic Web Tutorial

The (In)Famous Layer Cake

Page 18: CSHALS 2010 W3C Semanic Web Tutorial

Semantic Web Technology Timeline

1999 2001 2004 2008 20102007

RIF

HCLS

Page 19: CSHALS 2010 W3C Semanic Web Tutorial

As technologies & tools have evolved, Semantic Web advocates have progressed through stages:

2010: Where we are

Report on… Execute on…

Semantic Web vision Initial experiments

Experiments Technology standards

Technology standards Software packages

Software packages Proofs of concept

Proofs of concept Production implementations

Page 20: CSHALS 2010 W3C Semanic Web Tutorial

2010: Where we’re not

Semantic Web technologies are not a ‘magic crank’ for discovering new drugs (or solving other problems, for that matter)!

Image from Trey Ideker via Enoch Huang

Page 21: CSHALS 2010 W3C Semanic Web Tutorial

2010: Where we’re not (cont’d)

The Semantic Web still suffers from confusing and conflicting messaging, each of which asserts it’s “correct”.

XML vs. RDF?“Ontology” vs. “ontology”?

Semantic Web vs. Linked Data?

Data integration vs. reasoning vs. KBs vs. search vs. app. development vs. …

Page 22: CSHALS 2010 W3C Semanic Web Tutorial

2010: Where we’re not (cont’d)

People with appropriate skill sets for designing & building Semantic Web solutions are not widely available.

Page 23: CSHALS 2010 W3C Semanic Web Tutorial

2010: Where we’re not (cont’d)

We don’t yet have standard solutions for privacy, trust, probability, and other elements of the Semantic Web vision.

Page 24: CSHALS 2010 W3C Semanic Web Tutorial

What do Semantic Web solutions look like?

Page 25: CSHALS 2010 W3C Semanic Web Tutorial

RDF is…

Resource Description Framework

Page 26: CSHALS 2010 W3C Semanic Web Tutorial

RDF is…

The data model of the Semantic Web.

Page 27: CSHALS 2010 W3C Semanic Web Tutorial

RDF is…

A schema-less data model that features unambiguous identifiers and named relations

between pairs of resources.

Page 28: CSHALS 2010 W3C Semanic Web Tutorial

RDF graphs are collections of triplesTriples are made up of a subject, a predicate, and an object

Resources and relationships are named with URIs

RDF is…

A labeled, directed graph of relations between resources and literal values.

subject objectpredicate

Page 29: CSHALS 2010 W3C Semanic Web Tutorial

“Lee Feigenbaum works for Cambridge Semantics”

“Lee Feigenbaum was born in 1978”

“Cambridge Semantics is headquartered in Massachusetts”

Example RDF triples

Lee Feigenbaum

Cambridge Semantics

works for

Lee Feigenbaum 1978

born in

Cambridge Semantics

headquarteredMassachusetts

Page 30: CSHALS 2010 W3C Semanic Web Tutorial

Triples connect to form graphs

Lee Feigenbaum

Cambridge Semantics

works for

1978

born inheadquartered

Massachusetts

Boston

lives in

capital

Page 31: CSHALS 2010 W3C Semanic Web Tutorial

The graph data structure makes merging data with shared identifiers trivialTriples act as a least common denominator for expressing dataURIs for naming remove ambiguity

…the same identifier means the same thing

Why RDF? What’s different here?

Page 32: CSHALS 2010 W3C Semanic Web Tutorial

Why RDF? Incremental Integration

Flexible Graph Model

URIs for

naming

Agile, Incremental

Integration

RelationalDatabase RDF

Page 33: CSHALS 2010 W3C Semanic Web Tutorial

RDF is the model, for which there are several concrete syntaxes:

RDF/XML – standard, complex XML syntaxTurtle – common, textual, triples-oriented syntaxN3 – more expressive superset of TurtleN-Triples – textual, line-oriented, useful for streaming

What does RDF look like?

When writing RDF by hand and in many guides, examples, and discussions these days, you’ll see Turtle most often.

Page 34: CSHALS 2010 W3C Semanic Web Tutorial

Write a triple by writing its parts separated by spaces (subject predicate object)

A Bit of Turtle

@prefix ex: <http://example.org/myvocab/> .@prefix geo: <http://geonames.example/> .

ex:LeeFeigenbaum ex:employer ex:CambridgeSemantics .ex:LeeFeigenbaum ex:birthYear 1978 .ex:CambridgeSemantics ex:headquarters geo:BostonMA .geo:BostonMA ex:population 574000 .

Page 35: CSHALS 2010 W3C Semanic Web Tutorial

SPARQL is…

SPARQL Protocol And RDF Query Language

Page 36: CSHALS 2010 W3C Semanic Web Tutorial

SPARQL is…

The query language of the Semantic Web.

Page 37: CSHALS 2010 W3C Semanic Web Tutorial

SPARQL is…

A SQL-like language for querying sets of RDF graphs.

Page 38: CSHALS 2010 W3C Semanic Web Tutorial

SPARQL is…

A simple protocol for issuing queries and receiving results over HTTP. So…

Every SPARQL client works with every SPARQL server!

Page 39: CSHALS 2010 W3C Semanic Web Tutorial

SPARQL lets us:Pull information from structured and semi-structured data.Explore data by discovering unknown relationships.Query and search an integrated view of disparate data sources.Glue separate software applications together by transforming data from one vocabulary to another.

Why SPARQL?

Page 40: CSHALS 2010 W3C Semanic Web Tutorial

What automobiles get more than 25 miles per gallon, fit within my department’s budget, and can be purchased at a dealer located within 10 miles of one of my employees?

SELECT ?automobileWHERE { ?automobile a ex:Car ; epa:mpg ?mpg ; ex:dealer ?dealer . ?employee a ex:Employee ; geo:loc ?loc . ?dealer geo:loc ?dealerloc . FILTER(?mpg > 25 && geo:dist(?loc, ?dealerloc) <= 10) .}

Web dashboard SPARQL query

EmployeeDirectory

ERP / BudgetSystem

Web

Dealer 1Dealer 2

Dealer 3

EPA Fuel EfficiencySpreadsheet

SPARQL Query Engine

Page 41: CSHALS 2010 W3C Semanic Web Tutorial

bio2rdf.org – querying life sciences data

Page 42: CSHALS 2010 W3C Semanic Web Tutorial

bio2rdf.org – querying life sciences data

Page 43: CSHALS 2010 W3C Semanic Web Tutorial

3 pieces of the Semantic Web technology stack are about describing a domain well enough to capture (some of) the meaning of resources and relationships in the domain

RDF SchemaOWLRIF

From the explicit to the inferred

Apply knowledge to data to get more data.

Page 44: CSHALS 2010 W3C Semanic Web Tutorial

RDFS is…

RDF Schema

Page 45: CSHALS 2010 W3C Semanic Web Tutorial

Elements of:Vocabulary (defining terms)

I define a relationship called “prescribed dose.”

Schema (defining types)“prescribed dose” relates “treatments” to “dosages”

(my prescribed dose is 2mg; therefore 2mg is a dosage)

Taxonomy (defining hierarchies)Any “doctor” is a “medical professional”

(therefore Dr. Brown is a medical professional)

RDF Schema is…

Page 46: CSHALS 2010 W3C Semanic Web Tutorial

WOL OWL is…

Web Ontology Language

Page 47: CSHALS 2010 W3C Semanic Web Tutorial

Elements of ontologySame/different identity

“author” and “auteur” are the same relationtwo resources with the same “ISBN” are the same “book”

More expressive type definitionsA “cycle” is a “vehicle” with at least one “wheel”A “bicycle” is a “cycle” with exactly two “wheels”

More expressive relation definitions“sibling” is a symmetric predicatethe value of the “favorite dwarf” relation must be one of “happy”, “sleepy”, “sneezy”, “grumpy”, “dopey”, “bashful”, “doc”

OWL is…

Page 48: CSHALS 2010 W3C Semanic Web Tutorial

A class is a (named) collection of things with similar attributes

OWL: Rich Class Definitions

Page 49: CSHALS 2010 W3C Semanic Web Tutorial

A class is a (named) collection of things with similar attributes

OWL: Rich Class Definitions

Page 50: CSHALS 2010 W3C Semanic Web Tutorial

A class is a (named) collection of things with similar attributes

OWL: Rich Class Definitions

Page 51: CSHALS 2010 W3C Semanic Web Tutorial

OWL: Rich Class Definitions

Page 52: CSHALS 2010 W3C Semanic Web Tutorial

RIF is…

Rules Interchange Format

Page 53: CSHALS 2010 W3C Semanic Web Tutorial

Standard representation for exchanging sets of logical and business rulesLogical rules

A buyer buys an item from a seller if the seller sells the item to the buyerA customer becomes a "Gold" customer as soon as his cumulative purchases during the current year top $5000

Production rulesCustomers that become "Gold" customers must be notified immediately, and a golden customer card will be printed and sent to them within one weekFor shopping carts worth more than $1000, "Gold" customers receive an additional discount of 10% of the total amount

RIF is…

Page 54: CSHALS 2010 W3C Semanic Web Tutorial

Fantasy Land Architecture

Ontology / Schema+

Custom UI

Custom UI

Custom UI

Custom UI

Custom UI

Custom UI

Page 55: CSHALS 2010 W3C Semanic Web Tutorial

Reality

Internet

OracleRDB

DB2XML

LDAP Directory

Custom UI

Custom UI

Custom UI

Custom UI

Custom UI

Custom UI

Page 56: CSHALS 2010 W3C Semanic Web Tutorial

GRDDL is…

Gleaning Resource Descriptions from Dialects of Language

Page 57: CSHALS 2010 W3C Semanic Web Tutorial

GRDDL is…

A method for authoritatively getting RDF data from XML and XHTML documents.

Page 58: CSHALS 2010 W3C Semanic Web Tutorial

GRDDL is…

A mechanism for authoritatively deriving RDF data from families of XML and XHTML

documents.

Page 59: CSHALS 2010 W3C Semanic Web Tutorial

RDB2RDF is…

Relational Database to RDF

Page 60: CSHALS 2010 W3C Semanic Web Tutorial

RDB2RDF is…

A W3C Working Group to define a standard way to map from relational databases to RDF (and

SPARQL).

Page 61: CSHALS 2010 W3C Semanic Web Tutorial

A simple set of 4 guidelines for publishing RDF data on the Web (over HTTP)

Developed by Tim Berners-Lee in 2006

1. Use URIs as names for things• Globally unique identity

2. Use HTTP URIs • Everyone has a Web browser/client

3. When someone looks up a URI, provide useful information• …in the form of RDF data

4. Include links to other URIs• Foster discovery of additional information

Linked Data is…

Page 62: CSHALS 2010 W3C Semanic Web Tutorial

The LOD “cloud”, March 2009

Page 63: CSHALS 2010 W3C Semanic Web Tutorial

Application specific portions of the cloud Notably, bio-related data sets (in light purple)

some by the W3C “Linking Open Drug Data” task force

Page 64: CSHALS 2010 W3C Semanic Web Tutorial

RDFa is…

RDF in Attributes

Page 65: CSHALS 2010 W3C Semanic Web Tutorial

RDFa is…

A collection of HTML attributes that allow RDF to be embedded directly in Web pages.

Page 66: CSHALS 2010 W3C Semanic Web Tutorial

Don’t Repeat Yourself (DRY)In-context metadata (copy & paste)Authoritative (no screen scrapig)

Why RDFa?

Page 67: CSHALS 2010 W3C Semanic Web Tutorial

RDFa in action

Page 68: CSHALS 2010 W3C Semanic Web Tutorial

SEMANTIC WEB LANDSCAPE TODAY

Page 69: CSHALS 2010 W3C Semanic Web Tutorial

Semantic Web Tools

In 2010, there are a wide variety of open-source and commercial Semantic Web tools available.

Page 70: CSHALS 2010 W3C Semanic Web Tutorial

Triple storesBuilt on relational databaseNative RDF store

Development librariesFull-featured application servers

Types of RDF Tools

Most RDF tools contain some elements of each of these.

Page 71: CSHALS 2010 W3C Semanic Web Tutorial

Community-maintained listshttp://esw.w3.org/topic/SemanticWebTools

Emphasis on large triple storeshttp://esw.w3.org/topic/LargeTripleStores

Michael Bergman’s Sweet Tools searchable list:http://www.mkbergman.com/?page_id=325

Finding RDF Tools

Page 72: CSHALS 2010 W3C Semanic Web Tutorial

Query enginesThings that can run queriesMost RDF stores provide a SPARQL engine

Query rewritersE.g. to query relational databases (more later)

EndpointsThings that accept queries on the Web and return results

Client librariesThings that make it easy to ask queries

Types of SPARQL Tools

Page 73: CSHALS 2010 W3C Semanic Web Tutorial

Community-maintained list of query engineshttp://esw.w3.org/topic/SparqlImplementations

Publicly accessible SPARQL endpointshttp://esw.w3.org/topic/SparqlEndpoints

Michael Bergman’s Sweet Tools searchable list:http://www.mkbergman.com/?page_id=325

Finding SPARQL Tools

Page 74: CSHALS 2010 W3C Semanic Web Tutorial

Editors/environmentsOiled, Protégé, Swoop, TopBraid, Ontotrack, …

Developing Tools and Infrastructure

Page 75: CSHALS 2010 W3C Semanic Web Tutorial

Editors/environmentsOiled, Protégé, Swoop, TopBraid, Ontotrack, …

Reasoning systemsCerebra, FaCT++, Kaon2, Pellet, Racer, CEL, …

Developing Tools and Infrastructure

Pellet

KAON2 CEL

Page 76: CSHALS 2010 W3C Semanic Web Tutorial

Visualizing and Publishing Vocabularies

Page 77: CSHALS 2010 W3C Semanic Web Tutorial

Reusable, public ontologies

Measurement Units Ontology

The Event Ontology

FOAF

Page 78: CSHALS 2010 W3C Semanic Web Tutorial

Community-maintained list:http://esw.w3.org/topic/GrddlImplementations

GRDDL tools

Most GRDDL tools are adapters to existing RDF stores or SPARQL engines to allow loading or querying data from XML and XHTML sources.

Page 79: CSHALS 2010 W3C Semanic Web Tutorial

What about… everything else?

Standards don’t yet exist, but many tools exist to derive RDF and/or run SPARQL queries against

other sources of data.

Page 80: CSHALS 2010 W3C Semanic Web Tutorial

LDAP Directories

Squirrel RDFhttp://jena.sourceforge.net/SquirrelRDF/

Page 81: CSHALS 2010 W3C Semanic Web Tutorial

Excel spreadsheets

Anzo for Excelhttp://www.cambridgesemantics.com/products/anzo_for_excel

Page 82: CSHALS 2010 W3C Semanic Web Tutorial

Web-based data sources

Virtuoso Sponger Cartridgeshttp://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger

Page 83: CSHALS 2010 W3C Semanic Web Tutorial

Unstructured Text

Calaishttp://www.opencalais.com/

Page 84: CSHALS 2010 W3C Semanic Web Tutorial

Unstructured Text

Zemanta Web Servicehttp://developer.zemanta.com/

Page 85: CSHALS 2010 W3C Semanic Web Tutorial

On the WebGoogle, Yahoo!Best BuyNY TimesUS GovernmentUK Government

Where is it being used?

Page 86: CSHALS 2010 W3C Semanic Web Tutorial

IndustriesOil & Gas (integration, classification)Finance (structured data, ontologies, XBRL)Publishing (metadata)Government (structured data, metadata, classification)Libraries & museums (metadata, classification)IT (rapid application development & evolution)

Where is it being used?

Page 87: CSHALS 2010 W3C Semanic Web Tutorial

Health CareCleveland Clinic

Clinical researchData integration, classification (= better search)

UT School of HealthPublic health surveillanceSAPPHIRE—classification, ontology-driven development

VariousClinical Decision SupportAgile, rule-driven, scalable in the face of change

Where is it being used?

Page 88: CSHALS 2010 W3C Semanic Web Tutorial

Life SciencesAgile knowledgebases at PfizerTarget assessment at Eli LillyIntegrated information links at NovartisAstra Zeneca, J&J, UCB, …

Where is it being used?

CSHALS chronicles many of these uses and many more.

Page 89: CSHALS 2010 W3C Semanic Web Tutorial

TAKE-AWAY ADVICE

Page 90: CSHALS 2010 W3C Semanic Web Tutorial

These are horizontal, enabling technologies.But they apply particularly well to problems with these characteristics:

Heterogeneous data from multiple sourcesIncreasing reliance on connections within this data

Rapidly changing information needsSignificant early-mover advantageLarge amounts of data that would benefit from classification

Why are Semantic Web technologies appropriate for the life sciences?

Many tactical and strategic challenges in the life sciences industry feature these traits.

Page 91: CSHALS 2010 W3C Semanic Web Tutorial

Getting Started with Semantic Web technologies

Don’t boil the ocean.

Page 92: CSHALS 2010 W3C Semanic Web Tutorial

Getting Started with Semantic Web technologies

Goal: quick tactical wins on the path to large strategic valueBe sure to consider the operational ramifications

Who does what differently?

Ideal Semantic Web projects/applications have an incremental path towards broad deployment that generates demonstrable value along the way

Page 93: CSHALS 2010 W3C Semanic Web Tutorial

Look beyond the core Semantic Web capabilities and consider:

integration with existing enterprise systemsdevelopment & extension modelsdeployment, logging, maintenance, backuptoolinguser experience

Choose practical, enterprise-ready tools

If you choose to build new components and assemble existing components together, it’s quite

likely you’ll end up reinventing the wheel.

Page 94: CSHALS 2010 W3C Semanic Web Tutorial

What level of expertise is necessary?Technologies only?Technologies + API?Technologies + tooling?Tooling only?…

How will we acquire the expertise?In-house (and if so, how?)Vendor services3rd-party servicesOpen-source community

Plan for Acquiring Expertise

Page 95: CSHALS 2010 W3C Semanic Web Tutorial

I’m always happy to field questions & engage in discussion:

[email protected]

Thanks & Discussion