ontology and the semantic web barry smith august 26, 2013 1

47
Ontology and the Semantic Web Barry Smith August 26, 2013 1

Upload: oscar-snow

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ontology and the Semantic Web Barry Smith August 26, 2013 1

1

Ontology and the Semantic Web

Barry Smith

August 26, 2013

Page 2: Ontology and the Semantic Web Barry Smith August 26, 2013 1

2

Ontologies

• are computer-tractable representations of types in specific areas of reality

• are more and less general (upper and lower ontologies)– upper = organizing ontologies– lower = domain ontologies

Page 3: Ontology and the Semantic Web Barry Smith August 26, 2013 1

3FMA

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part_

of

is_a

Foundational Model of Anatomy

Page 4: Ontology and the Semantic Web Barry Smith August 26, 2013 1

ontologies = standardized labels designed for use in annotations

to make the data cognitively accessible to human beings

and algorithmically accessible to computers

4

Page 5: Ontology and the Semantic Web Barry Smith August 26, 2013 1

by allowing grouping of annotations

brain 20 hindbrain 15 rhombomere 10

Query brain without ontology 20Query brain with ontology 45

Ontologies facilitate retrieval of data

5

Page 6: Ontology and the Semantic Web Barry Smith August 26, 2013 1

ontologies = high quality controlled structured vocabularies used for the annotation (description, tagging) of data, images, emails, documents, …

6

Page 7: Ontology and the Semantic Web Barry Smith August 26, 2013 1

7

Ontology’s greatest successes around net-centricity

• You build a site• Others discover the site and they link to it• The more they link, the more well known the

page becomes (Google …)• Your data becomes discoverable• Your data becomes more easily discoverable

the more you use common vocabularies

Page 8: Ontology and the Semantic Web Barry Smith August 26, 2013 1

8

1. Each group creates a controlled vocabulary of the terms commonly used in its domain, and creates an ontology out of these terms using OWL (Web Ontology Language) syntax

4. Binds this ontology to its data and makes these data available on the Web

5. The ontologies are linked e.g. through their use of some common terms

6. These links create links among all the datasets, thereby creating a ‘web of data’

7. We can all share the same tags – they are called internet addresses

The roots of Semantic Technology

Page 9: Ontology and the Semantic Web Barry Smith August 26, 2013 1

9

Audio Features Ontology

Page 10: Ontology and the Semantic Web Barry Smith August 26, 2013 1

10

Audio Features Ontology

Page 11: Ontology and the Semantic Web Barry Smith August 26, 2013 1

11

Where we stand today• increasing availability of semantically enhanced

data and semantic software• increasing use of OWL (Web Ontology Language)

in attempts to create useful integration of on-line data and information

• “Linked Open Data” the New Big Thing

Page 12: Ontology and the Semantic Web Barry Smith August 26, 2013 1

12as of September 2010

Page 13: Ontology and the Semantic Web Barry Smith August 26, 2013 1

13

The problem: the more this sort of Semantic Technology is successful, they more it fails

The original idea was to break down silos via common controlled vocabularies for the tagging of data

The very success of the approach leads to the creation of ever new controlled vocabularies – semantic silos – as ever more ontologies are created in ad hoc ways

Every organization and sub-organization now wants to have its own “ontology”

The Semantic Web framework as currently conceived and governed by the W3C yields minimal standardization

Page 14: Ontology and the Semantic Web Barry Smith August 26, 2013 1

14

Divided we fail

Page 15: Ontology and the Semantic Web Barry Smith August 26, 2013 1

15

United we also fail

Page 16: Ontology and the Semantic Web Barry Smith August 26, 2013 1

The problem: many, many silos

• DoD spends more than $6B annually developing a portfolio of more than 2,000 business systems and Web services

• these systems are poorly integrated• deliver redundant capabilities, • make data hard to access, foster error and waste• prevent secondary uses of data

https://ditpr.dod.mil/ Based on FY11 Defense Information Technology Repository (DITPR) data

16

Page 17: Ontology and the Semantic Web Barry Smith August 26, 2013 1

17

what is missing here

Page 18: Ontology and the Semantic Web Barry Smith August 26, 2013 1

Syntactic and semantic interoperability

• Syntactic interoperability = systems can exchange messages (realized by XML).

• Semantic interoperability = messages are interpreted in the same way by senders and receivers.

• In UCore, meanings are specified via natural-language strings.

• Experience shows that this is not a viable route to achieving semantic interoperability.

18

Page 19: Ontology and the Semantic Web Barry Smith August 26, 2013 1

How to avoid the problem of semantic siloes

Distributed Development of a Shared Semantic Resource

Pilot testing to demonstrate feasibility for I2WD

19

Page 20: Ontology and the Semantic Web Barry Smith August 26, 2013 1

20

An alternative solution: Semantic Enhancement

A distributed incremental strategy of coordinated annotation

– data remain in their original state (is treated at ‘arms length’)– ‘tagged’ using interoperable ontologies created in tandem– allows flexible response to new needs, adjustable in real

time– can be as complete as needed, lossless, long-lasting because

flexible and responsive– big bang for buck – measurable benefit even from first small

investments

The strategy works only to the degree that it rests on shared governance and training

Page 21: Ontology and the Semantic Web Barry Smith August 26, 2013 1

compare: legends for mapscompare: legends for maps

21

Page 22: Ontology and the Semantic Web Barry Smith August 26, 2013 1

compare: legends for mapscommon legends allow (cross-border) integration

22

Page 23: Ontology and the Semantic Web Barry Smith August 26, 2013 1

The Gene Ontology

MouseEcotope GlyProt

DiabetInGene

GluChem

sphingolipid transporter

activity

23

Page 24: Ontology and the Semantic Web Barry Smith August 26, 2013 1

The Gene Ontology

MouseEcotope GlyProt

DiabetInGene

GluChem

Holliday junction helicase complex

24

Page 25: Ontology and the Semantic Web Barry Smith August 26, 2013 1

The Gene Ontology

MouseEcotope GlyProt

DiabetInGene

GluChem

sphingolipid transporter

activity

25

Page 26: Ontology and the Semantic Web Barry Smith August 26, 2013 1

Common legends• help human beings use and understand complex

representations of reality• help human beings create useful complex

representations of reality• help computers process complex

representations of reality• help glue data together

But common legends serve these purposes only if the legends are developed in a coordinated, non-redundant fashion

26

Page 27: Ontology and the Semantic Web Barry Smith August 26, 2013 1

International System of Units

27

Page 28: Ontology and the Semantic Web Barry Smith August 26, 2013 1

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)The Open Biomedical Ontologies (OBO) Foundry

28

Page 29: Ontology and the Semantic Web Barry Smith August 26, 2013 1

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RNAO, PRO)

Molecular Function(GO)

Molecular Process

(GO)

rationale of OBO Foundry coverage

GRANULARITY

RELATION TO TIME

29

Page 30: Ontology and the Semantic Web Barry Smith August 26, 2013 1

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OFORGANISMS

Family, Community, Deme, Population

OrganFunction

(FMP, CPRO)

Population Phenotype

PopulationProcess

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)Population-level ontologies 30

Page 31: Ontology and the Semantic Web Barry Smith August 26, 2013 1

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)Environment Ontology

envi

ron

men

ts

31

Page 32: Ontology and the Semantic Web Barry Smith August 26, 2013 1

32

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OF ORGANISMS

Family, Community,

Deme, Population OrganFunction

(FMP, CPRO)

Population

Phenotype

Population Process

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

(FMA, CARO)

Phenotypic Quality(PaTO)

Biological Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cell Com-

ponent(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

http://obofoundry.org

E N

V I R

O N

M E

N T

Page 33: Ontology and the Semantic Web Barry Smith August 26, 2013 1

33

RELATION TO TIME

GRANULARITY

CONTINUANT

INDEPENDENT

COMPLEX OF ORGANISMS

Family, Community,

Deme, Population

Environment of population

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

(FMA, CARO)

Environment of single organism

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cell Com-

ponent(FMA, GO)

Environment of cell

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular environment

http://obofoundry.org

E N

V I R

O N

M E

N T

Page 34: Ontology and the Semantic Web Barry Smith August 26, 2013 1

The OBO Foundry based on the idea of annotation = semantic enhancement of data across all of biology

$200 mill. spent so far on using the GO to annotate (tag) biomedical research data through manual effort of PhD biologusts

34

Page 35: Ontology and the Semantic Web Barry Smith August 26, 2013 1

OBO Foundry approach extended into other domains

35

NIF Standard Neuroscience Information Framework

ISF Ontologies Integrated Semantic Framework

OGMS and Extensions Ontology for General Medical Science

IDO Consortium Infectious Disease Ontology

cROP Common Reference Ontologies for Plants

Page 36: Ontology and the Semantic Web Barry Smith August 26, 2013 1

36

What these annotations do

• make data retrievable even by those not involved in their creation

• allow integration of data deriving from heterogeneous sources

• break down the walls of roach motels

Page 37: Ontology and the Semantic Web Barry Smith August 26, 2013 1

Benefits of the Approach• Does not interfere with the source content• Enables content to evolve in a cumulative fashion

as it accommodates new kinds of data• Does not depend on the data resources and can

be developed independently from them in an incremental and distributed fashion

• Provides a more consistent, homogeneous, and well-articulated presentation of the content which originates in multiple internally inconsistent and heterogeneous systems

37

Page 38: Ontology and the Semantic Web Barry Smith August 26, 2013 1

Benefits of the Approach• Makes management and exploitation of the

content more cost-effective• Allows graceful integration with other

government initiatives and brings the system closer to the federally mandated net-centric data strategy

• Creates incrementally an integrated content that is effectively searchable and that provides content to which more powerful analytics can be applied

38

Page 39: Ontology and the Semantic Web Barry Smith August 26, 2013 1

Building the Shared Semantic Resource

• Methodology of distributed incremental development

• Training• Governance• Common Architecture of Ontologies to support

consistency, non-redundancy, modularity– Upper Level Ontology (BFO)– Mid-Level Ontologies– Low Level Ontologies

39

Page 40: Ontology and the Semantic Web Barry Smith August 26, 2013 1

Goal: To realize Horizontal Integration(HI) of intelligence data

HI =Def. the ability to exploit multiple data sources as if they are one Problem: the data coming onstream are out of our

control Any strategy for HI must be agile in the sense that

it can be quickly extended to new zones of emerging data according to need

40

Page 41: Ontology and the Semantic Web Barry Smith August 26, 2013 1

I2WD StrategyCreate an agile strategy for building ontologies within a Shared Semantic Resource (SSR)

and apply and extend these ontologies to annotate new source data as they come onstream

⁻ Problem: Given the immense and growing variety of data sources, the development methodology must be applied by multiple different groups

⁻ How to manage collaboration?

41

Page 42: Ontology and the Semantic Web Barry Smith August 26, 2013 1

Why do large-scale ontology projects fail?

• focus on vocabularies, lexicons, with no logical structure, no attention to life cycle

• failure of housekeeping yields redundancy and therefore forking

• the same data is annotated in different ways by users of different ontology fragments

• data is siloed as before– HOW TO BUILD THE NEEDED LOGIC INTO THE

ARCHITECTURE OF THE ONTOLOGIES?42

Page 43: Ontology and the Semantic Web Barry Smith August 26, 2013 1

Examples of Principles• All terms in all ontologies should be singular nouns• Same relations between terms should be reused in

every ontology• Reference ontologies should be based on single

inheritance• All definitions should be of the form

an S = Def. a G which Dswhere ‘G’ (for: genus) is the parent term of S (for: species) in the corresponding reference ontology

Page 44: Ontology and the Semantic Web Barry Smith August 26, 2013 1

Anatomy Ontology(FMA*, CARO)

Environment

Ontology(EnvO)

Infectious Disease

Ontology(IDO*)

Biological Process

Ontology (GO*)

Cell Ontology

(CL)

CellularComponentOntology

(FMA*, GO*) Phenotypic Quality

Ontology(PaTO)

Subcellular Anatomy Ontology (SAO)

Sequence Ontology (SO*) Molecular

Function(GO*)Protein Ontology

(PRO*) Extension Strategy + Modular Organization 44

top level

mid-level

domain level

Information Artifact Ontology

(IAO)

Ontology for Biomedical

Investigations(OBI)

Spatial Ontology

(BSPO)

Basic Formal Ontology (BFO)

Page 45: Ontology and the Semantic Web Barry Smith August 26, 2013 1

Ontologies are built as orthogonal modules which form an incrementally evolving network

• scientists are motivated to commit to developing ontologies because they will need in their own work ontologies that fit into this network

• users are motivated by the assurance that the ontologies they turn to are maintained by experts

45

Page 46: Ontology and the Semantic Web Barry Smith August 26, 2013 1

More benefits of orthogonality

• helps those new to ontology to find what they need

• to find models of good practice• ensures mutual consistency of ontologies

(trivially)• and thereby ensures additivity of annotations

46

Page 47: Ontology and the Semantic Web Barry Smith August 26, 2013 1

More benefits of orthogonality• No need to reinvent the wheel for each new

domain• Can profit from storehouse of lessons learned• Can more easily reuse what is made by others• Can more easily reuse training• Can more easily inspect and criticize results of

others’ work• Leads to innovations (e.g. Mireot, Ontofox) in

strategies for combining ontologies

47