introduction to ontology barry smith august 11, 2012

36
Introduction to Ontology Barry Smith August 11, 2012

Upload: henry-townsend

Post on 12-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to Ontology Barry Smith August 11, 2012

Introduction to Ontology

Barry SmithAugust 11, 2012

Page 2: Introduction to Ontology Barry Smith August 11, 2012

The problem of (big) data

Page 3: Introduction to Ontology Barry Smith August 11, 2012

Some questions

• How to find data?• How to understand data when you find it?• How to use data when you find it?• How to integrate with other data?• How to label the data you are collecting?• How to build a set of labels for a new domain that will

integrate well with labels used in neighboring domains?

Big problem: nearly all of this data is siloed

3

Page 4: Introduction to Ontology Barry Smith August 11, 2012

Sources• Examples of databases containing person data and data

pertaining to skills

PersonID SkillID

111 222

SkillID Name Description

222 Java Programming

ID SkillDescr

333 SQL

EmplID SkillName

444 Java

Page 5: Introduction to Ontology Barry Smith August 11, 2012

The problem: many, many silos

• DoD spends more than $6B annually developing a portfolio of more than 2,000 business systems and Web services

• these systems are poorly integrated• deliver redundant capabilities, • make data hard to access, foster error and waste• prevent secondary uses of data

https://ditpr.dod.mil/ Based on FY11 Defense Information Technology Repository (DITPR) data

5

Page 6: Introduction to Ontology Barry Smith August 11, 2012

6/

Page 7: Introduction to Ontology Barry Smith August 11, 2012

One road to a solution: Exploit the network effects of the Web

• You build a site.• Others discover the site and they link to it• The more they link to it, the more important and

well known the page becomes (this is what Google exploits)

• Your page becomes important, and others begin to rely on it

• Many people link to the data, use it• New ‘secondary uses’ of the data are discovered

With thanks to Ivan Herman

7

Page 8: Introduction to Ontology Barry Smith August 11, 2012

Unfortunately the Web is ruled by anarchy. However much we try to link web content

together à la google, we will still be left with many, many siloes.

Photo credit “nepatterson”, Flickr

8

Page 9: Introduction to Ontology Barry Smith August 11, 2012

To avoid silos, data must be available on the Web in a standard way.

Use “ontologies” to capture common meanings with logical definitions that are understandable to both humans and computers.

using a common language such as OWL (Web Ontology Language)

The idea of the Semantic Web

Page 10: Introduction to Ontology Barry Smith August 11, 2012

Annotate data using ontologies

Source Term Ontology LabelDb1.Name SE.SkillDb2.SkillDescr SE.ComputerSkillDb3.SkillName SE.ProgrammingSkillDb1.PersonID SE.PersonIDDb2.ID SE.PersonIDDb3.EmplID SE.PersonIDSE.ComputerSkill SE.SkillSE.ProgrammingSkill SE.ComputerSkill

Inconsistent and idiosyncratic terms used in source data are associated with single preferred labels from ontologies

Page 11: Introduction to Ontology Barry Smith August 11, 2012

Where we stand today• html demonstrated the power of the Web to

allow sharing of information • increasing availability of semantically enhanced

data• increasing power of semantic software to allow

automatic reasoning over online information• increasing use of OWL in attempts to break down

silos, and create useful integration of on-line data and information

11

Page 12: Introduction to Ontology Barry Smith August 11, 2012

Linked Open Data as of September 2010

Page 13: Introduction to Ontology Barry Smith August 11, 2012

Ontology success stories, and some reasons for failure

unfortunately this data is not really linked

13

Page 14: Introduction to Ontology Barry Smith August 11, 2012

Ontology success stories, and some reasons for failure

14

unfortunately this data is not really linked

Page 15: Introduction to Ontology Barry Smith August 11, 2012

The result: the more Semantic Technology is successful, they more it fails to achieve it goals

the very success of the approach leads to the creation of ever new controlled vocabularies , semantic silos – because multiple ontologies are being created in ad hoc ways

The Semantic Web framework as currently conceived yields minimal standardization

Creates semantic siloes

15

Page 16: Introduction to Ontology Barry Smith August 11, 2012

Basic Formal Ontology (BFO)

top-level architecture used in over 120 ontology projects world wide

Next tutorial in this series: August 18-19, 2012http://ncorwiki.buffalo.edu/index.php/Basic_Formal_Ontology_2.0

Page 17: Introduction to Ontology Barry Smith August 11, 2012

People will tell you, all you need is …

17

XML gives you: processable tagging + syntactic interoperability

RDF gives you: net-centricity (URIs for unique and consistent naming), linked data

OWL (Web Ontology Language) gives you: RDF + semantic interoperability, richer logic

Page 18: Introduction to Ontology Barry Smith August 11, 2012

Levels of coordinationbut these are just tools:

• they do not rule out stovepipes• they do not prevent redundant efforts• they do not imply high quality ontologies of

the sort that will support reasoningEven if we all speak Irish, thus does not mean that we all understand each other

18

Page 19: Introduction to Ontology Barry Smith August 11, 2012

Warning 1.• OWL implementation is not enough• the issues we face are not only logical, but

also sociological• they are the same issues already endemic in

the database world – database architecture is inflexible– database systems, once distributed, degrade very

quickly; create stovepipes, forking, siloes …• How to ensure coordinated ontology

development over time?

Page 20: Introduction to Ontology Barry Smith August 11, 2012

Suggested principles for an ontologist’s code of ethics

1. I hereby swear that I will reuse existing ontology content wherever possible

2. I hereby swear that whenever I reuse terms from an existing ontology, I will keep their original source IDs

3. I hereby swear that before releasing an ontology I will aggressively test it in multiple independent real-world applications

4. I hereby swear that before committing a new term and definition to an ontology I will always think first

Page 21: Introduction to Ontology Barry Smith August 11, 2012

Some governance principles• Information sharing: to avoid ontology redundancy and

inconsistency, there must be sharing of information at every stage

• Collaborative development: where ontology development needs overlap, the communities involved must either develop shared resources or agree to a division of labor

• Leverage of existing resources: ontology development should wherever possible involve reuse of existing ontologies.

• Guiding role of subject-matter experts, who should be involved in the construction and maintenance of all domain ontology content

Page 22: Introduction to Ontology Barry Smith August 11, 2012

Warning 2.Ontology is a multi-disciplinary enterprise, in which the same terms are used in conflicting ways by different communities of ontologies

• universal, type, kind, class• instance• concept, model• representation• datum

22

Page 23: Introduction to Ontology Barry Smith August 11, 2012

The ontology spectrum (data focus)glossary: A simple list of terms and their definitions.

data dictionary: Terms, definitions, naming conventions and representations of the data elements in a computer system.

data model (e.g. JC3IEDM): Terms, definitions, naming conventions, representations and the beginning of specification of the relationships between data elements.

taxonomy: A complete data model in an inheritance hierarchy where all data elements inherit their behaviors from a single "super data element".

ontology: A complete, machine-readable specification of a conceptualization = conceptual data model

23

Page 24: Introduction to Ontology Barry Smith August 11, 2012

The ontology spectrum (reality focus)glossary: A simple list of terms and their definitions.

controlled vocabulary: A simple list of terms, definitions and naming conventions to ensure consistency.

taxonomy: A controlled vocabulary in which the terms form of a hierarchical representation of the types and subtypes of entities in a given domain.

The hierarchy is organized by the is_a (subtype) relation

ontology: A controlled vocabulary organized by is_a and by further formally defined relations, for example part_of.

24

Page 25: Introduction to Ontology Barry Smith August 11, 2012

FMA

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part_

of

is_a

Foundational Model of Anatomy25

Page 26: Introduction to Ontology Barry Smith August 11, 2012

In graph-theoretical terms:

Ontology Components:• alphanumeric IDs form nodes of the graph• each node is associated with some single term

(preferred label)• relationships between nodes, such as is_a form the

edges of the graph• definitions and synonyms are associated with each

node

26

Page 27: Introduction to Ontology Barry Smith August 11, 2012

Entity =def

anything which exists, including things and processes, functions and qualities, beliefs and actions, documents and software

27

Page 28: Introduction to Ontology Barry Smith August 11, 2012

A 515287 DC3300 Dust Collector Fan

B 521683 Gilmer Belt

C 521682 Motor Drive Belt

instances

universals28

Page 29: Introduction to Ontology Barry Smith August 11, 2012

Catalog vs. inventoryOntology vs. list of items in your warehouse

29

Page 30: Introduction to Ontology Barry Smith August 11, 2012

Warning 3.Do not confuse things with words and ideas

• Level 1: the entities in reality, both instances and universals

• Level 2: cognitive representations of this reality on the part of scientists ...

• Level 3: publicly accessible concretizations of these cognitive representations in textual and graphical artifacts

30

Page 31: Introduction to Ontology Barry Smith August 11, 2012

Ontology development

starts with: Level 2 = the cognitive representations of practitioners or researchers in the relevant domain

results in: Level 3 representational artifacts (comparable to maps, science texts, dictionaries)

31

Page 32: Introduction to Ontology Barry Smith August 11, 2012

Domain =def.

a portion of reality that forms the subject-matter of a single science or technology or mode of study;

proteomicsHIVdemographics...

32

Page 33: Introduction to Ontology Barry Smith August 11, 2012

Representation =def.

an image, idea, map, picture, name or description ... of some entity or entities

two kinds of representation:

analogue (photographs)

digital/composite/syntactically structured

33

Page 34: Introduction to Ontology Barry Smith August 11, 2012

Class =def.a maximal collection of particulars referred to by a general term

the class A =def. the collection of all particular A’s

where ‘A’ is a general term (e.g. ‘brother of Elvis fan’, ‘cell’)

Classes are on the same level as the instances which they contain

34

Page 35: Introduction to Ontology Barry Smith August 11, 2012

(Scientific) Ontology =def.

a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent

1. universals in reality

2. those relations between these universals which obtain universally (= for all instances)

lung is_a anatomical structure

lobe of lung part_of lung

35

Page 36: Introduction to Ontology Barry Smith August 11, 2012

Ontology (science)

the science of the kinds and structures of objects, properties, events, processes and relations in every domain of reality

36