feasting on brains! from web services to web 2.0 to the semantic web and back again…

Post on 03-Feb-2016

26 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Feasting on Brains! From Web Services to Web 2.0 to the Semantic Web and back again… A personal journey through the Semantic Web and Web Services for Health Care and Life Sciences Mark Wilkinson (markw@illuminae.com) Assistant Professor, Medical Genetics University of British Columbia - PowerPoint PPT Presentation

TRANSCRIPT

Feasting on Brains!Feasting on Brains!From Web Services From Web Services to Web 2.0 to Web 2.0 to the Semantic Web to the Semantic Web and back again…and back again…

A personal journey through the Semantic Web and Web Services for Health Care and Life SciencesMark Wilkinson (markw@illuminae.com)Assistant Professor, Medical GeneticsUniversity of British ColumbiaHeart and Lung Research Institute at St. Paul’s Hospital

Benjamin Good(He’s a “Creep”!)

approach

“Bioinformatics” is a broad fieldand suffers SEVERE interoperability problems

Is it possible to extract the knowledge Required for interoperability from the brains of

bioinformaticians en masse?

As a group, the brains of all bioinformaticians Contain all (known) bioinformatics

“Bioinformaticians” tend to be specialists in a particular domain of computational analysis

“Human Computation”(luis von Ahn)

Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrower

term”relation

Formalis-a

Frames(properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html

An ontology is a representation of knowledge

Animal

Mammal

Primate

Lemur HumanZombie

Hair

Brains ChipsShoots

has

eats

is_a

Hair

Hair

Classes, instances

properties, relationships

ClassesAnimal

Mammal

Primate

Lemur HumanZombie

Hair

Brains ChipsShoots

instances

Properties

has

eats

is_a

relations

eats

is_a

has

An ontology is a representation of knowledge

Animal

Mammal

Primate

Lemur HumanZombie

Hair

Brains ChipsShoots

has

eats

is_a

Hair

Hair

HairHair

Classes, instances

properties, relationships

Web Service?

A software tool that is accessible over the Web

Web Services are intended to be accessed by machines, not people.

Interoperability?

The ability of two Web Services to exchange information, and use that information correctly

This generally requires Semantics in the form of Ontologies…

BioMobyBioMobyEating brains to Eating brains to enable Web enable Web Service Service InteroperabilityInteroperability

Mmmm… Mmmm… Brains!!Brains!!

What does BioMoby do?

• Create an ontology of bioinformatics data-types• Define an ontology of bioinformatics operations• Open these ontologies for community input• Define Web Services v.v. these two ontologies

• A Machine can find an appropriate service• A Machine can execute that service unattended• Ontology is community-extensible

The BioMoby PlanThe BioMoby Plan

Gene names

MOBYCentral

MOBY hosts & services

SequenceAlignment SequenceExpress. Protein Alleles…

AlignPhylogenyPrimers

Overview of BioMoby Semantic Interoperability

Overview of BioMoby Semantic Interoperability

Why couldn’t we do this before?

Interoperability

is HARD!

Interoperability throughHuman Computation

BioMoby Data Type Ontology: An explicit list of all biological data-types, and the

relationships between them.

Ontology built, brain by brain, by informaticians!

We achieve interoperability simply because informaticians donate their brain-power

HUMAN COMPUTATION

A portion of the BioMobyOntology

…built from the brains of the community!

……so what can I do with it?so what can I do with it?

Analytical workflow Discovery

No explicit coordination between providers

Run-time discovery of appropriate tools

Automated execution of those tools

The machine “understands” the data you have in-hand, and assists you in choosing the next step in

your analysis.

Interoperability throughHuman Computation

Individuals contributed their knowledge about bioinformatics data-types to

a central ontology

Their combined knowledge enabled the construction of an interoperable framework

Who uses BioMoby?

Usage Statistics

15 Nations

> 60 independent institutions

>1600 interoperable Bioinformatics Resources

~500,000 requests for “brokering” each month

What have we What have we learned?learned?

We can consume We can consume the brains of a the brains of a

large community… large community…

……to generate to generate something complex, something complex,

yet organizedyet organized

Open Open KimonoKimono

The BioMoby ontology is The BioMoby ontology is actually quite messy…actually quite messy…

……communal brains communal brains cancan build useful ontologies, build useful ontologies,

but the problem is…but the problem is…

Ontologies are HARD!

How are ontologies How are ontologies usually constructed?usually constructed?

By small, hard-working, dedicated groups with lots of money!

• Gene Ontology & code– Curated: ~5 full-time staff– ~$25 Million (Lewis,S personal communication)

• NCI Metathesaurus & code– Curated: ~12 full-time staff– ~$15 Million (Peter K. , estimate)

• Health Level 7 (HL7)– Curated– $Lots… Some claim as much as $15 Billion

(Smith, Barry, KBB Workshop, Montreal, 2005)

To build the global Semantic Web for To build the global Semantic Web for Systems Biology we need to encode Systems Biology we need to encode knowledge from EVERY domain of knowledge from EVERY domain of

biology – from barley root apex biology – from barley root apex structure and function, to HIV clinical-structure and function, to HIV clinical-

trials outcomes… and this knowledge is trials outcomes… and this knowledge is constantly changing!constantly changing!

At >$15M each, can we At >$15M each, can we affordafford the the Semantic Web???Semantic Web???

iCAPTUReriCAPTURerexperimentexperiment

Mmmm… Mmmm…

Need MORE Need MORE Brains!!Brains!!

Dr. Bruce McManus with a human heart

in his hands

He knows his hearts…

…but he doesn’tknow how to build

an ontology

What we need

The Problem

The Solution?

The Solution?

So… how do we do it?

Remember what we learnedfrom Moby…

…communities CAN build ontologies!

Building Systems BiologyOntologies through Human Computation

iCAPTURer

Benjamin GoodPh.D. Student, UBC Bioinformatics

Genome BC Better Biomarkers in Transplantation project, St. Paul’s Hospital iCAPTURE Centre

Old Way

• KE drills the brain of one or a very few experts. • Painful, expensive, and time-consuming…

New Way? – the iCAPTURer

• KE creates a clever interface• No direct interaction with expert• Thousands of experts• Cheap Cheap Cheap!

iCAPTURer 1.0

Go to a scientific conference

Text-mine conference abstracts

Auto-Extract concepts

Put concepts into a series ofquestion “templates”

a web interface presents questions about these concepts to conference attendees

Give points for every question they answer

Give a prize to the highest point winner

Results

Is _____ a meaningful term?– Yes, No, I don’t know buttons

What is a synonym for ______– Text entry box

Where does _____ fit in the following tree of related terms?– Clickable tree

Knowledge Points Captured

464

340

207

1011 total

Observations

Yes/No questions work well

Text entry is less effective

Adding to a tree is a disaster!

Competition is a great motivatorfor human computation!

COST?

COST?

COST?

COST?

COST?

< $15,000,000

iCAPTURer 1.5

Start with hypothetical concept tree

Put concepts-concept relations into a series of true/false questions

Make a web interface to ask questions

If a relationship is false, then re-start at the root of the concept tree

Give points for every question they answer

Give a prize to the highest point winner

“Chatterbot”

“I’ve heard that a cardiac myocyte is a type of cardiac cell. Is this true?”

“I’ve heard that STEMI means the same thing as ST Elevated Myocardial Infarction. Is that

nonsense, or is it correct?”

“How do you feel about your mother?”

Results

Knowledge capture in 3 days

>11,000 Concepts

COST

$0

Full details of this experiment are available in:Proceedings of the Pacific Symposium on Biocomputing, 2006

Ontology Quality?

Potential Ontology Evaluation Metrics

• Domain independent– philosophical desiderata– graphical structure– satisfiability

• Domain specific – “Fit” to text– Similarity to a gold

standard– Task-based

– Manual, subjective– Auto, questionable value– Auto, useful, not enough

– Auto, dependent on NLP– Auto/Manual; gold standard

must exist!– Optimal! Auto/Manual, but not

generalizable

“Good”???

What do we mean by “Good”?

Ontology construction is “motivated by the goal of alignment not on concepts but on the universals in reality and thereby also on the

corresponding instances” - Barry Smith

Reality should be the benchmark for the “goodness” of an ontology

ontology evaluation based on referents

in reality

Chosen Philosophical Principle“Epistemology Precedes Ontology”

• A Class should refer to an invariant pattern of properties common among all its instances – Mammals have mammary glands and hair– Humans are an instance of the class Mammal

• Therefore…– If class-instances are mapped into an ontology– Each instance has “properties” or “qualities”– These properties or qualities SHOULD segregate

into different classes if the ontology is any good

Philosophical Desiderata

• Non-vagueness– at least one instance can exist with the Class pattern– Vague class: “mammalian cell wall”

• Non-ambiguity– no more than one common pattern per Class– Ambiguous class: “cell” (e.g. cell phone, jail cell)

• Non-redundancy– within the same level of granularity, no other class

refers to same common properties– Redundant classes: “human”, “homo sapiens”

Cimino, J, 1998

Realist Evaluation: Step 1Table of Instance-Properties

A Instance Char1 Char2 Class B?

I.1 Y N YI.2 Y Y YI.3 N N NI.4 N Y N... ... ... ...(Test one class at a time)

I.1I.2

I.3I.4

CB

Realist Evaluation: Step 2Machine Learning

Instance Char1 Char2 Class B?

I.1 Y N YI.2 Y Y YI.3 N N NI.4 N Y N... ... ... ...

If char1 = YThen Class X

100%

Pattern

Class B score for this pattern

WEKA

Produced by Waikato University in New Zealand

An open source library containing implementations of hundreds of machine learning algorithms(rule learners, LDA, SVM, neural networks... )

Realist Evaluation

0.35

0.10.92

Instance

Char1Char

2Class

1?

I.1 Y N Y

I.2 Y Y Y

I.3 N N N

I.4 N Y N

... ... ... ...

Class Scorefor

Each Class

Realist Evaluation - positive control

1. Identify an ontology that already has logical constraints on properties of a classes.

2. Assemble instances that have those properties

3. Classify the instances with a reasoner

4. Remove class restrictions from the ontology, but keep instances assigned to their classes

5. Look for patterns of instance properties

6. If successful, patterns should be detected

7. The higher the pattern score, the “gooder” the ontology is

Positive Control: Phosphabase

•An ontology describing different classes of phosphatase enzymes.

•Given the domain composition of a protein, phosphatase class can be inferred automatically.

Wolstencraft et al (2006) Protein classification using ontology classification Bioinformatics. Vol. 22 no. 14, pages 530–538

Remove the Logical Rules

• Remove the defining rules for each class

• Maintain the classified instances

• Execute the realist evaluation

• Can we re-discover the patterns that the logical class-rules used to dictate?

Realist Evaluation Positive Control

•25 classes from phosphabase tested on 700 simulated protein instances

•21 - pattern correctly identified for 100% of instances

•For 4 others, patterns identified covering 99, 92, 82, 82% of instances respectively.

Realist Evaluation Positive Control

•So the Phosphabase ontology is “good”

•We can detect strong patterns of properties in its instances that follow the philosophical desiderata

•This is unsurprising, since we knew that it was “good” in the first place…

Evaluation of Gene Ontology

is ongoing…

Interesting side effect…

Class-defining rules are generated by the realist evaluation

Most existing bio-ontologies lack formal class-definitions

This evaluation could be used to create such rules automatic classifiers

Can also detect what TYPE of property is best “classified” by current bio-ontologies

Is Realist Evaluation a Valid metric?

the realist evaluation measures the success of an ontology in classifying a specific set of

properties

We claim that this is a metric relating to the quality of that ontology

Is this metric any better than other metric like graph complexity, or fit-to-text?

Evaluatingmetrics

OntoLoki – Making mischief with Ontologies

1. Take an ontology that we claim is “good”

2. Make it worse by mischievously adding changes

3. Measure the degree of “mischief”

4. Run the evaluation metric of interest

55 Metric score should correlate with the amount of mischief added

Comparison of ontology quality metrics

Amount of noise added (ontology quality decreasing)

Good Good Metric Metric

Bad Metric

Mea

sure

d O

nto

log

y Q

ual

ity

Mea

sure

d O

nto

log

y Q

ual

ity

Is Reality Evaluation a good metric?

Let’s OntoLoki it to find out!

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0 0.05 0.1 0.15 0.2 0.25

OneR_avg_mean_KBi

Chi25_Jrip_avg_mean_KBi

Jrip_avg_mean_KBi

ZeroR_avg_mean_KBi

OntoLoki test of Realist Evaluation Metric

Good Metric!

Ave

rage

Cla

ss S

core

Noise Added (a measure of nodes affected)

Conclusion

Human computation can collect significant amounts of knowledge in an organized way

OntoLoki seems to be effective atevaluating the evaluation metricsReality evaluation is an interesting new

metric for testing ontologies

Subjective iCAPTURer Observations

Humans had an EXTREMELY difficult time classifying concepts into pre-existing

categories

Humans had an EXTREMELY difficult time defining new categories and placing them

into the existing classification system

Classification is

HARD!

Abandoning Classification

(briefly…)

An ontology is a representation of knowledge

Animal

Mammal

Primate

Lemur HumanGorilla

Hair

Big MediumSmall

has

has_size

is_a

Hair

Hair

Classes, instances

properties, relationships

AN ontology is ONE representation of knowledge

Animal

Mammal

Primate

Lemur HumanGorilla

Hair

Big MediumSmall

has

has_size

is_a

Hair

Hair

HairHair

Ontology of Anatomy

Animal

African_animal

Southern_African_animal

Aquatic plainsmountain

Africalives

is_a

Ontology of Habitat

Also might want… Odour, # digits, bone density, friendliness, cuteness..

AN ontology is ONE representation of knowledge

Clay Shirky: Ontology is Overrated…

• Attempts to predict the future– “Soviet Union” used to be a category in the Library

of Congress

• Attempts mind-reading– Size, location, odour.. Authors must predict what

users are interested in

• Great minds don’t think alike..– No two people are likely to create the same

ontology

http://www.shirky.com/writings/ontology_overrated.html

CategoriesProperties

Mass Mass CollaborativCollaborative Tagginge Tagging

BRAINS!! BRAINS!! MOREMORE

BRAINS!!BRAINS!!

Mass Open Social Tagging

A rapidly growing trend on the Web

Unstructured

Mass-collaboration

Anyone can say anything about anything using any words they wish

Connotea: Scientific Tagging(Connotea is a product of Nature Publishing Group)

Connotea Growth

Tagging is EASY!

The Tagged World

Tagging is easy!

Tagging costs nothing

Tagging empowers all viewpoints

Tagging is happening!!!!!!

Lexical Comparison of Tagging with

Formal Indexing Systemsand Ontologies

Ontology (FMA)

FMA Preflabels (11)

0

0.5

1

% OLP uniterms:

% OLP duplets:

OLP flexibility:

% containedByAnother:

Standard Deviation - Term

Length

Skewness - Term Length

complements:

compositions:

Ontology (GO Molecular Function)

GO_MF (15)

0

0.2

0.4

0.6

0.8% OLP uniterms:

% OLP duplets:

OLP flexibility:

% containedByAnother:

Standard Deviation - Term

Length

Skewness - Term Length

complements:

compositions:

Ontology (GO Biological Process)

GO_BP (13)

0

0.5

1% OLP uniterms:

% OLP duplets:

OLP flexibility:

% containedByAnother:

Standard Deviation - Term

Length

Skewness - Term Length

complements:

compositions:

Tagging (Bibsonomy)

Bibsonomy (20)

0

0.5

1% OLP uniterms:

% OLP duplets:

OLP flexibility:

%

containedByAnother:

Standard Deviation -

Term Length

Skewness - Term

Length

complements:

compositions:

Tagging (CiteULike)

CiteUlike (22)

0

0.5

1% OLP uniterms:

% OLP duplets:

OLP flexibility:

%

containedByAnother:

Standard Deviation -

Term Length

Skewness - Term

Length

complements:

compositions:

Tagging (Connotea)

Connotea (21)

0

0.5

1% OLP uniterms:

% OLP duplets:

OLP flexibility:

%

containedByAnother:

Standard Deviation -

Term Length

Skewness - Term

Length

complements:

compositions:

Ontologies and Folksonomies are fundamentally different!

GO_MF (15)

0

0.2

0.4

0.6

0.8% OLP uniterms:

% OLP duplets:

OLP flexibility:

% containedByAnother:

Standard Deviation - Term

Length

Skewness - Term Length

complements:

compositions:

Bibsonomy (20)

0

0.5

1% OLP uniterms:

% OLP duplets:

OLP flexibility:

%

containedByAnother:

Standard Deviation -

Term Length

Skewness - Term

Length

complements:

compositions:

Problem??

Folksonomies and ontologies are fundamentally different!

It may not be possible to derive one from the other accurately

Nevertheless, we would like to take advantage of tagging behaviour while gaining the power of

controlled vocabularies/Ontologies

E.D.The Entity Desciber

User types in all tags

Type-ahead

displays previously used tags

Connotea tagging

Connotea + E.D. Tagging

Leveraging Tagging?

“Tagging” effectively assigns properties to entities

ED Tagging constrains those properties to a controlled vocabulary or ontology

Can we discover patterns in those properties that indicate a “natural” classification system?

Can a “realist-evaluation” generate logical rules that define classes based on patterns of tags?

Final Thoughts

Ontologies are important, but hard to build

iCAPTURer: formal, template-based, cost-free consumption of biologist’s

brains seems to work!

Informal annotation (tagging) is cheap, easy, and scalable,

and is HAPPENING

Can we leverage tagging to create ontology-like structures? Maybe… Maybe not!

My journey back to Web Services

Why do I care about WS

so passionately?

The Deep Web

All the data and knowledge only accessible through Web Forms

Estimated to be orders of magnitude greater than the “surface Web”

- 91,000 Terabytes in the deep Web- 167 Terabytes in the Surface Web

Much of the Deep Web CANNOT be represented on the Semantic Web since it DOES NOT EXIST until the

Web Form is accessed

Moby 2.0 and

CardioSHARE

Merging the Deep Weband the Semantic Web

What Web Services do

SequenceData

BLAST SERVICE

Blast Hit

What BioMoby does

SequenceData

MOBY BLAST SERVICE

Blast Hit

Want Blast

The implied relationship between input and output

SequenceData

Blast Hit

givesBlastResult

Not “Bologically” Meaningful

The implied biological relationship between input and output

SequenceData

Blast Hit

hasHomologyTo

URIhasHomologyTo

URI

…looks a lot like the RDF statement…

To merge Web Servicesand the Semantic Web…

…Simply assertthe relationshipand let Moby do the rest!

Start with a partial Triple

URIrdf:type

Sequence

hasHomologyTo

What Moby 2.0 Does

MOBY BLAST SERVICE

Blast Hit

hasHomologyToURIrdf:type

Sequence

Moby 2.0

Predicate toWeb Service

Translator

Moby 2.0

hasHomologyTo property provided by

BLAST services

Need BLAST Service consuming rdf:type Sequence

Moby 2.0

Predicate toWeb Service

Translator

Moby 2.0 Query

FIND SERVICES THAT

Consume Sequence Data||

Provide hasHomologyTo Property||

Attached to other Sequence Data

Moby 2.0 extends SPARQL

SPARQL queries contain concepts and relationships of interest

Map RDF predicates onto Moby services capable of generating them

Registry query: “What Moby service consumes [subject] and generates the

[predicate] relationship type?”

But wait, there’s more!

CardioSHARE: Exploit knowledge in OWL-DL ontologies to enhance query

Subject Predicate Look up and execute Moby serviceConsumes proteins and generatesFunctional annotation property

Subject PredicateLook up and execute Moby serviceConsumes STK’s and Provides inhibitor property

Evaluate Query Expression

CardioSHARE: Exploit knowledge in OWL-DL ontologies to enhance query

This SPARQL query could be posed on a database of RAW, UNANNOTATEDProtein sequences, and be answered

by Moby 2.0

What do Moby 2.0 and CardioSHARE achieve?

Makes the Deep Web transparently accessible as if it were

a Semantic Web Resource

Allows SPARQL to do truly semantic queries!

Reduces the requirement of Biologists to know how/where to get

their data of interest

Simplifies construction of complex analytical pipelines by automating much of the

discovery/execution tasking

Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrower

term”relation

Formalis-a

Frames(properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html

Fin

top related