feasting on brains! from web services to web 2.0 to the semantic web and back again…

Feasting on Brains!Feasting on Brains!From Web Services From Web Services to Web 2.0 to Web 2.0 to the Semantic Web to the Semantic Web and back again…and back again…

A personal journey through the Semantic Web and Web Services for Health Care and Life SciencesMark Wilkinson (markw@illuminae.com)Assistant Professor, Medical GeneticsUniversity of British ColumbiaHeart and Lung Research Institute at St. Paul’s Hospital

Benjamin Good(He’s a “Creep”!)

approach

“Bioinformatics” is a broad fieldand suffers SEVERE interoperability problems

Is it possible to extract the knowledge Required for interoperability from the brains of

bioinformaticians en masse?

As a group, the brains of all bioinformaticians Contain all (known) bioinformatics

“Bioinformaticians” tend to be specialists in a particular domain of computational analysis

“Human Computation”(luis von Ahn)

Ontology Spectrum

Catalog/ID

SelectedLogical

Constraints(disjointness,

inverse, …)

Terms/glossary

Thesauri“narrower

term”relation

Formalis-a

Frames(properties)

Informalis-a

Formalinstance

Value Restrs.

GeneralLogical

constraints

Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness.Description in: www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html

An ontology is a representation of knowledge

Animal

Mammal

Primate

Lemur HumanZombie

Brains ChipsShoots

Classes, instances

properties, relationships

ClassesAnimal

Mammal

Primate

Lemur HumanZombie

Brains ChipsShoots

instances

Properties

relations

Animal

Mammal

Primate

Lemur HumanZombie

Brains ChipsShoots

HairHair

Classes, instances

Web Service?

A software tool that is accessible over the Web

Web Services are intended to be accessed by machines, not people.

Interoperability?

The ability of two Web Services to exchange information, and use that information correctly

This generally requires Semantics in the form of Ontologies…

BioMobyBioMobyEating brains to Eating brains to enable Web enable Web Service Service InteroperabilityInteroperability

Mmmm… Mmmm… Brains!!Brains!!

What does BioMoby do?

• Create an ontology of bioinformatics data-types• Define an ontology of bioinformatics operations• Open these ontologies for community input• Define Web Services v.v. these two ontologies

• A Machine can find an appropriate service• A Machine can execute that service unattended• Ontology is community-extensible

The BioMoby PlanThe BioMoby Plan

Gene names

MOBYCentral

MOBY hosts & services

SequenceAlignment SequenceExpress. Protein Alleles…

AlignPhylogenyPrimers

Overview of BioMoby Semantic Interoperability

Why couldn’t we do this before?

Interoperability

is HARD!

Interoperability throughHuman Computation

BioMoby Data Type Ontology: An explicit list of all biological data-types, and the

relationships between them.

Ontology built, brain by brain, by informaticians!

We achieve interoperability simply because informaticians donate their brain-power

HUMAN COMPUTATION

A portion of the BioMobyOntology

…built from the brains of the community!

……so what can I do with it?so what can I do with it?

Analytical workflow Discovery

No explicit coordination between providers

Run-time discovery of appropriate tools

Automated execution of those tools

The machine “understands” the data you have in-hand, and assists you in choosing the next step in

your analysis.

Interoperability throughHuman Computation

Individuals contributed their knowledge about bioinformatics data-types to

a central ontology

Their combined knowledge enabled the construction of an interoperable framework

Who uses BioMoby?

Usage Statistics

15 Nations

> 60 independent institutions

>1600 interoperable Bioinformatics Resources

~500,000 requests for “brokering” each month

What have we What have we learned?learned?

We can consume We can consume the brains of a the brains of a

large community… large community…

……to generate to generate something complex, something complex,

yet organizedyet organized

Open Open KimonoKimono

The BioMoby ontology is The BioMoby ontology is actually quite messy…actually quite messy…

……communal brains communal brains cancan build useful ontologies, build useful ontologies,

but the problem is…but the problem is…

Ontologies are HARD!

How are ontologies How are ontologies usually constructed?usually constructed?

By small, hard-working, dedicated groups with lots of money!

• Gene Ontology & code– Curated: ~5 full-time staff– ~$25 Million (Lewis,S personal communication)

• NCI Metathesaurus & code– Curated: ~12 full-time staff– ~$15 Million (Peter K. , estimate)

• Health Level 7 (HL7)– Curated– $Lots… Some claim as much as $15 Billion

(Smith, Barry, KBB Workshop, Montreal, 2005)

To build the global Semantic Web for To build the global Semantic Web for Systems Biology we need to encode Systems Biology we need to encode knowledge from EVERY domain of knowledge from EVERY domain of

biology – from barley root apex biology – from barley root apex structure and function, to HIV clinical-structure and function, to HIV clinical-

trials outcomes… and this knowledge is trials outcomes… and this knowledge is constantly changing!constantly changing!

At >$15M each, can we At >$15M each, can we affordafford the the Semantic Web???Semantic Web???

iCAPTUReriCAPTURerexperimentexperiment

Mmmm… Mmmm…

Need MORE Need MORE Brains!!Brains!!

Dr. Bruce McManus with a human heart

in his hands

He knows his hearts…

…but he doesn’tknow how to build

an ontology

What we need

The Problem

The Solution?

So… how do we do it?

Remember what we learnedfrom Moby…

…communities CAN build ontologies!

Building Systems BiologyOntologies through Human Computation

iCAPTURer

Benjamin GoodPh.D. Student, UBC Bioinformatics

Genome BC Better Biomarkers in Transplantation project, St. Paul’s Hospital iCAPTURE Centre

Old Way

• KE drills the brain of one or a very few experts. • Painful, expensive, and time-consuming…

New Way? – the iCAPTURer

• KE creates a clever interface• No direct interaction with expert• Thousands of experts• Cheap Cheap Cheap!

iCAPTURer 1.0

Go to a scientific conference

Text-mine conference abstracts

Auto-Extract concepts

Put concepts into a series ofquestion “templates”

a web interface presents questions about these concepts to conference attendees

Give points for every question they answer

Give a prize to the highest point winner

Results

Is _____ a meaningful term?– Yes, No, I don’t know buttons

What is a synonym for ______– Text entry box

Where does _____ fit in the following tree of related terms?– Clickable tree

Knowledge Points Captured

1011 total

Observations

Yes/No questions work well

Text entry is less effective

Adding to a tree is a disaster!

Competition is a great motivatorfor human computation!

< $15,000,000

iCAPTURer 1.5

Start with hypothetical concept tree

Put concepts-concept relations into a series of true/false questions

Make a web interface to ask questions

If a relationship is false, then re-start at the root of the concept tree

Give points for every question they answer

Give a prize to the highest point winner

“Chatterbot”

“I’ve heard that a cardiac myocyte is a type of cardiac cell. Is this true?”

“I’ve heard that STEMI means the same thing as ST Elevated Myocardial Infarction. Is that

nonsense, or is it correct?”

“How do you feel about your mother?”

Results

Knowledge capture in 3 days

>11,000 Concepts

Full details of this experiment are available in:Proceedings of the Pacific Symposium on Biocomputing, 2006

Ontology Quality?

Potential Ontology Evaluation Metrics

• Domain independent– philosophical desiderata– graphical structure– satisfiability

• Domain specific – “Fit” to text– Similarity to a gold

standard– Task-based

– Manual, subjective– Auto, questionable value– Auto, useful, not enough

– Auto, dependent on NLP– Auto/Manual; gold standard

must exist!– Optimal! Auto/Manual, but not

generalizable

“Good”???

What do we mean by “Good”?

Ontology construction is “motivated by the goal of alignment not on concepts but on the universals in reality and thereby also on the

corresponding instances” - Barry Smith

Reality should be the benchmark for the “goodness” of an ontology

ontology evaluation based on referents

in reality

Chosen Philosophical Principle“Epistemology Precedes Ontology”

• A Class should refer to an invariant pattern of properties common among all its instances – Mammals have mammary glands and hair– Humans are an instance of the class Mammal

• Therefore…– If class-instances are mapped into an ontology– Each instance has “properties” or “qualities”– These properties or qualities SHOULD segregate

into different classes if the ontology is any good

Philosophical Desiderata

• Non-vagueness– at least one instance can exist with the Class pattern– Vague class: “mammalian cell wall”

• Non-ambiguity– no more than one common pattern per Class– Ambiguous class: “cell” (e.g. cell phone, jail cell)

• Non-redundancy– within the same level of granularity, no other class

refers to same common properties– Redundant classes: “human”, “homo sapiens”

Cimino, J, 1998

Realist Evaluation: Step 1Table of Instance-Properties

A Instance Char1 Char2 Class B?

I.1 Y N YI.2 Y Y YI.3 N N NI.4 N Y N... ... ... ...(Test one class at a time)

I.1I.2

I.3I.4

Realist Evaluation: Step 2Machine Learning

Instance Char1 Char2 Class B?

I.1 Y N YI.2 Y Y YI.3 N N NI.4 N Y N... ... ... ...

If char1 = YThen Class X

Pattern

Class B score for this pattern

Produced by Waikato University in New Zealand

An open source library containing implementations of hundreds of machine learning algorithms(rule learners, LDA, SVM, neural networks... )

Realist Evaluation

0.10.92

Instance

Char1Char

2Class

I.1 Y N Y

I.2 Y Y Y

I.3 N N N

I.4 N Y N

... ... ... ...

Class Scorefor

Each Class

Realist Evaluation - positive control

1. Identify an ontology that already has logical constraints on properties of a classes.

2. Assemble instances that have those properties

3. Classify the instances with a reasoner

4. Remove class restrictions from the ontology, but keep instances assigned to their classes

5. Look for patterns of instance properties

6. If successful, patterns should be detected

7. The higher the pattern score, the “gooder” the ontology is

Positive Control: Phosphabase

•An ontology describing different classes of phosphatase enzymes.

•Given the domain composition of a protein, phosphatase class can be inferred automatically.

Wolstencraft et al (2006) Protein classification using ontology classification Bioinformatics. Vol. 22 no. 14, pages 530–538

Remove the Logical Rules

• Remove the defining rules for each class

• Maintain the classified instances

• Execute the realist evaluation

• Can we re-discover the patterns that the logical class-rules used to dictate?

Realist Evaluation Positive Control

•25 classes from phosphabase tested on 700 simulated protein instances

•21 - pattern correctly identified for 100% of instances

•For 4 others, patterns identified covering 99, 92, 82, 82% of instances respectively.

Realist Evaluation Positive Control

•So the Phosphabase ontology is “good”

•We can detect strong patterns of properties in its instances that follow the philosophical desiderata

•This is unsurprising, since we knew that it was “good” in the first place…

Evaluation of Gene Ontology

is ongoing…

Interesting side effect…

Class-defining rules are generated by the realist evaluation

Most existing bio-ontologies lack formal class-definitions

This evaluation could be used to create such rules automatic classifiers

Can also detect what TYPE of property is best “classified” by current bio-ontologies

Is Realist Evaluation a Valid metric?

the realist evaluation measures the success of an ontology in classifying a specific set of

properties

We claim that this is a metric relating to the quality of that ontology

Is this metric any better than other metric like graph complexity, or fit-to-text?

Evaluatingmetrics

OntoLoki – Making mischief with Ontologies

1. Take an ontology that we claim is “good”

2. Make it worse by mischievously adding changes

3. Measure the degree of “mischief”

4. Run the evaluation metric of interest

55 Metric score should correlate with the amount of mischief added

Comparison of ontology quality metrics

Amount of noise added (ontology quality decreasing)

Good Good Metric Metric

Bad Metric

Is Reality Evaluation a good metric?

Let’s OntoLoki it to find out!

0 0.05 0.1 0.15 0.2 0.25

OneR_avg_mean_KBi

Chi25_Jrip_avg_mean_KBi

Jrip_avg_mean_KBi

ZeroR_avg_mean_KBi

OntoLoki test of Realist Evaluation Metric

Good Metric!

Noise Added (a measure of nodes affected)

Conclusion

Human computation can collect significant amounts of knowledge in an organized way

OntoLoki seems to be effective atevaluating the evaluation metricsReality evaluation is an interesting new

metric for testing ontologies

Subjective iCAPTURer Observations

Humans had an EXTREMELY difficult time classifying concepts into pre-existing

feasting on brains! from web services to web 2.0 to the semantic web and back again…

semantic web

service unattended ontology

brains of bioinformaticians

biomobyeating brains

appropriate service

biological datatypes

webweb services

form of ontologiesmmmm

Documents

food, drink and feasting talk: hot spice gingerbread

delicate feasting theodore child 1890

fasting: feasting on god

feasting of the champions- planning

buddha's table - thai feasting vegetarian style

190944006 feasting on prosperity

juice feasting: closing - amazon...

physiological or biological approach to...

a time of feasting and fasting - wordpress.com

pentecost - feasting on the word

feasting, fasting & diamond dust

fasting & feasting: spiritual disciplines & activities

fausto_ feasting on people.pdf

fasting and feasting in lent

excursions · 7 6 8. feasting the senses. excursions 11 10...

storage.cloversites.comstorage.cloversites.com/swedesburgevangelicalluthera… ·...

the archaeology and politics of food and feasting

[chat mingkwan] buddha's table thai feasting

the 92 day juice feasting nutrition course review

march 2012 feasting