semantics for scientific experiments and the web– the implicit, the formal and the powerful amit...

47
Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS ) lab, Univ. of Georgia November 4, 2005 BISCSE 2005: Berkeley Initiative in Soft Computing Special Event Acknowledgements: Christopher Thomas, Satya Sanket Sahoo, William York NIH Integrated Technology Resource for Biomedical Glycomics

Upload: elfreda-baker

Post on 11-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Semantics for Scientific Experiments and the Web–

the implicit, the formal and the powerful Amit Sheth

Large Scale Distributed Information Systems (LSDIS) lab, Univ. of Georgia

November 4, 2005BISCSE 2005: Berkeley Initiative in Soft Computing Special Event

Acknowledgements: Christopher Thomas, Satya Sanket Sahoo, William York

NIH Integrated Technology Resource for Biomedical Glycomics

Page 2: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

What can Semantic Web do?

SelfDescribing

Machine &Human

Readable

Issued bya TrustedAuthority

Easy toUnderstand

ConvertibleCan be

Secured

The Semantic Web:XML, RDF & Ontology

Adapted from William Ruh (CISCO)

Page 3: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

What can SW do for me?

Page 4: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Semantic Web introduction

After Tim Berners-Lee

OWL

SWRL, RuleML

Key themes:• Machine processable data -> Automation• Currently, KR (ontology) and reasoning is predominantly based on DL (crisp logic).

Page 5: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

• XML – surface syntax for structured documents– imposes no semantic constraints on the meaning of these documents.

• XML Schema – is a language for restricting the structure of XML documents.

• RDF – is a datamodel for objects ("resources") and relations between them, – provides a simple semantics for this datamodel– these datamodels can be represented in an XML syntax.

• RDF Schema – is a vocabulary for describing properties and classes of RDF resources – with a semantics for generalization-hierarchies of such properties and classes.

• OWL – adds more vocabulary for describing properties and classes:

• relations between classes (e.g. disjointness), • cardinality (e.g. "exactly one"), • equality, richer typing of properties, • characteristics of properties (e.g. symmetry), and enumerated classes.

Technologies for SW: From XML to OWL

Exp

ressive Po

wer

http://en.wikipedia.org/wiki/Semantic_web#Components_of_the_Semantic_Web

NO SEMANTICS

Relationships as first class objects– key to Semantics

SEMANTICS

Page 6: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Lotfi Zadeh: World Knowledge …“It is beyond question that, in recent years,

very impressive progress has been made through the use of such tools. But, a view which is advanced in the following is that bivalent-logic- based methods have intrinsically limited capability to address complex problems which arise in deduction from information which is pervasively ill-structured, uncertain and imprecise.”

“WORLD KNOWLEDGE AND FUZZY LOGIC”

Page 7: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Central thesis• Machines do well with “formal semantics”• Need ways to incorporate ways to deal with raw

data and unorganized information, real world phenomena involving complex relationships, and complex knowledge humans have, and the way machines deal with (reason with) knowledge

• need to support “implicit semantics” and “powerful semantic” which go beyond prevalent DL-centric approach and “bivalent semantics” based approach to the Semantic Web

• Approach: Extending the SW vision …

Page 8: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

The Semantic Web• capturing “real world semantics” is a major step towards

making the vision come true.• These semantics are captured in ontologies• Ontologies are meant to express or capture

– Agreement– Knowledge

• Ontology is in turn the center price that enables– resolution of semantic heterogeneity – semantic integration– semantically correlating/associating objects and

documents• Current choice for ontology representation is primarily

Description Logics

Page 9: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Ontologies – many questions remain• How do we design ontologies with the constituent

concepts/classes and relationships?• How do we capture knowledge to populate

ontologies• Certain knowledge at time t is captured; but real

world changes• imprecision, uncertainties and inconsistencies

– what about things of which we know that we don’t know?

– What about things that are “in the eye of the beholder”?

• Need more powerful semantics – probabilistic,…

Page 10: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Dimensions of expressiveness (temtative)

Valencebivalent Multivalent

discretecontinu

ous

Degre

e o

f Ag

reem

ent

Info

rmal

Sem

i-Form

al

Form

al

Expressiveness

RDF

FOL

Current Semantic

Web Focus

Futureresearch

RDFS

OWL

Higher OrderLogic

Page 11: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

• Implicit semantics:… refers to what is implicit in data and that is not represented explicitly in any machine processable syntax.

• Formal semantics:…represented in some well-formed syntactic form (governed by syntax rules). Have usually involved limiting expressiveness to allow for acceptable computational characteristics.

• Powerful semantics:.. involves representing and utilizing more powerful knowledge that is imprecise, uncertain, partially true, and approximate … . Soft computing has explored these types of powerful semantics.

Sheth, A. et al.(2005). Semantics for the Semantic Web: The Implicit, the Formal and the Powerful. Intl. Journal on Semantic Web and Information Systems 1(1), 1-18.

Page 12: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

The world is informal

Machines have a hard time

Even more than humans,

understanding the “real world"

The solution:

<joint meaning>

<meaning of>Formal</meaning of>

<meaning of>Semantics</meaning of></joint meaning>

Page 13: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Implicit semantics

• Most knowledge is available in the form of – Natural language NLP– Unstructured text statistical

• Needs to be extracted as machine processable semantics/ (formal) representation

• Soft computing (“computing with words”) could play a role here

Page 14: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

The world can be incomprehensible

Sometimes we only see a small part of the picture

We need the help of machines to exploit the implicit semanticsWe need to be able to see the big picture

Page 15: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

What are implicit semantics?• Every collection of data or repositories contains

hidden information• We need to look at the data from the right angle• We need to ask the right questions• We need the tools that can ask these questions

and extract the information we need• We need to translate part of what is conveyed by

informal semantics into formal semantics, since machines have much easier part to deal with it, and we could gain automation

Page 16: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

How can we get to implicit semantics?• Co-occurrence of documents or terms in the same cluster • A document linked to another document via a hyperlink• Automatic classification of a document to broadly indicate

what a document is about with respect to a chosen taxonomy

• Use the implied semantics of a cluster to disambiguate (does the word “palm” in a document refer to a palm tree, the palm of your hand or a palm top computer?)

• Evidence of related concepts to disambiguate• Bioinformatics applications that exploit patterns like

sequence alignment, secondary and tertiary protein structure analysis, etc.

• Techniques and Technologies: Text Classification/categorization, Clustering, NLP, Pattern recognition, …

• Soft computing (“computing with words”)?

Page 17: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Automatic Semantic Annotation of Text:Entity and Relationship Extraction

KB, statistical and linguistic

techniques

Page 18: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Discovering complex relationships

Page 19: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Discovering complex relationships

Page 20: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

William Woods

– “Over time, many people have responded to the need for increased rigor in knowledge representation by turning to first-order logic as a semantic criterion. This is distressing, since it is already clear that first-order logic is insufficient to deal with many semantic problems inherent in understanding natural language as well as the semantic requirements of a reasoning system for an intelligent agent using knowledge to interact with the world.” [KR2004 keynote]

Page 21: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

The world is complex

• Sometimes our perception plays tricks on us

• Sometimes our beliefs are inconsistent

• Sometimes we can not draw clear boundaries

• We need to express these uncertainties

we need more Powerful Semantics

Page 22: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Examples

Complex relationships• Uncertainty:

– Glycan binding sites– Glycan composition

• Functions:– Sea level rising related to global warming– Earthquakes nuclear tests

• Question-Answering

Page 23: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Bioinformatics Apps & Ontologies• GlycOGlycO: A domain ontology for glycan structures, glycan functions

and enzymes (embodying knowledge of the structure and metabolisms of glycans) Contains 600+ classes and 100+ properties – describe structural

features of glycans; unique population strategy URL: http://lsdis.cs.uga.edu/projects/glycomics/glyco

• ProPreOProPreO: a comprehensive process Ontology modeling experimental proteomics Contains 330 classes, 40,000+ instances Models three phases of experimental proteomics* –

Separation techniques, Mass Spectrometry and, Data analysis; URL: http://lsdis.cs.uga.edu/projects/glycomics/propreo

• Automatic semantic annotation of high throughput experimental data Automatic semantic annotation of high throughput experimental data (in progress)

• Semantic Web Process with WSDL-S for semantic annotations of Web Semantic Web Process with WSDL-S for semantic annotations of Web ServicesServices

– http://lsdis.cs.uga.edu -> Glycomics project (funded by NCRR)

Page 24: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

GlycO

Page 25: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Manual annotation of mouse kidney spectrum by a human expert. For clarity, only 19 of the major peaks have been annotated.

Example 1: Mass spectrometry analysis

Goldberg, et al, Automatic annotation of matrix-assisted laser desorption/ionization N-glycan spectra, Proteomics 2005, 5, 865–875

Page 26: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Mass Spectrometry ExperimentEach m/z value in mass spec diagrams can

stand for many different structures (uncertainty wrt to structure that corresponds to a peak)

• Different linkage• Different bond• Different isobaric structures

Page 27: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Very subtle differences

• Peak at 1219.1 • Same molecular

composition• One diverging link• Found in different

organisms• background knowledge

(found in honeybee venom or bovine cells) can resolve the uncertainty

These are core-fucosylated high-mannose glycans

CBank: 16155Honeybee venom

CBank: 16154Bovine

Page 28: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Even in the same organism• Both Glycans found in

bovine cells• Both have a mass of

3425.11• Same composition• Different linkage• Since expression levels

of different genes can be measured in the cell, we can get probability of each structure in the sample

Different enzymes lead to these linkages

CBank: 21821

CBank: 21982

Page 29: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Model 1: associate probability as part of Semantic Annotation

• Annotate the mass spec diagram with all possibilities and assign probabilities according to the scientist’s or tool’s best knowledge

Page 30: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

P(S | M = 3461.57) = 0.6 P(T | M = 3461.57)

= 0.4

Goldberg, et al, Automatic annotation of matrix-assisted laser desorption/ionization N-glycan spectra, Proteomics 2005, 5, 865–875

Page 31: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Model 2: Probability in ontological representation of Glycan structure• Build a generalized probabilistic glycan

structure that embodies several possible glycans

Page 32: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Recap

• Experiments usually leave us with some uncertainty

• In order to transfer the data for further processing, this uncertainty must be maintained in the description

Page 33: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Example 2: Question answering systems

Page 34: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Simple Question answering agent

Can the recent increase in the number of strong hurricanes be attributed

to global warming?

Page 35: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Complex QA agentQ1: Are humans responsible for

global warming?

Q2: Is global warming responsible

for increased hurricane activity?

Deduce probabilistic

result

Is the recent increase in the number of strong

hurricanes a man-made problem due to global

warming?

Data exchanged between agents is probabilistic;So an ontology needs probabilistic representation to Support such exchange at a semantic level

Page 36: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Example 3: More Complex Relationships

Page 37: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Cause-Effects & Knowledge discovery

VOLCANO

LOCATIONASH RAIN

PYROCLASTICFLOW

ENVIRON.

LOCATION

PEOPLE

WEATHER

PLANT

BUILDING

DESTROYS

COOLS TEMP

DESTROYS

KILLS

Page 38: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Inter-Ontological Relationships

A nuclear test could have caused an earthquakeif the earthquake occurred some time after thenuclear test was conducted and in a nearby region.

NuclearTest Causes Earthquake

<= dateDifference( NuclearTest.eventDate,

Earthquake.eventDate ) < 30

AND distance( NuclearTest.latitude,

NuclearTest.longitude,

Earthquake,latitude,

Earthquake.longitude ) < 10000

Page 39: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Knowledge Discovery - Example

Earthquake Sources Nuclear Test Sources

Nuclear Test May Cause Earthquakes

Is it really true?

Complex Relationshi

p:How do you model

this?

Page 40: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Knowledge Discovery - Example

Earthquake Sources Nuclear Test Sources

Is it really true?

Number of Earthquakes

01020304050607080

1900

1910

1920

1930

1940

1950

1960

1970

1980

1990

year

#

5.8-6

6-7

5.8-6avg6-7avg

0

10

20

30

40

50

60

70

80

1900

1905

1910

1915

1920

1925

1930

1935

1940

1945

1950

1955

1960

1965

1970

1975

1980

1985

1990

1995

0

50

100

150

200

250

300

350

400

# 5.8-6

# 6-7

# tests

Earthquakes of strength

5.8 - 7

Number of nuclear

testsPossible

correlation

Correlation unclear

Page 41: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

What are powerful semantics?• Powerful semantics should be formal• Powerful semantics should capture implicit

knowledge• Powerful semantics should cope with

inconsistencies• Powerful semantics should deal with

imprecision

Page 42: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Powerful Semantics

• The formalism needs to express probabilities and/or fuzzy memberships in a meaningful way, i.e. a reasoner must be able to meaningfully interpret the probabilistic relationships and the fuzzy membership functions

• The knowledge expressed must be interchangeable.

Page 43: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Current efforts• Zhongli Ding, Yun Peng, and Rong Pan,

BayesOWL: Uncertainty Modeling in Semantic Web Ontologies– Preliminary work, focuses on schema information, only

models subclass-relationships as a Bayesian Network in form of a directed acyclic graph (DAG).

– Inadequate, because e.g. the probability for a certain glycan structure is dependent on the relationships between the glycan at hand and the concentration of specific enzymes in the sample. probabilistic relationships

– The probabilities are different for different individuals, probabilities solely at the class level are insufficient. representation of uncertainty at the instance level

Page 44: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Current efforts

• Giorgos Stoilos, Giorgos Stamou, Vassilis Tzouvaras, Jeff Z. Pan and Ian Horrocks, Fuzzy OWL: Uncertainty and the Semantic Web – OWL serialization of the fuzzy description logic f-SHOIN

introduced by Umberto Straccia. Fuzzy OWL has model theoretic semantics.

– Fuzzy logic semantics are inadequate for expressing probabilities. Determining e.g. a glycan structure is finally a binary decision. There is no fuzziness in glycan structure.

Page 45: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

Conclutions

• Semantic Web is useful if we can capture/represent semantic of real world objects and phenomena

• Ontologies are the way to achieve this; relationships hold the key to semantics

• So what types of expressive representation is needed to model relationships

• Is crisp logic (e.g., DL) adequate (since current ontology representation is dominated by DL)– Not for complex relationships and knowledge that

involve vagueness or uncertainty

Page 46: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

The road to more powerImplicit Semantics

+Formal Semantics+Soft Computing Technologies=Powerful Semantics

• For more information: http://lsdis.cs.uga.edu– Especially see Glycomics project

Page 47: Semantics for Scientific Experiments and the Web– the implicit, the formal and the powerful Amit Sheth Large Scale Distributed Information Systems (LSDIS)

• What happened when hypertext was married to Internet? Web

Same could happen if soft computing can be appropriately married to current Semantic Web infrastructure.