eagle bioinformatics symposium: 8. steve gardner, the importance of data representation: new tools...

40

Upload: eagle-genomics-ltd

Post on 11-May-2015

776 views

Category:

Healthcare


2 download

DESCRIPTION

The volume and diversity of life science and healthcare data have created huge data integration challenges. Technologies such as federation and warehousing allowed us to manage volume but didn't give us the ability to respond flexibly to change or to routinely create novel insights from those data. In part this is due to imperfect recording and understanding of the context of the data and in part due to the representations we use. This talk will explore some of the historical and future approaches to large-scale semantic data integration and look at new graph and geometrical approaches to large scale knowledge modelling.

TRANSCRIPT

Page 1: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 2: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

o

o

o

o

Page 3: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 4: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 5: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 6: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

select aminoid, seq1[0:6], xss[0:6] from amino a where seq1=‘R[2,4]+polar++hydroxyl+’

Page 7: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 8: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 9: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 10: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 11: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 12: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 13: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 14: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 15: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

GO:0003673 : Gene_Ontology (28348)

GO:0008150 : biological_process (21805)

GO:0005575 : cellular_component (13866)

GO:0003674 : molecular_function (20801)

GO:0008369 : obsolete (289)

GO:0004432 : 1-phosphatidylinositol-4-phosphate kinase, class IA (0)

GO:0003824 : enzyme(7162)

GO:0016301 : kinase(1027)

GO:0004428 : inositol/phosphatidylinositol kinase(37)

GO:0016307 : phosphatidylinositol phosphate kinase(9)

GO:0000285 : 1-phosphatidylinositol-3-phosphate 5-kinase(1)

GO:0016740 : transferase(2130)

GO:0016772 : transferase, transferring phosphorus-containing groups(1239)

GO:0016773 : phosphotransferase, alcohol group as acceptor(969)

GO:0004428 : inositol/phosphatidylinositol kinase(37)

GO:0016307 : phosphatidylinositol phosphate kinase(9)

GO:0000285 : 1-phosphatidylinositol-3-phosphate 5-kinase(1)

Ontology

Structured Data Sources Unstructured Data Sources

Page 16: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 17: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 18: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 19: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 20: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

o

oooooooo

oooooooo

ooooooooo

oooooooo

o

ooooooooooooooo

Page 21: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 22: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 23: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

o

o

o

o

o

o

o

Page 24: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 25: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

o

o

o

o

Context Vectors Term Vectors

1 2

3 n

‘Zinc’ ‘Finger’

1 2

3 n

Dot product comparisons of query vector vs term/context vectors gives semantic distance

‘Zinc finger’ OR addition

Query vector

Page 26: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

‘Tachycardia’ search – (untrained – no starting vocab provided) 400K clinical trials (500MB of XML), unfiltered result set Approx. 1.2M ‘terms’ in corpus

Vector length = semantic distance (in corpus) Colour = term density in corpus

Page 27: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 28: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 29: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

o

o

o

o

Page 30: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

o

o ° ° °

o

o

o

o

o

o

Page 31: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 32: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 33: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

o

o

o

o

Page 34: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

𝒙′

𝒚′

𝒛′

𝒙𝒚𝒛

𝟏 𝟎 𝟎𝟎 𝒄𝒐𝒔∅ 𝒔𝒊𝒏∅𝟎 −𝒔𝒊𝒏∅ 𝒄𝒐𝒔∅

Page 35: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

A B C D

0 0 0 0

0 0 0 1

0 0 1 0

0 0 1 1

0 1 0 1

0 1 1 0

0 1 1 1

1 0 0 1

1 0 1 0

1 0 1 1

1 1 0 1

1 1 1 0

1 1 1 1

A B C D

0 0 0 0

0 1 0 1 0 1 1

0 1 0 1 1 0

1 0 1 0 1 0 1 1

Page 36: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 37: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

o

o

o

o

o

Page 38: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling

o

o

o

o

o

Page 39: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling
Page 40: Eagle Bioinformatics Symposium: 8. Steve Gardner, The Importance of Data Representation: New Tools for Large Scale Knowledge Modelling