large-scale integration of data and text

162
Lars Juhl Jensen Large-scale integration of data and text

Upload: lars-juhl-jensen

Post on 10-May-2015

169 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Large-scale integration of data and text

Lars Juhl Jensen

Large-scale integration of data and text

Page 2: Large-scale integration of data and text

Lars Juhl Jensen

Large-scale integration of data and text

Page 3: Large-scale integration of data and text

Ph.D.

Page 4: Large-scale integration of data and text
Page 5: Large-scale integration of data and text

sequence analysis

Page 6: Large-scale integration of data and text

postdoc

Page 7: Large-scale integration of data and text

staff scientist

Page 8: Large-scale integration of data and text
Page 9: Large-scale integration of data and text

protein networks

Page 10: Large-scale integration of data and text

cellular signalling

Page 11: Large-scale integration of data and text

group leader

Page 12: Large-scale integration of data and text
Page 13: Large-scale integration of data and text

cofounder

Page 14: Large-scale integration of data and text
Page 15: Large-scale integration of data and text

data integration

Page 16: Large-scale integration of data and text

omics data

Page 17: Large-scale integration of data and text

association networks

Page 18: Large-scale integration of data and text

text mining

Page 19: Large-scale integration of data and text

biomedical literature

Page 20: Large-scale integration of data and text

electronic health records

Page 21: Large-scale integration of data and text

association networks

Page 22: Large-scale integration of data and text

guilt by association

Page 23: Large-scale integration of data and text
Page 24: Large-scale integration of data and text

STRING

Page 25: Large-scale integration of data and text

Franceschini et al., Nucleic Acids Research, 2013

Page 26: Large-scale integration of data and text

1100+ genomes

Page 27: Large-scale integration of data and text

genomic context

Page 28: Large-scale integration of data and text

gene fusion

Page 29: Large-scale integration of data and text

Korbel et al., Nature Biotechnology, 2004

Page 30: Large-scale integration of data and text

operons

Page 31: Large-scale integration of data and text

Korbel et al., Nature Biotechnology, 2004

Page 32: Large-scale integration of data and text

bidirectional promoters

Page 33: Large-scale integration of data and text

Korbel et al., Nature Biotechnology, 2004

Page 34: Large-scale integration of data and text

phylogenetic profiles

Page 35: Large-scale integration of data and text

Korbel et al., Nature Biotechnology, 2004

Page 36: Large-scale integration of data and text

a real example

Page 37: Large-scale integration of data and text
Page 38: Large-scale integration of data and text
Page 39: Large-scale integration of data and text
Page 40: Large-scale integration of data and text

Cell

Cellulosomes

Cellulose

Page 41: Large-scale integration of data and text

experimental data

Page 42: Large-scale integration of data and text

gene coexpression

Page 43: Large-scale integration of data and text
Page 44: Large-scale integration of data and text

physical interactions

Page 45: Large-scale integration of data and text

Jensen & Bork, Science, 2008

Page 46: Large-scale integration of data and text

genetic interactions

Page 47: Large-scale integration of data and text

Beyer et al., Nature Reviews Genetics, 2007

Page 48: Large-scale integration of data and text

curated knowledge

Page 49: Large-scale integration of data and text

pathways

Page 50: Large-scale integration of data and text

Letunic & Bork, Trends in Biochemical Sciences, 2008

Page 51: Large-scale integration of data and text

many databases

Page 52: Large-scale integration of data and text

different formats

Page 53: Large-scale integration of data and text

different identifiers

Page 54: Large-scale integration of data and text

variable quality

Page 55: Large-scale integration of data and text

not comparable

Page 56: Large-scale integration of data and text

not same species

Page 57: Large-scale integration of data and text

hard work

Page 58: Large-scale integration of data and text

(Ph.D. students)

Page 59: Large-scale integration of data and text

quality scores

Page 60: Large-scale integration of data and text

von Mering et al., Nucleic Acids Research, 2005

Page 61: Large-scale integration of data and text

calibrate vs. gold standard

Page 62: Large-scale integration of data and text

von Mering et al., Nucleic Acids Research, 2005

Page 63: Large-scale integration of data and text

homology-based transfer

Page 64: Large-scale integration of data and text

Franceschini et al., Nucleic Acids Research, 2013

Page 65: Large-scale integration of data and text

missing most of the data

Page 66: Large-scale integration of data and text

text mining

Page 67: Large-scale integration of data and text

>10 km

Page 68: Large-scale integration of data and text

too much to read

Page 69: Large-scale integration of data and text

computer

Page 70: Large-scale integration of data and text

as smart as a dog

Page 71: Large-scale integration of data and text

teach it specific tricks

Page 72: Large-scale integration of data and text
Page 73: Large-scale integration of data and text
Page 74: Large-scale integration of data and text

named entity recognition

Page 75: Large-scale integration of data and text

comprehensive lexicon

Page 76: Large-scale integration of data and text

cyclin dependent kinase 1

Page 77: Large-scale integration of data and text

CDC2

Page 78: Large-scale integration of data and text

flexible matching

Page 79: Large-scale integration of data and text

cyclin dependent kinase 1

Page 80: Large-scale integration of data and text

cyclin-dependent kinase 1

Page 81: Large-scale integration of data and text

orthographic variation

Page 82: Large-scale integration of data and text

CDC2

Page 83: Large-scale integration of data and text

hCdc2

Page 84: Large-scale integration of data and text

“black list”

Page 85: Large-scale integration of data and text

SDS

Page 86: Large-scale integration of data and text

augmented browsing

Page 87: Large-scale integration of data and text

Reflect

Page 88: Large-scale integration of data and text

browser add-on

Page 89: Large-scale integration of data and text

real-time text mining

Page 90: Large-scale integration of data and text

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010

Page 91: Large-scale integration of data and text

information extraction

Page 92: Large-scale integration of data and text

co-mentioning

Page 93: Large-scale integration of data and text

within documents

Page 94: Large-scale integration of data and text

within paragraphs

Page 95: Large-scale integration of data and text

within sentences

Page 96: Large-scale integration of data and text

NLPNatural Language Processing

Page 97: Large-scale integration of data and text

grammatical analysis

Page 98: Large-scale integration of data and text

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 99: Large-scale integration of data and text

more precise

Page 100: Large-scale integration of data and text

worse recall

Page 101: Large-scale integration of data and text

related web resources

Page 102: Large-scale integration of data and text

STITCH

Page 103: Large-scale integration of data and text

STRING + 300k chemicals

Page 104: Large-scale integration of data and text

stitch-db.org

Page 105: Large-scale integration of data and text

COMPARTMENTS

Page 106: Large-scale integration of data and text

compartments.jensenlab.org

Page 107: Large-scale integration of data and text

TISSUES

Page 108: Large-scale integration of data and text

tissues.jensenlab.org

Page 109: Large-scale integration of data and text

DISEASES

Page 110: Large-scale integration of data and text

diseases.jensenlab.org

Page 111: Large-scale integration of data and text

general framework

Page 112: Large-scale integration of data and text

curated knowledge

Page 113: Large-scale integration of data and text

experimental data

Page 114: Large-scale integration of data and text

text mining

Page 115: Large-scale integration of data and text

computational predictions

Page 116: Large-scale integration of data and text

common identifiers

Page 117: Large-scale integration of data and text

quality scores

Page 118: Large-scale integration of data and text

visualization

Page 119: Large-scale integration of data and text

web resources

Page 120: Large-scale integration of data and text

download files

Page 121: Large-scale integration of data and text

why so many?

Page 122: Large-scale integration of data and text

Swiss army knife syndrome

Page 123: Large-scale integration of data and text
Page 124: Large-scale integration of data and text

targeted resources

Page 125: Large-scale integration of data and text

common infrastructure

Page 126: Large-scale integration of data and text

medical data mining

Page 127: Large-scale integration of data and text

Jensen et al., Nature Reviews Genetics, 2012

Page 128: Large-scale integration of data and text
Page 129: Large-scale integration of data and text

opt-out

Page 130: Large-scale integration of data and text

opt-in

Page 131: Large-scale integration of data and text

centralized registries

Page 132: Large-scale integration of data and text

structured data

Page 133: Large-scale integration of data and text

Jensen et al., Nature Reviews Genetics, 2012

Page 134: Large-scale integration of data and text

14 years

Page 135: Large-scale integration of data and text

6.2 million patients

Page 136: Large-scale integration of data and text

119 million diagnoses

Page 137: Large-scale integration of data and text

distributions

Page 138: Large-scale integration of data and text

Jensen et al., submitted, 2014

Page 139: Large-scale integration of data and text

diagnosis trajectories

Page 140: Large-scale integration of data and text

Jensen et al., submitted, 2014

Page 141: Large-scale integration of data and text

Jensen et al., submitted, 2014

Page 142: Large-scale integration of data and text

complex trajectories

Page 143: Large-scale integration of data and text

Jensen et al., submitted, 2014

Page 144: Large-scale integration of data and text

confounding factors

Page 145: Large-scale integration of data and text

correlation ≠ causation

Page 146: Large-scale integration of data and text

electronic health records

Page 147: Large-scale integration of data and text

unstructured data

Page 148: Large-scale integration of data and text
Page 149: Large-scale integration of data and text

Danish

Page 150: Large-scale integration of data and text

busy doctors

Page 151: Large-scale integration of data and text

pharmacovigilance

Page 152: Large-scale integration of data and text

custom dictionaries

Page 153: Large-scale integration of data and text

drugs

Page 154: Large-scale integration of data and text

adverse drug events

Page 155: Large-scale integration of data and text

typo rules

Page 156: Large-scale integration of data and text

complex filters

Page 157: Large-scale integration of data and text

Eriksson et al., Drug Safetey, 2014

Page 158: Large-scale integration of data and text

new adverse drug reactions

Page 159: Large-scale integration of data and text

Eriksson et al., Drug Safety, 2014

Drug substance ADE p-value

Chlordiazepoxide Nystagmus 4.0e-8

Simvastatin Personality changes

8.4e-8

Dipyridamole Visual impairment

4.4e-4

Citalopram Psychosis 8.8e-4

Bendroflumethiazide

Apoplexy 8.5e-3

Page 160: Large-scale integration of data and text

direct medical implications

Page 161: Large-scale integration of data and text

AcknowledgmentsSTRING/STITCHChristian von MeringDamian SzklarczykMichael KuhnManuel StarkSamuel ChaffronChris CreeveyJean MullerTobias DoerksPhilippe JulienAlexander RothMilan SimonovicJan KorbelBerend SnelMartijn HuynenPeer Bork

Text miningSune FrankildJasmin SaricEvangelos PafilisKalliopi TsafouAlberto SantosJanos BinderHeiko HornMichael KuhnNigel BrownReinhardt SchneiderSean O’ Donoghue

EHR miningAnders Boeck JensenPeter Bjødstrup JensenRobert ErikssonFrancisco S. RoqueHenriette SchmockMarlene DalgaardMassimo AndreattaThomas HansenKaren SøebySøren BredkjærAnders JuulTudor OpreaPope MoseleyThomas WergeSøren Brunak

Page 162: Large-scale integration of data and text