string: large-scale data and text mining

95
STRING Large-scale data and text mining Lars Juhl Jensen

Upload: lars-juhl-jensen

Post on 27-Jun-2015

225 views

Category:

Science


4 download

DESCRIPTION

STRING: Large-scale data and text mining

TRANSCRIPT

Page 1: STRING: Large-scale data and text mining

STRINGLarge-scale data and text mining

Lars Juhl Jensen

Page 2: STRING: Large-scale data and text mining

association networks

Page 3: STRING: Large-scale data and text mining

guilt by association

Page 4: STRING: Large-scale data and text mining
Page 5: STRING: Large-scale data and text mining

biological systems

Page 6: STRING: Large-scale data and text mining

protein networks

Page 7: STRING: Large-scale data and text mining

STRING

Page 8: STRING: Large-scale data and text mining

1100+ genomes

Page 9: STRING: Large-scale data and text mining

computational predictions

Page 10: STRING: Large-scale data and text mining

gene fusion

Page 11: STRING: Large-scale data and text mining

Korbel et al., Nature Biotechnology, 2004

Page 12: STRING: Large-scale data and text mining

gene neighborhood

Page 13: STRING: Large-scale data and text mining

Korbel et al., Nature Biotechnology, 2004

Page 14: STRING: Large-scale data and text mining

phylogenetic profiles

Page 15: STRING: Large-scale data and text mining

Korbel et al., Nature Biotechnology, 2004

Page 16: STRING: Large-scale data and text mining

a real example

Page 17: STRING: Large-scale data and text mining
Page 18: STRING: Large-scale data and text mining
Page 19: STRING: Large-scale data and text mining
Page 20: STRING: Large-scale data and text mining

Cell

Cellulosomes

Cellulose

Page 21: STRING: Large-scale data and text mining

experimental data

Page 22: STRING: Large-scale data and text mining

gene coexpression

Page 23: STRING: Large-scale data and text mining
Page 24: STRING: Large-scale data and text mining

protein interactions

Page 25: STRING: Large-scale data and text mining

Jensen & Bork, Science, 2008

Page 26: STRING: Large-scale data and text mining

curated knowledge

Page 27: STRING: Large-scale data and text mining

complexes

Page 28: STRING: Large-scale data and text mining

pathways

Page 29: STRING: Large-scale data and text mining

Letunic & Bork, Trends in Biochemical Sciences, 2008

Page 30: STRING: Large-scale data and text mining

many databases

Page 31: STRING: Large-scale data and text mining

different formats

Page 32: STRING: Large-scale data and text mining

different identifiers

Page 33: STRING: Large-scale data and text mining

variable quality

Page 34: STRING: Large-scale data and text mining

not comparable

Page 35: STRING: Large-scale data and text mining

not same species

Page 36: STRING: Large-scale data and text mining

hard work

Page 37: STRING: Large-scale data and text mining

(Ph.D. students)

Page 38: STRING: Large-scale data and text mining

common identifiers

Page 39: STRING: Large-scale data and text mining

quality scores

Page 40: STRING: Large-scale data and text mining

von Mering et al., Nucleic Acids Research, 2005

Page 41: STRING: Large-scale data and text mining

score calibration

Page 42: STRING: Large-scale data and text mining

von Mering et al., Nucleic Acids Research, 2005

Page 43: STRING: Large-scale data and text mining

homology-based transfer

Page 44: STRING: Large-scale data and text mining

Franceschini et al., Nucleic Acids Research, 2013

Page 45: STRING: Large-scale data and text mining

missing most of the data

Page 46: STRING: Large-scale data and text mining

text mining

Page 47: STRING: Large-scale data and text mining

>10 km

Page 48: STRING: Large-scale data and text mining

too much to read

Page 49: STRING: Large-scale data and text mining

computer

Page 50: STRING: Large-scale data and text mining

comprehensive lexicon

Page 51: STRING: Large-scale data and text mining

CDC2

Page 52: STRING: Large-scale data and text mining

cyclin dependent kinase 1

Page 53: STRING: Large-scale data and text mining

expansion rules

Page 54: STRING: Large-scale data and text mining

hCdc2

Page 55: STRING: Large-scale data and text mining

CDC2

Page 56: STRING: Large-scale data and text mining

flexible matching

Page 57: STRING: Large-scale data and text mining

cyclin-dependent kinase 1

Page 58: STRING: Large-scale data and text mining

cyclin dependent kinase 1

Page 59: STRING: Large-scale data and text mining

“black list”

Page 60: STRING: Large-scale data and text mining

SDS

Page 61: STRING: Large-scale data and text mining

co-mentioning

Page 62: STRING: Large-scale data and text mining

counting

Page 63: STRING: Large-scale data and text mining

within documents

Page 64: STRING: Large-scale data and text mining

within paragraphs

Page 65: STRING: Large-scale data and text mining

within sentences

Page 66: STRING: Large-scale data and text mining

natural language processing

Page 67: STRING: Large-scale data and text mining

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 68: STRING: Large-scale data and text mining

text corpus

Page 69: STRING: Large-scale data and text mining

~2 million full-text articles

Page 70: STRING: Large-scale data and text mining

~22 million abstracts

Page 71: STRING: Large-scale data and text mining

Exercise 1Go to http://string-db.org

Query for Mt H37Rv adhD

(Rv3086)

Change between different

views

Check evidence for adhD–lipR

link

Extent network to 50

interactors

Page 72: STRING: Large-scale data and text mining
Page 73: STRING: Large-scale data and text mining
Page 74: STRING: Large-scale data and text mining

Exercise 2Go to the paper PMC2995261

Extract the protein names in

table 1

Create STRING network of

them

Change to “advanced” mode

Analyze for clusters and

enrichment

Page 75: STRING: Large-scale data and text mining

multi-page tables

Page 76: STRING: Large-scale data and text mining

related resources

Page 77: STRING: Large-scale data and text mining

general approach

Page 78: STRING: Large-scale data and text mining

curated knowledge

Page 79: STRING: Large-scale data and text mining

experimental data

Page 80: STRING: Large-scale data and text mining

text mining

Page 81: STRING: Large-scale data and text mining

computational predictions

Page 82: STRING: Large-scale data and text mining

common identifiers

Page 83: STRING: Large-scale data and text mining

quality scores

Page 84: STRING: Large-scale data and text mining

score calibration

Page 85: STRING: Large-scale data and text mining

visualization

Page 86: STRING: Large-scale data and text mining

protein networks

Page 87: STRING: Large-scale data and text mining

string-db.org

Page 88: STRING: Large-scale data and text mining

chemical networks

Page 89: STRING: Large-scale data and text mining

stitch-db.org

Page 90: STRING: Large-scale data and text mining

subcellular localization

Page 91: STRING: Large-scale data and text mining

compartments.jensenlab.org

Page 92: STRING: Large-scale data and text mining

tissue expression

Page 93: STRING: Large-scale data and text mining

tissues.jensenlab.org

Page 94: STRING: Large-scale data and text mining

disease associations

Page 95: STRING: Large-scale data and text mining

Work on your own datastring-db.org

stitch-db.org

compartments.jensenlab.org

tissues.jensenlab.org

diseases.jensenlab.org