how will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential...
TRANSCRIPT
![Page 1: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/1.jpg)
How will we efficiently understandthe interactions of ~20,000 genes,with ~200 million potential pairwise interactions?Minimally, we need to use the
information that exists
![Page 2: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/2.jpg)
June 1979: 2 relevant papers
S. Brenner (Genetics 1974) The genetics of Caenorhabditis elegans
J. Sulston & R. Horvitz (Developmental Biology 1977) Post-embryonic cell lineages of the nematode, Caenorhabditis elegans
Jan 2008: >200,000 relevant papers
![Page 3: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/3.jpg)
2
1
Predicting Gene Interactions from information available in public databases
Prioritizing high resolution genetic interaction tests by knowledge mining
Full text information retrievalHans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan
Weiwei Zhong
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
![Page 4: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/4.jpg)
Scientists spend more time skimming for information than reading papers.
Much information are details hidden in the full text, and are neither in the abstract nor captured in MeSH terms.
We designed Textpresso to do automated skimming for researchers and database curators.
The output can be used for more sophisticated Natural Language Processing.
www.textpresso.org
Textpresso Literature Search Engine
![Page 5: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/5.jpg)
Full Text Sentence Ontology
PubMed
Google Scholar
(-)
+
+ +
-
- -
MeSHTaxonomy
Gene OntologyCustomizedNeuroscience Information Framework
Textpresso
Can we do better than PubMed and Google Scholar?
![Page 6: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/6.jpg)
precursorupstream cascade descendants
GENE
Reporter Genes
PATHWAY
Drosophilaanatomy
FOXO HOXA1 pax2PKD1
denticle
wing
MP2 neuron
GFP, EGFP, YFP, lacZ, CFP, Green Fluorescent Protein, reporter gene, dsRed, mCherry
Categories are “bags of words”
![Page 7: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/7.jpg)
ARTICLE TEXT
TEXTPRESSO CATEGORIES
egl-38 regulates lin-3 transcription in vulF in L3 larvae
gene
regulation process life stage
anatomy
Individual sentences in full text are marked up with Categories
Automatically mark up the whole corpus of papers with terms of categories, and index for rapid searching
gene
![Page 8: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/8.jpg)
What Arabidopsis genes are expressed in the meristem based on reporter genes? 14,930 A.t. paperswww.textpresso.org/arabidopsis
![Page 9: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/9.jpg)
Is a nicotinic receptor associated with Drugs of Abuse other than nicotine?
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
www.textpresso.org/neuroscience 15,786 papers
![Page 10: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/10.jpg)
The problem with clever fly names
Gene name abbreviationforager forascute aswee weWashed eye We
Train system to recognize gene names by context
use italics from PDF ~70%
~85%
Michael Müller, Arun Rangarajan
![Page 11: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/11.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
What reporter genes have been used with Drosophila genes to study human disease? 20,099 full-text fly paperswww.textpresso.org/fly
![Page 12: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/12.jpg)
Find all sentences that contain ≥2 gene names and ≥1 association or regulation word:
26,000 sentences out of 4.400 articlessimple interface to “check off” sentences
100 sentences per hour
Database curation: e.g. Gene-Gene Interactions
output into database
![Page 13: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/13.jpg)
2
1
Predicting Gene Interactions from information available in public databases
Prioritizing high resolution genetic interaction tests by knowledge mining
Full text information retrievalHans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan
Weiwei Zhong
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
![Page 14: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/14.jpg)
Training Set
Training set 4775 Positive Interactions
Genetic, Literature curation (1909) Yeast two-hybrid screen (2933)
3296 Negative Genetic Interactions cis doubles in genetic mapping
Benchmark 5515 Positives: KEGG database 5000 Negatives: Randomly selected
![Page 15: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/15.jpg)
Algorithm
worm gene pair
yeast orthologs
total score
fly orthologs fly score
worm score
yeast score
Ortholog mapping
Scoring Score integration
interactionGO
expressionphenotypemicroarray
GOexpressionphenotype microarray
interactionGO
localizationphenotypemicroarray
![Page 16: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/16.jpg)
)|(
)|(
negvp
posvpL =
p(v | pos): probabilities of the predictor having value v if two genes interactp(v | neg): probabilities of the predictor having value v if two genes do not interact
likelihood ratio
0
1
2
3
4
5
6
7
0 5 10 15 20 25
C. elegans expression
L
term usage (% of annotated genes associated with the term)
Scoring and score integration
n: number of predictorsLi: likelihood ratio of each predictor
€
score = lni=1
n
∑ Li
sum the logs of the L’s
![Page 17: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/17.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 18: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/18.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 19: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/19.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 20: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/20.jpg)
lin-3
let-23
sem-5
sos-1
let-60
lin-45
mek-2
mpk-1
lip-1
ksr-1
gap-1
v1.6v1.4 & v1.6
![Page 21: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/21.jpg)
Testing let-60 ras Interactors
WT% Muv% average
N2 100 0 3.0
let-60(gf) 0 100 4.3
let-60(gf); tax-6(RNAi) 40 60 3.4
N2
let-60(gf)
let-60(gf);tax-6(RNAi)
87 genes have score >0.9; 17 confirmed from literature Inactivating genes on a gain-of-function (gf) let-60 mutant by RNAi Assay vulva precursor cell (VPC) induction
not Multivulva
strong Multivulva
weak Multivulva
![Page 22: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/22.jpg)
0
1
2
3
4
5
6
controltax-6 csn-5 qua-1
C01G8.9
pfn-3 nhr-41
C05D10.3 Y48G10A.3
dlg-1 tag-22 grd-11
W03F11.6
mig-15 taf-6.1 taf-1 lin-32
unc-55
Y59A8B.23 Y48G10A.3
wrt-8 sqv-7 wrt-4 evl-20 C07H6.3
glp-1 unc-59
grd-1 wrt-7 hog-1 cdc-25.3
che-1 mom-5
Y53C12C.1
rnt-1 cki-1 let-413
taf-4 tig-2
tag-117 psa-4
T24H10.7
lin-48 src-2
B0353.1 R05G6.10
T18D3.7
grd-2 ZC84.3 cdc-42
cki-2F59A2.4
K10H10.1C04C3.3F34D6.4F34D10.2C25H3.4H27A23.1Y54G11A.1
B0035.16M03C11.4C41C4.8M01F1.5ZK945.8ZK643.2F26E4.12C16A3.7C53A3.2
H14N18.4W02D3.6F08A8.4C37H5.3F28H6.3R10E11.3R04B5.5B0491.1C06A8.6
let-60(gf) VPC InductionUnder Various RNAi
12 hits (p<0.05) in 49 genes; 1 hit in 26 randomly selected genesCombined with literature, 29/66 (44%) predictions confirmed
p< 0.01 p< 0.05
VP
C in
duct
ion
inde
x
Score > 0.9 Score < 0.6
![Page 23: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/23.jpg)
let-60 ras interactors (suppressors)
tax-6 calcineurin
csn-5 COP-9 signalosome
qua-1 hedgehog-related protein
C01G8.9 SWI/SNF-related (eyelid)
C05D10.3 ABC transporter (white)
pfa-3 profilin
nhr-4 transcription factor
![Page 24: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/24.jpg)
QuickTime™ and a decompressor
are needed to see this picture.
C. elegans Interactions
Input 4,726 known interactions among 2,713 genesPredict additional 18,863 for total of 23,589 interactions among 4,408 genes
![Page 25: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/25.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
for Drosophila
![Page 26: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/26.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 27: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/27.jpg)
QuickTime™ and a decompressor
are needed to see this picture.
D. melanogaster interactionsInput 4,180 known interactions among 1,262 genes,Predict 13,126 for 17,306 interactions among 6,044 genes
![Page 28: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/28.jpg)
Automated, Quantitative Phenotyping
Chris Cronin: movement analysisBMC-Genetics 2005Chris Cronin: movement analysisBMC-Genetics 2005
generative graphicslocomotion
plate demographics (Weiwei Zhong)
morphology
sexual behavior
E. Fontaine, A. Whittaker, Joel Burdick
![Page 29: How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the](https://reader035.vdocument.in/reader035/viewer/2022070409/56649e9a5503460f94b9cbf4/html5/thumbnails/29.jpg)
2
1
Predicting Gene Interactions from information available in public databases
Prioritizing high resolution genetic interaction tests by knowledge mining
Full text information retrievalHans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan
Weiwei Zhong
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.