genegames.org: crowdsourcing human gene annotation (genome informatics 2012)
DESCRIPTION
Talk given at the Genome Informatics conference 2012 at Robinson College, Cambridge University.TRANSCRIPT
![Page 1: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/1.jpg)
The Gene Wiki: Crowdsourcing human gene annotation
Andrew Su, Ph.D.@andrewsu
[email protected]://sulab.org
GeneGames.org
Genome Informatics
September 6, 2012
OK
OK
![Page 2: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/2.jpg)
The Gene Wiki crib sheet
• Bulk creation of ~10k Wikipedia articles (http://dx.doi.org/10.1371/journal.pbio.0060175)
• Monthly stats: > 4 million views, > 1000 edits (http://
dx.doi.org/10.1093/nar/gkr925) • Text mining reveals novel Gene Ontology and Disease
Ontology annotations (http://dx.doi.org/doi:10.1186/1471-2164-12-603)
• Mash-up with SNPedia for crowdsourced gene-disease database (http://www.jbiomedsem.com/content/3/S1/S6)
• Merging Wikipedia with the Semantic Web (http://dx.doi.org/10.1093/database/bar060)
2
http://www.slideshare.net/andrewsu
![Page 3: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/3.jpg)
3
http://www.flickr.com/photos/archana3k1/4124330493/
Seven million human hours
![Page 4: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/4.jpg)
4
Twenty million human hours
http://www.flickr.com/photos/ableman/2171326385/
![Page 5: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/5.jpg)
-5
150 billion human hours
http://www.flickr.com/photos/rvp-cw/6243289302/
per year
![Page 6: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/6.jpg)
Using games to fold proteins6
Fold.it players have successfully:• Outperformed state of the art protein
folding algorithms (Cooper, Nature, 2010)
• Solved a previously-intractable crystal structure (Khatib, Nat Struct Mol Biol, 2011)
• Designed an improved protein folding algorithm (Khatib, PNAS, 2011)
• Improved enzyme activity of de novo designed enzyme (Eiben, Nat Biotechnol, 2011)
http://fold.it
![Page 9: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/9.jpg)
Using games to annotate genes?9
http://genegames.org
![Page 10: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/10.jpg)
No good gene-disease annotation database10
Alzheimer's disease (AD)Lipoprotein glomerulopathySea-blue histiocyte disease
Query: Apolipoprotein E
![Page 11: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/11.jpg)
No good gene-disease annotation database11
Alzheimer's disease (AD)Lipoprotein glomerulopathy Sea-blue histiocyte diseaseHyperlipoproteinemia, type IIIMacular degeneration, age-relatedMyocardial infarction susceptibility
Query: Apolipoprotein E
![Page 12: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/12.jpg)
No good gene-disease annotation database12
Alzheimer's disease (AD)Lipoprotein glomerulopathy Sea-blue histiocyte diseaseHyperlipoproteinemia, type IIIMacular degeneration, age-relatedMyocardial infarction susceptibilityHIVPsoriasisVascular Diseases
Query: Apolipoprotein E
?
?
?
?
?
![Page 13: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/13.jpg)
No good gene-disease annotation database13
Alzheimer's disease (AD)Neuropsychological Tests Cognition Disorders Dementia Cognition Disease Progression Cardiovascular Diseases Coronary Disease Diabetes Mellitus, Type 2 Memory Disorders
Query: Apolipoprotein E
Memory Coronary Artery Disease Hypertension Mental Status Schedule Psychiatric Status Rating
Scales Hyperlipidemias Atrophy Dementia, Vascular Parkinson Disease Brain Injuries Myocardial Infarction …
477 diseases!
![Page 14: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/14.jpg)
Play Dizeez to annotate gene-disease links14
3. If it’s ‘right’, you get points
4. Then on to the next question…
2. Click the related disease (only one is “right”)
5. Hurry!
1. Read the clue (gene)
6. Play to win!
![Page 15: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/15.jpg)
Dizeez players seem pretty smart…15
In total (since Dec 2011):• 207 unique gamers• 1045 games played• 8525 guesses
# Occurrences Gene Disease
7 GAST gastrinoma
7 RBP3 retinoblastoma
7 SSX1 synovial sarcoma
6 TG Graves' disease
6 CRYGC Cataract
6 SOX8 mental retardation
6 WRN Werner syndrome
6 ABL1 leukemia
6 MLL3 leukemia
6 SNAI2 breast carcinoma
Pubmed OMIM PharmGKB Gene Wiki
![Page 16: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/16.jpg)
Dizeez players seem pretty smart…16
# Occurrences Gene Disease
5 MECOM sarcoma
4 ATF7 cancer
3 ABCB5 acute myeloid leukemia
3 SART1 glioblastoma
3 NCK1 leukemia
3 NEK1 cancer
Pubmed OMIM PharmGKB Gene Wiki
In total (since Dec 2011):• 207 unique gamers• 1045 games played• 8525 guesses
![Page 17: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/17.jpg)
Using games to predict phenotype from genotype?17
http://genegames.org
The Cure
![Page 18: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/18.jpg)
Classification problems in genome biology18
cancer normal
find patterns
Classify new samples
cancer
normalSVM
Neural networks
Naïve Bayes
KNN
…100s samples
100,
000s
fea
ture
s
![Page 19: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/19.jpg)
Random forests19
Sample subset of cases and
featuresTrain decision
treecancer normal
100s samples
100,
000s
fea
ture
s
![Page 20: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/20.jpg)
Random forests20
cancer normal
100s samples
100,
000s
fea
ture
s
![Page 21: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/21.jpg)
Random forests21
Classify new samples
cancer
normal
cancer normal
100s samples
100,
000s
fea
ture
s
How to interject biological
knowledge?
![Page 22: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/22.jpg)
Network-guided forests22
Dutkowski & Ideker (2011). PLoS Computational Biology
![Page 23: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/23.jpg)
Network-guided forests23
Sample features by PPI
networkTrain decision
treecancer normal
100s samples
100,
000s
fea
ture
s
![Page 24: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/24.jpg)
Human-guided forests24
Sample features by
human intelligence
Train decision treecancer normal
100s samples
100,
000s
fea
ture
s
![Page 25: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/25.jpg)
The Cure: Genomic predictors for disease25
![Page 26: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/26.jpg)
The Cure: Genomic predictors for disease26
![Page 27: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/27.jpg)
The Cure: Genomic predictors for disease27
![Page 28: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/28.jpg)
The Cure: Genomic predictors for disease28
![Page 29: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/29.jpg)
The Cure: Genomic predictors for disease29
![Page 30: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/30.jpg)
The Cure: Genomic predictors for disease30
![Page 31: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/31.jpg)
Human-guided forests31
Classify new samples
cancer
normal
![Page 32: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/32.jpg)
“Critical Assessment”-style challenge32
Will this work? Check our blog after October 15.
Coming soon to genegames.org
![Page 33: GeneGames.org: Crowdsourcing human gene annotation (Genome Informatics 2012)](https://reader036.vdocument.in/reader036/viewer/2022062703/554e7583b4c90545698b4cb0/html5/thumbnails/33.jpg)
33
Doug Howe, ZFINJohn Hogenesch, U PennJon Huss, GNFLuca de Alfaro, UCSCAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum,
Fondation Jean DaussetMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors
WP:MCB Project
Collaborators
Ben GoodSalvatore LoguercioIan Macleod
Max NanisChunlei Wu
Group members
Funding and Support
(BioGPS: GM83924, Gene Wiki: GM089820)
Contacthttp://sulab.org
[email protected]@andrewsu+Andrew Su
Recruiting graduate students in quantitative biology! See http://education.scripps.edu/
@genegame