crowd sourcing methods to annotate biological processes andra waagmeester micelio
TRANSCRIPT
Crowd Sourcing Methods to Annotate Biological Processes
Andra Waagmeester
Micelio
Brothers Grimm: Stone soup
James Taylor http://km.aifb.kit.edu/ws/ckc2007/StoneSoup-www2007.pdf
“We try to analyze a 3D cell on a 2D level.” - Mike Washburn
Subsequently, we represent the multi-dimensional data space of this 2D view of the cell, again in a 2D space
Relational databases
Gene name ID Identifier
ZNF635m 18801 23126
…. ….. ….
Gene name ID Identifier
ZNF280E POGZ ENSG00000143442
…. ….. ……
Relational databasesGene name ID Identifier
ZNF635m 18801 23126
…. ….. ….
Gene name ID Identifier
ZNF280E POGZ ENSG00000143442
…. ….. ……
HGNC ID HGNC Symbol Name
18801 POGZ Pogo transposable element with ZNF domain
….. …… ……
Graph databases• ZNF635m is_a gene • ZNF635m has_Entrez_ID “23126”• ZNF635m ID “18801”• “18801” has_symbol “POGZ”• ZNF280E has_Ensembl_ID “ENSG00000143442”• ZNF280E HGNC_symbol “POGZ”
Something more profound is needed than relabeling old
wine in new bottles
Unique Resource Identifier• HGNCID:18801• ENSEMBL:ENSG00000143442• ENTREZ:23126• PMID:20196795
• ENTREZ:23126 rdf:type dbpedia:Gene• ENTREZ:23126 rdfs:label “ZNF635m”• ENTREZ:23126 rdfs:seeAlso HGNCID:18801• HGNCID:18801 rdfs:label “POGZ”• ENSEMBL:ENSG00000143442 rdf:type dbpedia:Gene• ENSEMBL:ENSG00000143442 rdfs:label “ZNF280E• ENSEMBL:ENSG00000143442 rdfs:seeAlso “HGNCID:POGZ”
Gerhard Michal 1974
Pathway external references
http://www.wikipathways.org/index.php/Pathway:WP430
Allows visualization of differences in expression
http://www.wikipathways.org/index.php/Pathway:WP430
Human and machine readable
@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix cas: <http://identifiers.org/cas/> .@prefix wprdf: <http://rdf.wikipathways.org/> .@prefix foaf: <http://xmlns.com/foaf/0.1/> ....<http://www.ncbi.nlm.nih.gov/gene/1394> a gpml:DataNode , skos:Concept , wp:GeneProduct ; rdfs:isDefinedBy gpml:DataNode ; rdfs:label "CRHR1"@en ; dc:identifier <http://identifiers.org/ncbigene/1394> , "1394"^^xsd:string ; dc:source "Entrez Gene"^^xsd:string ; dcterms:isPartOf <http://rdf.wikipathways.org/WP4_r39380.ttl> ; gpml:centerx "340.0"^^xsd:float ; ...
311,696 articles (1.5% of PubMed)have been cited by GO annotations
Wikipedia is reasonably accurate
19
Wikipedia has breadth and depth
20
http://en.wikipedia.org/wiki/Wikipedia:Size_comparisons, July 2008
Articles
Words(millions)
Wikipedia Britannica Online
Centralizing key data storage
21
Source: http://commons.wikimedia.org/wiki/File:Wikidata_slides_Magnus_Manske,_Cambridge,_2014-02-27.pdf
Centralizing key data storage
22
Source: http://commons.wikimedia.org/wiki/File:Wikidata_slides_Magnus_Manske,_Cambridge,_2014-02-27.pdf
Wikidata
23
Provide a database of the world’s knowledge that
anyone can edit
- Denny Vrandečić
Centralizing key data storage
24
Centralizing key data storage
25
Centralizing key data storage
26
287 language editions of Wikipedia
Biocurators/Bioinformatics community
Wikidata for biology
27
is a
regulates
Interacts with
Protein
Glycoprotein
Neural development
VLDL receptor
Amyloid precursor protein
Property:P31
Property:P128
Property:P129
Q8054
Q187126
Q1345738
Q1979313
Q423510
Q414043
Reelin
http://www.wikidata.org/wiki/Q414043
Wikidata for biology
28
Property:P31
Property:P128
Property:P129
Q8054
Q187126
Q1345738
Q1979313
Q423510
Q414043
http://wikidata.org/w/api.php?action=wbgetentities&ids=Q414043&languages=en
Current progress
● All human and mouse genes and proteins loaded
● All diseases (Human Disease Ontology) loaded
● Dataset of all drugs in preparation
● Model for interlinking relations ready and proposed
Our current workflow
Stone soup of data
James Taylor http://km.aifb.kit.edu/ws/ckc2007/StoneSoup-www2007.pdf
Andrew Su, Scripps
Benjamin Good, Scripps
Sebastian Burgstaller, Scripps
Lynn Schriml, U Maryland
Elvira Mitraka, U Maryland
Gang Fu, NCBI
Evan Bolton, NCBI
Paul Pavlidis, U British Columbia
Peter Robinson, Charite
Many Wikipedia and Wikidata
editorsContact:
[email protected]@micelio.be
Crowdsourcing in action