gene wiki and wikimedia foundation sparql workshop
TRANSCRIPT
CURATING BIOMEDICAL KNOWLEDGE ON WIKIDATA AND WIKIPEDIA
GENE WIKI
Benjamin GoodThe Scripps Research Institute, La Jolla, California
[email protected]: @bgood
Gene Wikidata TeamAndrew Su (Scripps)
Andra Waagmeester (Micelio)Sebastian Burgstaller (Scripps)Tim Putman (Scripps) – speaking next Julia Turner (Scripps)
Elvira Mitraka (U Maryland)Justin Leong (UBC)Lynn Schriml (U Maryland)Paul Pavlidis (UBC)Ginger Tsueng (Scripps)
ACKNOWLEDGEMENTS
“knowledge”
• A lot
• Important
• Text
More than 2 articles published/minute
Documents
Concepts
Gene Wiki: Filtering and summarizing PubMed
GENE WIKI
6
Protein structure
Symbols and identifiers
Tissue expression pattern
Gene Ontology annotations
Links to structured databases
Gene summary
Protein interactions
Linked references
Huss, PLoS Biol, 2008
Bot!
GENE WIKI TIMELINE
2007
Project Starts
2008
ProteinBoxBot populates infoboxes
for 9,000 human genes
Now at 10,369 genes, analyses
show article growth and high quality
20112009
Updated Bot maintaining
9,678 human genes
Start importing gene data into wikidata
20142016a
Convert more than 11,000+ gene infoboxes on
Wikipedia to draw all content from Wikidata
2016b
Launch first biomedically focused
Web App driven by Wikidata content…
https://en.wikipedia.org/wiki/Portal:Gene_Wiki
Gene Wiki Version 1.
{{GNF_Protein_box | Name = Reelin| image = | image_source = | PDB = {{PDB2|4AD9}} | HGNCid = 18512 | MGIid = | Symbol = LACTB2 | AltSymbols =; CGI-83 | IUPHAR = | ChEMBL = | OMIM = None | ECnumber = | Homologene = 9349 | GeneAtlas_image1 = | GeneAtlas_image2 = | GeneAtlas_image3 = | Protein_domain_image = | Function = {{GNF_GO|id=GO:0005515 |text = protein binding}} {{GNF_GO|id=GO:0016787 |text = hydrolase activity}} {{GNF_GO|id=GO:0046872 |text = metal ion binding}} | Component = {{GNF_GO|id=GO:0005739 |text = mitochondrion}} | Process = {{GNF_GO|id=GO:0008152 |text = metabolic process}} | Hs_EntrezGene = 51110 | Hs_Ensembl = ENSG00000147592 | Hs_RefseqmRNA = NM_016027 | Hs_RefseqProtein = NP_057111 | Hs_GenLoc_db = hg38 | Hs_GenLoc_chr = 8 | Hs_GenLoc_start = 70635318 | Hs_GenLoc_end = 70669174 | Hs_Uniprot = Q53H82 | Mm_EntrezGene = 212442 | Mm_Ensembl = ENSMUSG00000025937 | Mm_RefseqmRNA = NM_145381 | Mm_RefseqProtein = NP_663356 | Mm_GenLoc_db = mm10 | Mm_GenLoc_chr = 1 | Mm_GenLoc_start = 13623330 | Mm_GenLoc_end = 13660546 | Mm_Uniprot = Q99KR3 | path = PBB/51110}}
=
Gene Wiki Version 2.
{{Infobox gene}}
• All data in Wikidata• 1 Lua script works for
all 11,000+ genes
=
(1 of these for every gene)
IMPACT OF WIKIDATA ON WIKIPEDIA
IMPACT BEYOND WIKIPEDIA= SPARQL
Sample of current biomedical content
• All human, mouse genes and proteins• All Gene Ontology terms (describe function)• All Human Disease Ontology terms• All FDA approved drugs • 109+ reference microbial genomes
Burgstaller-Muelbacher et al (2016) DatabaseMitraka et al (2015) Semantic Web Applications for the Life Sciences
Putman et al (2016) Database
http://tinyurl.com/biowiki-sparql
Sample queries that are currently possible:• “where in the cell is the Reelin protein expressed?”• “What diseases are treated by Metformin”• “What diseases might be treated by Metformin”
http://query.wikidata.org
Example question: repurposing Metformin
http://tinyurl.com/zem3oxz
Metformin
?disease
interacts with
protein
geneencoded by genetic association
Mighttreat ?
Solute carrier family 22
member 3
SLC22A3
prostate cancer
A SPARQL powered user interface for consuming and editing organism data in WikidataTimothy E. Putman Ph.D. The Scripps Research Institute, La Jolla, California
[email protected]: @putmantime
Gene Wikidata TeamAndrew Su (Scripps)Benjamin Good – just spokeAndra Waagmeester (Micelio)Sebastian Burgstaller (Scripps)Elvira Mitraka (U Maryland)Julia Turner (Scripps)Justin Leong (UBC)Lynn Schriml (U Maryland)Paul Pavlidis (UBC)Ginger Tsueng (Scripps)
ACKNOWLEDGEMENTS
Centralizing and Linking the Data
BacteriaQ10876domain
TRPAQ21153984protein
C.trachomatisQ131065species
trpAQ21153861gene
C. trachomatis434/BUQ20800254strain
C. trachomatisQ131065species
trpAQ21153861gene
TRPAQ21153984protein
C. trachomatis434/BUQ20800254strain
trpAQ21153861gene
TRPAQ21153984protein
C. trachomatis434/BUQ20800254strain
C. trachomatisQ131065species
C. trachomatisQ131065species
TRPAQ21153984protein
C. trachomatis434/BUQ20800254strain
trpAQ21153861gene
C. trachomatisQ131065species
trpAQ21153861gene
C. trachomatis434/BUQ20800254strain
TRPAQ21153984protein
SPARQL Query• On page load
• JQuery execution of SPARQL query as AJAX GET Request
• On organism select• Get all gene and protein data for organism by
taxid
QUESTIONS?