a discovery platform to support translational …...• disgenet2r r package disgenet eccb 2016...

Post on 10-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A discovery platform to support translational research on human

diseases

ECCB T7 tutorial September 4 2016

JanetPiñeroandLauraI.Furlong

•  HowcanDisGeNEThelpinyourresearch?•  OverviewoftheDisGeNETPlaDorm

•  Hands-onTutorial•  Webinterface

•  DisGeNETCytoscapeapp•  DisGeNETRDFandSPARQLendpoint

•  disgenet2rRpackage

DisGeNETECCB2016Tutorial

Research ques+ons

•  WhatarethediseasesassociatedtothegeneSIRT1?

•  WhatarethegenesassociatedtoaAlzheimer’sdisease?

•  Whatarethegenessharedbycomorbiddiseases?

•  WhatarethegeneScvariantsassociatedtoobesity?

•  Whatarethedruggableproteinsassociatedto

Schizophrenia?

•  WhicharethepathwaysperturbedinLaforadisease?

Highthroughputgenomictechnologiesarehelpingtofinddiseasegenesandpathogenicvariants

OnlyoneorfewmaybecausaSve

Approximately10,000ofthesevariantswillhaveaconsequenceattheproteinfuncSon

Atypicalwholeexomesequencingexperimentproduces30,000–100,000variantsrelaSvetothereferencegenome

IdenSficaSonoftruepathogenicvariantsamong

allthevariaSonissSllamajorchallenge

Theavailabilityofcomprehensive,traceable,highqualitydataongenotype-phenotyperelaSonsiskey

phenotypegenotype

DATASILOS

WhatisthegeneDcbasisofWilsonDisease?

ATP7B

ATP7B

CPATP7BPRNPIL6LOXANXA5TNFAPOE

ATP7BCPATP7ACOMMD1ARSAHFESLC31A1

WhatisthegeneDcbasisofWilsonDisease?

DataSilos

DifferentStandards

LargeVolume

Needforresourcesthatgather,integrateandstandardizeinformaSononthegeneScbasisofdiseases

InformaDonongeneDcbasisofdiseases

ü  KnowledgeplaDormonhumandiseasesandtheirgenes

ü  CoversalldiseasetherapeuScareas

ü  IntegratesinformaSonfromexpert-curatedresourcesandfrom

theliterature

ü  Focusongene-diseaseassociaSon(GDA)anditssupporSng

evidence

ü  StandardizaSonoftheinformaSonandprovenance

Bio-Entity Finder and Relation Extraction

Gene-diseaseassociaSonsGene-diseaseassociaSons

Biomedical databases Text mining

http://ibi.imim.es/befree/

DisGeNET:theimplementaDon

Piñeroetal,2015doi:10.1093/database/bav028

GWAscatalog

OrphaNet

UniProt

CTD

LHGDN

CTD

Curated Predicted Literature

RGD

BEFREE

GAD

ClinVar

MGD

DisGeNETv4.0

DisGeNET:datasources

DisGeNET:staDsDcs(version4.0)

Source Genes Diseases AssociaDons

Curated 7,362 7,607 32,834

Predicted 2,743 2,064 10,264

Literature 16,141 11,447 403,925

All 17,381 15,093 429,036

Lastupdate:June2016

>94%

LauraI.Furlong 15

What is Text Mining?

TextminingunlocksinformaSonbyautomaScallyextracSngdatafromfree-textresources

BioNERmodule

•  EnStymenSonandnormalizaSon

•  Fuzzyandpanernmatchingmethods+dicSonaries

•  Diseaseandgenes•  HandlesambiguiSesbetween

enSSes

RelaDonExtracDonmodule

•  BasedonSVM•  CombinesShallowLinguisSc

Kernel(KSL)withDependencyKernel(KDEP)

•  ExploitsshallowanddeepsyntacScinformaSon

hnp://ibi.imim.es/tools/befree/

Gene-diseaseassociaDonidenDficaDonwithBeFree

Gene-diseaseassociaDontypesaccordingtotheDisGeNETontology

18

STANDARDS

phenotypegenotype

•  Largeinscaleandgrowingrapidly(NGS)

•  LargestudiesongeneScsofdiseaseavailable

•  HGVSstandardforsequencevariaSonnomenclature

•  Standardsfordataexchange•  UniProt,NCBI,Ensembl•  VarioML,VariO

•  PhenotypedataspansawidespectrumofpossibleobservaSonsaboutanindividual

•  Moredifficulttocaptureandtostandardize

•  HumanPhenotypeOntology,DiseaseOntology

•  Broadphenotypecategoriesusedinmanystudies

phenotypegenotype

•  Gene,protein,SNPs•  OfficialGenesymbol•  NCBIGeneId•  Uniprotaccession•  dbSNPidenSfierforvariants

•  Diseasesandphenotypes•  UMLSCUIs•  UMLSsemanSctypes•  DiseaseOntology•  Mappingstoavarietyofphenotypevocabulariesandontologies

StandardsinDisGeNET

DisGeNETassociaSontypeontology

DisGeNETassociaDontypeontology

hnp://sio.semanScscience.org

CoverageofdiseasevocabulariesandontologiesinDisGeNET

UMLS MeSH OMIM NCIt DO ORDO ICD9CM EFO HPO DECIPH

100 57 40 34 20 14 11 11 8 0.4

Signs,symptomsanddiseasesinDisGeNET

•  Abnormalphenotypes,signsandsymptomsInflammaSonSeizuresPainOverweight

•  DiseasesBreastcarcinomaDiabetesMellitus

•  DiseaseclassCardiovascularDiseasesAutoimmuneDiseasesNeurodegeneraSveDiseases

Numberofconcepts

Numberofassociatedgenes

NumberofassociatedSNPs

Disease 13,674 17,005 44,467

Diseaseclass 55 5,739 992

Phenotype 1,364 9,332 2,894

Signs,symptomsanddiseasesinDisGeNET

DATAPRIORITIZATION

Indicatespopularityofagene-diseaseassociaDonacrossalldatasources

DisGeNETscore=SCURATED+SPREDICTED+SLITERATURE

DisGeNETgene-diseaseassociaDonscore

DiseaseSpecificityIndex(DSI)

ü  Indicateshowspecificisagenewithrespecttodiseasesü  IsinverselyproporSonaltothenumberofdiseasesassociatedto

aparSculargene(rangesfrom0to1).ü  Ageneassociatedtoalargenumberofdiseases,suchasTNF

(associatedto>1,500diseases),isless“specific”foranydisease,andhasasmallDSIvalue(0.247)

ü  Ageneassociatedtoonlyonedisease,ismore“specific”forthatdiseaseandhasDSIof1.

TopscoredgenesforWilsondisease

GeneNumber

ofdiseases

DisGeNETscore DSI Numberof

PMIDsNumberof

SNPs

ATP7B 57 0,819 0,596 234 99ANXA5 129 0,2 0,505 1 0PRNP 205 0,128 0,468 4 1CP 114 0,126 0,532 26 0LOX 141 0,123 0,498 2 0LOXL2 48 0,123 0,610 1 0APOE 729 0,122 0,333 2 0TNF 1524 0,120 0,247 2 0IL6 1260 0,120 0,268 2 0NDUFB7 1 0,120 1 1 0

TopscoredgenesforMajorDepressiveDisorder

GeneNumber

ofdiseases

DisGeNETscore DSI Numberof

PMIDsNumberof

SNPs

SLC6A4 374 0,236 0,411 157 5TPH2 89 0,211 0,548 26 1HTR2A 222 0,155 0,463 45 17PCLO 20 0,130 0,696 12 5CRHR1 118 0,127 0,531 11 11CYP2D6 316 0,127 0,4281 11 2FKBP5 78 0,126 0,563 16 1SP4 16 0,125 0,739 3 1GRM7 32 0,123 0,666 5 1GNAI3 7 0,122 0,812 2 1

FLEXIBLEDATAACCESS

•  HowcanDisGeNEThelpinyourresearch?•  OverviewoftheDisGeNETPlaDorm

•  Hands-onTutorial•  Webinterface

•  DisGeNETCytoscapeapp•  DisGeNETRDFandSPARQLendpoint

•  disgenet2rRpackage

DisGeNETECCB2016Tutorial

•  HowcanDisGeNEThelpinyourresearch?•  OverviewoftheDisGeNETPlaDorm

•  Hands-onTutorial•  Webinterface

•  DisGeNETCytoscapeapp•  DisGeNETRDFandSPARQLendpoint

•  disgenet2rRpackage

DisGeNETECCB2016Tutorial

DisGeNETCytoscapeapp

•  NetworkrepresentaSonofgene-diseaseassociaSonsandprojecSons

•  DownstreamanalysiswithavarietyofnetworkanalysisandannotaSontoolsavailableinCytoscape

•  HowcanDisGeNEThelpinyourresearch?•  OverviewoftheDisGeNETPlaDorm

•  Hands-onTutorial•  Webinterface

•  DisGeNETCytoscapeapp•  DisGeNETRDFandSPARQLendpoint

•  disgenet2rRpackage

DisGeNETECCB2016Tutorial

DisGeNETasLinkedOpenData

ü WhataretheperturbedpathwaysinLaforadisease?

ü WhatproteinsassociatedwithAarskogsyndromeare

potenSaldrugtargets?

ü WhichgenesdifferenSallyexpressedinbetacellsare

associatedtoPancreaSccancer?

DisGeNETasLinkedOpenData

•  RDFandnanopublicaDons•  URIs:RDFprovidersor

•  SIO•  Useofstandards(11ontologiesinNCBO)

• MetadatadescripSon(W3CHCLS)• Interlinking

• Bio2RDF• LinkedLifeData

• Access• DownloadDataDump• SPARQLEndpoint• FacetedBrowser• OpenPHACTS

• NanopublicaSonNetwork• disgenet2R

• Openlicense• FAIR(ELIXIRandNIH)• Datahub• Sovware

hnp://lod-cloud.net/;Aug2014

DisGeNETasLinkedOpenData

SemanDcWeb–LinkedDataBasedonW3Cstandards

RDF:ResourceDescripSonFrameworkCaptureslogicalstructureofthedataGraphrepresentaSon

SPARQL:RDFquerylanguage

UsualWebvsSemanScWeb

Website DatasetPage/URL Resource/URIdocument,textual FormaldescripSonHTML:presentaSon RDF:semanScHumanreadable Machinereadable

SPARQL Query Structure #prefixdeclaraSonsPREFIX foaf:<hnp://xmlns.com/foaf/0.1/>#datasetdefiniSonFROM <DATASETGRAPH>#resultclauseSELECT/CONSTRUCT/ASK/DESCRIBE ..OUTPUT..#querypanernWHERE {graphpajern}#querymodifiersORDERBY…

DisGeNET-Tutorial 44IBISEMINAR-17–05-2016

GeneassociatedDisease

S P O

RBisoverexpressedinbladdercancersamplesasmeasuredby….

AstatementinapublicaSon

InRDF,astatementisatriple

Subject

Predicate

Object

RB1

RBisoverexpressedinbladdercancersamplesasmeasuredby….

AstatementinapublicaSon

InRDF,astatementisatriple

AlteredExpression

Carcinomaofbladder

hnp://rdf.disgenet.org/resource/gda/DGN1234

hnp://idenSfiers.org/hgnc.symbol/RB1hnp://linkedlifedata.com/resource/umls/id/C0699885

Data Model

•  HowtodescribeanassociaDon?

a)Asapropertyb)Asaclass

GeneassociatedDisease

S P O

GeneAssociaDonDisease

PO SP O

Data Model

•  HowtodescribeanassociaDon?

a)Asapropertyb)Asaclass

GeneassociatedDisease

S P O

GeneAssociaDonDisease

PO SP O

Data Model

•  HowtodescribeanassociaDon?

a)Asapropertyb)Asaclass

GeneassociatedDisease

S P O

GeneAssociaDonDisease

PO SP O

ProvenanceandEvidenceRDFtriples

Data Model •  Ontology-basedintegraDon

•  DisGeNETStandards•  SharedIDs•  Standardontologies

GeneAssociaDonDisease

PO SP O

hjp://semanDcscience.org/ontology/sio.owl

DisGeNETAssociaDonTypeOntology

rdf:type

hjp://rdf.disgenet.org/download/4.0.0/DisGeNET-RDF-Example.jl(Turtle)

RDFdatamodel

DisGeNET:thedatamodel

•  HowcanDisGeNEThelpinyourresearch?•  OverviewoftheDisGeNETPlaDorm

•  Hands-onTutorial•  Webinterface

•  DisGeNETCytoscapeapp•  DisGeNETRDFandSPARQLendpoint

•  disgenet2rRpackage

DisGeNETECCB2016Tutorial

•  Rpackage

•  TointerrogateDisGeNETdata

•  TocrossDisGeNETdatawithotherresources

•  TovisualizetheresultswithinthepowerfulRframework

•  ToengagewiththeR/Bioconductorcommunity

•  LaunchedwithinthereleaseofDisGeNETv4.0(April,2016)

hnp://www.disgenet.org/support@disgenet.orgtwiner:@DisGeNET

IBIGrouphjp://ibi.imim.es/AlbaGuSérrez-SacristánÀlexBravoJanetPiñeroAlexiaGiannoulaMiguelA.MayerAngelaLeisSanSagodelaPeñaEmilioCentenoLauraI.FurlongFerranSanz

PastMembersNúriaQueralt-RosinachMontserratCasesSolèneGrosdidierPabloCarbonellAnnaBauer-MehrenMichaelRautschka

top related