ebi web resources i: databases and toolsbcb.unl.edu/yyin/teach/pbb/ebi-go.pdf · cellular...

Post on 24-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

EBI web resources I: databases and tools

YanbinYin

1

Outline

• IntrotoEBI

• Databasesandwebtools– UniProt– GeneOntology

• HandsonPractice

MOSTMATERIALSAREFROM:http://www.ebi.ac.uk/training/online/course-list

2

Threeinternationalnucleotidesequencedatabases

3

TheEuropeanBioinformaticsInstitute (EBI)

Createdin1992aspartof EuropeanMolecularBiologyLaboratory (EMBL)

EMBLwascreatedin1974andisa molecularbiology researchinstitutionsupportedby20EuropeancountriesandAustralia

Wellcome TrustGenomeCampus, Hinxton,Cambridge,UKNeighborofWellcome TrustSangerInstitute

4

5

http://www.ebi.ac.uk/

ResearchgroupsinEBI

6

InterPro

UniProt

miRBase

MajordatabasesinEBI

EMBL-Bank (DNAandRNAsequences)Ensembl (genomes)ArrayExpress(microarray-basedgene-expressiondata)UniProt (proteinsequences)InterPro(proteinfamilies,domainsandmotifs)PDBe (macromolecularstructures)

Others,suchasIntAct (protein–proteininteractions)Reactome (pathways)ChEBI (smallmolecules)IntEnz (enzymeclassification)GO (geneontology)

GenBankGenomeMapView

GEOnr(GenPept)

CDDMMDB

SwissInstituteofBioinformaticsSangerInstitute

7

8

http://www.ebi.ac.uk/training/online/course/nucleotide-sequence-data-resources-ebi

chromatograms

9

SequencemightfirstenterENAasSRA (SequenceReadArchive)fragmented sequencereads;itmightbere-submittedasassembledWGS(WholeGenomeShotgun)sequenceoverlapcontigs;itmightbere-submittedagainwithfurtherassemblyasCON(Constructed)sequenceentries,withtheolderWGSentriesbeingconsignedtotheSequenceVersionArchive

10

Dataisfirstsplitintoclasses,thenitissplitintointersectingslicesbytaxonomy

UniProt

11

http://www.uniprot.org/help/uniparc

12

SourcesofannotationfortheUniProtKnowledgebase

13

Lifeasa ScientificCuratorhttp://www.ebi.ac.uk/about/jobs/career-profiles/scientific-curator

ScientificDatabaseCuratorjob:Cambridge,UnitedKingdomhttp://www.nature.com/naturejobs/science/jobs/589083-hgnc-gene-nomenclature-advisor

Curation generationhttp://cys.bios.niu.edu/yyin/teach/PBB/Bioinformatics%20Curation%20generation.pdf

Handsonpractice1:UniProt

14

15

www.uniprot.orghttp://www.uniprot.org/help/abouthttp://www.uniprot.org/docs/uniprot_flyer.pdf

16

WearegoingtodoIDmapping

17

http://cys.bios.niu.edu/yyin/teach/PBB/at-id.txt

ChooseAraport hereandUniProtKB here

18

TheseareUniProt IDs

19

SelectthePALproteinsandalignthem

Clustal omegaprogramwillbecalledtoaligntheselectedproteinseqsMaytake1mintofinish

20

ThisistheMSAresultpageToggletheseoptionsonwilladdcolorsinthealignment

21

GobacktotheproteinlistpageSelectingoneproteinwillenabletheBLASTbutton

ChooseadvancedwillallowtochangeBLASTparameters

22

Hereyoucanmakechanges

23

WearegoingtosearchUniProt proteomesforhumanproteinsetClickonAdvancedyouwillseeapop-outwindow

Hereyoucanspecifysearchterms

24

Clickheretogethelp

Clickheretoopenanewpage

25

TheGeneOntology(GO)projectisacollaborativeefforttoaddresstheneedforconsistentdescriptionsofgeneproductsindifferentdatabases

Theprojectbeganasacollaborationbetweenthreemodelorganismdatabases, FlyBase (Drosophila),the Saccharomyces GenomeDatabase (SGD)andthe MouseGenomeDatabase (MGD),in1998

Threestructuredcontrolledvocabularies(ontologies)thatdescribegeneproductsintermsoftheirassociatedbiologicalprocesses,cellularcomponentsandmolecularfunctionsinaspecies-independent manner.

Therearethreeseparateaspectstothiseffort:

1,thedevelopmentandmaintenanceoftheontologies themselves;2,theannotation ofgeneproducts,whichentailsmakingassociationsbetweentheontologiesandthegenesandgeneproductsinthecollaboratingdatabases;and3,developmentoftools thatfacilitatethecreation,maintenanceanduseofontologies.

http://geneontology.org/page/documentation

GeneOntology

26

GOisnotadatabaseofgenesequences,noracatalogofgeneproducts.Rather,GOdescribeshowgeneproductsbehave inacellularcontext.

GOisnotadictatedstandard,mandatingnomenclatureacrossdatabases.Groupsparticipatebecauseofself-interest,andcooperatetoarriveataconsensus.

GOisnotawaytounifybiologicaldatabases(i.e.GOisnota'federatedsolution').Sharingvocabularyisasteptowardsunification,butisnot,initself,sufficient.

GeneOntologycoversthreedomains:

cellularcomponent,thepartsofacelloritsextracellularenvironment;

molecularfunction,theelementalactivitiesofageneproductatthemolecularlevel,suchasbindingorcatalysis;

biologicalprocess,operationsorsetsofmoleculareventswithadefinedbeginningandend,pertinenttothefunctioningofintegratedlivingunits:cells,tissues,organs,andorganisms

ThescopeofGO

27

ThestructureofGOcanbedescribedintermsofagraph,whereeachGOtermisanode,andtherelationshipsbetweenthetermsareedgesbetweenthenodes.GOislooselyhierarchical,with'child'termsbeingmorespecializedthantheir'parent'terms,butunlikeastricthierarchy,atermmayhavemorethanoneparentterm

http://geneontology.org/page/ontology-structure

28http://www.ebi.ac.uk/training/online/course/go-quick-tour/what-can-i-do-go

id: GO:0000016 name: lactase activity namespace: molecular_function def: "Catalysis of the reaction: lactose + H2O = D-glucose + D-galactose." [EC:3.2.1.108] synonym: "lactase-phlorizin hydrolase activity" BROAD [EC:3.2.1.108] synonym: "lactose galactohydrolase activity" EXACT [EC:3.2.1.108] xref: EC:3.2.1.108 xref: MetaCyc:LACTASE-RXN xref: Reactome:20536 is_a: GO:0004553 ! hydrolase activity, hydrolyzing O-glycosyl compounds

29

Enrichmentanalysis:usestatisticalteste.g.FisherexacttestExample:inhumangenomebackground(20,000genetotal),40genesareinvolvedinp53signalingpathway.Agivengenelisthasfoundthat3outof300belongtop53signalingpathway.Then weaskthequestionif3/300ismorethanrandomchancecomparingtothehumanbackgroundof40/20000

http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E4

30

UniProt-GOannotation(GOA)

http://www.ebi.ac.uk/training/online/course/uniprot-goa-quick-tour/what-uniprot-goa

31

The reference usedtomaketheannotation(e.g. ajournalarticle)An evidencecode denotingthetypeofevidenceuponwhichtheannotationisbasedThedateandthecreatoroftheannotation

Gene product: Actin, alpha cardiac muscle 1, UniProtKB:P68032GO term: heart contraction ; GO:0060047 (biological process) Evidence code: Inferred from Mutant Phenotype (IMP) Reference: PMID 17611253Assigned by: UniProtKB, June 6, 2008

UniProt-GOAformat

32

Ifyouhaveanewgenome/transcriptome sequenced,howdoyouperformaGOannotationforit?

1. FindaclosetmodelorganismwhichhasbeenannotatedbyGO2. BLASTyourdataagainstthisclosestorganism3. TransfertheGOannotationofthebestmatchtoyourquerysequences

Forinstance,ifwewanttoannotateferntranscriptome withGOfunctiondescriptions….

1. FindArabidopsisUniProt proteindataset2. FindtheArabidopsisGOAassociationfile3. BLASTx fernreads(orassembledUniGenes)againsttheUniProt set4. AnalyzeBLASTresulttolinkfernreadsGOterms

TheideaofGOannotationfornewsequences

Handsonpractice2:GOannotation

33

34

http://geneontology.org/

35

http://amigo1.geneontology.org/cgi-bin/amigo/blast.cgi

Getanexampleproteinsequencefilefromhttp://cys.bios.niu.edu/yyin/teach/PBB/csl-pr.fa

36

37

Thisiseasy.Nowlet’strytogetalistofdifferentiallyexpressedgenesandthenfindwhat’scommoninthislistofgenesintermsoffunctions.

We’regonna useNCBIGEOwebsitetogetthegenelistandthenfeedthegenelisttoGOenrichmentanalysistools

38

GotoNCBIhomepage,searchGEODataSets withkeyword“GDS4831”,andhitsearch

39

Choose“Compare2setsofsamples”

Choose“Valuemeansdifference”Choose“8+fold”Choose“higher”

ThengotoStep2

SelecttochoosegroupA:threesamplesforCOP1depletionandHuh7cellline

GroupB:threesamplesfornegativecontrolandHuh7cellline

Hitok,andgotoStep3

40

Total256geneprofilesarefoundwith8+foldhigherexpressioninCOP1depletionthaninnegativecontrolinHuh7cellline

Togetthelistofgenes,chooseGenedatabaseandhitFinditems

41

Total225genescorrespondto256geneprofilesTodownloadthelistofGeneIDs,hitSendto,chooseUIlistasformatandhitCreatefile

Afilenamed“gene_result.txt”willbeautomaticallydownloadedtoyourlocalcomputerFindoutwhereitisdownloadedto,openitusingnotepad++

42

Viewthefileusingnotepad++

NextwewilluseDAVIDtoperformfunctionenrichmentanalysis

43

The Databasefor Annotation, Visualizationand IntegratedDiscovery (DAVID )

Hitstartanalysis

44

UploadthelistofGeneIDs

SelectENTREZ_GENE_ID

ClickonGenelist

45Checkthesubmittedgenelist

ThisallowsyoutoviewfunctionalannotationfromvariousresourcesincludingGO

46

IfyouhaveclickedonFunctionalAnnotationtool,youareatthispage

Allthesecanbechangedbyusers(toshowornottoshowandshowwhat)

Uncheckthis

47

SelectjustGO

Clickherewillopenanewwindowtoshowthe225differentiallyexpressedgenesareenrichedinwhatGO

48

GenesareenrichedinwhatGOcategories(comparedtothegenomebackground)?

Nextlecture: EBI web resources II (ENSEMBL

and InterPro)

49

top related