canadian bioinformacs workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 module 4: downstream analyses...
TRANSCRIPT
![Page 1: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/1.jpg)
5/12/16
1
CanadianBioinforma1csWorkshops
www.bioinforma1cs.ca
2 Module #: Title of Module
![Page 2: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/2.jpg)
6/16/16
1
Animagetorepresentyourworkshopormodule
Module4Downstreamanalyses&integra9ve
toolsDavidBujold
EpigenomicDataAnalysisJune20–June21,2016
Your logo here
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
LearningObjec6ves• Exploresomedownstreamanalysesthatcanbedonewith
epigenomicassaysdata• Discoversourcesofpubliclyavailabledatasetsthatcanbe
usedinanyone’sprojects• Learnaboutonlineportalsandtoolsthatcanease
epigenomicsdataanalysis
![Page 3: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/3.jpg)
6/16/16
2
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
• Over98%ofthehumangenomedoesnotencodeproteinsequences
• 76%ofthegenomegetstranscribed• Nearlyhalfofthegenomeisaccessibleinsomewaytogene9cregulatoryproteinssuchastranscrip9onfactors
• PuSngincontexttheinforma9onwecanobtainonvariants,DNAmethyla9on,histonemodifica9ons,transcrip9ontoRNA,chroma9naccessibility,etc.willeaseourunderstandingoftheunderlyingbiology
Mo6va6onforepigenomicintegra6veanalysis
(1) Elgar, G., & Vavouri, T. (2008). Tuning in to the signals: noncoding sequence conservation in vertebrate genomes. Trends in genetics, 24(7), 344-352. (2) Pennisi, E. (6 September 2012). "ENCODE Project Writes Eulogy for Junk DNA". Science 337 (6099): 1159–1161.
(2)
(1)
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
ModuleOutline1. Downstreamfunc9onalanalysistools2. Workingwithpublicdatasets3. Qualitycontrolforonlineresources4. Onlinevisualiza9onandanalysistools
![Page 4: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/4.jpg)
6/16/16
3
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
1-Downstreamfunc6onalanalysistools
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Downstreamfunc6onalanalysis• Onceprimaryanalysisisdoneforourepigenomicassay,wehave:
– AsetofpeakscallsforChIP-Seqassays– Methyla9onlevelsatCpGsitesforWGB-Seqassays
• Next,wecanusethisdatatorunsomefunc9onalanalysesbycomparing:
– Differentregionsfromthesamedataset– Mul9plesamplesofthesamegroup– Differentgroups
![Page 5: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/5.jpg)
6/16/16
4
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Differen6allymethylatedsites
Bock C. Analysing and interpreting DNA methylation data. Nat Rev Genet. 2012 Oct;13(10):705-19.
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Methyla6ondownstreamanalysis• Iden9fyingdifferen9allymethylatedregions(DMR)acrosssamplegroups(celltypes,diseasestatus,etc.)
• Iden9fyingregionsofthegenomewithdifferentmethyla9onpaderns
Roadmap Epigenomics Consortium et al. Nature 518, 317-330 (2015) doi:10.1038/nature14248
![Page 6: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/6.jpg)
6/16/16
5
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
(1) D'haeseleer, Patrik. "What are DNA sequence motifs?." Nature biotechnology 24.4 (2006): 423-425.
Whataremo6fs?• Short,recurringpadernsinDNAthatarepresumedtohaveabiologicalfunc9on
• Oeenindicatesequence-specificbindingsitesforproteinssuchasnucleasesandtranscrip9onfactors(TF)
• Inthisexample,ifallowing1basemismatch,therearetwomo9fs:TTGACAandGCATC:
Example from http://slideplayer.com/slide/8679835/
(1)
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Exploringmo6fsinChIP-seqpeaks• Usingregionspreviouslylabeledaspeaks,wecantrytoiden9fymo9fs
• Iden9fyingtranscrip9onfactorbindingsites(TFBS)ishelpfultounderstandregulatorynetworkstranscrip9onmechanisms
![Page 7: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/7.jpg)
6/16/16
6
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
HOMER• Triestoiden9fyregulatoryelementsenrichedinonesetofsequencescomparedtoanother
• Mo9fdiscoveryalgorithmdesignedforregulatoryelementanalysisingenomicsapplica9ons(DNAsequencesonly)
– Knownmo9fscoun9ng– Denovomo9fsiden9fica9on– Ademptstomatchdenovomo9fstoknownones
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
HOMER• findMo9fsGenome.plademptstoiden9fymo9fsinaprovidedlistofgenomicregions
• Input:– BEDfilecontainingtheregions(peaksfile)
• Column1:chromosome• Column2:star9ngposi9on• Column3:endingposi9on• Column4:UniquePeakID• Column5:notused• Column6:Strand(+/-or0/1,where0="+",1="-")
– Referencegenomeassembly
– Size:fragmentsizetouseformo9ffinding
![Page 8: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/8.jpg)
6/16/16
7
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
HOMER-Execu6onsteps1. Verifypeak/BEDfile2. Extractsequencesfromthegenomecorrespondingtotheregionsintheinputfile
3. CalculateGC/CpGcontentofpeaksequences4. Preparsethegenomicsequencesoftheselectedsizetoserveasbackgroundsequences
5. Randomlyselectbackgroundregionsformo9fdiscovery6. Autonormaliza9onofsequencebias7. Checkenrichmentofknownmo9fs8. denovomo9ffinding
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
HOMER• Amonggeneratedresultfiles,twoHTML-formadedreportswillbeavailable:
– homerResults.html:formadedoutputofdenovomo9ffinding– knownResults.html:formadedoutputofknownmo9ffinding
http://homer.salk.edu/homer/ngs/peakMotifs.html
![Page 9: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/9.jpg)
6/16/16
8
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
LookingforsignificantGOenrichment• WecanlookatbiologicalsignificanceofourpeaksusingGeneOntologies(GO)termsgenomeannota9ons
– GO:Setofstructured,controlledvocabulariesforcommunityuseinannota9nggenes,geneproductsandsequences
• Populartool:theGenomicRegionsEnrichmentofAnnota9onsTool(GREAT)
http://bejerano.stanford.edu/great/public/html/index.php
(1) Gene Ontology Consortium. "The gene ontology project in 2008." Nucleic acids research 36.suppl 1 (2008): D440-D444.
(1)
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
GREAT:Cis-regulatoryregionsfunc6onspredic6on
• Bindingsitesareoeennotlocatedintheproximalregionofthegeneofinterest
• GREATlooksbeyondthisproximalregion
McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010 May;28(5):495-501.
![Page 10: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/10.jpg)
6/16/16
9
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
GREAT:Cis-regulatoryregionsfunc6onspredic6on
McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010 May;28(5):495-501.
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
GREAT• Input:BEDfilewithregionsofinterest• Output:MatchingGOtermsforMolecularFunc9ons,BiologicalProcesses,Phenotypes,Diseases,etc.
• ExamplewithH3K27acpeaksfrombonemarrowsample:
http://bejerano.stanford.edu/great/public/html/index.php
![Page 11: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/11.jpg)
6/16/16
10
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
LinkingGWASvariantstoChIP-Seqdata
Roadmap Epigenomics Consortium et al. Nature 518, 317-330 (2015) doi:10.1038/nature14248
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Integra6veanalysiswithRoadmapdata
Roadmap Epigenomics Consortium et al. Nature 518, 317-330 (2015) doi:10.1038/nature14248
![Page 12: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/12.jpg)
6/16/16
11
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
2-Workingwithpublicdatasets
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Workingwithpublicdatasets
• Manylargeconsor9aofferdatasetsformul9ple9ssues/diseases/condi9ons
• Thesearefreeresourcestodomorewithcompara9vestudies
• Publicdatasetsoffernocontroloverhowassaysweredone,andwhatinforma9onisavailable
![Page 13: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/13.jpg)
6/16/16
12
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
RoadmapEpigenomicsProject
http://www.roadmapepigenomics.org/
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
ENCODEConsor6um
https://www.encodeproject.org/
![Page 14: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/14.jpg)
6/16/16
13
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
IHEC• Nowadays,themostrecentandcompleteresourceisIHEC,theInterna9onalHumanEpigenomeConsor9um
• Interna9onaleffortwithseveralfundingagencies
http://ihec-epigenomes.org/
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
WhatisIHEC• Goal:Providingstandardizedreferenceepigenomesforavarietyofnormalanddisease9ssues
– Membergroupstakepartincommideesworkingonstandards(assays,data/metadatadistribu9on,ethics…)
![Page 15: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/15.jpg)
6/16/16
14
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
IHECDataPortal• Goal:IntegrateepigenomicpublicdatasetsproducedwithintheInterna9onalHumanEpigenomeConsor9um
– Rawdataisincontrolledaccessrepositories
• AsofApril2016:– over7,000humandatasets– datasetsfrom7consor9a,otherscoming
• Offerstoolsfordatasetsdiscovery,visualiza9onandpre-analysis
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Publiclyaccessibledatasets• DatasetsmadeavailableintheIHECDataPortalispubliclyaccessibleforeveryone’sownresearch
• Humandataofferedbysuchconsor9ausuallyfallsinoneoftwocategories:
– Controlledaccessdata• Rawdatafromsequencers• Clinical/sensi9veinforma9onsuchasphenotypes• ArchivedatrepositoriessuchasEGAanddbGaP
– Publicdata• Annota9ontracks,touseintoolssuchasUCSCGenomeBrowser,EnsemblandIGV.
• Somedonor,sampleandlibrarymetadata• Freelydownloadable
![Page 16: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/16.jpg)
6/16/16
15
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
3-Qualitycontrolforonlineresources
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Qualitycontrolonepigenomicsdatasets• Datasetsobtainedonlineareofvariablelevelsofquality• Qualityofdownloadeddatasetsmustbeassessed• Examplesofqualitycontroltests:
– Rawdata:FastQC– Signal:
• Signal-to-noisera9o• ChromImpute• Whole-genomesignalcorrela9onacrosstracks
• SomeQCtoolsareavailableasonlineresources– IHECDataPortalincludessomepreliminaryqualitycontroltests,suchasPearsonCorrela9ontestoverwholetracksignal
![Page 17: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/17.jpg)
6/16/16
16
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
ChromImpute• Allowsimpu9ngmissingsignaltracks• Toimputeasampleforamark,usestrainingdata:
– fromothersampleswiththesamemark– fromtheothermarksforthegivensample
Ernst J, Kellis M. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nature Biotechnology, 33:364-376, 2015.
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Ernst J, Kellis M. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nature Biotechnology, 33:364-376, 2015.
![Page 18: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/18.jpg)
6/16/16
17
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
4-Onlinevisualiza6onandanalysistools
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Onlinevisualiza6onandanalysistools• Manyaddi9onalresourcesareusefulforvisualizingandmanipula9ngdatasets
• Inthissec9on,wewillcoverafew:– IHECDataPortal– UCSCGenomeBrowser– WashUEpigenomeBrowser– Galaxy
![Page 19: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/19.jpg)
6/16/16
18
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
IHECDataPortal-Overview
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
IHECDataPortal-DataGrid
![Page 20: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/20.jpg)
6/16/16
19
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
IHECDataPortal-DatasetsCorrela6on
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
IHECDataPortal-DatasetsCorrela6on
![Page 21: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/21.jpg)
6/16/16
20
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
IHECDataPortal-Download
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
IHECDataPortal-ComingSoon• Comprehensivefilteringbasedonavailablemetadata• Metadataextrac9onfeatureinhuman-readableandmachine-readableformats
• Centralizeddataserving• Linkstopermanentsessions,foreasierci9ngandshareability
![Page 22: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/22.jpg)
6/16/16
21
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
VisualizingtrackswiththeUCSCGenomeBrowser
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
UCSCGenomeBrowserTrackHubs• TrackscanbeaggregatedusingatextdocumentintheUCSCGenomeBrowsertrackhubformat
• Advantage:Canbeeasilydistributedtocollaborators/usersofyourresources
• Inconvenient:Needtogeneratethistextdocument
• Documenta9on:– hdps://genome.ucsc.edu/goldenpath/help/hgTrackHubHelp.html
![Page 23: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/23.jpg)
6/16/16
22
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Smalltrackhubexample
track McGill_MS000101_monocyte_RNASeq_signal_forward type bigWig bigDataUrl http://epigenomesportal.ca/public_data/MS000101.monocyte.RNASeq.signal_forward.bigWig shortLabel 000101mono.rna longLabel MS000101 | human | monocyte | RNA-Seq | signal_forward track McGill_MS000101_monocyte_RNASeq_signal_reverse type bigWig bigDataUrl http://epigenomesportal.ca/public_data/MS000101.monocyte.RNASeq.signal_reverse.bigWig shortLabel 000101mono.rna longLabel MS000101 | human | monocyte | RNA-Seq | signal_reverse
• Minimumproper9esforatrack:– track:Symbolicnameofthetrack– type:Oneofthesupportedformats
• bigWig,bigBed,bigGenePred,bam,halSnake,vcfTabix– bigDataUrl:Webloca9on(URL)ofthedatafile– shortLabel:Shorttrackdescrip9on(Max17characters)– longLabel:Longertrackdescrip9on(displayedovertracksinthebrowser)
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
WashUEpigenomeBrowser• SupportsmanytracktypesincludedintheUCSCBrowser
– BigBedsareontheway– CanalsoloadUCSCtrackhubdocuments
http://epigenomegateway.wustl.edu/browser/
![Page 24: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/24.jpg)
6/16/16
23
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Galaxy• Web-basedframeworkofferingauser-friendlyinterfacemappingtomostpopularbioinforma9cstools
– "Dataintensivebiologyforeveryone."
• Allowsforreproducibleresults
– Steps/parameterskeptinhistory
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
GalaxyInterface• ManytoolscoveredinthisworkshopareavailableinGalaxy
![Page 25: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/25.jpg)
6/16/16
24
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Galaxy-Pipelinedesign• Abilitytodesigncustompipelinesandimportothers’
– Allthroughauser-friendlyGUI
• Tailoredforsmall/mediumscaleprojectswithnottoomanysamples
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
GenAP
• GenAPisaCanadiancompu9ngplarormforlifescienceresearchers
• LeveragesCANARIEhigh-speednetworkandComputeCanada(CC)HighPerformanceCompu9ng
• Userscancreatetheirownprivate,fullyconfiguredGalaxyandruntheiranalysesonComputeCanadaHPCs
• FreeforCanadianacademia– AllyouneedistogetaComputeCanadaaccount
![Page 26: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/26.jpg)
6/16/16
25
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
GenAPPipelines• Free,open-sourcesoewarewithPython• Manypipelinesavailable,suchasforepigenomics:
– RNA-Seq– RNA-SeqDenovo– ChIP-Seq– Methyla9onpipelinecomingsoon
• Allsoewarerequirementsarepre-installedatmanyComputeCanadaHPCs
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
GenAPPipelineshdps://bitbucket.org/mugqic/mugqic_pipelines
![Page 27: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/27.jpg)
6/16/16
26
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
• PrivateGalaxyinstance,sharablewithcollaborators• Computejobsmakinguseofgroup’sCCalloca9on
– Fasterthanusegalaxy.org
GenAP-Galaxy
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
GenAPPortal• LoginwithyourComputeCanadaaccount
![Page 28: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/28.jpg)
6/16/16
27
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
GenAPPortal• You’re then readyto connect to thePortal
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
GenAPPortal-PreparingGalaxy• Instan9a9ngaGalaxyapplica9onwithinGenAP
![Page 29: Canadian Bioinformacs Workshops 4.pdf · 2018. 11. 21. · 6/16/16 2 Module 4: Downstream analyses & integra6ve tools bioinformatics.ca • Over 98% of the human genome does not encode](https://reader034.vdocument.in/reader034/viewer/2022051812/602e7690de5b79590d547e5a/html5/thumbnails/29.jpg)
6/16/16
28
Module4:Downstreamanalyses&integra6vetools bioinformatics.ca
Conclusion• Inthisunit,wehavecovered:
– Sometypesofdownstreamanalyseswithepigenomicdata– Howtoobtainpubliclyaccessibledatasetsforyourownanalyses– Methodstoassessthequalityofpublicdata– Howtovisualizeepigenomicdatasetsusingonlinetools– Someonlineresourcestorunaddi9onalanalyseswithawebinterface
• Thefollowingworkshopwillprovideanintroduc9ontosomeofthetoolspresentedintheseslides
• Aeertheworkshop,ifyou’reinCanadianAcademia,getthatComputeCanada/GenAPaccount!☺