s s gg n n r r o o f f e e r r u u t t c c u u r r t t s s...

Post on 20-Mar-2018

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GGeennoommiicc IInnffrraassttrruuccttuurree ffoorr NNGGSSThe small, medium and large stuff that helps get the job done

Associate Professor Mik BlackDepartment of Biochemistry, University of Otago

BBrriieeff aassiiddee:: wwhhoo aamm II??Background in statistics: my (rather diverse and collaborative) researchinvolves the development and application of statistical methods forproblems in human disease genomics.

Heavily involved in the establishment of two government-fundednational infrastructure initiatives in New Zealand:

Formerly the bioinformatics team leader for NZGL (2012), and still anactive team member...

·

·

NZGL (New Zealand Genomics Ltd) - inter-university collaborationin genomics and bioinformatics.NeSI (NZ eScience Infrastructure) - cross institutional (universitiesand Crown Research Institutes) collaboration in high performancecomputing and eResearch.

-

-

·

2/31

OOvveerrvviieeww -- tthhiiss ttaallkk wwiillll ccoovveerr......Large-scale institutional infrastructure: out-sourcing the generationand analysis of genomic data.

Community infrastructure: shared genomics resources.

Personal infrastructure: tools and techniques for getting the most outof genomic (or any) data.

·

·

·

3/31

OOvveerrvviieeww -- tthhiiss ttaallkk wwiillll ccoovveerr......

BBuutt nnoo GGaammee ooff TThhrroonneess......

Large-scale institutional infrastructure: out-sourcing the generationand analysis of genomic data.

Community infrastructure: shared genomics resources.

Personal infrastructure: tools and techniques for getting the most outof genomic (or any) data.

·

·

·

4/31

OOvveerrvviieeww -- tthhiiss ttaallkk wwiillll ccoovveerr......

BBuutt nnoo GGaammee ooff TThhrroonneess......

First  though:  I'm  a  statistician  -­‐  let's  generate  some  data.

Large-scale institutional infrastructure: out-sourcing the generationand analysis of genomic data.

Community infrastructure: shared genomics resources.

Personal infrastructure: tools and techniques for getting the most outof genomic (or any) data.

·

·

·

5/31

OOuuttssoouurrcciinngg yyoouurr ggeennoommiiccssMany highly competitive (and high quality) genomics providers exist.

Preferences depend on many things...

·

Offer various levels of flexibility in terms of the types of data theywill/can generate.Provide a variety of services, from basic data generation to"bundled" full-service options which take tissue samples, andproduce analytic results.Some providers also provide web- or VM-based analysis options.

-

-

Your own competence/capacity for handling genomic dataPrevious experience with a providerWord of mouthPrice....

----

6/31

YYoouurr eexxppeerriimmeenntt:: wwhhaatt ttoo tthhiinnkk aabboouutt??What is the question you are trying to answer?Have you designed an appropriate sequencing experiment?

Once you have the data, what will you do with it?

··

will it provide the answer to your question?is there a better way to get to that answer?

--

·Storage and processing? On-site or off? Space and CPUrequirements?Do you have access to the skills/resources required to completethe analysis task?

-

-

7/31

YYoouurr eexxppeerriimmeenntt:: wwhhaatt ttoo tthhiinnkk aabboouutt??

Ask  (and  answer)  these  questions  BEFORE  you  generate  the  data

What is the question you are trying to answer?Have you designed an appropriate sequencing experiment?

Once you have the data, what will you do with it?

··

will it provide the answer to your question?is there a better way to get to that answer?

--

·Storage and processing? On-site or off? Space and CPUrequirements?Do you have access to the skills/resources required to completethe analysis task?

-

-

8/31

OOuuttssoouurrcciinngg yyoouurr bbiiooiinnffoorrmmaattiiccssWhat are you outsourcing?

Make sure a full analysis plan is in place before committing to the work.

·Quality assessment and basic bioinformatics?Generic data analysis?Domain-specific analysis?Tailored analysis for your specific question?

----

·If possible, have the plan inspected by an independent "expert".-

9/31

OOuuttssoouurrcciinngg yyoouurr bbiiooiinnffoorrmmaattiiccss

DON'T  underestimate  the  value  that  an  expert  team  ofbioinformaticians  can  bring  to  your  project,  but  DO  make  sure  youknow  what  you  will  be  getting  from  them  (and  the  cost...)

What are you outsourcing?

Make sure a full analysis plan is in place before commuting to the work.

·Quality assessment and basic bioinformatics?Generic data analysis?Domain-specific analysis?Tailored analysis for your specific question?

----

·If possible, have the plan inspected by an independent "expert".-

10/31

OOuuttssoouurrcciinngg yyoouurr bbiiooiinnffoorrmmaattiiccssQuality assessment: this will usually be provided with the data. Don'tbe afraid to ask for more information (and even more QA).

Basic bioinformatics: e.g., quality trimming/filtering and alignment toa reference genome - make sure you are very clear about what youwant (if you know): trim/filter parameters, genome build, organism (!),aligner, parameters...

Generic data analysis: e.g., variant calling, differential expression etc

Tailored analysis for your specific question: this requires the mostspecification (and input from you) and should involve a provider with abackground in this area.

·

·

·

·

11/31

OOuuttssoouurrcciinngg yyoouurr bbiiooiinnffoorrmmaattiiccss

ALWAYS  specify  that  you  require  the  code  used  to  perform  theanalysis  -­‐  you  need  to  know  what  was  done  every  step  of  the  way

Quality assessment: this will usually be provided with the data. Don'tbe afraid to ask for more information (and even more QA).

Basic bioinformatics: e.g., quality trimming/filtering and alignment toa reference genome - make sure you are very clear about what youwant (if you know): trim/filter parameters, genome build, organism (!),aligner, parameters...

Generic data analysis: e.g., variant calling, differential expression etc

Tailored analysis for your specific question: this requires the mostspecification (and input from you) and should involve a provider with abackground in this area.

·

·

·

·

12/31

CCoommmmuunniittyy rreessoouurrcceess:: wwhhaatt iiss aavvaaiillaabbllee??Databases/Browsers (the big players)

Software tools:

Generic data sources:

·NCBI (http://www.ncbi.nlm.nih.gov/)Ensembl (http://www.ensembl.org/)UCSC (https://genome.ucsc.edu/)

MANY domain-specific options (check Nat Gen annual DB issue)

----

·GenomeSpace (Galaxy, GenePattern, Cytoscape,...)R/BioconductorThat scary command line thing....

---

·GEO, ArrayExpress, inSilicoDB, SRA, dbGaP, EGA,....-

13/31

GGeenneePPaatttteerrnn:: wweebb--bbaasseedd aannaallyyssiiss ppllaattffoorrmm

http://www.broadinstitute.org/cancer/software/genepattern/14/31

GGaallaaxxyy:: wweebb--bbaasseedd aannaallyyssiiss ppllaattffoorrmm

https://usegalaxy.org/15/31

GGeennoommeeSSppaaccee:: jjooiinniinngg uupp tthhee ccooooll ssttuuffff......

http://www.genomespace.org/16/31

GGeennoommeeSSppaaccee:: jjooiinniinngg uupp tthhee ccooooll ssttuuffff......

http://www.genomespace.org/17/31

BBiiooccoonndduuccttoorr:: wwhheerree tthheeyy ggeett tthhee ccooooll ssttuuffff......

http://bioconductor.org/18/31

LLooccaall ccoommppuuttaattiioonnaall rreessoouurrcceessQuite local:

Local(ish):

·NCI (HPC for bioinformatics): http://www.ncisf.orgGenomics Virtual Lab: http://genome.edu.auNeCTAR (HPC/eResearch resources): http://nectar.org.au

---

·NZGL (customized environment): http://nzgenomics.org.nzNeSI ("raw" HPC): http://nesi.org.nz

--

19/31

TTrraaiinniinngg -- ""II''mm ssuurree II ccaann ddoo iitt bbeetttteerr......""Most researchers aren't looking to out-source the investigativecomponent of their research.

Training: there are MANY opportunities available for up-skilling:

·

Data analysis is a fundamental part of scientific investigation.Many investigators want to "own" the entire dataprocessing/analysis process, others don't.

--

·

Bioplatforms Australia: http://www.bioplatforms.com.auInstitute based (e.g., IMB Winter School...)NZGL: http://nzgenomics.co.nzOnline courses (what to choose...??):

----

FREE R/Bioconductor material: http://bioconductor.org-

20/31

BBiiooccoonndduuccttoorr ccoouurrssee mmaatteerriiaall

http://bioconductor.org/help/course-materials/2014/21/31

TTrraaiinniinngg -- kknnooww wwhhaatt yyoouu nneeeeddBioinformatics is a VERY broad field: what is it that you want to learn?

Specialization is a GOOD thing, if you can afford it

·

Early-stage analysis: QA/QC, alignment

More specialized: variant calling (SNPs, CNV, other SV),assembly, RNA-seq count generation, metagenomics...

Further downstream: analysis of "processed" data (clustering,prediction, pathways, network reconstruction, phylogeny...)

-

-

-

·it's great to be a jack-of-all-trades, but the "master-of-none"trade-off can be a problem.makes sense to invest your time where it will be most effective:learn the skills most relevant to what you are trying to accomplishwith your research.

-

-

22/31

TTrraaiinniinngg -- ""II''mm ssuurree II ccaann ddoo iitt bbeetttteerr......""

DO  take  advantage  of  training  opportunities,  but  DON'Toverestimate  what  is  being  provided.

There  is  only  so  much  we  can  teach  in  a  few  days...

Learn  "enough  to  be  dangerous",  and  then  find  a  good"bioinformatics  buddy"  to  keep  you  from  going  astray  -­‐  know  yourlimitations.

23/31

RReepprroodduucciibbllee rreesseeaarrcchhWe are currently (I hope) in the midst of a "reproducibility revolution"

The R computing environment provides a good example of this, butthere are a number of others (e.g., iPython notebooks).

·

increased emphasis on sharing all aspects of our research.strong emphasis on the use (and development) of open sourcetools that build on existing frameworks.move (by many) towards the use of frameworks for ensuring thatwe are doing "reproducible research".

--

-

·

Rstudio (http://rstudio.com) includes R markdown (v2) by default.Facilitates the production of high-quality output (HTML, PDF, evenWord!) with embedded analysis and results.

--

24/31

RRssttuuddiioo iinntteerrffaaccee

http://rstudio.com25/31

RR mmaarrkkddoowwnn oouuttppuutt

http://rmarkdown.rstudio.com/26/31

TToooollss ffoorr ccoollllaabboorraattiioonnGenomeSpace (e.g., Galaxy and GenePattern) provide domain-specifictools for the collaborative sharing of data and analyses.A number of groups combine cloud-based tools in an ad hoc fashion togenerate a collaborative research environment:

·

·

storage provision (e.g., Dropbox, Google Drive)code sharing/editing (e.g., Github, Bitbucket)reproducible research (e.g., rmarkdown, iPython notebooks)shared/collaborative web-based analysis (e.g., RStudio Server,Shiny Server).

----

27/31

TToooollss ffoorr ccoollllaabboorraattiioonn

Although  seemingly  haphazard,  this  approach  provides  a  lot  offlexibility  for  incorporating  new  tools  as  they  emerge.

GenomeSpace (e.g., Galaxy and GenePattern) provide domain-specifictools for the collaborative sharing of data and analyses.A number of groups combine cloud-based tools in an ad hoc fashion togenerate a collaborative research environment:

·

·

storage provision (e.g., Dropbox, Google Drive)code sharing/editing (e.g., Github, Bitbucket)reproducible research (e.g., rmarkdown, iPython notebooks)shared/collaborative web-based analysis (e.g., RStudio Server,Shiny Server).

----

28/31

SSuummmmaarryyLarge-scale infrastructure - know what you are getting:

Community infrastructure - know what is available:

Personal infrastructure - know what you are doing:

·

Clearly define plans and expectations in terms of the data andanalysis that you are paying for.Ensure you have the resources needed to complete the project.

-

Generic and domain-specific resources exist which can facilitate,streamline and complement your research.

-

·

Know your tools, and develop (and follow!) an analysis plan.The "reproducible research" paradigm offers a valuable set ofresources to help ensure reproducibility.

--

29/31

IInn cclloossiinngg

These  slides  are  not  meant  to  contain  a  comprehensive  summaryof  the  issues  and  opportunities  surrounding  genomics  andbioinformatics  infrastructure.

If  you  still  have  questions  after  this  session,  please  feel  free  tocome  and  have  a  chat  with  me  over  the  next  couple  of  days  (oremail):

mik.black@otago.ac.nz30/31

AA ((nnoonn--eexxhhaauussttiivvee)) lliisstt ooff uusseeffuull llooccaall lliinnkkss::

* These guys are all giving presentations this week...

Australia:

New Zealand:

·

QFAB*: http://qfab.orgAGRF*: http://agrf.org.auAus. Bioinformatics Network*: http://australianbioinformatics.netBioplatforms Australia: http://bioplatforms.com.auEnsembl/EMBL resources (local)*: http://braembl.org.au

-----

·

NZGL: http://nzgenomics.co.nzNZ Bioinformatics Institute: http://www.bioinformatics.org.nzNeSI: http://nesi.org.nz

---

31/31

top related