the genomic hyperbrowser · dna as a line • this is indeed the dynamic perspective! • dna...

36
The Genomic HyperBrowser Statistical genome analysis made accessible and reproducible Sveinung Gundersen Elixir.no, UiO

Upload: others

Post on 18-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

The Genomic HyperBrowser

Statistical genome analysismade accessible and reproducible

Sveinung GundersenElixir.no, UiO

Page 2: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Credit

• Based on a presentation Assoc. Prof. Geir Kjetil Sandve held at a meeting in Oxford, may 7th, 2013

Page 3: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Focus

• Downstream analysis of high-level genome-scale data

• You want to compare your data with existing data collections

• But..

• how to find the questions they can answer?

• how to go about answering questions at this scale?

Page 4: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Outline

• A bioinformatician’s view on genomics

• Analyzing genomic track data

• Under the hood of the analysis tools

• A quick tour of HyperBrowser features

Page 5: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Outline

• A bioinformatician’s view on genomics

• Analyzing genomic track data

• Under the hood of the analysis tools

• A quick tour of HyperBrowser features

Page 6: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

What is a reference genome?

• It’s a bunch of sequence

• Human genome a collection of ~3 billion nucleotides

• It’s a map!

• Where sequences belong in relation to each other

• Essentially makes up a line

Genome

Page 7: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

The whiteboard and the computer file

Genome

Reference genome acts like

coordinate system for genomic data

chr21!10079666!10120808!NM_001187chr21!13332357!13412442!NR_026916chr21!13700575!13700652!NR_036164chr21!13904368!13935777!NM_174981chr21!14137324!14142556!NR_026755

Page 8: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

DNA as a line

• This is indeed the dynamic perspective!

• DNA doesn’t change that much from hour to hour, or cell to cell

• But a lot happens along the DNA: binding by TFs, modifications of histones, ...

• Even for gene expression or SNPs we can usually abstract away from the underlying sequence

• Functional genomics typically refers to the genome as a line (map), not as sequence

Page 10: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Public data- ENCODE, FANTOM, GEO, Roadmap Epigenomics ..

• By now, Big Science provides:

• Chromatin accessibility (DHSs) for ~350 cell samples

• Binding of ~100 TFs in several cell types

• Most histone modifications in several cell types

• Gene expression for thousands of setups

• TSS and active promoters in ~950 cell samples

• DNA methylation, 3D genome structure, ...

Page 11: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Outline

• A bioinformatician’s view on genomics

• Analyzing genomic track data

• Under the hood of the analysis tools

• A quick tour of HyperBrowser features

Page 12: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Exploiting the data

• Data is becoming less of a bottleneck

• With so much public data, some is likely to be relevant

• Producing broad amounts of new data is often within reach

• But, asking the right questions is still tricky

• Forming interesting hypotheses is no easier than before

• The large scale complicates analysis

Page 13: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

This can’t be it?!

?

Page 14: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Cell types and MS associated regions

• Regions of the genome are not always active

• Varies e.g. between cells types

• Due to e.g. modification of histones

• In which cell types are MS associated regions active?

Page 15: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Cell-type specific activity of MS regions

• MS GWAS SNP locations along genome

• Histone modification-derived chromatin states along the genome, in 9 cell types

• Derived from ENCODE data (Nature, 473, 43–49)

• Are regions around MS GWAS SNPs unexpectedly active in B-cells (gm12878)?

Page 16: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

A simple approach

• Do MS regions overlap more than expected with B-cell AP regions?

• But, this is really a bit too simple

Page 17: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

A more reasonable approach (and still quite straightforward)

• Do MS regions overlap unexpectedly more with B-cell than e.g. stem cell regions?

• Yes!

• (“Genomic regions associated with multiple sclerosis are active in B cells”, PLoS One. 2012;7(3))

Page 18: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Outline

• A bioinformatician’s view on genomics

• Analyzing genomic track data

• Under the hood of the analysis tools

• A quick tour of HyperBrowser features

Page 19: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Delineating basic types of genomic tracks

Points

Segments

Function

Bins

Page 20: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Track types:7 basic track types

Genome Partition (GP)

Step Function (SF)

Function (F)

Points (P)

Segments (S)

Valued Points (VP)

Valued Segments (VS)

Page 21: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Track types:8 advanced track types

Linked Points (LP)

Linked Segments (LS)

Linked Genome Partition (LGP)

Linked Valued Points (LVP)

Linked Valued Segments (LVS)

Linked Step Function (LSF)

Linked Base Pairs (LBP)

Linked Function (LF)

Page 22: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along
Page 23: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

S-S Overlap

Page 24: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

The troubling random nature

• Counting overlap is straightforward

• But statistical testing requires random data

• “The multitudes of possible genomes that evolution might have produced for our and other species”

• Must find something that is reasonable enough

• Does appropriate randomness match statistical tests?

Page 25: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Tracing assumptions

• Textbook Wilcoxon H0:

• Values (4) independent and symmetric around 0

• But what is assumed on the genomic track data?

Page 26: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

A grammar for null models

• Specifying assumptions:

• Which of the tracks should be randomized?

• Which properties should still be preserved?

• How should track elements be randomized?

• Computing p-values according to model

• Exact/asymptotic test if assumptions match

• Monte Carlo with explicit randomization if needed

Page 27: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Outline

• A bioinformatician’s view on genomics

• Analyzing genomic track data

• Under the hood of the analysis tools

• A quick tour of HyperBrowser features

Page 28: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Tracks suitable for analysis

Basic trackrepresentation

External trackcollection

(UCSC, ENCODE)

Galaxy historydata

Explorative plotsof tracks and

relations

Visualization

(Table 2)

5 tools

Hypothesessupported

by data

Hypothesis testing

(Table 1)

Analyze genomic tracks

Unsupervisedsubgrouping

of tracks

Clusteringanalysis

(Table 2)

Cluster tracks

Hypotheses on3D co-localizationsupported by data

3Danalysis

(Table 2)

Analyze spatial

co-localization

Generatetracks

(Table 3)

6 tools

HB trackrepository

(Table 3)

Extracttracktool

Customizetracks

(Table 3)

4 tools

Data preparationData customizationAnalysis

Spreadsheet /WDEXODU�ÀOHV

Format & convert

(Table 3)

2 tools

Statisticson tracks and

relations

Descriptivestatistics

(Table 1)

Analyze genomic tracks

Page 29: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Tracks suitable for analysis

Basic trackrepresentation

External trackcollection

(UCSC, ENCODE)

Galaxy historydata

Explorative plotsof tracks and

relations

Visualization

(Table 2)

5 tools

Hypothesessupported

by data

Hypothesis testing

(Table 1)

Analyze genomic tracks

Unsupervisedsubgrouping

of tracks

Clusteringanalysis

(Table 2)

Cluster tracks

Hypotheses on3D co-localizationsupported by data

3Danalysis

(Table 2)

Analyze spatial

co-localization

Generatetracks

(Table 3)

6 tools

HB trackrepository

(Table 3)

Extracttracktool

Customizetracks

(Table 3)

4 tools

Data preparationData customizationAnalysis

Spreadsheet /WDEXODU�ÀOHV

Format & convert

(Table 3)

2 tools

Statisticson tracks and

relations

Descriptivestatistics

(Table 1)

Analyze genomic tracks

Page 30: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Current focus

• Main focus:

• Simple fetching collections of genomic tracks from public sources

• Handling of multi-track collections and analysis

• Better integration of HyperBrowser with NeLS and TSD

Page 31: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Future directions

• Analyzing predictor-enhancer interaction, taking high-resolution chromosome conformation data into account

• Better handling of phenotype information (for pharmacology collaboration projects)

• Several other collaboration projects

Page 32: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Publications

• Core statistical analysis system

• “The Genomic HyperBrowser: inferential genomics at the sequence level” (Genome Biology, 2010)

• “The Genomic HyperBrowser: an analysis web server for genome-scale data” (Nucleic Acids Research, 2013)

• Types of genomic tracks

• “Identifying elemental genomic track types and representing them uniformly” (BMC Bioinformatics, 2011)

Page 33: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Publications

• Google maps of many-to-many analyses

• “The differential disease regulome” (BMC Genomics, 2011)

• 3D genome structure analysis

• “Handling realistic assumptions in hypothesis testing of 3D co-localization of genomic elements” (Nucleic Acids Research, 2013)

Page 34: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

The team

Knut Liestøl

Eivind Tøstesen

Sigve Nakken

Halfdan Rydbeck

Geir Kjetil Sandve

Trevor Clancy Fang

Liu Sveinung Gundersen

Ingrid K.

Lars

Arnoldo Frigessi

Eivind HovigMorten Johansen

Marit HoldenVegard NygaardEgil Ferkingstad

2008

2012

Page 35: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Support

Page 36: The Genomic HyperBrowser · DNA as a line • This is indeed the dynamic perspective! • DNA doesn’t change that much from hour to hour, or cell to cell • But a lot happens along

Conclusion

• If you want to do genome analysis, and don’t want to reinvent the wheel:

• Google “HyperBrowser” and try out the web system

• PubMed “HyperBrowser” and skim 2013 NAR article