bioinformatics tools for epigenetics · pdf filebioinformatics tools for epigenetics studies ....

29
Simon Cockell Newcastle University Bioinformatics Support Unit http://bsu.ncl.ac.uk/ [email protected] Bioinformatics Tools for Epigenetics Studies

Upload: duonglien

Post on 16-Mar-2018

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Simon Cockell

Newcastle University

Bioinformatics Support Unit

http://bsu.ncl.ac.uk/

[email protected]

Bioinformatics Tools for Epigenetics Studies

Page 2: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Introduction

• Genome-wide studies still in relative infancy

• Tools landscape still developing

• Many choices for each type of data

– Not always clear which is ‘best’

• or what ‘best’ even means

• Few tools are user-friendly

Page 3: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

A word about R...

• R is a programming language

• Focused on statistics

• Extremely powerful for data analysis and

visualisation

• Even programmers find it opaque

• Extensive support for bioinformatics applications

through “Bioconductor”

http://www.r-project.org

Page 4: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Bioconductor

• R packages for Bioinformatics

• 554 (and counting) packages for analysis of:

– Microarrays

– High Throughput Assays

– Sequence Data

– Annotation, and more... (including epigenetics)

• Inherits all the user-unfriendliness of R

http://bioconductor.org

Page 5: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Bioconductor for Epigenetics

• Packages for array and sequencing analysis

• Certain data types well supported

– Illumina 450k methylation data

– Nimblegen tiling arrays

– Many sequencing formats

• Pipeline solutions possible without having to chain

many different tools

Page 6: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Bioconductor

Page 7: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Microarray Approaches

• A microarray is just a collection of probes on a

solid surface

• Any system that produces differential amounts of

DNA that bind to these probes can be analysed

• Typically gene expression but increasingly:

– SNP detection

– Copy Number

– Protein Binding (ChIP-on-chip)

– Methylation (MeDIP-chip)

Page 8: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

MeDIP-Chip

Page 9: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

MeDIP-chip tools

• Much in common with expression microarray

analysis

• Inherits normalisation (within and between chip)

• Enrichment analysis specialised

– Needs to account for GC content of genomic

region

Page 10: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

MeDIP-chip - BATMAN

• BAyesian Tool for Methylation ANalysis

• Adjusts probe log2-ratios using coupling factor

based on CpG density

• More reliable enrichment calculation than simple

peak-calling

• Very difficult to use

• Can also deal with MeDIP-Seq data

Down et al. Nature Biotechnology 26, 779 - 785 (2008)

Page 11: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

MeDIP-chip - BATMAN

Batman is not a piece of software; it is an algorithm

performed using the command prompt. As such it is

not especially user-friendly and is quite a

computationally technical process.

http://en.wikipedia.org/wiki/Bayesian_tool_for_methylation_analysis

Page 12: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

MeDIP-chip - MEDME

• Bioconductor package

• Requires a calibration experiment generating ‘fully-methylated’ DNA

– Can be simulated (though not obvious how...)

• MeDIP enrichment is not linearly correlated to the methylation level

• Can be modeled as a sigmoidal function of the log2(mCpG)

• Use this model to predict absolute and relative methylation levels from MeDIP data

Pelizzola et al. Genome Res. 18: 1652-1659 (2008)

Page 13: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

MEDME

Page 14: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Sequencing Approaches

• High throughput sequencing catalysing a

revolution

• The ‘death of the microarray’ is widely predicted*

• Many older technologies being replaced with

sequencing

• Epigenetics (epigenomics?) among them

• Relevant technologies:

– ChIP-Seq, MeDIP-Seq, BS-Seq

* Heidi Ledford – “The death of microarrays?” Nature 455 (2008)

Page 15: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Bisulphite Sequencing

• Reads produced in standard BS-Seq experiment

can map to one of 4 ‘strands’

– Watson, Watson-BS, Crick, Crick-BS

• Directional sequencing can be used to improve

problem space

• Read mapping still more complicated than

standard NGS experiment

Page 16: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

BS-Seq Read Aligners

• Bismark

– Uses Bowtie for alignment against pre-indexed

genomes

• BSMAP

– Uses SOAP in similar way

• RMAPBS

– uses an seed and hashing strategy to locate

partial matches for reads

Page 17: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

BS-Seq Visualisation

• Genome viewers for visualising sites of

methylation & enrichment

– IGV, SeqMonk, Tablet

• Circos for novel visualisations of genome-wide

data

Page 18: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Chatterjee A et al. Nucl. Acids Res. 2012;40:e79-e79

Differential methylation from different

aligners

Page 19: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Genome-wide visualisations

Holger Heyn et al. (2012) Epigenetics 7(6): 542-50

Page 20: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Downstream Tools

• Differentially methylated regions usually related to

genes

• Downstream treatment of gene lists used to confer

‘meaning’

• Examine for enrichment:

– Functions (Gene Ontology)

– Pathways (KEGG)

Page 21: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Ingenuity Pathway Analysis

• Take a simple gene list and annotate

• Information contained in the “Ingenuity Knowledge

Base”

• Manually curated

– Physical Interactions

– Pathway Data

– Functional Data

http://www.ingenuity.com

Page 22: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Ingenuity Pathway Analysis What’s it good for?

• Multiple input formats

• Associating genes with functions and pathways in common

• Attaching statistical significance to these associations

• Pharmacology-specific analysis (Tox analysis, biomarker analysis etc)

• Pretty pictures

Page 23: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Ingenuity Pathway Analysis What’s it not good for?

• Transparency

• Interoperability

• “Unusual” organisms

• Overly short or long gene lists

• Tight budgets

Page 24: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

IPA - Networks

Page 25: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

IPA - Pathways

Page 26: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

DAVID

• Database for Annotation, Visualization and

Integrated Discovery

• http://david.abcc.ncifcrf.gov/

• DAVID is a data warehouse that links genes to

functional annotation

– much like the IPA Knowledge Base

• We can use it to look for functional enrichment

Page 27: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Summary

• Bioinformatics for epigenetics still developing

• User-friendly tools still a rarity

– The algorithms come first, the shiny comes later

• Powerful and flexible analysis systems do exist

– e.g. Bioconductor

• Data summarisation and visualisation important

where quantity of data becomes overwhelming

– Particularly for sequencing approaches

Page 28: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

Reading List

• Thomas Down et al. (2008) A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nature Biotechnology 26, 779-85

– doi:10.1038/nbt1414

• Mattia Pelizzola et al. (2008) MEDME: An experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIP-enrichment. Genome Res. 18: 1652-9

– doi: 10.1101/gr.080721.108

• Nina Pälmke, et al. (2011) Comprehensive analysis of DNA-methylation in mammalian tissues using MeDIP-chip, Methods 53(2): 175-84

– doi: 10.1016/j.ymeth.2010.07.006.

• Aniruddha Chatterjee et al. (2012) Comparison of alignment software for genome-wide bisulphite sequence data. Nucl. Acids Res. 40(10): e79

– doi:10.1093/nar/gks150

• Holger Heyn et al. (2012) Whole-genome bisulfite DNA sequencing of a DNMT3B mutant patient. Epigenetics 7(6): 542-50

– doi: 10.4161/epi.20523

• Brad T Sherman et al. (2007) DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis. BMC Bioinformatics 8: 426

– doi: 10.1186/1471-2105-8-426

Page 29: Bioinformatics Tools for Epigenetics · PDF fileBioinformatics Tools for Epigenetics Studies . Introduction ... Batman is not a piece of software; it is an algorithm performed using

For further information, contact:

EMail: [email protected]

Website: bsu.ncl.ac.uk

Twitter: @sjcockell