bioinformatics tools for epigenetics · pdf filebioinformatics tools for epigenetics studies ....
TRANSCRIPT
Simon Cockell
Newcastle University
Bioinformatics Support Unit
http://bsu.ncl.ac.uk/
Bioinformatics Tools for Epigenetics Studies
Introduction
• Genome-wide studies still in relative infancy
• Tools landscape still developing
• Many choices for each type of data
– Not always clear which is ‘best’
• or what ‘best’ even means
• Few tools are user-friendly
A word about R...
• R is a programming language
• Focused on statistics
• Extremely powerful for data analysis and
visualisation
• Even programmers find it opaque
• Extensive support for bioinformatics applications
through “Bioconductor”
http://www.r-project.org
Bioconductor
• R packages for Bioinformatics
• 554 (and counting) packages for analysis of:
– Microarrays
– High Throughput Assays
– Sequence Data
– Annotation, and more... (including epigenetics)
• Inherits all the user-unfriendliness of R
http://bioconductor.org
Bioconductor for Epigenetics
• Packages for array and sequencing analysis
• Certain data types well supported
– Illumina 450k methylation data
– Nimblegen tiling arrays
– Many sequencing formats
• Pipeline solutions possible without having to chain
many different tools
Bioconductor
Microarray Approaches
• A microarray is just a collection of probes on a
solid surface
• Any system that produces differential amounts of
DNA that bind to these probes can be analysed
• Typically gene expression but increasingly:
– SNP detection
– Copy Number
– Protein Binding (ChIP-on-chip)
– Methylation (MeDIP-chip)
MeDIP-Chip
MeDIP-chip tools
• Much in common with expression microarray
analysis
• Inherits normalisation (within and between chip)
• Enrichment analysis specialised
– Needs to account for GC content of genomic
region
MeDIP-chip - BATMAN
• BAyesian Tool for Methylation ANalysis
• Adjusts probe log2-ratios using coupling factor
based on CpG density
• More reliable enrichment calculation than simple
peak-calling
• Very difficult to use
• Can also deal with MeDIP-Seq data
Down et al. Nature Biotechnology 26, 779 - 785 (2008)
MeDIP-chip - BATMAN
Batman is not a piece of software; it is an algorithm
performed using the command prompt. As such it is
not especially user-friendly and is quite a
computationally technical process.
http://en.wikipedia.org/wiki/Bayesian_tool_for_methylation_analysis
MeDIP-chip - MEDME
• Bioconductor package
• Requires a calibration experiment generating ‘fully-methylated’ DNA
– Can be simulated (though not obvious how...)
• MeDIP enrichment is not linearly correlated to the methylation level
• Can be modeled as a sigmoidal function of the log2(mCpG)
• Use this model to predict absolute and relative methylation levels from MeDIP data
Pelizzola et al. Genome Res. 18: 1652-1659 (2008)
MEDME
Sequencing Approaches
• High throughput sequencing catalysing a
revolution
• The ‘death of the microarray’ is widely predicted*
• Many older technologies being replaced with
sequencing
• Epigenetics (epigenomics?) among them
• Relevant technologies:
– ChIP-Seq, MeDIP-Seq, BS-Seq
* Heidi Ledford – “The death of microarrays?” Nature 455 (2008)
Bisulphite Sequencing
• Reads produced in standard BS-Seq experiment
can map to one of 4 ‘strands’
– Watson, Watson-BS, Crick, Crick-BS
• Directional sequencing can be used to improve
problem space
• Read mapping still more complicated than
standard NGS experiment
BS-Seq Read Aligners
• Bismark
– Uses Bowtie for alignment against pre-indexed
genomes
• BSMAP
– Uses SOAP in similar way
• RMAPBS
– uses an seed and hashing strategy to locate
partial matches for reads
BS-Seq Visualisation
• Genome viewers for visualising sites of
methylation & enrichment
– IGV, SeqMonk, Tablet
• Circos for novel visualisations of genome-wide
data
Chatterjee A et al. Nucl. Acids Res. 2012;40:e79-e79
Differential methylation from different
aligners
Genome-wide visualisations
Holger Heyn et al. (2012) Epigenetics 7(6): 542-50
Downstream Tools
• Differentially methylated regions usually related to
genes
• Downstream treatment of gene lists used to confer
‘meaning’
• Examine for enrichment:
– Functions (Gene Ontology)
– Pathways (KEGG)
Ingenuity Pathway Analysis
• Take a simple gene list and annotate
• Information contained in the “Ingenuity Knowledge
Base”
• Manually curated
– Physical Interactions
– Pathway Data
– Functional Data
http://www.ingenuity.com
Ingenuity Pathway Analysis What’s it good for?
• Multiple input formats
• Associating genes with functions and pathways in common
• Attaching statistical significance to these associations
• Pharmacology-specific analysis (Tox analysis, biomarker analysis etc)
• Pretty pictures
Ingenuity Pathway Analysis What’s it not good for?
• Transparency
• Interoperability
• “Unusual” organisms
• Overly short or long gene lists
• Tight budgets
IPA - Networks
IPA - Pathways
DAVID
• Database for Annotation, Visualization and
Integrated Discovery
• http://david.abcc.ncifcrf.gov/
• DAVID is a data warehouse that links genes to
functional annotation
– much like the IPA Knowledge Base
• We can use it to look for functional enrichment
Summary
• Bioinformatics for epigenetics still developing
• User-friendly tools still a rarity
– The algorithms come first, the shiny comes later
• Powerful and flexible analysis systems do exist
– e.g. Bioconductor
• Data summarisation and visualisation important
where quantity of data becomes overwhelming
– Particularly for sequencing approaches
Reading List
• Thomas Down et al. (2008) A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis. Nature Biotechnology 26, 779-85
– doi:10.1038/nbt1414
• Mattia Pelizzola et al. (2008) MEDME: An experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIP-enrichment. Genome Res. 18: 1652-9
– doi: 10.1101/gr.080721.108
• Nina Pälmke, et al. (2011) Comprehensive analysis of DNA-methylation in mammalian tissues using MeDIP-chip, Methods 53(2): 175-84
– doi: 10.1016/j.ymeth.2010.07.006.
• Aniruddha Chatterjee et al. (2012) Comparison of alignment software for genome-wide bisulphite sequence data. Nucl. Acids Res. 40(10): e79
– doi:10.1093/nar/gks150
• Holger Heyn et al. (2012) Whole-genome bisulfite DNA sequencing of a DNMT3B mutant patient. Epigenetics 7(6): 542-50
– doi: 10.4161/epi.20523
• Brad T Sherman et al. (2007) DAVID Knowledgebase: a gene-centered database integrating heterogeneous gene annotation resources to facilitate high-throughput gene functional analysis. BMC Bioinformatics 8: 426
– doi: 10.1186/1471-2105-8-426
For further information, contact:
EMail: [email protected]
Website: bsu.ncl.ac.uk
Twitter: @sjcockell