using public databases to inform research questions
TRANSCRIPT
Using Public Databases to Inform Research Questions
Day 1
Large Reference Epigenome Projects Understanding Natural Variation
Understanding Epigenetic Variation
• International consortia have pursued the establishment of reference epigenomes for a large number of cell types and conditions
• Epigenome Roadmap published in Nature February 2015
• main findings of the NIH Roadmap EpigenomicsProgram
The Beginnings
• ENCODE (the Encyclopedia of DNA Elements) started in 2003 to identify functional elements in the non-coding parts of the genome
• Understand how the genome is packaged, regulated, and read
• In 2012 Nature, Genome Research and Genome Biologypublished 30 papers all together on the results of the ENCODE project
• While ENCODE transformed technology and data analysis, clinical application was limited
• Most results were from a small number of cell lines
ENCODE Data types:
• After the human genome had been mapped, it was clear the epigenomeneeded to be explored as well
https://www.encodeproject.org/
ENCODE to Roadmap• Funded by NIH, the Roadmap Epigenomic Project was established to generate
epigenomic data from the primary cells and tissues from both healthy individuals and patients with diseases (e.g. cancer, neurodegenerative and autoimmune disease)
• To the right are the tissues and cell types profiled
• 127 human tissues and cell types
PMID: 25693563
Cell Types in Roadmap• Many of the adult tissues investigated were broken down by cell type or region
• e.g. blood into several types of immune cell, and the brain into regions including the hippocampus and dorsolateral prefrontal cortex
PMID: 25693562
profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression
Data Sets Available in Roadmap
Many more cell/tissue groups…
To see full list: http://www.nature.com/nature/journal/v518/n7539/fig_tab/nature14248_F2.html
111 reference epigenomes from Roadmap with 16 additional epigenomesfrom ENCODE
International Human EpigenomeConsortium (IHEC)
• Roadmap represents an early component of IHEC• IHEC plans to determine the epigenomes of every cell type in the human body
— estimated to about a thousand
• IHEC brings together data from several consortiums, including:• ENCODE• Roadmap • BLUEPRINT (http://www.blueprint-epigenome.eu/)- aim to decipher the
epigenomes of more than 100 different types of blood cells• Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC)
Using these Reference Data Sets
• What questions can these data sets answer?
• What questions can they not answer?
• How can we use this data to inform our study questions?
Lets Explore the Data
• There are a couple of ways to view and download this reference data
• I am going to provide a few examples
• Human Epigenome Atlas provides interactive Visualization and Download
• Not going to focus on download because there are many ways to view the data online
• http://www.genboree.org/epigenomeatlas/index.rhtml
click
Epigenetic Data BrowsersGetting an Idea of Content
UCSC Genome Browser
• Very user friendly, can search for a region or a gene
• Directions:• Go to: http://genome.ucsc.edu/• Click on “Genome Browser”• Search a favorite gene, click
“submit”
• Things to try:• zoom• Adding tracks• getting track info
click
Viewing the Roadmap Data
• From Genboree you can export the Roadmap data to Genome Browser, but there are two browsers from WashU that are great for view this data
• WashU Epigenome Browser• Supports multiple organisms, visualizes chromatin-interaction
data (e.g. Hi-C), performs gene set view, gene plot, and many other capabilities
• The Roadmap Epigenome Browser• Powered by the WashU Epigenome Browser, but specific to the
Roadmap data
WashU Epigenome Browser
• http://epigenomegateway.wustl.edu/browser/• Click on “Human h19”• Then click on “Public Hubs”• Then click “Reference human epigenomes from
Roadmap Epigenomics Consortium”
click
WashU Epigenome Browser
• Then click “Load” Roadmap Data from GEO• Then click “Loaded” to select the samples to view
click
WashU Epigenome Browser
• Can select and view all epigenetics marks for one tissue
• Walk through together
• Can view one type of epigenetic mark for multiple tissues
• Prefer to do this in Roadmap EpigenomeBrowser
Roadmap Epigenome Browser
• http://epigenomegateway.wustl.edu/browser/roadmap/
• Click on “h19” and then “Load”
• Can then select epigenetic mark
• Can select region or gene to interrogate
• Click submit
Select Region
Select mark
Can see how the data clusters
Exploring SNP data in Ensembl
• http://useast.ensembl.org/index.html• Can also explore in dbSNP but I prefer this interface
• Put rs number in search – example: rs1801133
Exploring SNP data in Ensembl
• Can explore population genetics from 1000 genomes project
• Can look at LD • Must select a
reference population
Exploring Other Public Data in NCBI Epigenomics• http://www.ncbi.nlm.nih.gov/epigenomics• Can browse experiments or samples and then view results
Exploring Other Public Data in NCBI Epigenomics• Choose “Browse Experiments”• Choose species, biological source and all features• “Select all” Experiment IDs• Click “View on Genome”
Exploring Other Public Data in NCBI Epigenomics
• Can view in UCSC Genome Browser
• Click “View at UCSC”
Exploring Other Public Data: GEO
GEO: Gene Expression Omnibus• A public functional genomics data repository• Both array and sequencing data stored
• Visit GEO DataSet Site: http://www.ncbi.nlm.nih.gov/gds/?term=
• Can search a research question of interest
• Many ways to download data into R – can download processed data or raw data
• Can download directly from GEO• Can use R library GEOquery• Both minfi and RnBeads have functions to download GEO data and
format for their specific purpose
Task: Replicating some Roadmap ResultsPracticing with these viewers and data sets
Brain Epigenomics
• Paper: Dissecting neural differentiation regulatory networks through epigenetic footprinting
• One of the papers published in the February issue of Nature
• Going to replicate some of the results presented in Supplementary Figure 1 and Figure 1
Brain Epigenomics: Supp. Figure 1
• Identified 3,396 differentially expressed genes between undifferentiated ES cells and the first four neural progenitor cell stages
• Pluripotency-associated genes such as OCT4 and NANOG are downregulated
• We’re going to look at H9 cells, H9 derived neuronal progenitor cells, and H9 derived neuron culture cells
• Data not available yet for all the cell types presented in paper
Brain Epigenomics: Supp. Figure 1
• Go to: http://epigenomegateway.wustl.edu/browser/• Going to load expression data for these 3 cell types
• Go to Human hg19• Go to Public Hubs• Click on “Roadmap Epigenomics Interactive Analysis Hub”• Load “Complete Consolidated Dataset”• Load Expression Data (under ES/iPS cells) for:
• H9 cells• H9 derived neuronal progenitor cells• H9 derived neuron culture cells
• Explore some of the genes in Supplementary Figure 1 (e.g. NANOG)
Brain Epigenomics: Supp. Figure 1
• Look at some of the genes in the figure all together by creating a gene set
• Click “Apps”• Select Gene and region set• Upload the list of genes in the
comments of slide• Can try “Gene set view” to see
all the genes at once• Under apps look at “Scatter
plot” to compare to plots
Brain Epigenomics: Figure 1Going to replicate part of this figure
Brain Epigenomics: Figure 1
• From the previous exercise we can see the downregulation of NANOG in the derived cells
• Now can add in additional epigenetic marks• Do they seem to be associated with regulation?
• Can also try to recreate the Figure 1 for SOX2 • Use H9 derived neuronal progenitor cell
Exploring Epigenomic Annotation of Genetic Variants• Another paper published in February Issue
• Epigenomic annotation of genetic variants using the Roadmap Epigenome Browser
• Going to look at multiple sclerosis–associated SNPs identified that are annotated using epigenomic and expression data from 31 primary human tissues (orange) and cells (light green).
• The region associated with rs756699 has H3K4me1 mostly confined to immune-related cell types (solid black box).
• The closest gene, TCF7 (3.8 kb downstream), also shows high expression in the same group of cell types
• The region surrounding rs307896 has H3K4me1 signal in all tissues and cell types (dashed black box).
• SNP rs307896 lies in an intron of SAE1, a gene that is also expressed in all the samples
Exploring Epigenomic Annotation of Genetic Variants
Exploring Epigenomic Annotation of Genetic Variants
• http://epigenomegateway.wustl.edu/browser/roadmap/• Can also do this in WashU Epigenome Browser, but this has a
nice clustering capability• Add H3K9me1 and RNA-seq (have to use search)
• Load data• Clink hg19 to select the samples – select primary cells and
tissues • In window click “chr” to select the gene TCF7, zoom out• Click the track header (e.g. “H3K4me1”) to add
“Annotation track”• Go to population variation• Add dbSNP