using public databases to inform research questions

34
Using Public Databases to Inform Research Questions Day 1

Upload: amlbinder

Post on 16-Apr-2017

282 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Using public databases to inform research questions

Using Public Databases to Inform Research Questions

Day 1

Page 2: Using public databases to inform research questions

Large Reference Epigenome Projects Understanding Natural Variation

Page 3: Using public databases to inform research questions

Understanding Epigenetic Variation

• International consortia have pursued the establishment of reference epigenomes for a large number of cell types and conditions

• Epigenome Roadmap published in Nature February 2015

• main findings of the NIH Roadmap EpigenomicsProgram

Page 4: Using public databases to inform research questions

The Beginnings

• ENCODE (the Encyclopedia of DNA Elements) started in 2003 to identify functional elements in the non-coding parts of the genome

• Understand how the genome is packaged, regulated, and read

• In 2012 Nature, Genome Research and Genome Biologypublished 30 papers all together on the results of the ENCODE project

• While ENCODE transformed technology and data analysis, clinical application was limited

• Most results were from a small number of cell lines

ENCODE Data types:

• After the human genome had been mapped, it was clear the epigenomeneeded to be explored as well

https://www.encodeproject.org/

Page 5: Using public databases to inform research questions

ENCODE to Roadmap• Funded by NIH, the Roadmap Epigenomic Project was established to generate

epigenomic data from the primary cells and tissues from both healthy individuals and patients with diseases (e.g. cancer, neurodegenerative and autoimmune disease)

• To the right are the tissues and cell types profiled

• 127 human tissues and cell types

PMID: 25693563

Page 6: Using public databases to inform research questions

Cell Types in Roadmap• Many of the adult tissues investigated were broken down by cell type or region

• e.g. blood into several types of immune cell, and the brain into regions including the hippocampus and dorsolateral prefrontal cortex

PMID: 25693562

profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression

Page 7: Using public databases to inform research questions

Data Sets Available in Roadmap

Many more cell/tissue groups…

To see full list: http://www.nature.com/nature/journal/v518/n7539/fig_tab/nature14248_F2.html

111 reference epigenomes from Roadmap with 16 additional epigenomesfrom ENCODE

Page 8: Using public databases to inform research questions

International Human EpigenomeConsortium (IHEC)

• Roadmap represents an early component of IHEC• IHEC plans to determine the epigenomes of every cell type in the human body

— estimated to about a thousand

• IHEC brings together data from several consortiums, including:• ENCODE• Roadmap • BLUEPRINT (http://www.blueprint-epigenome.eu/)- aim to decipher the

epigenomes of more than 100 different types of blood cells• Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC)

Page 9: Using public databases to inform research questions

Using these Reference Data Sets

• What questions can these data sets answer?

• What questions can they not answer?

• How can we use this data to inform our study questions?

Page 10: Using public databases to inform research questions

Lets Explore the Data

• There are a couple of ways to view and download this reference data

• I am going to provide a few examples

• Human Epigenome Atlas provides interactive Visualization and Download

• Not going to focus on download because there are many ways to view the data online

• http://www.genboree.org/epigenomeatlas/index.rhtml

click

Page 11: Using public databases to inform research questions

Epigenetic Data BrowsersGetting an Idea of Content

Page 12: Using public databases to inform research questions

UCSC Genome Browser

• Very user friendly, can search for a region or a gene

• Directions:• Go to: http://genome.ucsc.edu/• Click on “Genome Browser”• Search a favorite gene, click

“submit”

• Things to try:• zoom• Adding tracks• getting track info

click

Page 13: Using public databases to inform research questions

Viewing the Roadmap Data

• From Genboree you can export the Roadmap data to Genome Browser, but there are two browsers from WashU that are great for view this data

• WashU Epigenome Browser• Supports multiple organisms, visualizes chromatin-interaction

data (e.g. Hi-C), performs gene set view, gene plot, and many other capabilities

• The Roadmap Epigenome Browser• Powered by the WashU Epigenome Browser, but specific to the

Roadmap data

Page 14: Using public databases to inform research questions

WashU Epigenome Browser

• http://epigenomegateway.wustl.edu/browser/• Click on “Human h19”• Then click on “Public Hubs”• Then click “Reference human epigenomes from

Roadmap Epigenomics Consortium”

click

Page 15: Using public databases to inform research questions

WashU Epigenome Browser

• Then click “Load” Roadmap Data from GEO• Then click “Loaded” to select the samples to view

click

Page 16: Using public databases to inform research questions

WashU Epigenome Browser

• Can select and view all epigenetics marks for one tissue

• Walk through together

• Can view one type of epigenetic mark for multiple tissues

• Prefer to do this in Roadmap EpigenomeBrowser

Page 17: Using public databases to inform research questions

Roadmap Epigenome Browser

• http://epigenomegateway.wustl.edu/browser/roadmap/

• Click on “h19” and then “Load”

• Can then select epigenetic mark

• Can select region or gene to interrogate

• Click submit

Select Region

Select mark

Page 18: Using public databases to inform research questions

Can see how the data clusters

Page 19: Using public databases to inform research questions

Exploring SNP data in Ensembl

• http://useast.ensembl.org/index.html• Can also explore in dbSNP but I prefer this interface

• Put rs number in search – example: rs1801133

Page 20: Using public databases to inform research questions

Exploring SNP data in Ensembl

• Can explore population genetics from 1000 genomes project

• Can look at LD • Must select a

reference population

Page 21: Using public databases to inform research questions

Exploring Other Public Data in NCBI Epigenomics• http://www.ncbi.nlm.nih.gov/epigenomics• Can browse experiments or samples and then view results

Page 22: Using public databases to inform research questions

Exploring Other Public Data in NCBI Epigenomics• Choose “Browse Experiments”• Choose species, biological source and all features• “Select all” Experiment IDs• Click “View on Genome”

Page 23: Using public databases to inform research questions

Exploring Other Public Data in NCBI Epigenomics

• Can view in UCSC Genome Browser

• Click “View at UCSC”

Page 24: Using public databases to inform research questions

Exploring Other Public Data: GEO

GEO: Gene Expression Omnibus• A public functional genomics data repository• Both array and sequencing data stored

• Visit GEO DataSet Site: http://www.ncbi.nlm.nih.gov/gds/?term=

• Can search a research question of interest

• Many ways to download data into R – can download processed data or raw data

• Can download directly from GEO• Can use R library GEOquery• Both minfi and RnBeads have functions to download GEO data and

format for their specific purpose

Page 25: Using public databases to inform research questions

Task: Replicating some Roadmap ResultsPracticing with these viewers and data sets

Page 26: Using public databases to inform research questions

Brain Epigenomics

• Paper: Dissecting neural differentiation regulatory networks through epigenetic footprinting

• One of the papers published in the February issue of Nature

• Going to replicate some of the results presented in Supplementary Figure 1 and Figure 1

Page 27: Using public databases to inform research questions

Brain Epigenomics: Supp. Figure 1

• Identified 3,396 differentially expressed genes between undifferentiated ES cells and the first four neural progenitor cell stages

• Pluripotency-associated genes such as OCT4 and NANOG are downregulated

• We’re going to look at H9 cells, H9 derived neuronal progenitor cells, and H9 derived neuron culture cells

• Data not available yet for all the cell types presented in paper

Page 28: Using public databases to inform research questions

Brain Epigenomics: Supp. Figure 1

• Go to: http://epigenomegateway.wustl.edu/browser/• Going to load expression data for these 3 cell types

• Go to Human hg19• Go to Public Hubs• Click on “Roadmap Epigenomics Interactive Analysis Hub”• Load “Complete Consolidated Dataset”• Load Expression Data (under ES/iPS cells) for:

• H9 cells• H9 derived neuronal progenitor cells• H9 derived neuron culture cells

• Explore some of the genes in Supplementary Figure 1 (e.g. NANOG)

Page 29: Using public databases to inform research questions

Brain Epigenomics: Supp. Figure 1

• Look at some of the genes in the figure all together by creating a gene set

• Click “Apps”• Select Gene and region set• Upload the list of genes in the

comments of slide• Can try “Gene set view” to see

all the genes at once• Under apps look at “Scatter

plot” to compare to plots

Page 30: Using public databases to inform research questions

Brain Epigenomics: Figure 1Going to replicate part of this figure

Page 31: Using public databases to inform research questions

Brain Epigenomics: Figure 1

• From the previous exercise we can see the downregulation of NANOG in the derived cells

• Now can add in additional epigenetic marks• Do they seem to be associated with regulation?

• Can also try to recreate the Figure 1 for SOX2 • Use H9 derived neuronal progenitor cell

Page 32: Using public databases to inform research questions

Exploring Epigenomic Annotation of Genetic Variants• Another paper published in February Issue

• Epigenomic annotation of genetic variants using the Roadmap Epigenome Browser

• Going to look at multiple sclerosis–associated SNPs identified that are annotated using epigenomic and expression data from 31 primary human tissues (orange) and cells (light green).

• The region associated with rs756699 has H3K4me1 mostly confined to immune-related cell types (solid black box).

• The closest gene, TCF7 (3.8 kb downstream), also shows high expression in the same group of cell types

• The region surrounding rs307896 has H3K4me1 signal in all tissues and cell types (dashed black box).

• SNP rs307896 lies in an intron of SAE1, a gene that is also expressed in all the samples

Page 33: Using public databases to inform research questions

Exploring Epigenomic Annotation of Genetic Variants

Page 34: Using public databases to inform research questions

Exploring Epigenomic Annotation of Genetic Variants

• http://epigenomegateway.wustl.edu/browser/roadmap/• Can also do this in WashU Epigenome Browser, but this has a

nice clustering capability• Add H3K9me1 and RNA-seq (have to use search)

• Load data• Clink hg19 to select the samples – select primary cells and

tissues • In window click “chr” to select the gene TCF7, zoom out• Click the track header (e.g. “H3K4me1”) to add

“Annotation track”• Go to population variation• Add dbSNP