june 8, 2016 encode 2016: research applications and users … · 2016-06-07 · encode chip-seq tf...
TRANSCRIPT
![Page 1: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/1.jpg)
The ENCODE Encyclopedia
Zhiping WengUniversity of Massachusetts Medical School
ENCODE 2016: Research Applications and Users MeetingJune 8, 2016
![Page 2: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/2.jpg)
The Encyclopedia Of DNA Elements Consortium
Goals:
- Catalog all functional
elements in the genome
- Develop freely available
resource for research
community
- Study human and mouse
![Page 3: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/3.jpg)
Overview of The ENCODE Consortium
GeneModels
RNA TFBinding
Data CoordinationCenter
Data Analysis Center Analysis Working Group
ElementID
ChromatinStates
HistoneMods
DNase DNAme RBPBinding
Computational Analysis Groups
Technology Development Groups
Data Production Groups
The ENCYCLOPEDIA Slide from Mike Pazin, NHGRI3
![Page 4: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/4.jpg)
encodeproject.org
![Page 5: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/5.jpg)
encodeproject.org
![Page 6: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/6.jpg)
6
![Page 7: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/7.jpg)
• Typically derived directly from the experimental data
• Data produced from ENCODE 3 uniform processing pipelines: e.g. peaks and expression quantification
Ground Level Annotations
Ground Level Annotations
![Page 8: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/8.jpg)
Gene Expression (RNA-seq)
Ground Level Annotations
Data produced by: Gingeras, Wold, Lecuyer, Hardison, Graveley
Visualization by: Feng Yue
The expression levels of genes annotated by GENCODE 19 in ~60 human cell types.
![Page 9: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/9.jpg)
Transcription Factor Binding (TF ChIP-seq)
Ground Level Annotations
Data produced by: Snyder, Myers, Bernstein, Farnham, Stam, Iyer, White, Ren, Struhl, Weissman, Hardison, Wold, Fu
Visualization by: Weng
Peaks (enriched genomic regions) of TFs computed from ~900 human and mouse ChIP-seq experiments.
![Page 10: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/10.jpg)
Factorbook: Motivation• Visualizes summarized data centered on TFs
• not easily shown in a genome browser• includes a number of useful analyses and statistical
information• Average histone profiles
• Motifs
• Heat maps
• Transcription Factor (TF)-centric repository of all ENCODE ChIP-seq datasets on TF-binding regions
• Will also visualize ChIP-seq Histone and DNase-seq datasets from ENCODE and ROADMAP soon!
10 Factorbook 10
![Page 11: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/11.jpg)
ENCODE ChIP-seq TF Datasets• Human:
• 837 ChIP-seq TF datasets• 167 TFs
• 104 cell types
• Mouse: • 170 ChIP-seq TF datasets
• 51 TFs
• 26 cell types
Last data import: February 29, 2016
11 Factorbook 11
![Page 12: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/12.jpg)
Function
• brief overview of molecular function of TF
• 3D protein structure of TF (if available)
• distilled from RefSeq, Gene Card, and wikipedia
• links to external resources
12 Factorbook 12
![Page 13: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/13.jpg)
Average Histone Profiles
• +/- 2kb (inclusive) window around peak summits
• separated by distance to the nearest annotated transcription start site
• proximal profiles have peaks within 1 kb of a TSS
• distal profiles have all other peaks
Factorbook 13
![Page 14: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/14.jpg)
Average Nucleosome Profiles
• show effect of binding of TFs on regional positioning of nucleosomes
• +/- 2kb (inclusive) window around peak summits • red lines within 1 kb of a TSS• blue lines represent all other peaks
• data from GM12878 and K562 MNase-seq
14 Factorbook 14
![Page 15: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/15.jpg)
Motif Enrichment• sequences of the top 500 TF ChIP-seq peaks were
used to identify enriched motifs de novo• MEME-ChIP
• top 5 motifs shown
15 Factorbook 15
![Page 16: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/16.jpg)
Motif Filtering
Automatically filter out motifs that may not be biologically significant
16 Factorbook 16
![Page 17: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/17.jpg)
Histone and TF Heat Maps
• compare a given TF in a specific cell type against the histone marks and other TFs in same cell type
• Pearson correlation value also shown (“r”)
• histone marks• enrichment represented in a normalized scale over a 10kb window
centered on the peak summit
• TFs• binding strengths are represented in a normalized scale over a 2kb
window, also centered on the peak summit
each column in a heat map indicates a ChIP-seq peak of the currently selected (“pivot”) TF
Columns for the “pivot” TF are sorted (left-to-right) in descending order of ChIP-seq signal
17 Factorbook 17
![Page 18: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/18.jpg)
Histone Mark Enrichment (ChIP-seq)
Ground Level Annotations
Data produced by: Ren, Bernstein, Stam, Farnham, Hardison, Snyder, Wold, Weissman
Peaks of a variety of histone marks computed from ~600 ChIP-seq experiments.
![Page 19: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/19.jpg)
Open chromatin (DNase-seq)
Ground Level Annotations
DNase I hypersensitive sites (also known as DNase-seq peaks) computed from ~300 human and mouse experiments.
Data produced by: Stam, Crawford, Hardison
![Page 20: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/20.jpg)
Topologically associating domains (TADs) and Compartments (Hi-C)
Ground Level Annotations
Data produced by: Dekker
![Page 21: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/21.jpg)
Promoter-enhancer links (ChIA-PET)
Ground Level Annotations
Data produced by: Snyder, Ruan
Links between promoters and distal regulatory elements such as enhancers computed from 8 ChIA-PET experiments.
![Page 22: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/22.jpg)
RNA Binding Protein Occupancy (eCLIP-seq)
Ground Level Annotations
Peaks computed from eCLIP-seq data in human cell lines K562 and HepG2 for a large number of RNA Binding Proteins (RBPs).
Data produced by: Yeo
![Page 23: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/23.jpg)
Middle Level Annotations
Middle Level Annotations
• Integrate multiple types of experimental data and ground level annotations
![Page 24: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/24.jpg)
Goals for Predicting Enhancer-like Regions
24
• Develop an unsupervised method applicable to both human and mouse
• Incorporate different epigenomic datasets such as DNase-seq and H3K27ac
• Apply method to as many cell and tissue types as possible
Middle Level Annotations
![Page 25: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/25.jpg)
Rationale for Developing Methods in Mouse
25
• Rich matrix of data of uniformly processed data:
- Histone modification ChIP-seq (Bing Ren)
- RNA-seq (Barbara Wold)
- DNA methylation (Joe Ecker)
- DNase-seq (John Stam)
Middle Level Annotations
![Page 26: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/26.jpg)
11.5 13.5 14.5 15.5 16.6 0
Facial Prominence
Forebrain
Heart
Hindbrain
Intestine
Kidney
Limb
Liver
Lung
Midbrain
Neural Tube
Stomach
ENCODE Data for Embryonic Mouse
H3K27ac DNase + H3K27acH3K27ac Data - Bing RenDNase Data – John StamMiddle Level Annotations
![Page 27: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/27.jpg)
Rationale for Developing Methods in Mouse
27
• Rich matrix of data of uniformly processed data:
- Histone Modification ChIP-seq (Bing Ren)
- RNA-seq (Barbara Wold)
- DNA Methylation (Joe Ecker)
- DNase-seq (John Stam)
• Experimental validations of enhancers in embryonic mice:
- VISTA Database (Len Penacchio & Axel Visel)
Middle Level Annotations
![Page 28: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/28.jpg)
• Over 2,000 total tested regions
• Over 200 active enhancers in limb, brain sub regions, and heart
Visel, …, Pennacchio (2009) NaturePennacchio,…, Rubin (2006) Nature
28Middle Level Annotations
![Page 29: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/29.jpg)
VISTA Database: Examples
Middle Level Annotations
![Page 30: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/30.jpg)
How to Center Predictions?
DNase Peaks H3K27ac Peaks
How to Rank Peaks?
p-value signal multiple signals:DNA Methylation
H3K4me1/2/3Middle Level Annotations
![Page 31: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/31.jpg)
VISTA Positive VISTA Negative
Overlaps Peak True Positive False Positive
Does Not Overlap Peak
False Negative True Negative
31
Enhancer-like Region Prediction Method
Middle Level Annotations
![Page 32: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/32.jpg)
Midbrain Predictions Centered on DNase Peaks
32Middle Level Annotations
![Page 33: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/33.jpg)
Results
33
• Centering predictions on DNase peaks results in better performance than centering on H3K27ac peaks
• Incorporating additional data such as DNA methylation and/or H3K4me1/2/3 signal did not improve performance
Middle Level Annotations
![Page 34: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/34.jpg)
Enhancer-like Region Prediction Method
Middle Level Annotations
![Page 35: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/35.jpg)
Enhancer-like Region Prediction Method
Middle Level Annotations
![Page 36: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/36.jpg)
Enhancer-like Region Prediction Method
Middle Level Annotations
![Page 37: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/37.jpg)
Enhancer-like Region Prediction Method
Middle Level Annotations
![Page 38: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/38.jpg)
Enhancer-like Region Prediction Method
Middle Level Annotations
![Page 39: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/39.jpg)
Enhancer-like Region Prediction Method
Middle Level Annotations
![Page 40: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/40.jpg)
Enhancer-like Region Prediction Method
Middle Level Annotations
![Page 41: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/41.jpg)
Enhancer-like Region Prediction Method
Middle Level Annotations
![Page 42: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/42.jpg)
Example - Neural Tube (e11.5) Enhancer
42Middle Level Annotations
![Page 43: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/43.jpg)
Goals for Prediction Promoter-like Regions
43
• Develop an unsupervised method applicable to both human and mouse
• Incorporate different epigenomic datasets such as DNase-seq, H3K4me3, and/or H3K27ac
• Apply method to as many cell and tissue types as possible
Middle Level Annotations
![Page 44: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/44.jpg)
Promoter Prediction Method
Middle Level Annotations
GeneExpression
(FPKM)Ranked
Expression
Rank by H3K4me3
Signal
Rank by DNase Signal
Rank by H3K27ac
Signal
Gene A 3421 1 1 8 145
Gene B 2329 2 7 345 985
Gene C 432 3 4 2 217
... ... ... ... ... ...
Using a linear model, which features of proximal DNase peaks are most predictive of ranked expression?
![Page 45: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/45.jpg)
H3K4me3 Signal Only
Middle Level Annotations
![Page 46: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/46.jpg)
DNase Signal Only
Middle Level Annotations
![Page 47: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/47.jpg)
H3K27ac Signal Only
Middle Level Annotations
![Page 48: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/48.jpg)
Best Method: H3K4me3 Signal + 0.28 * DNase Signal
Middle Level Annotations
![Page 49: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/49.jpg)
Example - Promoters in GM12878
49Middle Level Annotations
![Page 50: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/50.jpg)
Visualization of Enhancer-like and Promoter-like Regions
50Middle Level Annotations
![Page 51: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/51.jpg)
Demo
zlab-annotations.umassmed.edu
• Proof of concept for enhancer-like and promoter-like visualization
• Seeking feedback from community
• Provide a sample site for DCC to implement
Encyclopedia 51
![Page 52: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/52.jpg)
Encyclopedia 52
1) Choose genome
2) Enter gene, SNP, genomic coordinate
3) Choose cell types
4) View tracks in UCSCOr WashU Genome Browser →
![Page 53: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/53.jpg)
DNase and H3K27ac signal tracks
Candidate enhancertracks
Conservation
Encyclopedia 53
![Page 54: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/54.jpg)
Encyclopedia 54
Select cell types based uponintersection of
search coordinates with peak bed files
Candidate enhancerspredicted using
• DNase + H3K27ac• DNase-only• H3K27ac-only
Tissue of origin based uponENCODE ontology information
![Page 55: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/55.jpg)
ENCODE DCCMatrix view
Encyclopedia 55www.encodeproject.org/matrix/?type=Experiment
![Page 56: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/56.jpg)
Top Level Annotations
Top Level Annotations
• Integrate a broad range of experimental data, as well as ground and middle level annotations
![Page 57: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/57.jpg)
Chromatin states
Top Level Annotations
Visualization by: Kellis
epilogos.broadinstitute.org
![Page 58: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/58.jpg)
Variant Annotation
Top Level Annotations
Visualization by: Snyder, Kellis, Gerstein
![Page 59: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/59.jpg)
Predicting Target Genes of Enhancers
1. Create benchmark dataset for method comparison
2. Evaluate correlation based methods
3. Integrate additional data to improve performance
4. Input from ENCODE groups & comparison of other methods
59Top Level Annotations
![Page 60: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/60.jpg)
Part I: Creating a Benchmark Dataset
60Top Level Annotations
![Page 61: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/61.jpg)
Promoter Capture Hi-C
Mifsud, …, Osborne (2015) Nature Genetics
Pros:• Thousands more high
resolution links than previous Hi-C datasets
Cons:• Links may not
represent functional contacts
~50,000 Enhancer-Gene links overlap enhancer-like regions
61Top Level Annotations
![Page 62: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/62.jpg)
Integrating Additional Datasets- GM12878
• ChIA-PET from the Snyder lab targeting RAD21 in GM12878
• eQTLs in lymphoblastoid cells curated by the Kellis Lab in HaploReg (also included LD SNPs r2 > 0.8)
• Hi-C (high resolution) loops in GM12878 from Aiden lab1
621. Rao, …, Aiden (2014) Cell
Top Level Annotations
![Page 63: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/63.jpg)
Overlap of Datasets with Promoter Capture Links
In total: 1,372
Replicated Links
Promoter Capture
Links
63*require one link end to contain only enhancer-like regions and other link end to contain TSSs for only one gene
*
*
![Page 64: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/64.jpg)
Distance Between Enhancers and Genes
64Top Level Annotations
![Page 65: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/65.jpg)
Determining the Negatives
For all enhancer-like regions with at least one positive link, select all genes that meet the following requirements:
#1 – Genes must be within 500Kb
#2 – Genes cannot be linked in any individual dataset (i.e. exclude enhancer-gene pairs with evidence from only one datatype)
65Top Level Annotations
![Page 66: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/66.jpg)
Positive: 335 Negative: 6,381
Positive: 335 Negative: 6,381
Positive: 670 Negative: 12,763
1,340
25,525
Training Set
Validation Set
Testing Set
Positives
Negatives
Dividing Links into Training, Validation, & Testing Sets
~5% of Cases are Positive
66Top Level Annotations
![Page 67: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/67.jpg)
Part II: Evaluation of Correlation Methods
67Top Level Annotations
![Page 68: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/68.jpg)
Correlation – Tested Parameters
• Raw signal vs Z-score normalized signal
• DNase signal vs H3K27ac signal
• ENCODE datasets vs. Roadmap datasets
• Pearson vs Spearman correlation
• Rank by correlation coefficient vs permutation p-value1
1. Method adapted from Sheffield, …, Furey (2013) Genome Research68
Top Level Annotations
![Page 69: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/69.jpg)
ROC - Correlation Methods
Ave Rank of DNase & H3K27ac, AUROC = 0.76DNase, Normalized Signal, AUROC = 0.73
DNase, Raw Signal, AUROC = 0.67H3K27ac, Normalized Signal, AUROC = 0.70
H3K27ac, Raw Signal, AUROC = 0.62
69Top Level Annotations
![Page 70: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/70.jpg)
PR - Correlation Methods
70
Ave Rank of DNase & H3K27ac, AUPR = 0.13DNase, Normalized Signal, AUPR = 0.12
DNase, Raw Signal, AUPR = 0.09H3K27ac, Normalized Signal, AUPR = 0.11
H3K27ac, Raw Signal, AUPR = 0.08
Top Level Annotations
![Page 71: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/71.jpg)
In Some Cases Correlation Accurately Predicts Links
DNase
H3K27ac
H3K4me3
RNA-seq
Genes
Enhancer-GeneLinks
Enhancer-likeRegions
TLR10
Important to Innate Immune System71
Top Level Annotations
![Page 72: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/72.jpg)
GM12878
Average H3K27ac Signal Across Enhancer-like Region
Ave
rage
H3K
27ac
Sig
nal
Acr
oss
TLR
10 T
SS
In Some Cases Correlation Accurately Predicts Links
72Top Level Annotations
![Page 73: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/73.jpg)
In Many Cases Correlation Does Not Accurately Predict Links
DNase
H3K27ac
H3K4me3
RNA-seq
Genes
Enhancer-GeneLinks
Enhancer-likeRegions
ELL
Elongation Factor Pol II
73Top Level Annotations
![Page 74: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/74.jpg)
GM12878
Average H3K27ac Signal Across Enhancer-like Region
Ave
rage
H3K
27ac
Sig
nal
Acr
oss
ELL
TSS
In Many Cases Correlation Does Not Accurately Predict Links
74Top Level Annotations
![Page 75: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/75.jpg)
Incorporating Distance Information
Distance is an important feature in predicating enhancer-gene links, but using a hard cutoff (e.g. 100Kb) results in missing 1/3 of links
We instead tested:
• Ranking by distance
• Average rank of distance and best performing correlation method (average rank of DNase and H3K27ac)
75Top Level Annotations
![Page 76: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/76.jpg)
Incorporating Distance Improves Performance
Best Correlation, AUROC = 0.76Distance Only, AUROC = 0.84
Average Rank Distance & Correlation, AUROC = 0.88
76Top Level Annotations
![Page 77: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/77.jpg)
Incorporating Distance Improves Performance
Best Correlation, AUPR = 0.13Distance Only, AUPR = 0.18
Average Rank Distance & Correlation, AUPR = 0.32
77Top Level Annotations
![Page 78: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/78.jpg)
Part II: Conclusions
• For correlation analysis:
- DNase slightly outperforms H3K27ac
- It is better to use Z-score normalized signal over raw signal
- Pearson correlation coefficient out performs Spearman
- Ranking by correlation coefficient outperforms ranking by p-value (and is much faster!)
• Incorporating distance information dramatically increases performance
78Top Level Annotations
![Page 79: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/79.jpg)
Part III: Developing Random Forest Model
79Top Level Annotations
![Page 80: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/80.jpg)
Developing Two Random Forest Models
Can be applied across all cell and tissue types
80Top Level Annotations
![Page 81: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/81.jpg)
Minimal Model Features
• Minimum distance between enhancer and gene TSS
• Average conservation across enhancer and promoter
• Average DNase Signal across enhancer and promoter
• Average H3K27ac Signal across enhancer and promoter
• Correlation of K-mers (tested 3-6mer)
• Using signals across multiple cell and tissue types:- Correlation of DNase signal- Mean and standard deviation of DNase signal- Correlation of H3K27ac Signal- Mean and standard deviation of H3K27ac signal
81Top Level Annotations
![Page 82: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/82.jpg)
ROC – Random Forest Minimal Model
Correlation, AUROC = 0.76Average Rank Distance & Correlation, AUROC = 0.88
Random Forest Minimal Model, AUROC = 0.92
82Top Level Annotations
![Page 83: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/83.jpg)
PR – Random Forest Minimal Model
Correlation, AUPR = 0.13Average Rank Distance & Correlation, AUPR = 0.32
Random Forest Minimal Model, AUPR = 0.45
83Top Level Annotations
![Page 84: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/84.jpg)
Feature Importance - Minimal Model
Feature Importance 84Top Level Annotations
![Page 85: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/85.jpg)
Comprehensive Model Features
• Minimal model features
• Gene expression & RAMPAGE Peaks
• Signal from other Histone Marks (H3K4me1/2/3, H3K27me3, H3K36me3)
• TF peaks signal (Pol2, p300, CTCF)
85Top Level Annotations
![Page 86: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/86.jpg)
ROC – Random Forest with Gene Expression
Correlation, AUROC = 0.76Average Rank Distance & Correlation, AUROC = 0.88
Random Forest Minimal Model, AUROC = 0.92Random Forest w/ Gene Expression, AUROC = 0.93
86Top Level Annotations
![Page 87: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/87.jpg)
PR – Random Forest with Gene Expression
Correlation, AUPR = 0.13Average Rank Distance & Correlation, AUPR = 0.32
Random Forest Minimal Model, AUPR = 0.45Random Forest w/ Gene Expression, AUPR = 0.52
87Top Level Annotations
![Page 88: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/88.jpg)
88
Feature Importance – RF with Gene Expression
Feature ImportanceTop Level Annotations
![Page 89: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/89.jpg)
• In corporate additional training and testing data, such as massively parallel reporter assays and STARR-seq
• Retest additional features when training set is large
• Prediction of target genes remains a major challenge.
• We also would like to define other types of regulatory elements.
Top Level Annotations
Future Directions
![Page 90: June 8, 2016 ENCODE 2016: Research Applications and Users … · 2016-06-07 · ENCODE ChIP-seq TF Datasets •Human: •837 ChIP-seq TF datasets •167 TFs •104 cell types •Mouse:](https://reader033.vdocument.in/reader033/viewer/2022050509/5f99e275f1e42a799176934f/html5/thumbnails/90.jpg)
Acknowledgements
Weng LabJill MooreMichael PurcaroArjan van der VeldeTyler BorrmanHenry PrattSowmya IyerJie Wang
Stam LabJohn Stamatoyannopoulos Bob ThurmanRichard Sandstrom
Gerstein Lab Mark Gerstein Anurag Sethi
ENCODE ConsortiumBrad BernsteinRoss HardisonLen PennacchioAxel ViselBing RenAnshul KundajeData Production Groups
90