diffreps: automated chip-seq differential analysis package
DESCRIPTION
diffReps is published in PLoS ONE. Link: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0065598TRANSCRIPT
diffReps: automated ChIP-seq differential analysis package
Li ShenAsst. Professor
Neuroscience, Mount Sinai06/28/2013
Slides adapted from previous presentation
2
ChIP-seq differential analysisTreatment(coc i.p.)
Control(sal i.p.)
Rep1Rep2
Rep3
Rep1Rep2
Rep3
Differences
Venn diagram for peak lists
Treatment Control
False positive
False negativeTreatment Control
3
Subtle changes of chromatin modifications
H3K4me3 from ENCODE
K562
ESC
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
ASUN: Asunder, Spermatogenesis Regulator
[0, 1.2]
[0, 1.2]
4
Existing programs for differential analysis
• ChIPDiff(2008): HMM-based approach. NOT sensitive enough for brain data.
• Peak-based: DIME(2011), DBChIP(2012). Caveats.
• Read counts + DESeq(2010)/edgeR(2010): Not convenient to use.
K562
ESC
Peaks
5
diffReps: a ChIP-seq differential analysis package
• Written in PERL, easy to use command line tool; Do everything in one command.
• Sliding window strategy.
Background modeling
Normalization
Differential test
Merge and re-test
Multiple testing
correction
Workflow
diffReps.pl -tr A.bed B.bed -co C.bed D.bed -gn mm9 -re report.txt
Google code:
6
Differential analysis & tail behavior
Gaussian: p=1E-5
Empirical: p=1E-5
H3K4me3 from mouse brain; bin1kb counts normalized.
7
Statistical tests for differential analysis
• Negative binomial test: models biological replicates, over-dispersion
• T-test: NOT recommended• X2 test: SUM((exp – emp)^2)
=> X2 distr (p-val).• G-test: SUM(ln(emp / exp))
=> X2 distr (p-val). A modification to X2 test, recommended.
diffReps on H3K4me3: cocaine vs. saline
Negative binomial test T-test6527
282
130
8
Two additional tools1. Find hotspots - hotspots are regions where the differential sites or peaks occur significantly more often than random chance.
Hotspot
Differential sites
Greedy search algorithmLocal Poisson
Eval
2. Region analysis - any file with the first 3 columns to be: chromosome, start, end. Annotate gene and heterochromatic regions
Easy to use: region_analysis.pl -i input.txt
9
Test data: ENCODE H3K4me3 between K562 and ESC
Target: H3K4me3 Mock: DNA Input
Identify differential chromatin modification sites
ESC K562
Rep1
Rep2
Rep1
Rep2
Estimate empirical false positive rate
10
Sensitivity & SpecificityTarget
Mock
Negative binomial vs. G-test
eFDR < .05%
11
Overlapped & specific sitesUp-regulated sites, do the same for down sites
“Specific”
“Overlapped”
Promoter
Genebody Promoter Genebody
Using default p<1E-4
RNA-seq
12
Correlating differential sites with transcription
“Specific”“Overlapped”
K562, ESC RNA-seq TopHat-Cufflinks: gene exp change, alternative promoter/splicing
13
diffReps “specific” sites - examples
14
diffReps is used in many works
Big cocaine project:
15
diffReps: current status & community feedback
diffRepspublished
Great to see diffreps has found a nice home in plos one. It is literally the program which has saved my sanity, my phD and probably the paper i'm writing!
- Michael Reschen, Oxford Univ., UK
http://dx.plos.org/10.1371/journal.pone.0065598
16
Acknowledgement
Role Li Shen Ningyi Shao Xiaochuan Liu Eric NestlerDevelopmentTest & resultDocumentation
Google codeMoney$
diffReps: