![Page 1: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/1.jpg)
DNA Copy Number AnalysisDNA Copy Number Analysis
Qunyuan Zhang,Ph.D.
Division of Statistical Genomics
Department of Genetics & Center for Genome Sciences
Washington University School of Medicine
03 - 25 – 2008
GEMS Course: M 21-621 Computational Statistical Genetics
![Page 2: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/2.jpg)
Four QuestionsFour Questions
What is Copy Number ?What is Copy Number ?
What can Copy Number tell us?What can Copy Number tell us?
How to measure/quantify Copy Number?How to measure/quantify Copy Number?
How to analyze Copy Number?How to analyze Copy Number?
![Page 3: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/3.jpg)
What is Copy Number ?What is Copy Number ?
Gene Copy Number
The gene copy number (also "copy number variants" or CNVs) is the amount of copies of a particular gene in the genotype of an individual. Recent evidence shows that the gene copy number can be elevated in cancer cells. For instance, the EGFR copy number can be higher than normal in Non-small cell lung cancer. …Elevating the gene copy number of a particular gene can increase the expression of the protein that it encodes.
From Wikipedia www.wikipedia.org
![Page 4: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/4.jpg)
DNA Copy Number A Copy Number Variant (CNV) represents a copy number change involving a D
NA fragment that is ~1 kilobases or larger. From Nature Reviews Genetics, Feuk et al. 2006
DNA Copy Number ≠ DNA Tandem Repeat Number (e.g. microsatellites) <10 bases
DNA Copy Number ≠ RNA Copy Number RNA Copy Number = Gene Expression Level
DNA transcription mRNA
Copy Number is the amount of copies of a particular fragment of nucleic acid molecular chain. It refers to DNA Copy Number in most publications.
![Page 5: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/5.jpg)
What can Copy Number tell us?What can Copy Number tell us?
Genetic Diversity/Polymorphisms
- restriction fragment length polymorphism (RFLP)- amplified fragment length polymorphism (AFLP)- random amplification of polymorphic DNA (RAPD)- variable number of tandem repeat (VNTR; e.g., mini- and
microsatellite)- single nucleotide polymorphism (SNP)- presence/absence of transportable elements…- structural alterations (e.g., deletions, duplications, inversions … )- DNA copy number variant (CNV)
Association with phenotypes/diseases genes/genetic factors
![Page 6: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/6.jpg)
Genetic Alterations in Tumor Cells (DNA Copy Number Changes)
Homologous repeats
Segmental duplications
Chromosomal rearrangements
Duplicative transpositions
Non-allelic recombinations
……
Normal cell
Tumor cells
deletion amplification
CN=0 CN=1 CN=2 CN=3 CN=4
CN=2
![Page 7: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/7.jpg)
How to measure/quantify Copy Number?How to measure/quantify Copy Number?
Quantitative Polymerase Chain Reaction (Q-PCR) : DNA Amplification
(dNTPs, primers, Taq polymerase, fluorescent dye)
PCR
less CN amplification less DNA low fluorescent intensity
more CN amplification more DNA high fluorescent intensity
(one fragment each time)
Microarray : DNA Hybridization
(dNTPs, primers, Taq polymerase, fluorescent dye)
PCR
less CN amplification less DNA arrayed probes low intensities
more CN amplification more DNA arrayed probes high intensities
(multiple/different fragments, mixed pool)
Hybridization
![Page 8: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/8.jpg)
SNP Array: From Image to Copy Number
Tumor: red intensity
Normal: green intensity
Red < Green: Deletion (CN<2)
Red > Green: Amplification (CN>2)
Red = Green: No Alteration (CN=2)
more DNA copy number more DNA hybridization higher intensity
![Page 9: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/9.jpg)
Array CGH : From Image to Copy Number
Tumor NormalAffymetrix Mapping
250K Sty-I chip
~250K probe sets
~250K SNPs
CN=1
CN=0
CN>2
CN=2
CN=2
CN=2
probe set (24 probes)
Deletion
Deletion
Amplification
more DNA copy number more DNA hybridization higher intensity
![Page 10: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/10.jpg)
How to Analyze Copy How to Analyze Copy Number?Number?
![Page 11: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/11.jpg)
General Procedures for Copy Number Analysis
Finished chips (scanner) Raw image data [.DAT files] (experiment info [ .EXP]) (image processing software)
Probe level raw intensity data [.CEL files]
Background adjustment, Normalization, Summarization
Summarized intensity data
Raw copy number (CN) data [log ratio of tumor/normal intensities]
Significance test of CN changesEstimation of CN
Smoothing and boundary determination Concurrent regions among population
Amplification and deletion frequencies among populationsAssociation analysis
Preprocessing :
chip description file [.CDF]
![Page 12: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/12.jpg)
Background Adjustment/Correction
Reduces unevenness of a single chip Makes intensities of different positions on a chip comparable
Before adjustment After adjustment
Corrected Intensity (S’) = Observed Intensity (S) – Background Intensity (B)
For each region i, B(i) = Mean of the lowest 2% intensities in region i
AffyMetrix MAS 5.0
![Page 13: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/13.jpg)
Eliminates non-specific hybridization signalObtains accurate intensity values for specific hybridization
Background Adjustment/Correction
PM only, PM-MM, Ideal MM, etc.
quartet probe set
sense or antisense strands
25 oligonucleotide probes
![Page 14: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/14.jpg)
NormalizationReduces technical variation between chips Makes intensities from different chips comparable
Before normalization After normalization
Base Line Array (linear); Quantile Normalization;Contrast Normalization; etc.
S – Mean of S
S’ =
STD of S
S’ ~ N(0,1 )
![Page 15: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/15.jpg)
Combines the multiple probe intensities for each probe set to produce a summarized value for subsequent analyses.
Summarization
Average methods:
PM only or PM-MM, allele specific or non-specific
Model based method : Li & Wong , 2001
Gene Expression Index
![Page 16: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/16.jpg)
Raw Copy Number Data
S : Summarized raw intensity
S’ : Log transformation, S’ = log2(S)Raw CN: Log ratio of tumor / normal intensities
CN = S’tumor - S’normal = log2(Stumor/Snormal)
Pair design
Snormal = S of the paired normal sampleGroup design
Snormal = average S of the group of normal samples
before Log transformation
S
after Log transformation
Log(S)
Raw CN
![Page 17: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/17.jpg)
Individual Level AnalysisIndividual Level Analysis
Analysis for each individual sample (or each sample pair)
Smoothing
Significance test of CN amplification and deletion
Boundary finding (smoothing and segmentation)
CN estimation
![Page 18: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/18.jpg)
Smoothing via Sliding Window
… .. … … . . . . .. …… …… .. … … . . . . .. …… … .. …… … ..
Window 1Window 2
Window 3Window 4
Window 5Window 6
Window 7Window 8
Window 9Window 10
Window N
Window k
………..
………..
Each window (k) contains 30 consecutive SNPs (k, k+1, k+2, k+3, …, k+29)
![Page 19: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/19.jpg)
Smoothing Smoothing (sliding window=30 snps)(sliding window=30 snps)
Affymetrix
IlluminaChrom. 7
Chrom. 7 Chrom. 7
Mbp
CN
Mbp
Chrom. 7
CN
Mbp
CN
Mbp
CN
![Page 20: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/20.jpg)
Significance Test of CN ChangesSignificance Test of CN Changes
An ExampleAn Example
![Page 21: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/21.jpg)
Sliding Window SmoothingC
N
CN
Mbp Mbp
![Page 22: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/22.jpg)
NormalizationC
N
MbpSD
Mbp
![Page 23: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/23.jpg)
P-value calculationSD
Mbp
-log P
Mbp
![Page 24: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/24.jpg)
Calculate FDR for each window
-log F
DR
Mbp
-log P
Mbp
![Page 25: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/25.jpg)
Select window (FDR < 0.05)
CN
Mbp
-log F
DR
Mbp
![Page 26: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/26.jpg)
Another Example Intensities and Raw CNs, Chr. 1 (Piar#101)
Black: Normal, Red: Tumor, Green: Tumor- Normal
![Page 27: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/27.jpg)
Significance Test for Copy Number Changes: -log(p) values, TSP data, chr. 1, pair#101
Window-based t test
Window size = 0.5 Mbp (~30 SNPs); N = SNP number in window
Mean CN of window t = X N ~ t (df=N -1) SD of widow
-log(p)
Window Position (Mbp)
![Page 28: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/28.jpg)
Segmentation (break chrom. into CN-homologous pieces)BioConductor R Packages (www.bioconductor.org)GLAD package, adaptive weights smoothing (AWS) methodDNAcopy package, circular binary segmentation method
![Page 29: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/29.jpg)
CN Estimation: Hidden Markov Model (HMM) CNAT(www.affymetrix.com); dChip (www.dchip.org) ; CNAG (www.genome.umin.jp)
CN=? CN=? CN=? CN=? CN=?
log ratio
log ratio
log ratio
log ratio
log ratio
… SNP_i SNP_i+1 SNP_i+2 SNP_i+3 SNP_i+4 … position
hidden status(unknown CN )
observed status(raw CN = log ratio of intensities)
CN estimation: finding a sequence of CN values which maximizes the likelihood of observed raw CN.
Algorithm: Viterbi algorithm (can be Iterative)
Information/assumptions below are needed
Background probabilities: Overall probabilities of possible CN values.
P(CN=x); x=0,1,2,3,4,…, n (usually,n<10)
Transition probabilities: Probabilities of CN values of each SNP conditional on the previous one.
P(CN_i+1=xi | CN_i=xj); x=0,1,2,3,4,…, or n
Emission probabilities: Probabilities of observed raw CN values of each SNP conditional on the hidden/unknown/true CN status.
P(log ratio<x|CN=y)=f(x|CN=y); x=one of real numbers; y=0,1,2,3,4, …, or n
![Page 30: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/30.jpg)
HMM Estimation of CN for Chr. 1 (Piar#101)Black: Normal Intensities, Red: Tumor Intensities, Green: Tumor- Normal
Blue: HMM estimated CNs in Tumor Tissue
CN=2 CN=1
CN=4CN=3
![Page 31: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/31.jpg)
Population Level AnalysisPopulation Level Analysis
Analysis for the whole group (or sub-group) of samples
Overall significance test
Amplification and deletion frequencies summarization
Common/concurrent region finding
![Page 32: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/32.jpg)
Raw CN Changes of Chr. 14(average over ~400 pairs )
![Page 33: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/33.jpg)
Genome-wide Raw Copy Number Changes(sliding window plot, averaged over ~400 pairs )
![Page 34: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/34.jpg)
Sliding Window Test of Significance of CN Changes -log(p) values, based on ~ 400 pairs
![Page 35: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/35.jpg)
Visualization of Concurrent Regions of Chr. 14(~400 pairs)
positions
samples
![Page 36: DNA Copy Number Analysis Qunyuan Zhang,Ph.D. Division of Statistical Genomics Department of Genetics & Center for Genome Sciences Washington University](https://reader036.vdocument.in/reader036/viewer/2022062518/56649ea15503460f94ba4b52/html5/thumbnails/36.jpg)
Software
Affymetrix Chips (www.affymetrix.com)Illumina Chips (www.illumina.com)
CNAT(www.affymetrix.com); dChip (www.dchip.org) ;CNAG (www.genome.umin.jp)
GenePattern www.broad.mit.edu/cancer/software/genepattern/
BioConductor R Packages (www.bioconductor.org)GLAD package, adaptive weights smoothing (AWS) methodDNAcopy package, circular binary segmentation method