copy number variation (cnv) what is it?
DESCRIPTION
CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel: 6516-6877 Email: [email protected] http://bidd.nus.edu.sg Room 08-14, level 8, S16, NUS. Copy number variation (CNV) What is it?. - PowerPoint PPT PresentationTRANSCRIPT
CZ5225: Modeling and Simulation in BiologyCZ5225: Modeling and Simulation in Biology
Lecture 10: Copy Number VariationsLecture 10: Copy Number Variations
Prof. Chen Yu ZongProf. Chen Yu Zong
Tel: 6516-6877Tel: 6516-6877Email: Email: [email protected]
http://bidd.nus.edu.sgRoom 08-14, level 8, S16, NUSRoom 08-14, level 8, S16, NUS
22
Copy number variation (CNV)Copy number variation (CNV)What is it?What is it?
• A form of human genetic variation: instead of 2 copies of each region of each chromosome (diploid), some people have amplifications or losses (> 1kb) in different regions– this doesn’t include translocations or inversions
• We all have such regions – the publicly available genome NA15510 has
between 5 & 240 by various estimates– they are only rarely harmful (but rare things do happen)
Copy-number probes are used to quantifyCopy-number probes are used to quantifythe amount of DNA at known locithe amount of DNA at known loci
CN locus: ...CGTAGCCATCGGTAAGTACTCAATGATAG...PM: ATCGGTAGCCATTCATGAGTTACTA
* **
PM = c
CN=1* **
PM = 2c
CN=2* **
PM = 3c
CN=3
44
Copy number variationCopy number variationPopulation genomicsPopulation genomics
The genomes of two humans differ more in a structural sense than at the nucleotide level; a recent paper estimates that on average two of us differ by
~ 4 - 24 Mb of genetic due to Copy Number Variation ~ 2.5 Mb due to Single Nucleotide Polymorphisms
Abundance of CNVs in the Abundance of CNVs in the human populationhuman population
?Still an open question but probably thousands, at low allelic frequency
(<20%)
Abundance of deletion CNVs in Abundance of deletion CNVs in the human population the human population
Comparison of overlapping CNVs identified by Conrad et al. (2006) and McCarroll et al. (2006). Freeman et al. Genome Res 2006
Non-allelic homologous recombination events Non-allelic homologous recombination events between low-copy repeats (LCR-NAHR)between low-copy repeats (LCR-NAHR)
Lupski & Inoue, TIG 2002
Duplications and Deletions of LCRs Duplications and Deletions of LCRs mediated by NAHRmediated by NAHR
LCRs in direct
orientation
LCRs in inverted
orientation
Inversions
Intrachromatid recombination Intrachromatid recombination between LCRsbetween LCRs
LCRs in direct orientation LCRs in inverted orientation
InversionDeletion
Mechanisms generating Mechanisms generating genomic deletionsgenomic deletions
1111
Copy number variationCopy number variationRelations to human diseaseRelations to human disease
Responsible for a number of rare genetic conditions. For example, Down syndrome ( trisomy 21), Cri du chat syndrome (a partial deletion of 5p).
Implicated in complex diseases. For example:
CCL3L1 CN HIV/AIDS susceptibility; also, some sporadic (non-inherited) CN variants are strongly associated with autism, while
Tumors typically have a lot of chromosomal abnormalities, including recurrent CN changes.
Evolutionary and medical Evolutionary and medical implications of CNVs: implications of CNVs: CCL3L1 as an exampleCCL3L1 as an example
When CCL3L1 occupies the CCR5 receptor on CD4 cells, it blocks HIV's entry.
Gonzales et al., Science, 2005
Gonzales et al., Science, 2005
Copy-number variation of CCL3L1 within Copy-number variation of CCL3L1 within and among human and chimp populations and among human and chimp populations
Gonzales et al., Science, 2005
Individuals with a high CCL3L1 gene copy number relative to their population average are more resistant to HIV infection than those with a low copy number, presumably because there is more ligand to compete with HIV during binding to CCR5.
CCL3L1 and HIV InfectionCCL3L1 and HIV Infection
1515
Trisomy 21Trisomy 21
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
1616
Partial deletion of chr 5pPartial deletion of chr 5p
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
1717
A cytogeneticist’s storyA cytogeneticist’s story
“The story is about diagnosis of a 3 month old baby with macrocephaly and some heart problems. The doctors questioned a couple of syndromes which we tested for and found negative. Rather than continue this ‘shot in the dark’ approach, we put the case on an array and found a 2Mb deletion which notably deletes the gene NSD1 on chr 5, mutations in which are known to be cause Sotos syndrome. This is an overgrowth syndrome and fits with the macrocephaly.
The bottom line is that we are able to diagnose quicker by this approach and delineate exactly the underlying genetic change.”
1818
2Mb deletion
Chromosome 5
A cytogeneticist’s storyA cytogeneticist’s story
1919
A lung cancer cell line vs matched normal lymphoblast,from Nannya et al Cancer Res 2005;65:6071-6079
Many tumors have gross CN changesMany tumors have gross CN changes
2020
Research into gonad dysfunction: Research into gonad dysfunction: Human sex reversalHuman sex reversal
• 20% of 46,XY females have mutations in SRY
• 80% of 46,XY females unexplained!
• 90% of 46,XX males due to translocation SRY
• 10% of 46,XX males unexplained!
Suggests loss of function and gain of function mutations in other genes may cause sex reversal. We’re looking at shared deletions.
2121
Genomic DNA
ATCGGTAGCCATTCATGAGTTACTAPerfect Match probe for Allele A
ATCGGTAGCCATCCATGAGTTACTAPerfect Match probe for Allele B
A SNP
GTAGCCATCGGTA GTACTCAATGAT
Affymetrix SNP chip terminologyAffymetrix SNP chip terminology
Genotyping: answering the question about the two copies of the chromosome on which the SNP is located:Is a sample AA (AA) , AB (AG) or BB (GG) at this SNP?
Affymetrix GeneChipAffymetrix GeneChip
1.28cm 6.4 million features/ chip
1.28cm
5 µ5 µ
5 µ
> 1 million identical 25 bp probes / feature
* **
***
2323
250 ng Genomic DNA
RE Digestion
Adaptor Ligation
GeneChip Mapping Assay OverviewGeneChip Mapping Assay Overview
Xba XbaXba
Fragmentationand Labeling
PCR: One Primer Amplification
Complexity Reduction
AA BB AB
Hyb & Wash
2424
Principal low-level analysis stepsPrincipal low-level analysis steps• Background adjustment and normalization at probe level These steps are to remove lab/operator/reagent effects
• Combining probe level summaries to probe set level summary: best done robustly, on many chips at once
This is to remove probe affinity effects and discordant observations (gross errors/non-responding probes, etc)
• Possibly further rounds of normalization (probe set level) as lab/cohort/batch/other effects are frequently still visible
• Derive the relevant copy-number quantities Finally, quality assessment is an important low-level task.
2525
AA
TTAT
Preprocessing for total CN using SNP probe Preprocessing for total CN using SNP probe pairs (250K chip) pairs (250K chip)
Modification by H Bengtsson of a method due to A Wirapati developed some years ago for microsatellite genotyping; similar to the approach used by Illumina.
2626
Background adjustment Background adjustment and normalizationand normalization
Outcome similar to that achieved by quantile normalization
2727
Low-level analysis problems remain Low-level analysis problems remain unsolved; why?unsolved; why?
• The feature size keeps and so the # features/chip keeps; • Fewer and fewer features are used for a given
measurement, allowing more measurements to be made using a single chip
These considerations all place more and more demands on the low-level analysis: to maintain the quality of existing measurements, and to obtain good new ones.
SNP probes can be used toSNP probes can be used toestimate total copy numbersestimate total copy numbers
AA* **
PM = PMA + PMB = 2c
* **
* **
PM = PMA + PMB = 2c
AB* **
*
* **
PM = PMA + PMB = 2c
* **
BB
* **
PM = PMA + PMB = 3c
AAB* **
2929
SNP probe tiling strategySNP probe tiling strategy
TAGCCATCGGTA N
SNP 0 position
A / G
GTACTCAATGAT*
ATCGGTAGCCAT T
ATCGGTAGCCAT CATCGGTAGCCAT G
ATCGGTAGCCAT ACATGAGTTACTACATGAGTTACTA
CATGAGTTACTA CATGAGTTACTA
PMMM
PMMM
AA
B B
0 Allele0 Allele
0 Allele0 Allele
Central probe quartet
3030
SNP probe tiling strategySNP probe tiling strategy
TAGCCATCGGTA N
SNP
+4 PositionA / G
GTA C TCAATGATCAGCT*
GTAGCCAT T
GTAGCCAT CGTAGCCAT C
GTAGCCAT TCAT G AGTTACTAGTCGCAT C AGTTACTAGTCG
CAT G AGTTACTAGTCGCAT C AGTTACTAGTCG
PMMM
PMMM
AA
B B
+4 Allele+4 Allele
+4 Allele+4 Allele
+4 offset probe quartet
3131
SNP for Identifying Copy Number VariationsSNP for Identifying Copy Number Variations
• Using SNP chips to identify change in total copy number (i.e. CN ≠ 2)
• Outline a new method (CRMA)
• Evaluate and compare it with other methods
• Make some closing remarks on further issues
3232
Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (or quantile)
Total CN PM=PMA+PMB
Summarization (SNP signals )
log-additivePM only
Post-processing fragment-length
(GC-content)
Raw total CNs R = Reference
Mij = log2(ij/Rj) chip i, probe j
A few details are passed over. Ask me later if you care about them.
Crosstalk between alleles Crosstalk between alleles - adds significant artifacts to signals- adds significant artifacts to signals
Cross-hybridization:
Allele A: TCGGTAAGTACTCAllele B: TCGGTATGTACTC
AA* **
PMA >> PMB
* **
* **
PMA ≈ PMB
AB* ** *
* **
PMA << PMB
* **
BB
There are six possible allele pairsThere are six possible allele pairs
• Nucleotides: {A, C, G, T}• Ordered pairs:
– (A,C), (A,G), (A,T), (C,G), (C,T), (G,C)
• Because of different nucleotides bind differently, the crosstalk from A to C might be very different from A to T.
AA
BBAB
Crosstalk between alleles Crosstalk between alleles is easy to spotis easy to spot
offset
+
PMB
PMA
Example:
Data from one array
Probe pairs (PMA, PMB)for nucleotide pair (A,T)
Crosstalk between alleles Crosstalk between alleles can be estimated and corrected forcan be estimated and corrected for
PMB
PMA
What is done:
Offset is removed from SNPs and CN units.
Crosstalk is removedfrom SNPs.
+
no offset
AA
BBAB
3737
CRMA
Preprocessing(probe signals)
allelic crosstalk (or quantile)
Total CN PM=PMA+PMB
Summarization (SNP signals )
log-additivePM only
Postprocessing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Already briefly described.
Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)
3838
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CN PM=PMA+PMB
Summarization (SNPsignals )
log-additivePM only
Postprocessing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
That’s it!
Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)
3939
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additivePM only
Postprocessing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
log2(PMijk) = log2ij + log2jk + ijk
Fit using rlm
Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)
4040
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CN PM=PMA+PMB
Summarization (SNP signals )
log-additivePM-only
Postprocessing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
100K
Longer fragments get lesswell amplified by PCR and so give weaker SNP signals
Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)
4141
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CN PM=PMA+PMB
Summarization (SNP signals )
log-additivePM-only
Postprocessing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
500K
Longer fragments get lesswell amplified by PCR and so give weaker SNP signals
Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)
4242
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CN PM=PMA+PMB
Summarization (SNP signals )
log-additivePM-only
Postprocessing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
500K
Longer fragments get lesswell amplified by PCR and so give weaker SNP signals
Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)
4343
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CN PM=PMA+PMB
Summarization (SNP signals )
log-additivePM-only
Postprocessing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Care required with the number and nature of Reference samples used
Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)
4444
Comparison of 4 methodsComparison of 4 methods
CRMA dChip(Li & Wong
2001)
CNAG*(Nannya et al
2005)
CNAT v4(Affymetrix
2006)
Preprocessing(probe signals)
allelic crosstalk (quantile)
quantile scale quantile
Total CN PM=PMA+PMB PM=PMA+PMB
MM=MMA+MMB
PM=PMA+PMB “log-additive”PM-only
Summarization (SNP signals ) Log additive
PM only
Multiplicative
PM-MM
=A+B
Post-processing fragment-length
(GC-content)
fragment-length
(GC-content)
fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj) Mij = log2(ij/Rj) Mij = log2(ij/Rj) Mij = log2(ij/Rj)
4545
Further bioinformatic issuesFurther bioinformatic issues
• Estimating copy number: needs calibration data• Segmentation (of chromosomes into constant
copy number regions): an HMM-like algorithm• Analyzing family CN data: a different HMM• Incorporating non-polymorphic probes:
independent HMM observations to be weighted and combined
• Dealing with mixed normal-abnormal samples• Utilizing poor quality DNA samples• Estimating allele-specific copy number