copy number variation (cnv) what is it?

45
CZ5225: Modeling and Simulation in CZ5225: Modeling and Simulation in Biology Biology Lecture 10: Copy Number Variations Lecture 10: Copy Number Variations Prof. Chen Yu Zong Prof. Chen Yu Zong Tel: 6516-6877 Tel: 6516-6877 Email: Email: [email protected] http://bidd.nus.edu.sg Room 08-14, level 8, S16, NUS Room 08-14, level 8, S16, NUS

Upload: cid

Post on 30-Jan-2016

115 views

Category:

Documents


4 download

DESCRIPTION

CZ5225: Modeling and Simulation in Biology Lecture 10: Copy Number Variations Prof. Chen Yu Zong Tel: 6516-6877 Email: [email protected] http://bidd.nus.edu.sg Room 08-14, level 8, S16, NUS. Copy number variation (CNV) What is it?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Copy number variation (CNV) What  is it?

CZ5225: Modeling and Simulation in BiologyCZ5225: Modeling and Simulation in Biology

Lecture 10: Copy Number VariationsLecture 10: Copy Number Variations

Prof. Chen Yu ZongProf. Chen Yu Zong

Tel: 6516-6877Tel: 6516-6877Email: Email: [email protected]

http://bidd.nus.edu.sgRoom 08-14, level 8, S16, NUSRoom 08-14, level 8, S16, NUS

Page 2: Copy number variation (CNV) What  is it?

22

Copy number variation (CNV)Copy number variation (CNV)What is it?What is it?

• A form of human genetic variation: instead of 2 copies of each region of each chromosome (diploid), some people have amplifications or losses (> 1kb) in different regions– this doesn’t include translocations or inversions

• We all have such regions – the publicly available genome NA15510 has

between 5 & 240 by various estimates– they are only rarely harmful (but rare things do happen)

Page 3: Copy number variation (CNV) What  is it?

Copy-number probes are used to quantifyCopy-number probes are used to quantifythe amount of DNA at known locithe amount of DNA at known loci

CN locus: ...CGTAGCCATCGGTAAGTACTCAATGATAG...PM: ATCGGTAGCCATTCATGAGTTACTA

* **

PM = c

CN=1* **

PM = 2c

CN=2* **

PM = 3c

CN=3

Page 4: Copy number variation (CNV) What  is it?

44

Copy number variationCopy number variationPopulation genomicsPopulation genomics

The genomes of two humans differ more in a structural sense than at the nucleotide level; a recent paper estimates that on average two of us differ by

~ 4 - 24 Mb of genetic due to Copy Number Variation ~ 2.5 Mb due to Single Nucleotide Polymorphisms

Page 5: Copy number variation (CNV) What  is it?

Abundance of CNVs in the Abundance of CNVs in the human populationhuman population

?Still an open question but probably thousands, at low allelic frequency

(<20%)

Page 6: Copy number variation (CNV) What  is it?

Abundance of deletion CNVs in Abundance of deletion CNVs in the human population the human population

Comparison of overlapping CNVs identified by Conrad et al. (2006) and McCarroll et al. (2006). Freeman et al. Genome Res 2006

Page 7: Copy number variation (CNV) What  is it?

Non-allelic homologous recombination events Non-allelic homologous recombination events between low-copy repeats (LCR-NAHR)between low-copy repeats (LCR-NAHR)

Lupski & Inoue, TIG 2002

Page 8: Copy number variation (CNV) What  is it?

Duplications and Deletions of LCRs Duplications and Deletions of LCRs mediated by NAHRmediated by NAHR

LCRs in direct

orientation

LCRs in inverted

orientation

Inversions

Page 9: Copy number variation (CNV) What  is it?

Intrachromatid recombination Intrachromatid recombination between LCRsbetween LCRs

LCRs in direct orientation LCRs in inverted orientation

InversionDeletion

Page 10: Copy number variation (CNV) What  is it?

Mechanisms generating Mechanisms generating genomic deletionsgenomic deletions

Page 11: Copy number variation (CNV) What  is it?

1111

Copy number variationCopy number variationRelations to human diseaseRelations to human disease

Responsible for a number of rare genetic conditions. For example, Down syndrome ( trisomy 21), Cri du chat syndrome (a partial deletion of 5p).

Implicated in complex diseases. For example:

CCL3L1 CN HIV/AIDS susceptibility; also, some sporadic (non-inherited) CN variants are strongly associated with autism, while

Tumors typically have a lot of chromosomal abnormalities, including recurrent CN changes.

Page 12: Copy number variation (CNV) What  is it?

Evolutionary and medical Evolutionary and medical implications of CNVs: implications of CNVs: CCL3L1 as an exampleCCL3L1 as an example

When CCL3L1 occupies the CCR5 receptor on CD4 cells, it blocks HIV's entry.

Gonzales et al., Science, 2005

Page 13: Copy number variation (CNV) What  is it?

Gonzales et al., Science, 2005

Copy-number variation of CCL3L1 within Copy-number variation of CCL3L1 within and among human and chimp populations and among human and chimp populations

Page 14: Copy number variation (CNV) What  is it?

Gonzales et al., Science, 2005

Individuals with a high CCL3L1 gene copy number relative to their population average are more resistant to HIV infection than those with a low copy number, presumably because there is more ligand to compete with HIV during binding to CCR5.

CCL3L1 and HIV InfectionCCL3L1 and HIV Infection

Page 15: Copy number variation (CNV) What  is it?

1515

Trisomy 21Trisomy 21

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 16: Copy number variation (CNV) What  is it?

1616

Partial deletion of chr 5pPartial deletion of chr 5p

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 17: Copy number variation (CNV) What  is it?

1717

A cytogeneticist’s storyA cytogeneticist’s story

“The story is about diagnosis of a 3 month old baby with macrocephaly and some heart problems. The doctors questioned a couple of syndromes which we tested for and found negative. Rather than continue this ‘shot in the dark’ approach, we put the case on an array and found a 2Mb deletion which notably deletes the gene NSD1 on chr 5, mutations in which are known to be cause Sotos syndrome. This is an overgrowth syndrome and fits with the macrocephaly.

The bottom line is that we are able to diagnose quicker by this approach and delineate exactly the underlying genetic change.”

Page 18: Copy number variation (CNV) What  is it?

1818

2Mb deletion

Chromosome 5

A cytogeneticist’s storyA cytogeneticist’s story

Page 19: Copy number variation (CNV) What  is it?

1919

A lung cancer cell line vs matched normal lymphoblast,from Nannya et al Cancer Res 2005;65:6071-6079

Many tumors have gross CN changesMany tumors have gross CN changes

Page 20: Copy number variation (CNV) What  is it?

2020

Research into gonad dysfunction: Research into gonad dysfunction: Human sex reversalHuman sex reversal

• 20% of 46,XY females have mutations in SRY

• 80% of 46,XY females unexplained!

• 90% of 46,XX males due to translocation SRY

• 10% of 46,XX males unexplained!

Suggests loss of function and gain of function mutations in other genes may cause sex reversal. We’re looking at shared deletions.

Page 21: Copy number variation (CNV) What  is it?

2121

Genomic DNA

ATCGGTAGCCATTCATGAGTTACTAPerfect Match probe for Allele A

ATCGGTAGCCATCCATGAGTTACTAPerfect Match probe for Allele B

A SNP

GTAGCCATCGGTA GTACTCAATGAT

Affymetrix SNP chip terminologyAffymetrix SNP chip terminology

Genotyping: answering the question about the two copies of the chromosome on which the SNP is located:Is a sample AA (AA) , AB (AG) or BB (GG) at this SNP?

Page 22: Copy number variation (CNV) What  is it?

Affymetrix GeneChipAffymetrix GeneChip

1.28cm 6.4 million features/ chip

1.28cm

5 µ5 µ

5 µ

> 1 million identical 25 bp probes / feature

* **

***

Page 23: Copy number variation (CNV) What  is it?

2323

250 ng Genomic DNA

RE Digestion

Adaptor Ligation

GeneChip Mapping Assay OverviewGeneChip Mapping Assay Overview

Xba XbaXba

Fragmentationand Labeling

PCR: One Primer Amplification

Complexity Reduction

AA BB AB

Hyb & Wash

Page 24: Copy number variation (CNV) What  is it?

2424

Principal low-level analysis stepsPrincipal low-level analysis steps• Background adjustment and normalization at probe level These steps are to remove lab/operator/reagent effects

• Combining probe level summaries to probe set level summary: best done robustly, on many chips at once

This is to remove probe affinity effects and discordant observations (gross errors/non-responding probes, etc)

• Possibly further rounds of normalization (probe set level) as lab/cohort/batch/other effects are frequently still visible

• Derive the relevant copy-number quantities Finally, quality assessment is an important low-level task.

Page 25: Copy number variation (CNV) What  is it?

2525

AA

TTAT

Preprocessing for total CN using SNP probe Preprocessing for total CN using SNP probe pairs (250K chip) pairs (250K chip)

Modification by H Bengtsson of a method due to A Wirapati developed some years ago for microsatellite genotyping; similar to the approach used by Illumina.

Page 26: Copy number variation (CNV) What  is it?

2626

Background adjustment Background adjustment and normalizationand normalization

Outcome similar to that achieved by quantile normalization

Page 27: Copy number variation (CNV) What  is it?

2727

Low-level analysis problems remain Low-level analysis problems remain unsolved; why?unsolved; why?

• The feature size keeps and so the # features/chip keeps; • Fewer and fewer features are used for a given

measurement, allowing more measurements to be made using a single chip

These considerations all place more and more demands on the low-level analysis: to maintain the quality of existing measurements, and to obtain good new ones.

Page 28: Copy number variation (CNV) What  is it?

SNP probes can be used toSNP probes can be used toestimate total copy numbersestimate total copy numbers

AA* **

PM = PMA + PMB = 2c

* **

* **

PM = PMA + PMB = 2c

AB* **

*

* **

PM = PMA + PMB = 2c

* **

BB

* **

PM = PMA + PMB = 3c

AAB* **

Page 29: Copy number variation (CNV) What  is it?

2929

SNP probe tiling strategySNP probe tiling strategy

TAGCCATCGGTA N

SNP 0 position

A / G

GTACTCAATGAT*

ATCGGTAGCCAT T

ATCGGTAGCCAT CATCGGTAGCCAT G

ATCGGTAGCCAT ACATGAGTTACTACATGAGTTACTA

CATGAGTTACTA CATGAGTTACTA

PMMM

PMMM

AA

B B

0 Allele0 Allele

0 Allele0 Allele

Central probe quartet

Page 30: Copy number variation (CNV) What  is it?

3030

SNP probe tiling strategySNP probe tiling strategy

TAGCCATCGGTA N

SNP

+4 PositionA / G

GTA C TCAATGATCAGCT*

GTAGCCAT T

GTAGCCAT CGTAGCCAT C

GTAGCCAT TCAT G AGTTACTAGTCGCAT C AGTTACTAGTCG

CAT G AGTTACTAGTCGCAT C AGTTACTAGTCG

PMMM

PMMM

AA

B B

+4 Allele+4 Allele

+4 Allele+4 Allele

+4 offset probe quartet

Page 31: Copy number variation (CNV) What  is it?

3131

SNP for Identifying Copy Number VariationsSNP for Identifying Copy Number Variations

• Using SNP chips to identify change in total copy number (i.e. CN ≠ 2)

• Outline a new method (CRMA)

• Evaluate and compare it with other methods

• Make some closing remarks on further issues

Page 32: Copy number variation (CNV) What  is it?

3232

Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)

CRMA

Preprocessing(probe signals)

allelic crosstalk (or quantile)

Total CN PM=PMA+PMB

Summarization (SNP signals )

log-additivePM only

Post-processing fragment-length

(GC-content)

Raw total CNs R = Reference

Mij = log2(ij/Rj) chip i, probe j

A few details are passed over. Ask me later if you care about them.

Page 33: Copy number variation (CNV) What  is it?

Crosstalk between alleles Crosstalk between alleles - adds significant artifacts to signals- adds significant artifacts to signals

Cross-hybridization:

Allele A: TCGGTAAGTACTCAllele B: TCGGTATGTACTC

AA* **

PMA >> PMB

* **

* **

PMA ≈ PMB

AB* ** *

* **

PMA << PMB

* **

BB

Page 34: Copy number variation (CNV) What  is it?

There are six possible allele pairsThere are six possible allele pairs

• Nucleotides: {A, C, G, T}• Ordered pairs:

– (A,C), (A,G), (A,T), (C,G), (C,T), (G,C)

• Because of different nucleotides bind differently, the crosstalk from A to C might be very different from A to T.

Page 35: Copy number variation (CNV) What  is it?

AA

BBAB

Crosstalk between alleles Crosstalk between alleles is easy to spotis easy to spot

offset

+

PMB

PMA

Example:

Data from one array

Probe pairs (PMA, PMB)for nucleotide pair (A,T)

Page 36: Copy number variation (CNV) What  is it?

Crosstalk between alleles Crosstalk between alleles can be estimated and corrected forcan be estimated and corrected for

PMB

PMA

What is done:

Offset is removed from SNPs and CN units.

Crosstalk is removedfrom SNPs.

+

no offset

AA

BBAB

Page 37: Copy number variation (CNV) What  is it?

3737

CRMA

Preprocessing(probe signals)

allelic crosstalk (or quantile)

Total CN PM=PMA+PMB

Summarization (SNP signals )

log-additivePM only

Postprocessing fragment-length

(GC-content)

Raw total CNs Mij = log2(ij/Rj)

Already briefly described.

Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)

Page 38: Copy number variation (CNV) What  is it?

3838

CRMA

Preprocessing(probe signals)

allelic crosstalk (quantile)

Total CN PM=PMA+PMB

Summarization (SNPsignals )

log-additivePM only

Postprocessing fragment-length

(GC-content)

Raw total CNs Mij = log2(ij/Rj)

That’s it!

Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)

Page 39: Copy number variation (CNV) What  is it?

3939

CRMA

Preprocessing(probe signals)

allelic crosstalk (quantile)

Total CNs PM=PMA+PMB

Summarization (SNP signals )

log-additivePM only

Postprocessing fragment-length

(GC-content)

Raw total CNs Mij = log2(ij/Rj)

log2(PMijk) = log2ij + log2jk + ijk

Fit using rlm

Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)

Page 40: Copy number variation (CNV) What  is it?

4040

CRMA

Preprocessing(probe signals)

allelic crosstalk (quantile)

Total CN PM=PMA+PMB

Summarization (SNP signals )

log-additivePM-only

Postprocessing fragment-length

(GC-content)

Raw total CNs Mij = log2(ij/Rj)

100K

Longer fragments get lesswell amplified by PCR and so give weaker SNP signals

Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)

Page 41: Copy number variation (CNV) What  is it?

4141

CRMA

Preprocessing(probe signals)

allelic crosstalk (quantile)

Total CN PM=PMA+PMB

Summarization (SNP signals )

log-additivePM-only

Postprocessing fragment-length

(GC-content)

Raw total CNs Mij = log2(ij/Rj)

500K

Longer fragments get lesswell amplified by PCR and so give weaker SNP signals

Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)

Page 42: Copy number variation (CNV) What  is it?

4242

CRMA

Preprocessing(probe signals)

allelic crosstalk (quantile)

Total CN PM=PMA+PMB

Summarization (SNP signals )

log-additivePM-only

Postprocessing fragment-length

(GC-content)

Raw total CNs Mij = log2(ij/Rj)

500K

Longer fragments get lesswell amplified by PCR and so give weaker SNP signals

Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)

Page 43: Copy number variation (CNV) What  is it?

4343

CRMA

Preprocessing(probe signals)

allelic crosstalk (quantile)

Total CN PM=PMA+PMB

Summarization (SNP signals )

log-additivePM-only

Postprocessing fragment-length

(GC-content)

Raw total CNs Mij = log2(ij/Rj)

Care required with the number and nature of Reference samples used

Copy-number estimation using Copy-number estimation using Robust Multichip Analysis (CRMA)Robust Multichip Analysis (CRMA)

Page 44: Copy number variation (CNV) What  is it?

4444

Comparison of 4 methodsComparison of 4 methods

CRMA dChip(Li & Wong

2001)

CNAG*(Nannya et al

2005)

CNAT v4(Affymetrix

2006)

Preprocessing(probe signals)

allelic crosstalk (quantile)

quantile scale quantile

Total CN PM=PMA+PMB PM=PMA+PMB

MM=MMA+MMB

PM=PMA+PMB “log-additive”PM-only

Summarization (SNP signals ) Log additive

PM only

Multiplicative

PM-MM

=A+B

Post-processing fragment-length

(GC-content)

fragment-length

(GC-content)

fragment-length

(GC-content)

Raw total CNs Mij = log2(ij/Rj) Mij = log2(ij/Rj) Mij = log2(ij/Rj) Mij = log2(ij/Rj)

Page 45: Copy number variation (CNV) What  is it?

4545

Further bioinformatic issuesFurther bioinformatic issues

• Estimating copy number: needs calibration data• Segmentation (of chromosomes into constant

copy number regions): an HMM-like algorithm• Analyzing family CN data: a different HMM• Incorporating non-polymorphic probes:

independent HMM observations to be weighted and combined

• Dealing with mixed normal-abnormal samples• Utilizing poor quality DNA samples• Estimating allele-specific copy number