detecting and genotyping cnv

25
Population Approaches to Detecting and Genotyping Copy Number Variation Lachlan Coin July 2010

Upload: marek-brandys

Post on 23-Nov-2014

118 views

Category:

Documents


2 download

TRANSCRIPT

Population Approaches to Detecting and Genotyping Copy Number VariationLachlan Coin July 2010

Outline Population-haplotype approach to CNV detecting and genotyping Application to SNP and CGH data Application to NGS sequence data

cnvHap approach to CNV discovery and genotyping

Coin et al, 2010, Nature Methods 7, 541 - 546 (2010)

Example of trained model

cnvHap models haploid CN transitions Specify an per-base global transition rate matrix

copy number to 0 1 2 3 4

copy number from

0 1 2 3 4

q00 q10 .

Rate matrix multiplied by position specific scalar rate Values trained using EM, following the approach of Klosterman et al, used in Xrate for finding substitution rates

cnvHap joint model of CNV + SNP haplotypes

Cluster positions modelled using a linear modelf0 (g) = 1 f1 ( g ) = log(CN( g )/2) rm ( g ) 2 f 2 ( g ) = (log(CN( g )/2))2 rm ( g ) (g) = * f 3 ( g ) = bfrac( g ) bm 2 (g) f 4 ( g ) = bfrac( g ) * (1 bfrac( g )) bm f ( g ) = bfrac( g ) * (bfrac( g ) 0.5)* (bfrac( g ) 1) 5

Model fitted using Ridge regression carried at each iteration of E-M algorithm

Using Illumina SNP arrays

Combined Illumina and Agilent arraysIllumina Agilent Illumina Agilent Illumina Agilent

Some CNVs exhibit shared structure

Improved CNV genotyping accuracyCumulative Frequency of Squared Pearson Correlation

A deletion at 16p11.2 in a patient with extreme obesity+1 0

MLPA probes Segmental duplication

p1 3. 12

q1 2. 2

p1 2. 3

p1 2. 1

p1 3. 2

p1 1. 2

q2 2. 2

q2 3. 1

q2 3. 3

chromosome 16

estimated by aCGH to be 546kb-700kb flanked by segmental duplication (>99% sequence identity) probably arises by NAHR, implying deletion is 739kb BMI = 29.2 kg.m-2 at age 7 learning difficulties, delayed speechRG Walters et al. Nature 463, 671-675 (2010) doi:10.1038/nature08727

q2 4. 2

q2 1

log 2ratio

-1 -2 -3

28.9 Mb

29.2 Mb

29.5 Mb

29.8 Mb

30.1 Mb

30.4 Mb

30.7 Mb

16p11.2 deletions in obesity and population cohortsCohort French child obesity case:control British extreme early-onset obesity (SCOOP) French adult obesity case:control French bariatric surgery patients Swedish discordant siblings Population cohorts (NFBC1966, CoLaus, EGPUT) Obesity: Morbid obesity: P = 5.8x10-7 P = 6.4x10-8 Obese 4/643 3/931 4/705 2/141 2/159 3/1592 Lean/ Normal Weight 0/530 0/669 0/140 1/6235

OR = 29.8 [3.9225] OR = 43.0 [5.6329]

Coverage affected by GC content

Regression model fit to correct for GC bias

Loess curves fit to remove residual spatial variation of coverage

Detecting CNVS with NGS dataDepth/haploid coverage

B-allele frequency

NGS versus CGH dataNGS data chrom1:350mb-351mb CGH data chrom1:350mb-351mb

NGS vs CGH data

Haplotype structure of deletion

NGS amplificationDepth/coverage

With consistent break-points in population

Imputation error rate

Switch error rate

Polyploid phasing and imputation

Conclusions Population-haplotype model enables joint CNV discovery and genotyping using array data Preliminary results indicate this will also help using NGS data Combining information from multiple platforms improves sensitivity Imputation still works for ploidy > 2, phasing becomes more difficult

AcknowledgementsEvangelos Bellos Shu-Yi Su Robin Walters David Balding (UCL) Rob Sladek (McGill)

Julian Asher Alex Blakemore Adam de Smith Phillipe Froguel Julia El-Sayed Moustafa