population approaches to detecting and genotyping copy number variation

25
Population Approaches to Detecting and Genotyping Copy Number Variation Lachlan Coin July 2010

Upload: gyan

Post on 21-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Population Approaches to Detecting and Genotyping Copy Number Variation. Lachlan Coin July 2010. Outline. Population-haplotype approach to CNV detecting and genotyping Application to SNP and CGH data Application to NGS sequence data. cnvHap approach to CNV discovery and genotyping. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Population Approaches to Detecting and Genotyping Copy Number Variation

Population Approaches to Detecting and Genotyping Copy

Number Variation

Lachlan Coin

July 2010

Page 2: Population Approaches to Detecting and Genotyping Copy Number Variation

Outline

• Population-haplotype approach to CNV detecting and genotyping

• Application to SNP and CGH data

• Application to NGS sequence data

Page 3: Population Approaches to Detecting and Genotyping Copy Number Variation

cnvHap approach to CNV discovery and genotyping

Coin et al, 2010, Nature Methods 7, 541 - 546 (2010) 

Page 4: Population Approaches to Detecting and Genotyping Copy Number Variation

Example of trained model

Page 5: Population Approaches to Detecting and Genotyping Copy Number Variation

cnvHap models haploid CN transitions

• Specify an per-base global transition rate matrix

copy number to

copy

num

ber

from 0

1234

0 1 2 3 4

q00 q10 ….

• Rate matrix multiplied by position specific scalar rate• Values trained using EM, following the approach of

Klosterman et al, used in Xrate for finding substitution rates

Page 6: Population Approaches to Detecting and Genotyping Copy Number Variation

cnvHap joint model of CNV + SNP haplotypes

Page 7: Population Approaches to Detecting and Genotyping Copy Number Variation

Cluster positions modelled using a linear model

1))((*0.5))((*)(=)(

))((1*)(=)(

)(=)(

)/2))((log(=)(

)/2)((log=)(

1=)(

*=

)(

)(

)(

)(

5

4

3

22

1

0

2

2

ggggf

gggf

ggf

ggf

ggf

gf

g

g

g

g

bm

bm

rm

rm

bfracbfracbfrac

bfracbfrac

bfrac

CN

CN

β

Model fitted using Ridge regression carried at each iteration of E-M algorithm

Page 8: Population Approaches to Detecting and Genotyping Copy Number Variation

Using Illumina SNP arrays

Page 9: Population Approaches to Detecting and Genotyping Copy Number Variation

Illumina Agilent Illumina Agilent Illumina Agilent

Combined Illumina and Agilent arrays

Page 10: Population Approaches to Detecting and Genotyping Copy Number Variation

Some CNVs exhibit shared structure

Page 11: Population Approaches to Detecting and Genotyping Copy Number Variation

Improved CNV genotyping accuracy

Cumulative Frequency of Squared Pearson Correlation

Page 12: Population Approaches to Detecting and Genotyping Copy Number Variation

A deletion at 16p11.2 in a patient with ‘extreme obesity’

• estimated by aCGH to be 546kb-700kb• flanked by segmental duplication (>99% sequence identity)• probably arises by NAHR, implying deletion is 739kb

• BMI = 29.2 kg.m-2 at age 7½• learning difficulties, delayed speech

28.9 Mb 29.2 Mb 29.5 Mb 29.8 Mb 30.1 Mb 30.4 Mb 30.7 Mb

p13.2

p13.1

2

p12.3

p12.1

q12.2

q21

q22.2

q23.1

q23.3

q24.2

p11.2

log2

ratio

+1

0

- 1

- 2

- 3

MLPA probes

Segmental duplication

chromosome 16

RG Walters et al. Nature 463, 671-675 (2010) doi:10.1038/nature08727

Page 13: Population Approaches to Detecting and Genotyping Copy Number Variation

16p11.2 deletions in obesity and population cohorts

-3/931British extreme early-onset obesity (SCOOP)

0/5304/643French child obesity case:control

Lean/Normal Weight

ObeseCohort

0/6694/705French adult obesity case:control

1/62353/1592Population cohorts(NFBC1966, CoLaus, EGPUT)

0/1402/159Swedish discordant siblings

-2/141French bariatric surgery patients

Obesity: P = 5.8x10-7 OR = 29.8 [3.9–225]Morbid obesity: P = 6.4x10-8 OR = 43.0 [5.6–329]

Page 14: Population Approaches to Detecting and Genotyping Copy Number Variation

Coverage affected by GC content

Page 15: Population Approaches to Detecting and Genotyping Copy Number Variation

Regression model fit to correct for GC bias

Page 16: Population Approaches to Detecting and Genotyping Copy Number Variation

Loess curves fit to remove residual spatial variation of coverage

Page 17: Population Approaches to Detecting and Genotyping Copy Number Variation

Detecting CNVS with NGS dataDepth/haploid coverage

B-allele frequency

Page 18: Population Approaches to Detecting and Genotyping Copy Number Variation

NGS versus CGH data

NGS data chrom1:350mb-351mb CGH data chrom1:350mb-351mb

Page 19: Population Approaches to Detecting and Genotyping Copy Number Variation

NGS vs CGH data

Page 20: Population Approaches to Detecting and Genotyping Copy Number Variation

Haplotype structure of deletion

Page 21: Population Approaches to Detecting and Genotyping Copy Number Variation

NGS amplification Depth/coverage

Page 22: Population Approaches to Detecting and Genotyping Copy Number Variation

With consistent break-points in population

Page 23: Population Approaches to Detecting and Genotyping Copy Number Variation

Polyploid phasing and imputationIm

puta

tion

erro

r ra

teS

witc

h e

rror

rat

e

Page 24: Population Approaches to Detecting and Genotyping Copy Number Variation

Conclusions

• Population-haplotype model enables joint CNV discovery and genotyping using array data

• Preliminary results indicate this will also help using NGS data

• Combining information from multiple platforms improves sensitivity

• Imputation still works for ploidy > 2, phasing becomes more difficult

Page 25: Population Approaches to Detecting and Genotyping Copy Number Variation

Acknowledgements

Evangelos Bellos

Shu-Yi Su

Robin Walters

Julian Asher

Alex Blakemore

Adam de Smith

Phillipe Froguel

Julia El-Sayed Moustafa

David Balding (UCL)

Rob Sladek (McGill)