segmental variation ( c opy n umber v ariants and other gross chromosomal rearrangements)

39
S L I D E 1 SEGMENTAL VARIATION (C opy N umber V ariants and other gross chromosomal rearrangements) Allen E. Bale, M.D. Dept. of Genetics

Upload: mabli

Post on 19-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

SEGMENTAL VARIATION ( C opy N umber V ariants and other gross chromosomal rearrangements). Allen E. Bale, M.D. Dept. of Genetics. Importance of Copy Number Variants (CNVs) and Other Rearrangements in Health and Disease. Constitutional (germ-line) variants in hereditary conditions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 1

SEGMENTAL VARIATION

(Copy Number Variants and other gross chromosomal rearrangements)

Allen E. Bale, M.D.

Dept. of Genetics

Page 2: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 2

Importance of Copy Number Variants (CNVs) and Other Rearrangements in Health and Disease

• Constitutional (germ-line) variants in hereditary conditions– Large and small copy number variants– Translocations and inversions: rarely cause a phenotype but

may generate CNVs due to mis-pairing during meiosis

• Somatically acquired variants in cancer– Duplications and deletions: amplification of oncogene; loss of

tumor suppressor – Translocations and inversions: place oncogene under control

of an active promoter

Page 3: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 3

What is the origin of structural variants?

• An area of active research

• Recurrent constitutional CNVs: Often related to illegitimate recombination between homologous, but non-identical, sequences

• Rare, non-recurrent, constitutional CNVs: No obvious sequence homology at breakpoints, ?non-homologous end joining

• Tumor CNVs: Any mechanism to create a rearrangement that favors tumor growth, often non-homologous end joining.

Page 4: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 4

Cytogenetically visible CNVs and translocations

Page 5: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 5

A Really Large CNV

Page 6: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 6

Somatically acquired translocation

Page 7: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 7

Limitations of Cytogenetics

• Cell has to be proliferating in order to arrest chromosomes at metaphase (when they are visible under the microscope)

• Resolution is limited (in the range of 5 Mb)

• Requires highly skilled technologists and still a lot of hands-on time, even with sophisticated image processing

Page 8: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 8

Submicroscopic CNVs: Array CGH*

*Frequently referred to as “chromosome microarray”

Page 9: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 9

Example: Submicroscopic 22q deletion

•Abnormal nose, ears, and palate

•Also heart, parathyroid, and thymus abnormalities

Page 10: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 10

Limitations of Array CGH

• Can’t detect translocations and inversions

• Resolution still limited by number of probes on the array—typical resolution about 100 kb

• Still a fair amount of variability in results depending on exactly which array is used

Page 11: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 11

Genome-scale sequencing to detect rearrangements

If you could sequence each chromosome as one continuous piece of DNA, from one end to the other with no gaps in the sequence, what structural variants would you miss?

Page 12: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 12

Genome-scale sequencing to detect rearrangements

What methods are currently in use?

•Depth-of-coverage methods

Regions that are deleted or duplicated should yield lesser or greater numbers of reads

•Detection of breakpoints by:

–Short paired reads (like Illumina paired-end sequencing)

Are the sequences at two ends of a fragment both from the same chromosome? Are they the right distance apart?

–Long reads (kb-scale)

Direct sequencing of breakpoints

Page 13: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 13

Genome-scale sequencing to detect rearrangements

•Depth-of-coverage method

•Detection of breakpoints by short paired reads

•Detection of breakpoints by long reads

Compared with cytogenetics and array CGH, how would the approaches above perform?

• What would be missed by depth-of-coverage reading?

• What would be missed by detection of breakpoints?

• What problems do you foresee with these two approaches?

Page 14: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 14

Depth-of-coverage example:

Whole exome sequencing as a tool to identify both sequence variants and CNVs

Page 15: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 15

Whole exome sequencing (see Dr. Lifton’s lecture)

• Capture portions of the genome containing exons in order to efficiently sequence coding regions

• Not designed for CNV detection, but potentially contains information on gene dosage

• For any gene, the number of fragments captured on the array and sequenced should be proportional to the representation in the starting material

Page 16: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 16

Array CGH vs. Exome Sequencing

Page 17: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 17

Does this work at all?

• Total reads on the X chromosome were counted in a series of males and females

• Gene dosage for the X chromosome in males should be half the gene dosage for the X chromosome in females

Page 18: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 18

Does it work for single exons?

Reads counted for each exon of the OTC gene on X chromosomeMales should have one half the female dosage.

• Read number varies among exons due to different capture efficiencies but is consistent subject to subject.

• Exons with sufficient read numbers show dosage effect.

• Performs very well for this 70 kb gene taken as a single unit.

Page 19: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 19

Approach to scanning the whole genome for CNVs

• The genome was divided into 50 kb windows.

• Intervals with zero reads were removed.

• Mean number of reads and standard deviations for each interval were calculated from 10 exome sequences.

• Depth of coverage in a single patient was compared to average and standard deviation of depth of coverage.

• Algorithms were developed for:

– Classifying X chromosome as being deleted in males compared with females

– Classifying X chromosome as being duplicated in females compared with males

Page 20: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 20

Chromosomal coverage with non-zero, 50 kb intervals corresponds exactly to density of coding sequences

Page 21: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 21

Test case: Female with a 338 kb duplication on 5q35Diagram shows all loci passing initial algorithm

Page 22: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 22

Filter #1: Require two adjacent intervals to both be deleted or duplicated

Page 23: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 23

Filter #2: Remove “deleted regions” that contain heterozygous variants

Page 24: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 24

Filter # 3: Remove intervals with read counts <200

Page 25: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 25

Application to 7 subjects with deletions or duplications in 500 kb to 1 Mb range

Page 26: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 26

Some problems with use of exome data

• Intervals with no genes are not covered (important?)• Intervals with large genes having close homologs elsewhere in

the genome can not be accurately evaluated.

• Because this technology is evolving rapidly, the normal standard to which a test sample is compared needs to be a pool of recent exome sequences (huge FDR with non-homogeneous samples).

Page 27: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 27

For a review of published depth-of-coverage methods for exome or genome data see:

Klambauer, G. et. al. (2012). "cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate." Nucleic Acids Res.

Compares several programs, none of which work really well.

Two newer programs for exome sequencing are in your reading list.

Page 28: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 28

Paired-end methods

• Illumina HiSeq, the current industry leader in high-throughput sequencing, generates short reads from fragments 200 to 600 bp long.

• Reading both ends of the same fragment gives you sequences that should lie 200 to 600 bp apart

• Other methods can generate paired fragments that lie even farther apart

Page 29: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 29

Long paired-end methods

Paired end mapping—up to thousands of bp apart

From Korbel et al., 2009

Page 30: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 30

Identifying Structural Mutations: Deletions & Duplications

Page 31: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 31

Identifying Structural Mutations: Inversions

Page 32: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 32

Identifying Structural Mutations: Translocations

Page 33: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 33

Analyzing structural variations from paired end data

• PEMer (Korbel et al., 2009): For discovery of CNVs and inversions; could also be implemented for translocations

• Breakdancer (Chen et al., 2009): For discovery of CNVs, inversions, and translocations

Page 34: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 34

Identifying Structural Mutations with paired end sequence: What goes wrong?

Page 35: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 35

How to overcome problems with paired end detection of CNVs

Separating the wheat from the chaff

• Technical artifacts (ligation of unrelated fragments during library preparation) may be numerous but will be random

• Artifacts related to homologous sequences (see previous slide) will be reproducible but common to all samples

• Real structural variants will be reproducible within a sample and not common to all samples

• How much reading depth do you need to detect the real variants?

Page 36: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 36

Toward direct sequencing of breakpoints

• Long reads– PACbio can generate reads of 1000 bp or so– Nanopore sequencing said to generate reads in the 10s of

thousands• Strobe sequencing with PACbio:

Normally read length is limited due to inactivation of polymerase by laser. Short bursts of laser give sample sequences along a stretch of DNA in the 20 kb range.

Page 37: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 37

Programs for analysis of longer reads that directly sequence breakpoints

• CREST (Wang et. al., 2011): Detects small and large structural variants by direct sequencing of breakpoints.

• SRiC (Zhang et al., 2011): Similar to CREST

• Algorithm for strobe reads (Ritz et al., 2010)

Page 38: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 38

Conclusions

• Structural variation in the genome accounts for a great deal of human phenotypic variability including disease

• Depth-of-coverage methods can detect many CNVs but not inversions and translocations. Variation from sample to sample limits sensitivity and specificity.

• Whole genome sequencing, which can identify all types of structural variants, will supersede depth-of-coverage methods.

• Large scale and small scale duplications and repetitive sequences remain a major obstacle.

Page 39: SEGMENTAL VARIATION  ( C opy  N umber  V ariants and other  gross chromosomal rearrangements)

S L I D E 39

Department of GeneticsPatricia GordonChristopher Heffelfinger Murim Choi Shrikant Mane Richard Lifton Allen Bale

Neuropsychiatric Genetics ProgramStephan Sanders Matthew State

School of Public Health, Biostatistics Division Annette Molinaro

Acknowledgments for exome CNV analysis