genome representation and variant identification

32
enome representation variant identificat Deanna M. Church, NCBI

Upload: nardo

Post on 22-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Genome representation and variant identification. Deanna M. Church, NCBI. The Reference Assembly is NOT Static. NCBI35 (hg17). NCBI36 (hg18). GRCh37 (hg19). GRCh37.p9. Image credit: http :// www.tohlejokes.com. http://genomereference.org. Resolved: 716 Open: 697. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Genome representation  and  variant identification

Genome representation and variant identification

Deanna M. Church, NCBI

Page 2: Genome representation  and  variant identification
Page 3: Genome representation  and  variant identification

The Reference Assembly is NOT Static

NCBI35 (hg17)NCBI36 (hg18)GRCh37 (hg19)GRCh37.p9

Page 4: Genome representation  and  variant identification

Image credit: http://www.tohlejokes.com

Page 5: Genome representation  and  variant identification

http://genomereference.org

Page 6: Genome representation  and  variant identification

Resolved: 716Open: 697

Page 7: Genome representation  and  variant identification

http://www.ncbi.nlm.nih.gov/dbvar

Page 8: Genome representation  and  variant identification

Studies

Variant Regions

Variant Calls

Variant Region nsv531833 type: CNV

Variant Calls: nssv577112 type: copy number gain Method: Oligo aCGH Analysis: Probe signal intensity phenotype: Autism; etc. Clinical: Pathogenic Copy Number: 3

Variant Calls: nssv580124 type: copy number loss Method: Oligo aCGH Analysis: Probe signal intensity phenotype: Autism. Clinical: Pathogenic Copy Number: 1

MethodsAnalysis

PublicationsSamples

Submitted assembly

Page 9: Genome representation  and  variant identification

Variant Call Ambiguitystart stop

Inner start Inner stop

Outer start Outer stop

Probes with decreased signal intensityProbes with expected signal intensity

breakpoint breakpoint

Inner start Inner stop

Page 10: Genome representation  and  variant identification

Variant Call AmbiguityOuter start Outer stop

Fosmid clone (40 Kb +/- 1 Kb)

20Kb Clone has an insertionrelative to the genome

Clone has a deletionrelative to the genome 60 Kb

Page 11: Genome representation  and  variant identification

Assembly, Mis-assembly, Biology and Variant Interpretation

Page 12: Genome representation  and  variant identification

BAC insertBAC vector

Shotgun sequence

Assemble

GAPS

“finishers” go in to manually fill the gaps, often by PCR

Page 13: Genome representation  and  variant identification

NCBI36 (hg18)

GRCh

37 (h

g19)

Page 14: Genome representation  and  variant identification

NCBI35 (hg17)

GRCh37 (hg19)

AL139246.20

AL139246.21

Page 15: Genome representation  and  variant identification

Build sequence contigs based on contigs defined in TPF (Tiling Path File).

Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysis

Switch point

Consensus sequence

Page 16: Genome representation  and  variant identification

NCBI36

Page 17: Genome representation  and  variant identification

nsv832911 (nstd68) Submitted on NCBI35 (hg17)

Page 18: Genome representation  and  variant identification

NCBI35 (hg17) Tiling Path

GRCh37 (hg19) Tiling Path

Gap Inserted

Moved approximately 2 Mb distal on chr15

NC_0000015.8 (chr15)

NC_0000015.9 (chr15)

Removed from assembly

Added to assembly

HG-24

Page 19: Genome representation  and  variant identification

Sequences from haplotype 1Sequences from haplotype 2

Old Assembly model: compress into a consensus

New Assembly model: represent both haplotypes

Page 20: Genome representation  and  variant identification

AC074378.4AC079749.5

AC134921.2AC147055.2

AC140484.1AC019173.4

AC093720.2AC021146.7

NCBI36 NC_000004.10 (chr4) Tiling Path

Xue Y et al, 2008

TMPRSS11E TMPRSS11E2

GRCh37 NC_000004.11 (chr4) Tiling Path

AC074378.4AC079749.5

AC134921.1AC147055.2

AC093720.2AC021146.7

TMPRSS11E

GRCh37: NT_167250.1 (UGT2B17 alternate locus)

AC074378.4AC140484.1

AC019173.4AC226496.2

AC021146.7

TMPRSS11E2

nsv532126 (nstd37)

Page 21: Genome representation  and  variant identification

GRCh37

Page 22: Genome representation  and  variant identification

81 FIX Patches71 NOVEL Patches

GRCh37.p9

Page 23: Genome representation  and  variant identification

Dennis et al., 2012

1q32 1q21 1p21

1p21 patch alignment to chromosome 1

Page 24: Genome representation  and  variant identification

Finding the data

Page 25: Genome representation  and  variant identification

How dbVar* manages data

*and most other NCBI databases too

Object Method Analysis Clinical assertion

NCBI36 location

Etc…

nsv1000 Oligo aCGH Probe signal intensity

None Location Etc…

nsv2000 Sequencing Paired end analysis

None Location Etc…

nsv3000 Sequencing Read Depth

Benign Location Etc..

… … … … … …

Search Term

Page 26: Genome representation  and  variant identification
Page 27: Genome representation  and  variant identification
Page 28: Genome representation  and  variant identification

Variant submitted on NCBI35 (hg17)Failed to remap to NCBI36 (hg18)Successful remap to GRCh37 (hg19)

Page 29: Genome representation  and  variant identification
Page 30: Genome representation  and  variant identification

No results in ‘normal’ dbVar searchGenome Sensor predicts this is a location -> points to dbVar Genome Browser

Page 31: Genome representation  and  variant identification
Page 32: Genome representation  and  variant identification

Acknowledgements

dbVar

John LopezTim HefferonJohn GarnerChao ChenGeorge ZhouVictor Ananiev

NCBI

Collaborators

DGVaDGV

GRCNCBI

Valerie SchneiderNathan BoukHsiu-Chuan Chen

Collaborators

TGI-WUWTSIEBI

ISCANCBI Genomes, Viewers and Variation groups