schneider_agbt2014

23
aking Advantage of GRCh3 Valerie Schneider 12 February 2014

Upload: vaschn

Post on 11-May-2015

7.135 views

Category:

Technology


0 download

DESCRIPTION

Presentation on the GRCh38 human reference genome assembly from the 2014 AGBT meeting.

TRANSCRIPT

Page 1: Schneider_AGBT2014

Taking Advantage of GRCh38

Valerie Schneider12 February 2014

Page 2: Schneider_AGBT2014

Introducing GRCh38

GRCh38: Dec. 24, 2013

Page 3: Schneider_AGBT2014

Time for change

GRCh37.p13

• 178 Regions: 3.15% of chromosome sequence

• 131 FIX patches: add 6.8 Mb novel sequence

• 73 NOVEL patches: add >800kb novel sequence

GRCh38

• 178 regions with alt loci: 2% of chromosome sequence (61.9 Mb)

• 261 Alt Loci: 3.6 Mb novel sequence relative to chromosomes

Page 4: Schneider_AGBT2014

GRCh38: Assembly Stats

http://genomereference.org

Page 5: Schneider_AGBT2014

GRCh38: Annotation Stats

Page 6: Schneider_AGBT2014

MAF=0Insertio

nsn=834

MAF=0Deletion

sn=1541

MAF<5%Mismatc

h in pseudo/pr txptn=1413

Annotator and clinical

requestsn= ~260

GRCh38 Sequence Updates

SNV MAF = 0

n=15,244

Page 7: Schneider_AGBT2014

GRCh38 Sequence Updates

Pile-Up Analysis: “Never Seen” Mismatched Bases Originating from RP11 Components

79% of these bases are heterozygous in RP11 WGS

n=10489

Page 8: Schneider_AGBT2014

GRCh38: Sequence Updates

Coding Consequences

Page 9: Schneider_AGBT2014

GRCh38 Model Centromeres

Until now, centromeres have been defined as multi-megabase gaps in the assembly

Page 10: Schneider_AGBT2014
Page 11: Schneider_AGBT2014

GRCh38 Model Centromeres

Karen Miga (Kent Lab, UCSC)

Page 12: Schneider_AGBT2014

GRCh38 Model Centromeres

http://genomereference.org

Page 13: Schneider_AGBT2014

1q32 1q21 1p21

Dennis et al., 2012

GRCh38 Sequence Addition

Page 14: Schneider_AGBT2014

HYDIN: chr16 (16q22.2)

HYDIN2: chr1 (1q21.1)Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Placed in GRCh38

Alignment of HYDIN2 Genomic, 300 Kb, 99.4% ID

Alignment of HYDIN CHM1_1.0, >99.9% IDAlignment of HYDIN2 Genomic, 300 Kb, 99.4% ID

Alignment of HYDIN CHM1_1.0, >99.9% ID

Doggett et al., 2006

GRCh38 Path Updates

Page 15: Schneider_AGBT2014

GRCh38: Novel Sequence

Page 16: Schneider_AGBT2014

Sequences from haplotype 1Sequences from haplotype 2

Old Assembly model: compress into a consensus

New Assembly model: represent both haplotypes

GRCh38 Alt Loci

Page 17: Schneider_AGBT2014

GRCh38: Alt Loci

Page 18: Schneider_AGBT2014

GRCh38: Alt Loci

Black: deletion configuration

Kidd et al., PLoS Genet. (2007) PMID: 17447845

Part of chr22 assembly

Alternate locus for chr22

Page 19: Schneider_AGBT2014

chromosome

alt/patch

reads On-target alignment

Off-target alignments

(n=122,922)

GRCh38: Alt Loci

Page 20: Schneider_AGBT2014

GRCh38: Alt Loci

Page 21: Schneider_AGBT2014

Masks and alt aware aligners reduce the incidence of ambiguous alignments observed when aligning

reads to the full assembly

Mask1: mask chr for fix patches, scaffold for novel/alts. Mask2: mask only on scaffolds

GRCh38: Alt Loci

Page 22: Schneider_AGBT2014

http://www.ncbi.nlm.nih.gov/genome/tools/remap

ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38

Page 23: Schneider_AGBT2014

Collaborators• NCBI RefSeq and gpipe annotation team• Havana annotators• Karen Miga• David Schwartz• Steve Goldstein• Mario Caceres• Giulio Genovese• Jeff Kidd• Peter Lansdorp• Mark Hills• David Page• Jim Knight• Stephan Schuster• 1000 Genomes

GRC SAB• Rick Myers• Granger Sutton• Evan Eichler• Jim Kent• Roderic Guigo• Carol Bult• Derek Stemple• Matthew Hurles• Richard Gibbs

GRCh38 Credits