schneider_agbt2014
DESCRIPTION
Presentation on the GRCh38 human reference genome assembly from the 2014 AGBT meeting.TRANSCRIPT
Taking Advantage of GRCh38
Valerie Schneider12 February 2014
Introducing GRCh38
GRCh38: Dec. 24, 2013
Time for change
GRCh37.p13
• 178 Regions: 3.15% of chromosome sequence
• 131 FIX patches: add 6.8 Mb novel sequence
• 73 NOVEL patches: add >800kb novel sequence
GRCh38
• 178 regions with alt loci: 2% of chromosome sequence (61.9 Mb)
• 261 Alt Loci: 3.6 Mb novel sequence relative to chromosomes
GRCh38: Annotation Stats
MAF=0Insertio
nsn=834
MAF=0Deletion
sn=1541
MAF<5%Mismatc
h in pseudo/pr txptn=1413
Annotator and clinical
requestsn= ~260
GRCh38 Sequence Updates
SNV MAF = 0
n=15,244
GRCh38 Sequence Updates
Pile-Up Analysis: “Never Seen” Mismatched Bases Originating from RP11 Components
79% of these bases are heterozygous in RP11 WGS
n=10489
GRCh38: Sequence Updates
Coding Consequences
GRCh38 Model Centromeres
Until now, centromeres have been defined as multi-megabase gaps in the assembly
GRCh38 Model Centromeres
Karen Miga (Kent Lab, UCSC)
1q32 1q21 1p21
Dennis et al., 2012
GRCh38 Sequence Addition
HYDIN: chr16 (16q22.2)
HYDIN2: chr1 (1q21.1)Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Placed in GRCh38
Alignment of HYDIN2 Genomic, 300 Kb, 99.4% ID
Alignment of HYDIN CHM1_1.0, >99.9% IDAlignment of HYDIN2 Genomic, 300 Kb, 99.4% ID
Alignment of HYDIN CHM1_1.0, >99.9% ID
Doggett et al., 2006
GRCh38 Path Updates
GRCh38: Novel Sequence
Sequences from haplotype 1Sequences from haplotype 2
Old Assembly model: compress into a consensus
New Assembly model: represent both haplotypes
GRCh38 Alt Loci
GRCh38: Alt Loci
GRCh38: Alt Loci
Black: deletion configuration
Kidd et al., PLoS Genet. (2007) PMID: 17447845
Part of chr22 assembly
Alternate locus for chr22
chromosome
alt/patch
reads On-target alignment
Off-target alignments
(n=122,922)
GRCh38: Alt Loci
GRCh38: Alt Loci
Masks and alt aware aligners reduce the incidence of ambiguous alignments observed when aligning
reads to the full assembly
Mask1: mask chr for fix patches, scaffold for novel/alts. Mask2: mask only on scaffolds
GRCh38: Alt Loci
http://www.ncbi.nlm.nih.gov/genome/tools/remap
ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38
Collaborators• NCBI RefSeq and gpipe annotation team• Havana annotators• Karen Miga• David Schwartz• Steve Goldstein• Mario Caceres• Giulio Genovese• Jeff Kidd• Peter Lansdorp• Mark Hills• David Page• Jim Knight• Stephan Schuster• 1000 Genomes
GRC SAB• Rick Myers• Granger Sutton• Evan Eichler• Jim Kent• Roderic Guigo• Carol Bult• Derek Stemple• Matthew Hurles• Richard Gibbs
GRCh38 Credits