transitioning to gr_ch38
TRANSCRIPT
© 2014 Personalis, Inc. All rights reserved.
Pioneering Genome-Guided Medicine
Deanna M. Church
Senior Directory of Genomics and Content
Transitioning to GRCh38
Personalis, Inc. 2
Who we are
Inherited
Disease
Diagnostics
Cancer
Services
ACE Platform
Research
Services
Personalis, Inc. 3
Reference assembly influence
Gene1 Gene2
Gene1
Sample
Ref
Assembly
Personalis, Inc. 4
Excitement about GRCh38
GGAACGCAGGGAACACAG
DPYD
R->C
Alt loci
Model Centromere Sequences
Miga et al., 2014
Personalis, Inc. 5
CCL3: region: GRCh37
NC_000017.10 (chr17): 34,442,621-35,005,379
Personalis, Inc. 6
CCL5-TBC1D3 region: GRCh38NC_000017.11 (chr17): 36,032,574-36,269,924
NT_187661.1
100 Kb deletion on chromosome
Steinberg et al., 2014 http://dx.doi.org/10.1101/006841
7
Alternate Loci and Genes
3.6 Mb of novel sequence
153 genes not on primary assemblyUnique sequence in alternate loci
Total: 3.6 Mb; 153 genes only on alts
Personalis, Inc. 8
Alt Loci and Genes
25% Medically Interpretable Genes (MIG)
Primary Assembly
Alt Locus
6.4%
6.2%0.18%
Personalis, Inc. 9
Alt Loci and Genes
NT_167246.2: MHC alternate locus
No SNP annotationSparse SNP
annotation
Personalis, Inc. 10
Analysis challenges
Primary Assembly
Paralogous duplicationAllelic duplication
Alt Locus
MapQ
https://github.com/GenomeRef/SoftwareDevTracking
Personalis, Inc. 11
Analysis challenges: variant representation
Primary Assembly
Alt Locus
G>C
1/1 Only valid if homozygous for Alt
1/. Correct if heterozygous for Alt
Personalis, Inc. 12
Waiting for graph representations?
Credit: UC Santa Cruz Genomics Institute
Personalis, Inc. 13
Analysis challenges
chr19 vs 19
GenBank: CM00681.2
RefSeq: NC_000019.10
Personalis, Inc. 14
Analysis challenges
chr19_KI270938v1_alt
CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1
GenBank: KI270886.1
RefSeq: NT_187640.1
Personalis, Inc. 15
Analysis challenges MICB
Reporting formats (GFF, VCF, etc) don’t
manage multiple locations easily
Personalis, Inc. 16
NW_003871068.1
NC_000006.12 BestRefSeq gene 31494881 31511124 . + . ID=gene13336;Name=MICB;Dbxref=GeneID:4277
NT_167244.2 BestRefSeq gene 2827449 2843674 . + . ID=gene42005;Name=MICB;Dbxref=GeneID:4277
NT_113891.3 BestRefSeq gene 2972222 2988464 . + . ID=gene43669;Name=MICB;Dbxref=GeneID:4277
NT_167245.2 BestRefSeq gene 2742492 2758910 . + . ID=gene44377;Name=MICB;Dbxref=GeneID:4277
NT_167246.2 BestRefSeq gene 2810648 2816200 . + . ID=gene44827;Name=MICB;Dbxref=GeneID:4277
NT_167247.2 BestRefSeq gene 2836836 2853071 . + . ID=gene45127;Name=MICB;Dbxref=GeneID:4277
ID=gene13336;Name=MICB;Dbxref=GeneID:4277 ID=gene42005;Name=MICB;Dbxref=GeneID:4277ID=gene43669;Name=MICB;Dbxref=GeneID:4277 ID=gene44377;Name=MICB;Dbxref=GeneID:4277 ID=gene44827;Name=MICB;Dbxref=GeneID:4277 ID=gene45127;Name=MICB;Dbxref=GeneID:4277
Personalis, Inc. 17
Analysis challenges
• Need aligners that can distinguish allelic and
paralogous duplication
• Need variant callers/modules than can correctly
assign genotypes in complex regions
• Need to extend file formats to accommodate new
assembly model