church dm grc_workshop

31
© 2014 Personalis, Inc. All rights reserved. Pioneering Genome-Guided Medicine A view from the trenches Deanna M. Church Senior Director of Genomics and Content Real world challenges to using GRCh38

Upload: genome-reference-consortium

Post on 02-Jul-2015

202 views

Category:

Documents


4 download

DESCRIPTION

Using the genome by Deanna Church

TRANSCRIPT

Page 1: Church dm grc_workshop

© 2014 Personalis, Inc. All rights reserved.

Pioneering Genome-Guided Medicine

A view from the trenches

Deanna M. Church

Senior Director of Genomics and Content

Real world challenges to using GRCh38

Page 2: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary2

Acknowledgements

PersonalisJason Harris

Sarah Garcia

Jeanie Tirch

Gabor Bartha

Mark Pratt

Scott Kirk

Michael Clark

Rich Chen

John West

Genome Reference Consortium

NCBIValerie Schneider

Nathan Bouk

Terence Murphy

Alex Astashyn

Donna Maglott

Melissa Landrum

Wendy Rubinstein

Jennifer Lee

Page 3: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary3

Who we are

Inherited

Disease

Diagnostics

Cancer

Services

ACE Platform

Research

Services

Page 4: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary4

Accuracy is key to what we do

Case courtesy of Geisinger Health System

• Both affected children– Macrocephaly

– Low muscle tone, hypotonia

– Delay in early milestones

– Dysphagia

– Esotropia

• Affected Male (3 yr)– Intellectual disability

– Mild hearing loss

– High arched palate

– Small cyst near eye

• Affected Female (15 mo)– Sleep apnea

– Failure to thrive

– Laryngomalacia

– Anisocoria

– Small optic nervesNovel 2bp deletion in GATAD2B

Called by GATK as paternally inherited

Page 5: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary5

Accuracy is key to what we do

Sample GATK-

determined

Genotype

Ref Alt Depth Allele

Freq.

Father 0/1 108 11 121 0.09

Mother 0/0 111 0 112 0.00

Brother 0/1 63 44 109 0.40

Sister 0/1 64 52 119 0.44

Case courtesy of Geisinger Health System

Page 6: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary6

Excitement about GRCh38

GGAACGCAGGGAACACAG

DPYD

R->C

Alt loci

Model Centromere Sequences

Miga et al., 2014

Page 7: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary7

Medical content not on chromosome sequences

Page 8: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary8

Medical content not on chromosome sequences

NT_113939: chr19 unlocalized contigGRCh37

GRCh38

Page 9: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary9

Medical content not on chromosome sequences

NT_167246.2: MHC alternate locus

No SNP annotationSparse SNP

annotation

Page 10: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary10

By any other name

chr19 vs 19

GenBank: CM00681.2

RefSeq: NC_000019.10

Page 11: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary11

By any other name

chr19_KI270938v1_alt

CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1

GenBank: KI270886.1

RefSeq: NT_187640.1

Page 12: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary12

Unflattening the data MICB

Reporting formats (GFF, VCF, etc) don’t

manage multiple locations easily

Page 13: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary13

NW_003871068.1

NC_000006.12 BestRefSeq gene 31494881 31511124 . + . ID=gene13336;Name=MICB;Dbxref=GeneID:4277

NT_167244.2 BestRefSeq gene 2827449 2843674 . + . ID=gene42005;Name=MICB;Dbxref=GeneID:4277

NT_113891.3 BestRefSeq gene 2972222 2988464 . + . ID=gene43669;Name=MICB;Dbxref=GeneID:4277

NT_167245.2 BestRefSeq gene 2742492 2758910 . + . ID=gene44377;Name=MICB;Dbxref=GeneID:4277

NT_167246.2 BestRefSeq gene 2810648 2816200 . + . ID=gene44827;Name=MICB;Dbxref=GeneID:4277

NT_167247.2 BestRefSeq gene 2836836 2853071 . + . ID=gene45127;Name=MICB;Dbxref=GeneID:4277

ID=gene13336;Name=MICB;Dbxref=GeneID:4277

ID=gene42005;Name=MICB;Dbxref=GeneID:4277

ID=gene43669;Name=MICB;Dbxref=GeneID:4277

ID=gene44377;Name=MICB;Dbxref=GeneID:4277

ID=gene44827;Name=MICB;Dbxref=GeneID:4277

ID=gene45127;Name=MICB;Dbxref=GeneID:4277

Building snpEFF

Page 14: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary14

Incremental steps: using fix patches

SHANK2

Page 15: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary15

Using Fix patches to improve alignments

Incremental steps: using fix patches

Page 16: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary16

Migrating to GRCh38: using Fix patches

hs37d5

Fix patch

Page 17: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary17

Migrating to GRCh38: using Fix patches

hs37d5

Fix patch

Page 18: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary18

Migrating to GRCh38: using Fix patches

GRCh37 vs. Fix Patch

GRCh38

Page 19: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary19

GRCh37.p13 Improved alignments outside of fix patch regions

Regions outside of fix patches

Jason Harris

hs37d5

GRCh37.p13

hs37d5GRCh37.p13

378 Ten kb windows that don’t

overlap fix patches with >10 SNV

call differences

Page 20: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary20

GRCh37.p13 Improved alignments outside of fix patch regionsJason Harris

hs37d5

GRCh37.p13

hs37d5

GRCh37.p13

hs37d5

GRCh37.p13

Page 21: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary21

Using Fix patches

Page 22: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary22

Aligning GRCh37 and GRCh38

Seq in

assembly 1

Seq in

assembly 2

A A

B

B’

B

Unique well aligned

region in both assemblies.

First Pass (FP) alignments

Second Pass (SP) alignments

SP only

Expansion

Assembly 1

SP + FP

Collapse

Assembly 2

Page 23: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary23

Aligning GRCh37 and GRCh38

Page 24: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary24

Mapping to GRCh38

Page 25: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary25

Mapping to GRCh38

Dataset Starting

loci

Failure Unique to

Primary

Unique to

Alts

Collapse

in

GRCh37

Collapse

in

GRCh38

GWAS

catalog

7,991 0 7,827 0 14 0

ClinVar* 88,343 3 86,549 5 278 4

GO-ESP

6500

1,982,177 180 1,920,864 339 5,792 324

GIAB 2,915,713 274 2,874,786 47 1,662 4

*clinvar_20140902.vcf

NCBI assembly-assembly alignments from:

Sept 20, 2014, software version 1.7

Page 26: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary26

Remap vs. liftOver

liftOver-dbSNP remap

rs141109950chr7

Page 27: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary27

Remap vs. liftOver

rs267602252

remap liftOver

Page 28: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary28

First Pass remap Second Pass remap

Migrating to GRCh38

Page 29: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary29

Migrating to GRCh38

New PRODH paralog

Sequence is unlocalized on chr22.

Page 30: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary30

Using GRCh38 to improve GRCh37 annotationKCNE1

Alignment to new paralog added in GRCh38

Page 31: Church dm grc_workshop

Personalis, Inc. | Confidential and Proprietary31

Getting the most out of the reference

Still challenging because tools and

data structures expect a flat assembly

Remap/liftOver not the final answer for

moving variation

Even modest changes (via fix patches)

are promsing