agbt2017 reference workshop: fulton

35
Laboratory Aspects of Generating High Quality Assemblies MGI Reference Genomes Workshop Bob Fulton February 13 th 2017

Upload: genome-reference-consortium

Post on 20-Mar-2017

202 views

Category:

Health & Medicine


0 download

TRANSCRIPT

Page 1: AGBT2017 Reference Workshop: Fulton

Laboratory Aspects of Generating High Quality Assemblies

MGI Reference Genomes Workshop

Bob FultonFebruary 13th 2017

Page 2: AGBT2017 Reference Workshop: Fulton

Primary Objectives

• Develop Tools and Techniques to Provide High Quality, Haplo-resolved Genome Assemblies Sampling and Capturing as Much Human Diversity as Possible

Page 3: AGBT2017 Reference Workshop: Fulton

Sequencing Strategy for Reference Genomes

• PacBio Large Insert Library Construction• Linked Reads with 10X Genomics• Validation Using BioNano Physical Map

Page 4: AGBT2017 Reference Workshop: Fulton

PacBio

Page 5: AGBT2017 Reference Workshop: Fulton

PacBio WGS Library Construction

• High Molecular Weight Genomic DNA• DNA must be of sufficient quality to allow for 50 kb shearing to

produce PacBio Continuous Long Reads (CLR)

• Consistent Shearing 50 kb• Preferred method: Diagenode Megaruptor

• Fragment size setting – 50kb

• Working on 3 Methods for Library Construction• PacBio SMRTbell – Current Standard PacBio SMRTbell Template Prep

Kit 1.0 and SMRTbell Damage Repair Kit• Hybrid Library– Swift Accel-NGS XL Library Prep Kit but exchanging

PacBio Damage Repair Kit• Swift Library - Swift Accel-NGS XL Library Prep Kit Including Swift

DNA Repair Enzymes • New Data Recently Available with New Repair Process

Page 6: AGBT2017 Reference Workshop: Fulton

HG02818 Library Preparation and Sequencing

• Three library reactions(15ug) each of HG02818 were processed using the PacBio SMRTbell, Hybrid, and Swift library preps.

• Library recoveries leading into BluePippin size selection for the Hybrid and Swift methods were double the PacBio library prep.

• All libraries were size selected on the BP at 20Kb-50Kb..

• The PacBio SMRTbell library generated over a Gb of data for the first two SMRT cells. Additional SMRT cells produced less data as the library appeared to degrade.

Library Method Library Recovery Pre-BP

ROI Read Length

PacBio SMRTbell 35.8% (5.3ug) 12178

Hybrid 68.8% (10.3ug) 13511

Swift 70.9% (10.6ug) 10232

Page 7: AGBT2017 Reference Workshop: Fulton

HG02818 Library Preparation and Sequencing

0

200

400

600

800

1000

1200

1400

1600

1800

11/6/16 11/11/16 11/16/16 11/21/16 11/26/16 12/1/16 12/6/16

PacBio SMRTbell

Hybrid

Swift

Date of PacBio RSII Sequencing Run

Read

of

Inse

rt M

base

spe

r SM

RT c

ell

Page 8: AGBT2017 Reference Workshop: Fulton

Subread Length Comparisons - HG02818

SMRTbell Library

• Mean Subread Length: 11,391 bp

• N50 Subread Length: 17,007 bp

Hybrid Libraries

• Mean Subread Length: 13,406 bp

• N50 Subread Length: 18,649 bp

Page 9: AGBT2017 Reference Workshop: Fulton

Subread Length Comparisons - HG02818

Swift Library

• Mean Subread Length: 10,163 bp

• N50 Subread Length: 15,220 bp

E. Coli New Swift Only Kit

• Mean Subread Length:

16,387 bp

• N50 Subread Length:

22,625 bp

Page 10: AGBT2017 Reference Workshop: Fulton

Agilent Tape Station Assessment of Library Size

PacBio SMRTbell No BluePippin Size Selection

Page 11: AGBT2017 Reference Workshop: Fulton

Agilent Tape Station Assessment of Library Size

PacBio SMRTbell 6Kb-50Kb BluePippin Size Selection

Page 12: AGBT2017 Reference Workshop: Fulton

Agilent Tape Station Assessment of Library Size

Hybrid Prep Pre-BluePippin Size Selection

Page 13: AGBT2017 Reference Workshop: Fulton

Agilent Tape Station Assessment of Library Size

PacBio SMRTbell 8Kb-50Kb BluePippin Size Selection

Page 14: AGBT2017 Reference Workshop: Fulton

Agilent Tape Station Assessment of Library Size

Hybrid Prep 18Kb-50Kb BluePippin Size Selection

Page 15: AGBT2017 Reference Workshop: Fulton

10X Genomics

Page 16: AGBT2017 Reference Workshop: Fulton

10X Genomics

• Chromium Instrument• Long Range Linking Information on a Genome Wide

Scale• Phasing Information Across a Genome• Enhanced Variant Calling and Structural Variation

Detection• DeNovo Assembly of Diploid Genomes• Both WGS and Targeted Approaches

Page 17: AGBT2017 Reference Workshop: Fulton

10X Genomics Overview

(Church 10X Genomics)

Page 18: AGBT2017 Reference Workshop: Fulton

10X Genomics Phasing – Important for Het vs. Repeat Copy Resolution

(Church 10X Genomics)

Page 19: AGBT2017 Reference Workshop: Fulton

(Church 10X Genomics)

Page 20: AGBT2017 Reference Workshop: Fulton

BioNano

Page 21: AGBT2017 Reference Workshop: Fulton
Page 22: AGBT2017 Reference Workshop: Fulton

Bionano Stats from Human Cell Lines

Genome Coverage Mol N50(Kb)

# of Map Contigs

Contig N50 (Mb)

Total Map Size (Gb)

NA19240 96X 174.9 3148 1.26 2.85

NA19238 93X 216.9 2798 1.47 2.93

NA19239 118X 201 2565 1.68 2.96

HG00733 157X 202.9 2484 1.69 2.92

HG00514 161X 211.7 3025 1.35 2.83

NA12878 134X 202.7 2739 1.46 2.84

HG01352 117X 184.5 3666 1.01 2.80

Page 23: AGBT2017 Reference Workshop: Fulton

Large Inversion in HG00514

Page 24: AGBT2017 Reference Workshop: Fulton

Printrepeats showing ~25kb Inverted Repeat

Page 25: AGBT2017 Reference Workshop: Fulton

Read Mapping of Short Reads

A CG TG T

Short ReadsA A

CC ? ?G G G G

TTTT ??? ?

Page 26: AGBT2017 Reference Workshop: Fulton

Short Read Assembly

A CG TG T

Short ReadsA A

CC ? ?G G G G

TTTT ??? ?

A

C

G

T

G

T

Page 27: AGBT2017 Reference Workshop: Fulton

Long (PacBio) Reads

A CG TG T

Long ReadsA CG

T

T

A

GA

G G

G

G

T

CT

Page 28: AGBT2017 Reference Workshop: Fulton

10X Linked Reads

A CG TG T

A

C

G G

T

A

C

G

T T

T

T

G T

Page 29: AGBT2017 Reference Workshop: Fulton

10X Linked Reads

A CG TG T

CT TA

T T

A G T

G TX

We only achieve ~.2X per Molecule

X

X

Page 30: AGBT2017 Reference Workshop: Fulton

10X Linked Reads – Resolving Alleles vs Repeats

A CG T/GG T

CT TA G

CT T

A G G

G GX

Page 31: AGBT2017 Reference Workshop: Fulton

BioNano Map

A CG TG T

Nick Sites

Page 32: AGBT2017 Reference Workshop: Fulton

BioNano Map

A CG TG T

Nick Sites

Indicates Flipped Loop of Inverted Repeat

Page 33: AGBT2017 Reference Workshop: Fulton

Future Plans

• Refine Existing Platforms• Longer Linking• Longer Sequences• Cost Reductions

• Investigate New Platforms• PacBio Sequel• Oxford Nanopore

• Investigate New Techniques• Hybridization of Long Linked Reads in Lieu of Large Insert Clones to

Capture Allelic Diversity Across as Many Humans as Possible

Page 34: AGBT2017 Reference Workshop: Fulton

Summary

• Goal: Generate Robust Data Sets for Additional High-quality Reference Genome Enhancing the Full Range of Genetic Diversity in Humans

• These Long Read (Long Range) Sequencing/Mapping Applications Provide Orthogonal Synergistic Data Sets to Help Accomplish Our Goal.

• Each System Possesses Unique Challenges and Requires Optimization of Protocols and Running Conditions Specific to Our Needs.

• Experience and Communication is Key.

(Magrini)

Page 35: AGBT2017 Reference Workshop: Fulton

Acknowledgements

The McDonnell Genome Institute at Washington University in St. Louis

Tina GravesAmy LyLisa CookCatrina FronickKaryn Meltz SteinbergWes WarrenChad TomlinsonEddie BelterSusan Dutcher

10X GenomicsDeanna ChurchMichael Chase

BioNano GenomicsAlex Hastie

Pacific Biosciences Nick SisnerosLaura Nolden

Nationwide Children’s Hospital

Rick WilsonVince MagriniSean McGrath

NCBIValerie Schneider