introduction over the last two years, the production teams at the genome center (gc) have been...

1
Introduction Over the last two years, the production teams at The Genome Center (GC) have been working to design flexible pipelines that are able to handle a multitude of diverse projects. With the advent of next generation sequencing technologies that allow rapid and low cost data generation, many more samples are needed to fill the production pipelines. The cost of data generation has fallen to a point where new and creative project designs can be considered that were not feasible in the past. Because of this greater degree of freedom with project design, the pipeline needed to be designed to handle projects with a large numbers of samples needing focused data generation as well as projects with only a few samples needing genome-wide coverage. The first step in this process was to develop a long term plan that involved every aspect of The Genome Center. The senior leadership from informatics, technology development, analysis and production meet weekly to monitor the pipelines, discuss new projects and decide how best to take advantage of new technologies. Every project we accept first undergoes a thorough review to make sure that we can meet the expectations of the collaborators in a cost-effective and timely manner. We then strive to work with collaborators to explain the approach and attendant timelines. The incoming sample management group is the entry point to the pipeline and was designed to handle large numbers of samples. Software tools allow the efficient import of barcoded or non-barcoded samples into the laboratory information management system database (LIMS). Project tracking and sample progress monitoring tools facilitate communication of progress with each collaborator. Within Sample Management, the resource bank houses all of the samples brought into the center, provides samples for QC, and manages distribution to the appropriate group when project production work begins. In addition, all of the lab and analytical processes associated with each sample are catalogued in a new “work order” system developed by the LIMS group. The downstream production groups such as library construction and the data generation groups (3730, 454 and Illumina) work from uniform, SOP-formatted protocols and LIMS tools. These groups process large numbers of samples, using automation to provide more uniform results, and to minimize sample processing mistakes and contamination. The protocols also allow for pooling of samples which reduces the cost of the project by utilizing next generation sequencing platforms in a more cost-effective manner. After data generation, projects enter the specified analysis pipeline(s). The Genome Center offers a wide range of analysis capabilities, including single nucleotide, in/del variant and structural variation detection. We have high-throughput validation schemes available, as well as replication capabilities. Project Management Work Flow The Estimates group is contacted before a project begins. The sample submission sheet is sent and completed by collaborator or internal contact. Samples arrive at The Genome Center with the completed sample submission sheet, and are delivered to project management. Project Management Team checks for correct paperwork including: Sample submission sheet, signed estimate, IRB, billing contact and account based upon the funding source. A project heading is communicated to the laboratory information management system database, and the project is created. Work order is created with information from the submission sheet, the samples are barcoded, checked into freezer, and work order sent to appropriate groups. Resource bank group claims samples, quantitates, and creates aliquots. Samples sent to library construction, and are then forwarded to appropriate sequencing pipeline after library construction is complete. PROJECT MANAGEMENT, DATA GENERATION, AND ANALYSIS AT THE GENOME CENTER Elaine Mardis, Lucinda Fulton, and The Genome Center Production Group The Genome Center, Washington University School of Medicine, St. Louis, Missouri 63108, USA Number of GAIIx’s = 50 Number of HiSeqs = 3 101 Cycle Paired End Run Avg. Gb/ run 34 Reads/ run 170 M PF Run Time 10 days Avg. Error rate R1 2.07 Avg. Error rate R2 1.55 Sample received from Lib Core Titration Enrichment Experiment (TERM) emulsification and amplification TERM Recovery & Enrichment Large Volume (LV) Emulsion emulsification and amplification LV Recovery & Enrichment Sequencing Run Preparation Sequencing Run To Analysis 7 hour process 4 hour process 8 hour process 4 hour process 2 hour process 9 hour process 454 Sequencing Pipeline 454 Sequencing Capacity •8 total 454 instruments •120 Titanium runs per month (50.4 Gb per month) •Average fragment yield per run: 1.2 million reads •350 average readlength •420 average megabases •Currently, ~10% of pipeline is dedicated to HMP projects. Total Libraries per week Resource Bank Work Flow Sterile Environment for Sample Assessment Pipettes, tips, water, empty aliquot tubes are sterilized in a hood with UV Countertop in the hood is washed with DNA Away prior to introducing samples Disposable Lab Coats, Face Masks, and Gloves are worn when handling samples Filtered pipette tips are utilized Water used to generate dilutions is disposed of daily per technician QC Metrics Quantitation Method: Qubit Assay Quality Metric: Agarose Gel or Agilent Electropherogram Sample Tracking via LIMS Initial Source volume is recorded Qubit Concentration is recorded Agarose Gel and/or Agilent Trace is saved Source tubes are linked to each aliquot created All samples (stocks and aliquots) are logged into a freezer system Inventory reports can be generated to determine material remaining Sample Distribution Manual Aliquots are performed in a sterile environment in the hood Automated Aliquots are being tested for sterility utilizing the EpMotion Library Construction Work Flow Library Preparation Environment Technician work space is wiped down with bleach prior to and immediately following the completion of library construction Lab Coat, Goggles and Gloves are worn when handling samples Library reagents are aliquot specifically for one library event to prevent cross contamination Library Prep Methods 454 Library Prep: manual 125 MIDs exist for samples that need to be pooled prior to data generation Illumina Library Prep: manual or automated 12 indexes exist for samples that need to be pooled prior to data generation Library QC Metrics Initial sample verification: aliquot barcode and volume is confirmed against resource bank information Final Library Quantitaion: Qubit Assay Final Library Quality: Agilent Electropherogram Library Construction Capacity Platform Library Prep Manual Indexing (w/ qPCR) Automated Indexing (with qPCR) Small fragment Paired End (w/ qPCR) Fragment 3Kb Paired End Nick Translation Rapid Prep Prep Time (days) 3 5 4 1 3 1 Number of samples per person 6 48 4 4 2 8 Total Libraries per person per week 10 48 5 20 3 40 Library Technician Allocation 3 2 2 1 2 1 Total Library Throughput per Week 30 96 10 20 7 40 203 454 ILLUMINA Total Libraries per week Illumina Sequencing Pipeline Methods Qubit Assessments Agilent Assessments Manual Aliquot Generation Number of samples per person per day 100 48 60 Total Samples per Week 500 240 300 Resource Bank Technician Allocation 1 0.5 Total Sample Assessment per Week 500 150 Poster designed by Kerri Ochoa The Genome Center has a robust system for variant validation that can be extended from single nucleotide variants (SNV), to small indels (~<100bp), to larger structural variations (large indels, translocations, and inversions). Several methods are available, and chosen on the basis of scale, turnaround times, or other project specific guidelines. The methods are outlined briefly below. Targeting Method PCR: Specific targeting, scaleable from 1 to 1000’s, applicable to all sequencing platforms, cost effective at small scale (<1000 sites), and rapid turnaround (~5 days from site definition to oligo in hand) Capture: Very efficient design and capture efficiencies, cost effective in large scale (>1000 sites), applicable to next generation sequencing platforms Sequencing Method 3730: Advantages - Rapid turnaround (~1 day template to seq), Inexpensive at small scale Disadvantages - expensive at larger scale, mixed alleles in single sequence 454: Advantages – 2-3 day turnaround, Cost effective at large scale, digital (clonal) and deep read counts for calculation of variant allele frequencies, orthogonal sequencing to the common Illumina discovery platform Disadvantages – Ambiguities in mononucleotide runs, Cost per sequences still higher than Illumina, products are pooled so normalization is necessary Illumina: Advantages- Extremely cost effective at large scale, digital (clonal) and deep read counts for calculation of variant allele frequencies Disadvantages – Concatenation and shearing necessary in library construction, not orthogonal sequence validation of common discovery Illumina platform Variant Validation Analysis

Upload: albert-james

Post on 01-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction Over the last two years, the production teams at The Genome Center (GC) have been working to design flexible pipelines that are able to handle

IntroductionOver the last two years, the production teams at The Genome Center (GC) have been working to design flexible pipelines that are able to handle a multitude of diverse projects. With the advent of next generation sequencing technologies that allow rapid and low cost data generation, many more samples are needed to fill the production pipelines. The cost of data generation has fallen to a point where new and creative project designs can be considered that were not feasible in the past. Because of this greater degree of freedom with project design, the pipeline needed to be designed to handle projects with a large numbers of samples needing focused data generation as well as projects with only a few samples needing genome-wide coverage.

The first step in this process was to develop a long term plan that involved every aspect of The Genome Center. The senior leadership from informatics, technology development, analysis and production meet weekly to monitor the pipelines, discuss new projects and decide how best to take advantage of new technologies. Every project we accept first undergoes a thorough review to make sure that we can meet the expectations of the collaborators in a cost-effective and timely manner. We then strive to work with collaborators to explain the approach and attendant timelines.

The incoming sample management group is the entry point to the pipeline and was designed to handle large numbers of samples. Software tools allow the efficient import of barcoded or non-barcoded samples into the laboratory information management system database (LIMS). Project tracking and sample progress monitoring tools facilitate communication of progress with each collaborator. Within Sample Management, the resource bank houses all of the samples brought into the center, provides samples for QC, and manages distribution to the appropriate group when project production work begins. In addition, all of the lab and analytical processes associated with each sample are catalogued in a new “work order” system developed by the LIMS group.

The downstream production groups such as library construction and the data generation groups (3730, 454 and Illumina) work from uniform, SOP-formatted protocols and LIMS tools. These groups process large numbers of samples, using automation to provide more uniform results, and to minimize sample processing mistakes and contamination. The protocols also allow for pooling of samples which reduces the cost of the project by utilizing next generation sequencing platforms in a more cost-effective manner.

After data generation, projects enter the specified analysis pipeline(s). The Genome Center offers a wide range of analysis capabilities, including single nucleotide, in/del variant and structural variation detection. We have high-throughput validation schemes available, as well as replication capabilities.

Project Management Work Flow The Estimates group is contacted before a project begins. The sample submission sheet is sent and completed by collaborator or internal contact.

Samples arrive at The Genome Center with the completed sample submission sheet, and are delivered to project management.

Project Management Team checks for correct paperwork including: Sample submission sheet, signed estimate, IRB, billing contact and account based upon the funding source.

A project heading is communicated to the laboratory information management system database, and the project is created.

Work order is created with information from the submission sheet, the samples are barcoded, checked into freezer, and work order sent to appropriate groups.

Resource bank group claims samples, quantitates, and creates aliquots.

Samples sent to library construction, and are then forwarded to appropriate sequencing pipeline after library construction is complete.

PROJECT MANAGEMENT, DATA GENERATION, AND ANALYSIS AT THE GENOME CENTER

Elaine Mardis, Lucinda Fulton, and The Genome Center Production GroupThe Genome Center, Washington University School of Medicine, St. Louis, Missouri 63108, USA

Number of GAIIx’s = 50 Number of HiSeqs = 3

101 Cycle Paired End Run

Avg. Gb/ run 34

Reads/ run 170 M PF

Run Time 10 days

Avg. Error rate R1 2.07

Avg. Error rate R2 1.55

Sample received from Lib Core

Titration Enrichment Experiment (TERM) emulsification and

amplification

TERM Recovery &Enrichment

Large Volume (LV) Emulsion emulsification

and amplification

LV Recovery &Enrichment

Sequencing Run Preparation

Sequencing Run To Analysis

7 hour process 4 hour

process

8 hour process

4 hour process

2 hour process

9 hour process

454 Sequencing Pipeline

454 Sequencing Capacity •8 total 454 instruments

•120 Titanium runs per month (50.4 Gb per month)

•Average fragment yield per run: 1.2 million reads

•350 average readlength

•420 average megabases

•Currently, ~10% of pipeline is dedicated to HMP projects.

Total Libraries per week

Resource Bank Work Flow• Sterile Environment for Sample Assessment

– Pipettes, tips, water, empty aliquot tubes are sterilized in a hood with UV– Countertop in the hood is washed with DNA Away prior to introducing samples– Disposable Lab Coats, Face Masks, and Gloves are worn when handling samples– Filtered pipette tips are utilized– Water used to generate dilutions is disposed of daily per technician

• QC Metrics– Quantitation Method: Qubit Assay– Quality Metric: Agarose Gel or Agilent Electropherogram

• Sample Tracking via LIMS– Initial Source volume is recorded– Qubit Concentration is recorded– Agarose Gel and/or Agilent Trace is saved– Source tubes are linked to each aliquot created– All samples (stocks and aliquots) are logged into a freezer system– Inventory reports can be generated to determine material remaining

• Sample Distribution– Manual Aliquots are performed in a sterile environment in the hood– Automated Aliquots are being tested for sterility utilizing the EpMotion

Library Construction Work Flow• Library Preparation Environment

– Technician work space is wiped down with bleach prior to and immediately following the completion of library construction– Lab Coat, Goggles and Gloves are worn when handling samples– Library reagents are aliquot specifically for one library event to prevent cross contamination

• Library Prep Methods– 454 Library Prep: manual

• 125 MIDs exist for samples that need to be pooled prior to data generation– Illumina Library Prep: manual or automated

• 12 indexes exist for samples that need to be pooled prior to data generation• Library QC Metrics

– Initial sample verification: aliquot barcode and volume is confirmed against resource bank information – Final Library Quantitaion: Qubit Assay– Final Library Quality: Agilent Electropherogram

Library Construction CapacityPlatform

Library Prep

Manual Indexing

(w/ qPCR)

Automated Indexing

(with qPCR)

Small fragment

Paired End (w/ qPCR) Fragment

3Kb Paired End Nick

Translation Rapid Prep

Prep Time (days) 3 5 4 1 3 1

Number of samples per

person 6 48 4 4 2 8Total Libraries per person per

week 10 48 5 20 3 40

Library Technician Allocation 3 2 2 1 2 1

Total Library Throughput per Week 30 96 10 20 7 40 203

454ILLUMINA

Total Libraries per week

Illumina Sequencing Pipeline

Methods Qubit AssessmentsAgilent

AssessmentsManual Aliquot

Generation

Number of samples per person per day

100 48 60

Total Samples per Week

500 240 300

Resource Bank Technician Allocation

1 0.5

Total Sample Assessment per

Week500 150

Poster designed by Kerri Ochoa

The Genome Center has a robust system for variant validation that can be extended from single nucleotide variants (SNV), to small indels (~<100bp), to larger structural variations (large indels, translocations, and inversions). Several methods are available, and chosen on the basis of scale, turnaround times, or other project specific guidelines. The methods are outlined briefly below.

Targeting Method

PCR: Specific targeting, scaleable from 1 to 1000’s, applicable to all sequencing platforms, cost effective at small scale (<1000 sites), and rapid turnaround (~5 days from site definition to oligo in hand)

Capture: Very efficient design and capture efficiencies, cost effective in large scale (>1000 sites), applicable to next generation sequencing platforms

Sequencing Method

3730: Advantages - Rapid turnaround (~1 day template to seq), Inexpensive at small scale

Disadvantages - expensive at larger scale, mixed alleles in single sequence

454: Advantages – 2-3 day turnaround, Cost effective at large scale, digital (clonal) and deep read counts for calculation of variant allele frequencies, orthogonal sequencing to the common Illumina discovery platform

Disadvantages – Ambiguities in mononucleotide runs, Cost per sequences still higher than Illumina, products are pooled so normalization is necessary

Illumina: Advantages- Extremely cost effective at large scale, digital (clonal) and deep read counts for calculation of variant allele frequencies

Disadvantages – Concatenation and shearing necessary in library construction, not orthogonal sequence validation of common discovery Illumina platform

Variant Validation

Analysis