csu idrc next generation sequencing core genomic sequencing services

10
CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Upload: betsy

Post on 14-Jan-2016

60 views

Category:

Documents


0 download

DESCRIPTION

CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services. Semiconductor DNA Sequencing. Ion Proton. Ion Torrent. “Sequencing on a Chip”. Semiconductor Sequencing in a Nutshell. “It’s a computational pH meter”. Metagenomics. Environmental samples of communities of organisms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

CSU IDRC Next Generation Sequencing CoreGenomic Sequencing Services

Page 2: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Semiconductor DNA Sequencing

Ion Proton Ion Torrent

“Sequencing on a Chip”

Page 3: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Semiconductor Sequencing in a Nutshell

“It’s a computational pH meter”

Page 4: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Metagenomics

• Environmental samples of communities of organisms• water, soil samples• human & animal microbiomes• mine tailings, oil spills• deep sea, polar ice• etc. etc.

Page 5: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Metagenomics Pipeline

CSU Cray supercomputer;Oak Ridge Titan supercomputer

Torrent/Protonsequencers Megan

NCBI nucleotide databases

Page 6: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Metagenomics Tools

Ion Proton Sequencer• In: Sample DNA• Out: 50M DNA fragments

NCBI nucleotide database• DNA fragments• 15M+ records

Do the math:• 50M * 15M = 1014 queries

mpiBLAST• Highly parallelized Blast algorithm• NGS sample DNA• Query NCBI DB

CSU Cray XT6m• 2,016 CPU cores

Page 7: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Metagenomics

• Dr. Toni Piaggio, National Wildlife Research Center, Fort Collins• Florida Everglades water samples (4)• “What species are in the water?”

• CSU NextGen Sequencing Core: Ion Proton; 2 weeks• CSU Cray: 1,000 cores, 24-hours, 4 runs; 1 week • Results

Page 8: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Metagenomics

• Rarefaction curves• Estimate species richness• Asymptotic? • Find rare species

Page 9: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Computational Resources

Oak Ridge Titan Cray XK7 Supercomputer• 300K CPU cores; 50M GPU cores • mpiBlast• NCBI nucleotide DB• Query 100% of sample DNA

CSU Cray XT6m Supercomputer• 2,016 CPU cores• mpiBlast• NCBI nucleotide DB• Query 1% of sample DNA

Strong scaling

Page 10: CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services

Summary

Big Data Issues

• Semiconductor sequencer data

• Large-scale database queries

• High-performance computing