csu idrc next generation sequencing core genomic sequencing services
DESCRIPTION
CSU IDRC Next Generation Sequencing Core Genomic Sequencing Services. Semiconductor DNA Sequencing. Ion Proton. Ion Torrent. “Sequencing on a Chip”. Semiconductor Sequencing in a Nutshell. “It’s a computational pH meter”. Metagenomics. Environmental samples of communities of organisms - PowerPoint PPT PresentationTRANSCRIPT
CSU IDRC Next Generation Sequencing CoreGenomic Sequencing Services
Semiconductor DNA Sequencing
Ion Proton Ion Torrent
“Sequencing on a Chip”
Semiconductor Sequencing in a Nutshell
“It’s a computational pH meter”
Metagenomics
• Environmental samples of communities of organisms• water, soil samples• human & animal microbiomes• mine tailings, oil spills• deep sea, polar ice• etc. etc.
Metagenomics Pipeline
CSU Cray supercomputer;Oak Ridge Titan supercomputer
Torrent/Protonsequencers Megan
NCBI nucleotide databases
Metagenomics Tools
Ion Proton Sequencer• In: Sample DNA• Out: 50M DNA fragments
NCBI nucleotide database• DNA fragments• 15M+ records
Do the math:• 50M * 15M = 1014 queries
mpiBLAST• Highly parallelized Blast algorithm• NGS sample DNA• Query NCBI DB
CSU Cray XT6m• 2,016 CPU cores
Metagenomics
• Dr. Toni Piaggio, National Wildlife Research Center, Fort Collins• Florida Everglades water samples (4)• “What species are in the water?”
• CSU NextGen Sequencing Core: Ion Proton; 2 weeks• CSU Cray: 1,000 cores, 24-hours, 4 runs; 1 week • Results
Metagenomics
• Rarefaction curves• Estimate species richness• Asymptotic? • Find rare species
Computational Resources
Oak Ridge Titan Cray XK7 Supercomputer• 300K CPU cores; 50M GPU cores • mpiBlast• NCBI nucleotide DB• Query 100% of sample DNA
CSU Cray XT6m Supercomputer• 2,016 CPU cores• mpiBlast• NCBI nucleotide DB• Query 1% of sample DNA
Strong scaling
Summary
Big Data Issues
• Semiconductor sequencer data
• Large-scale database queries
• High-performance computing