talk at basespace developer conference sf 2013
TRANSCRIPT
Bioinformatics|Software|Services
NOVOALIGN BASESPACE APP
Zayed AlbertynBioinformatics Director, Novocraft technologies Sdn BhdIllumina® BaseSpace Developer Conference, San Francisco9th December 2013
Bioinformatics|Software|Services
Novocraft Technologies Sdn Bhd
• Incorporated in 2008, BioNexus Status Company
• Small team of Mathematicians, Biologists & Software Engineeers
• Develop Innovation & World Class Products
• High-Performance Computing in growing Genomics Era
• International Market & User Base
Bioinformatics|Software|Services
Products• Novoalign– Illumina, 454• NovoalignCS – SOLiD • Novosort • Cluster Solutions
– NovoalignMPI, NovoalignCSMPI• NGS WorkBench (web)• All running on standard commodity hardware
– No special GPU/supercomputer required– Mac OS & Linux versions available– Open source operating system (Linux)
• NGS Cloud computing HPC workflows – Amazon EC2/S3/EBS
Bioinformatics|Software|Services
NGS ServicesConsultation on NextGen projects
• Exome• Whole genome• SNV, Indel, Structural
Variations (SVs)• RNASeq• CHIP-Seq• Methylome• Small RNA• de-novo assembly
Automated pipelines
In-house/custom and open source software
Illumina and other platforms
Cloud Solutions-packaged AMIs,containers
Bioinformatics|Software|Services
Collaborations
• Academic/research institutes• Industry– HPC providers– Pharma– Cloud solutions
• Resellers– US and Global
Bioinformatics|Software|Services
A few of our NOVOALIGN users
Bioinformatics|Software|Services
User Examples
Bioinformatics|Software|Services
• Hash-based aligner• Peer reviewed publications: 2009-present• Accuracy– SNPs and short Indels
• Read length > 250 bp as of V3.X.X
NOVOALIGN
Bioinformatics|Software|Services ROC Curves
• True Positive vs False positive rate• Higher Y value - better at finding the
“true” result• Lower X value – better at excluding “false”
results
http://lh3lh3.users.sourceforge.net/alnROC.shtml
Bioinformatics|Software|Services
The performance of various methods for mapping reads to reference repeats.
Highnam G et al. Nucl. Acids Res. 2013;41:e32
Bioinformatics|Software|Services
The performance of various methods for mapping reads to reference repeats.
Highnam G et al. Nucl. Acids Res. 2013;41:e32
Bioinformatics|Software|Services
http://www.bioplanet.com/gcathttp://www.bioplanet.com/gcat/reports/112/variant-calls/ion-torrent-225bp-se-exome-30x/novoalign-gatk-ug/compare-183-119/group-read-depth
Genome-in-a-bottle Consortium dataset
Bioinformatics|Software|Services
http://bcbio.wordpress.comCourtesy Brad Chapman & Oliver Hoffman. HSPH
“Our standard workflow uses novoalign based on its stringency in resolving large insertions and deletions. These results suggest equally good results using bwa mem, along with improved processing times”
Bioinformatics|Software|ServicesGraphical representation of the total number of
downstream false positives expressed as a percentage...
Oliver GR. 2012 [http://f1000r.es/NMpsFc] F1000Research 2012, 1:2 (doi: 10.12688/f1000research.1-2.v2)
Bioinformatics|Software|Services
Novosort comparison on Illumina reads
Bioinformatics|Software|Services
Developing on BaseSpace
Bioinformatics|Software|Services
Motivation
• Reach out to more users• Enable seamless integration with the cloud• Establish BaseSpace Novoalign community
Bioinformatics|Software|Services
Alignment
• Alignment Quality Calibration
• Multithreaded• Adaptor
stripping
Sorting
• Novosort• Multithreaded
Variant Calling
• Freebayes• SNPs & Indels
What is the App?
Bioinformatics|Software|Services
What is the App?• Novoalign– Paired-end– Human-genome only, others later– Caveat: require min. 8Gb RAM machine
• Alignment coordinate-sorting– Novosort
• Variant Calling– Freebayes (Erik Garrison & Gabor Marth )
Bioinformatics|Software|Services
What is the App?• Novoalign– Paired-end– Human-genome only, others later– Caveat: require min. 8Gb RAM machine
• Alignment coordinate-sorting– Novosort
• Variant Calling– Freebayes (Erik Garrison & Gabor Marth )
Bioinformatics|Software|Services
What is the App?• Novoalign– Paired-end– Human-genome only, others later– Caveat: require min. 8Gb RAM machine
• Alignment coordinate-sorting– Novosort
• Variant Calling– Freebayes (Erik Garrison & Gabor Marth )
Bioinformatics|Software|Services
New-developer Challenges
• The “Docker” way of doing things– Image vs Container
• Front-end : Javascript/CSS• Basck-end: Algorithms/scripting
Bioinformatics|Software|Services
Back-end process
Front-end
process
Perl/C++/R/Python
Bioinformatics|Software|Services
Back-end Development ProcessStart the Native VM•Vmware•Linux environment
Start your own Docker Repository•Create new IMAGE on Docker.io•Done automatically on your first push
Attach to your image•Docker run …
Make small test dataset• Illumina cancer panel read•Subset chr22 alignmnents
Develop the app back-end process•Automated script runs pipeline•Alignment->sorting->variant calling
Postprocess •Charting with R•ggplot2
Bioinformatics|Software|Services
Front-end Development Process
BaseSpace Developer tools• Code editor• Preview form inputs
Initiate test runs• Send data to your
backend Native app
Build Report form• Write Liquid/Js/HTML5
Bioinformatics|Software|Services
App Screenshots
Bioinformatics|Software|Services
Bioinformatics|Software|Services
Bioinformatics|Software|Services
Bioinformatics|Software|Services
Bioinformatics|Software|Services
NovocraftLeadershipColin HercusHaniza HashimBioinformaticsAkzam SaidinKaamesh KaamahalaranAbdul Malik AhmadSoftware DevelopmentDeepa MuruganSharon ChinLaura Hamit
Acknowledgements
IlluminaRaymond TeckotzkyMayank Tyagi
VT/GeneByGeneDavid MittelmanGareth HighnamNir LiebovichJason Wang
HSPH Bioinformatics CoreOliver HoffmanBrad Chapman