h mishima - biogem, ruby ucsc api, and bioruby
DESCRIPTION
Presentation by H Mishima at BOSC2012: Biogem, Ruby UCSC API, and BioRubyTRANSCRIPT
Biogem,
Ruby UCSC API,
and BioRubyHiroyuki Mishima (Nagasaki University),Raoul J.P. Bonnal, Naohisa Goto,Francesco Strozzi, Toshiaki Katayama,Pjotr Prins
BioRuby•a bioinformatics library for the Ruby language
•>11 years - project since Nov. 21, 2000
BioRuby
is an open-source project
BUT, I HAVE A QUESTION...
Are open source projects truly open?
Aspects of the word ‘OPEN’
•OPEN for redistribution
•OPEN for source code access
•OPEN for contribution
CENTRALIZED APPROACH• Pros
–QC for stability and consistency–easy to apply coding standard–enables extensive tests and documentation
• Cons–heavy burden on release managers– longer process, sparser release– lack of cutting-edge features
Two ways to participate in BioRuby development
1. Be a committer1. be a trusted contributor in the community2. get an open-bio.org account3. be a CSV/SVN committer
2. Send patches to (busy) core-members1. wait for patch evaluation2. wait for next release of BioRuby
Two ways to participate in BioRuby development
1. Be a committer1. be a trusted contributor in the community2. get an open-bio.org account3. be a CSV/SVN committer
2. Send patches to (busy) core-members1. wait for patch evaluation2. wait for next release of BioRuby
BARRIERSTO ENTRY
Lower the barrier to entry!
Actions of BioRuby •more OPEN for source code access
•more OPEN for contribution
Social Coding Using GitHub
In 2010, the BioRuby project source repository moved to GitHub
ACTION 1
• Users can fork the code freely.• Users still have to wait for
acceptance of pull-requests to get their code incorporated into the official repository.
ACTION 2
Plug-in system - BioGem
DECENTRALIZED APPROACH• Enables expanding BioRuby without
tweaking its stable core• plug-ins are maintained by their authors• encourage ‘best practice’ using a tool
(biogem command)– Standard directory structure– version control using Git– Using the RubyGems packaging system– testing and documentation
The Biogems workflow
Biogems.infoBiogems.info – a portal site for Biogem users
rank in total downloads (rank up&down)citation, current version,day of final release, links to source code,status of Travis continuous integration
highly motivating (me)
Database /web-service APIbio ucsc apiintermineeutilssequenceservergorubybio ensembl
Wrapperbio samtoolsbio loggerbio bwabio signalpbio sgebio exportpredbio tabix
Applicationscaffoldergenfragbio isoelectric pointbio phytabio tm hmmdna sequence alignerbio gagbio kmer counter
File Parserbio gff3bio assemblybio blastxmlparserbio fasterbio alignmentbio nexmlbio kb illuminabio octopusbio affybio dbsnobio rdfbio hmmer modelbio hmmer3 reportbio pileup iteratorbio phyloxml
Visualizationbio graphics
Frameworkbio ngs
Toolboxbio genomic intervalbio bigbiobio hellobio plasmoapbio cnls screenscraperbio data bio aliphatic indexbio hydropathybio gngm
Biogem Examplebio hello
Biogem Collectionbio core
more than 60 Biogems...
Database /web-service APIbio ucsc apiintermineeutilssequenceservergorubybio ensembl
Wrapperbio samtoolsbio loggerbio bwabio signalpbio sgebio exportpredbio tabix
Applicationscaffoldergenfragbio isoelectric pointbio phytabio tm hmmdna sequence alignerbio gagbio kmer counter
File Parserbio gff3bio assemblybio blastxmlparserbio fasterbio alignmentbio nexmlbio kb illuminabio octopusbio affybio dbsnpbio rdfbio hmmer modelbio hmmer3 reportbio pileup iteratorbio phyloxml
Visualizationbio graphics
Frameworkbio ngs
Toolboxbio genomic intervalbio bigbiobio hellobio plasmoapbio cnls screenscraperbio data bio aliphatic indexbio hydropathybio gngm
Biogem Examplebio hello
Biogem Collectionbio core
Database Access-relatedNext Generation Sequencing-related
Hiro Mishima• NOT a core
developer of BioRuby
• not a computer scientist but a dentist
• semi-dry biologist• human geneticist
BioGem is lowering barriers to entry
Ruby UCSC API
>40,000tables!
$ gem install bio-ucsc-api
How to get started
EASY!
22
require 'bio-ucsc‘Bio::Ucsc::Hg19.connectresult = Bio::Ucsc::Hg19::Snp131. find_by_name("rs56289060")puts result.chrom # => "chr1"
23
A query written in fluent interface.
region = "chr17:7,579,614-7,579,700"condition = Bio::Ucsc::Hg19::Snp131. with_interval(region).select(:name)puts condition.to_sql
24
SQL made easy
SELECT name FROM `snp131`WHERE (chrom = 'chr17' AND bin in (642,80,9,1,0) AND ( (chromStart BETWEEN 7579613 AND 7579700) OR (chromEnd BETWEEN 7579613 AND 7579700) OR (chromStart <= 7579613 AND chromEND >= 7579700) ));
Details of Ruby UCSC API:
Please find poster
presentations BOSC2012 #15ISMB2012 #I06
FUTURE DIRECTION of BioGem• Still QC by peer-review is important.
–ensures stability and quality of codes and documents
–educates plug-in authors• R/Bioconductor has excellent peer-
review system–good coding style and well-formatted
document–requires huge human resources and
efforts
• recommended collections•Bio-Core (Raoul J.P. Bonnal)
• loose/casual peer-review• need to draw up guidelines for
designing “good” biogems
Solutions would be…
Common challenge among Bio* projects:
Balance between lowering barrier to entry and keeping higher quality
ACKNOWLEDGMENTS• All BioRuby contributors• Ruby UCSC API
– Jan Aerts• The BioRuby Panel
– Raoul Bonnal– Naohisa Goto– Francesco Strozzi– Toshiaki Katayama– Pjotr Prins
• Dept. of Human Genetics, Nagasaki Univ.– Koh-ichiro Yoshiura
• Google Summer of Code students• O|B|F – Open Bioinformatics Foundation
QUESTION?
or mishima_eng