h mishima - biogem, ruby ucsc api, and bioruby

Post on 11-Jun-2015

1.359 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation by H Mishima at BOSC2012: Biogem, Ruby UCSC API, and BioRuby

TRANSCRIPT

Biogem,

Ruby UCSC API,

and BioRubyHiroyuki Mishima (Nagasaki University),Raoul J.P. Bonnal, Naohisa Goto,Francesco Strozzi, Toshiaki Katayama,Pjotr Prins

BioRuby•a bioinformatics library for the Ruby language

•>11 years - project since Nov. 21, 2000

BioRuby

is an open-source project

BUT, I HAVE A QUESTION...

Are open source projects truly open?

Aspects of the word ‘OPEN’

•OPEN for redistribution

•OPEN for source code access

•OPEN for contribution

CENTRALIZED APPROACH• Pros

–QC for stability and consistency–easy to apply coding standard–enables extensive tests and documentation

• Cons–heavy burden on release managers– longer process, sparser release– lack of cutting-edge features

Two ways to participate in BioRuby development

1. Be a committer1. be a trusted contributor in the community2. get an open-bio.org account3. be a CSV/SVN committer

2. Send patches to (busy) core-members1. wait for patch evaluation2. wait for next release of BioRuby

Two ways to participate in BioRuby development

1. Be a committer1. be a trusted contributor in the community2. get an open-bio.org account3. be a CSV/SVN committer

2. Send patches to (busy) core-members1. wait for patch evaluation2. wait for next release of BioRuby

BARRIERSTO ENTRY

Lower the barrier to entry!

Actions of BioRuby •more OPEN for source code access

•more OPEN for contribution

Social Coding Using GitHub

In 2010, the BioRuby project source repository moved to GitHub

ACTION 1

• Users can fork the code freely.• Users still have to wait for

acceptance of pull-requests to get their code incorporated into the official repository.

ACTION 2

Plug-in system - BioGem

DECENTRALIZED APPROACH• Enables expanding BioRuby without

tweaking its stable core• plug-ins are maintained by their authors• encourage ‘best practice’ using a tool

(biogem command)– Standard directory structure– version control using Git– Using the RubyGems packaging system– testing and documentation

The Biogems workflow

Biogems.infoBiogems.info – a portal site for Biogem users

rank in total downloads (rank up&down)citation, current version,day of final release, links to source code,status of Travis continuous integration

highly motivating (me)

Database /web-service APIbio ucsc apiintermineeutilssequenceservergorubybio ensembl

Wrapperbio samtoolsbio loggerbio bwabio signalpbio sgebio exportpredbio tabix

Applicationscaffoldergenfragbio isoelectric pointbio phytabio tm hmmdna sequence alignerbio gagbio kmer counter

File Parserbio gff3bio assemblybio blastxmlparserbio fasterbio alignmentbio nexmlbio kb illuminabio octopusbio affybio dbsnobio rdfbio hmmer modelbio hmmer3 reportbio pileup iteratorbio phyloxml

Visualizationbio graphics

Frameworkbio ngs

Toolboxbio genomic intervalbio bigbiobio hellobio plasmoapbio cnls screenscraperbio data bio aliphatic indexbio hydropathybio gngm

Biogem Examplebio hello

Biogem Collectionbio core

more than 60 Biogems...

Database /web-service APIbio ucsc apiintermineeutilssequenceservergorubybio ensembl

Wrapperbio samtoolsbio loggerbio bwabio signalpbio sgebio exportpredbio tabix

Applicationscaffoldergenfragbio isoelectric pointbio phytabio tm hmmdna sequence alignerbio gagbio kmer counter

File Parserbio gff3bio assemblybio blastxmlparserbio fasterbio alignmentbio nexmlbio kb illuminabio octopusbio affybio dbsnpbio rdfbio hmmer modelbio hmmer3 reportbio pileup iteratorbio phyloxml

Visualizationbio graphics

Frameworkbio ngs

Toolboxbio genomic intervalbio bigbiobio hellobio plasmoapbio cnls screenscraperbio data bio aliphatic indexbio hydropathybio gngm

Biogem Examplebio hello

Biogem Collectionbio core

Database Access-relatedNext Generation Sequencing-related

Hiro Mishima• NOT a core

developer of BioRuby

• not a computer scientist but a dentist

• semi-dry biologist• human geneticist

BioGem is lowering barriers to entry

Ruby UCSC API

>40,000tables!

$ gem install bio-ucsc-api

How to get started

EASY!

22

require 'bio-ucsc‘Bio::Ucsc::Hg19.connectresult = Bio::Ucsc::Hg19::Snp131. find_by_name("rs56289060")puts result.chrom # => "chr1"

23

A query written in fluent interface.

region = "chr17:7,579,614-7,579,700"condition = Bio::Ucsc::Hg19::Snp131. with_interval(region).select(:name)puts condition.to_sql

24

SQL made easy

SELECT name FROM `snp131`WHERE (chrom = 'chr17' AND bin in (642,80,9,1,0) AND ( (chromStart BETWEEN 7579613 AND 7579700) OR (chromEnd BETWEEN 7579613 AND 7579700) OR (chromStart <= 7579613 AND chromEND >= 7579700) ));

Details of Ruby UCSC API:

Please find poster

presentations BOSC2012 #15ISMB2012 #I06

FUTURE DIRECTION of BioGem• Still QC by peer-review is important.

–ensures stability and quality of codes and documents

–educates plug-in authors• R/Bioconductor has excellent peer-

review system–good coding style and well-formatted

document–requires huge human resources and

efforts

• recommended collections•Bio-Core (Raoul J.P. Bonnal)

• loose/casual peer-review• need to draw up guidelines for

designing “good” biogems

Solutions would be…

Common challenge among Bio* projects:

Balance between lowering barrier to entry and keeping higher quality

ACKNOWLEDGMENTS• All BioRuby contributors• Ruby UCSC API

– Jan Aerts• The BioRuby Panel

– Raoul Bonnal– Naohisa Goto– Francesco Strozzi– Toshiaki Katayama– Pjotr Prins

• Dept. of Human Genetics, Nagasaki Univ.– Koh-ichiro Yoshiura

• Google Summer of Code students• O|B|F – Open Bioinformatics Foundation

QUESTION?

or mishima_eng

top related