h mishima - biogem, ruby ucsc api, and bioruby

30
Biogem, Ruby UCSC API, and BioRuby royuki Mishima (Nagasaki Universit oul J.P. Bonnal, Naohisa Goto, ancesco Strozzi, Toshiaki Katayama otr Prins

Upload: jan-aerts

Post on 11-Jun-2015

1.359 views

Category:

Technology


1 download

DESCRIPTION

Presentation by H Mishima at BOSC2012: Biogem, Ruby UCSC API, and BioRuby

TRANSCRIPT

Page 1: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Biogem,

Ruby UCSC API,

and BioRubyHiroyuki Mishima (Nagasaki University),Raoul J.P. Bonnal, Naohisa Goto,Francesco Strozzi, Toshiaki Katayama,Pjotr Prins

Page 2: H Mishima - Biogem, Ruby UCSC API, and BioRuby

BioRuby•a bioinformatics library for the Ruby language

•>11 years - project since Nov. 21, 2000

Page 3: H Mishima - Biogem, Ruby UCSC API, and BioRuby

BioRuby

is an open-source project

BUT, I HAVE A QUESTION...

Page 4: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Are open source projects truly open?

Page 5: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Aspects of the word ‘OPEN’

•OPEN for redistribution

•OPEN for source code access

•OPEN for contribution

Page 6: H Mishima - Biogem, Ruby UCSC API, and BioRuby

CENTRALIZED APPROACH• Pros

–QC for stability and consistency–easy to apply coding standard–enables extensive tests and documentation

• Cons–heavy burden on release managers– longer process, sparser release– lack of cutting-edge features

Page 7: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Two ways to participate in BioRuby development

1. Be a committer1. be a trusted contributor in the community2. get an open-bio.org account3. be a CSV/SVN committer

2. Send patches to (busy) core-members1. wait for patch evaluation2. wait for next release of BioRuby

Page 8: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Two ways to participate in BioRuby development

1. Be a committer1. be a trusted contributor in the community2. get an open-bio.org account3. be a CSV/SVN committer

2. Send patches to (busy) core-members1. wait for patch evaluation2. wait for next release of BioRuby

BARRIERSTO ENTRY

Page 9: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Lower the barrier to entry!

Page 10: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Actions of BioRuby •more OPEN for source code access

•more OPEN for contribution

Page 11: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Social Coding Using GitHub

In 2010, the BioRuby project source repository moved to GitHub

ACTION 1

Page 12: H Mishima - Biogem, Ruby UCSC API, and BioRuby

• Users can fork the code freely.• Users still have to wait for

acceptance of pull-requests to get their code incorporated into the official repository.

Page 13: H Mishima - Biogem, Ruby UCSC API, and BioRuby

ACTION 2

Plug-in system - BioGem

Page 14: H Mishima - Biogem, Ruby UCSC API, and BioRuby

DECENTRALIZED APPROACH• Enables expanding BioRuby without

tweaking its stable core• plug-ins are maintained by their authors• encourage ‘best practice’ using a tool

(biogem command)– Standard directory structure– version control using Git– Using the RubyGems packaging system– testing and documentation

Page 15: H Mishima - Biogem, Ruby UCSC API, and BioRuby

The Biogems workflow

Page 16: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Biogems.infoBiogems.info – a portal site for Biogem users

rank in total downloads (rank up&down)citation, current version,day of final release, links to source code,status of Travis continuous integration

highly motivating (me)

Page 17: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Database /web-service APIbio ucsc apiintermineeutilssequenceservergorubybio ensembl

Wrapperbio samtoolsbio loggerbio bwabio signalpbio sgebio exportpredbio tabix

Applicationscaffoldergenfragbio isoelectric pointbio phytabio tm hmmdna sequence alignerbio gagbio kmer counter

File Parserbio gff3bio assemblybio blastxmlparserbio fasterbio alignmentbio nexmlbio kb illuminabio octopusbio affybio dbsnobio rdfbio hmmer modelbio hmmer3 reportbio pileup iteratorbio phyloxml

Visualizationbio graphics

Frameworkbio ngs

Toolboxbio genomic intervalbio bigbiobio hellobio plasmoapbio cnls screenscraperbio data bio aliphatic indexbio hydropathybio gngm

Biogem Examplebio hello

Biogem Collectionbio core

more than 60 Biogems...

Page 18: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Database /web-service APIbio ucsc apiintermineeutilssequenceservergorubybio ensembl

Wrapperbio samtoolsbio loggerbio bwabio signalpbio sgebio exportpredbio tabix

Applicationscaffoldergenfragbio isoelectric pointbio phytabio tm hmmdna sequence alignerbio gagbio kmer counter

File Parserbio gff3bio assemblybio blastxmlparserbio fasterbio alignmentbio nexmlbio kb illuminabio octopusbio affybio dbsnpbio rdfbio hmmer modelbio hmmer3 reportbio pileup iteratorbio phyloxml

Visualizationbio graphics

Frameworkbio ngs

Toolboxbio genomic intervalbio bigbiobio hellobio plasmoapbio cnls screenscraperbio data bio aliphatic indexbio hydropathybio gngm

Biogem Examplebio hello

Biogem Collectionbio core

Database Access-relatedNext Generation Sequencing-related

Page 19: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Hiro Mishima• NOT a core

developer of BioRuby

• not a computer scientist but a dentist

• semi-dry biologist• human geneticist

BioGem is lowering barriers to entry

Page 20: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Ruby UCSC API

Page 21: H Mishima - Biogem, Ruby UCSC API, and BioRuby

>40,000tables!

Page 22: H Mishima - Biogem, Ruby UCSC API, and BioRuby

$ gem install bio-ucsc-api

How to get started

EASY!

22

Page 23: H Mishima - Biogem, Ruby UCSC API, and BioRuby

require 'bio-ucsc‘Bio::Ucsc::Hg19.connectresult = Bio::Ucsc::Hg19::Snp131. find_by_name("rs56289060")puts result.chrom # => "chr1"

23

A query written in fluent interface.

Page 24: H Mishima - Biogem, Ruby UCSC API, and BioRuby

region = "chr17:7,579,614-7,579,700"condition = Bio::Ucsc::Hg19::Snp131. with_interval(region).select(:name)puts condition.to_sql

24

SQL made easy

SELECT name FROM `snp131`WHERE (chrom = 'chr17' AND bin in (642,80,9,1,0) AND ( (chromStart BETWEEN 7579613 AND 7579700) OR (chromEnd BETWEEN 7579613 AND 7579700) OR (chromStart <= 7579613 AND chromEND >= 7579700) ));

Page 25: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Details of Ruby UCSC API:

Please find poster

presentations BOSC2012 #15ISMB2012 #I06

Page 26: H Mishima - Biogem, Ruby UCSC API, and BioRuby

FUTURE DIRECTION of BioGem• Still QC by peer-review is important.

–ensures stability and quality of codes and documents

–educates plug-in authors• R/Bioconductor has excellent peer-

review system–good coding style and well-formatted

document–requires huge human resources and

efforts

Page 27: H Mishima - Biogem, Ruby UCSC API, and BioRuby

• recommended collections•Bio-Core (Raoul J.P. Bonnal)

• loose/casual peer-review• need to draw up guidelines for

designing “good” biogems

Solutions would be…

Page 28: H Mishima - Biogem, Ruby UCSC API, and BioRuby

Common challenge among Bio* projects:

Balance between lowering barrier to entry and keeping higher quality

Page 29: H Mishima - Biogem, Ruby UCSC API, and BioRuby

ACKNOWLEDGMENTS• All BioRuby contributors• Ruby UCSC API

– Jan Aerts• The BioRuby Panel

– Raoul Bonnal– Naohisa Goto– Francesco Strozzi– Toshiaki Katayama– Pjotr Prins

• Dept. of Human Genetics, Nagasaki Univ.– Koh-ichiro Yoshiura

• Google Summer of Code students• O|B|F – Open Bioinformatics Foundation

Page 30: H Mishima - Biogem, Ruby UCSC API, and BioRuby

QUESTION?

or mishima_eng