biopython project update (bosc 2012)
DESCRIPTION
Highlights of the Biopython project for computational biology, 2011-2012: Artemis-like genome track comparison with GenomeDiagram, new formats for SeqIO, phylogenetics with Bio.Phylo, Bio.PDB improvements, and an update on Google Summer of Code (GSoC) projects.TRANSCRIPT
Project Update
Eric Talevich, Peter Cock, Brad Chapman, João Rodrigues,
and Biopython contributors
Bioinformatics Open Source Conference (BOSC)July 14, 2012
Long Beach, California, USA
Hello, BOSC
Biopython is a freely available Python library for biological computation, and a long-running, distributed collaboration to produce and maintain it [1].● Supported by the Open Bioinformatics Foundation
(OBF)● "This is Python's Bio* library. There are several Bio*
libraries like it, but this one is ours."● http://biopython.org/_____[1] Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093/bioinformatics/btp163
Bio.Graphics (Biopython 1.59, February 2012)
New features in...BasicChromosome:
● Draw simple sub-features on chromosome segments● Show the position of genes, SNPs or other loci
GenomeDiagram [2]:● Cross-links between tracks● Track-specific start/end positions for showing regions
_____[2] Pritchard, L., White, J.A., Birch, P.R., Toth, I. (2010) GenomeDiagram: a python package for the visualization of large-scale genomic data. Bioinformatics 2(5) 616-7.doi:10.1093/bioinformatics/btk021
BasicChromosome: Potato NB-LRRs
Jupe et al. (2012) BMC Genomics
GenomeDiagram:A tale of three phages
Swanson et al. (2012) PLoS One (to appear)
GenomeDiagram imitatesArtemis Comparison Tool (ACT)
SeqIO and AlignIO(Biopython 1.58, August 2011)
● SeqXML format [3]
● Read support for ABI chromatogram files (Wibowo A.)
● "phylip-relaxed" format (Connor McCoy, Brandon I.)○ Relaxes the 10-character limit on taxon names○ Space-delimited instead○ Used in RAxML, PhyML, PAML, etc.
_____[3] Schmitt et al. (2011) SeqXML and OrthoXML: standards for sequence and orthology information. Briefings in Bioinformatics 12(5): 485-488. doi:10.1093/bib/bbr025
Bio.Phylo & pypaml
● PAML interop: wrappers, I/O, glue○ Merged Brandon Invergo’s pypaml as
Bio.Phylo.PAML (Biopython 1.58, August 2011)
● Phylo.draw improvements
● RAxML wrapper (Biopython 1.60, June 2012)
● Paper in review [4]
_____[4] Talevich, E., Invergo, B.M., Cock, P.J.A., Chapman, B.A. (2012) Bio.Phylo: a unified toolkit for processing, analysis and visualization of phylogenetic data in Biopython. BMC Bioinformatics 13:209. doi:10.1186/1471-2105-13-209
Phylo.draw and matplotlib
Bio.bgzf (Blocked GNU Zip Format)
● BGZF is a GZIP variant that compresses blocks of a fixed, known size
● Used in Next Generation Sequencing for efficient random access to compressed files○ SAM + BGZF = BAM
Bio.SeqIO can now index BGZF compressed sequence files. (Biopython 1.60, June 2012)
TogoWS(Biopython 1.59, February 2012)
● TogoWS is an integrated web resource for bioinformatics databases and services
● Provided by the Database Center for Life Science in Japan
● Usage is similar to NCBI Entrez
_____http://togows.dbcls.jp/
PyPy and Python 3
Biopython:● works well on PyPy 1.9
(excluding NumPy & C extensions)● works on Python 3 (excluding some C
extensions), but concerns remain about performance in default unicode mode.○ Currently 'beta' level support.
Bio.PDB
● mmCIF parser restored (Biopython 1.60, June 2012)○ Lenna Peterson fixed a 4-year-old lex/yacc-related
compilation issue○ That was awesome○ Now she's a GSoC student○ Py3/PyPy/Jython compatibility in progress
● Merging GSoC results incrementally○ Atom element names & weights (João Rodrigues,
GSoC 2010)○ Lots of feature branches remaining...
Bio.PDB feature branches
'10 '11 '12 ...
GSOC
mmCIF Parser
Bio.Struct
InterfaceAnalysis
Mocapy++Generic Features
PDBParser
Google Summer of Code (GSoC)
In 2011, Biopython had three projects funded via the OBF:● Mikael Trellet (Bio.PDB)● Michele Silva (Bio.PDB, Mocapy++)● Justinas Daugmaudis (Mocapy++)
In 2012, we have two projects via the OBF:● Wibowo Arindrarto: (SearchIO)● Lenna Peterson: (Variants)
_____http://biopython.org/wiki/Google_Summer_of_Codehttp://www.open-bio.org/wiki/Google_Summer_of_Codehttps://www.google-melange.com/
GSoC 2011: Mikael Trellet
Biomolecular interfaces in Bio.PDBMentor: João Rodrigues
● Representation of protein-protein interfaces: SM(I)CRA
● Determining interfaces from PDB coordinates● Analyses of these objects
_____http://biopython.org/wiki/GSoC2011_mtrellet
GSoC 2011: Michele Silva
Python/Biopython bindings for Mocapy++Mentor: Thomas Hamelryck
Michele Silva wrote a Python bridge for Mocapy++ and linked it to Bio.PDB to enable statistical analysis of protein structures.
More-or-less ready to merge after the next Mocapy++ release._____http://biopython.org/wiki/GSOC2011_Mocapy
Mocapy extensions in PythonMentor: Thomas Hamelryck
Enhance Mocapy++ in a complementary way, developing a plugin system for Mocapy++ allowing users to easily write new nodes (probability distribution functions) in Python.
He's finishing this as part of his master's thesis project with Thomas Hamelryck._____http://biopython.org/wiki/GSOC2011_MocapyExt
GSoC 2011: Justinas Daugmaudis
GSoC 2012: Lenna Peterson
Diff My DNA: Development of a Genomic Variant Toolkit for BiopythonMentors: Brad Chapman, James Casbon
● I/O for VCF, GVF formats● internal schema for variant data
_____http://arklenna.tumblr.com/tagged/gsoc2012
GSoC 2012: Wibowo Arindrarto
SearchIO implementation in BiopythonMentor: Peter Cock
Unified, BioPerl-like API for search results from BLAST, HMMer, FASTA, etc.
_____http://biopython.org/wiki/SearchIOhttp://bow.web.id/blog/tag/gsoc/
Thanks
● OBF● BOSC organizers● Biopython contributors● Scientists like you
Check us out:● Website: http://biopython.org● Code: https://github.com/biopython/biopython