1 why to become a pyologist perl is for plumbers – python is for biologists stefan maetschke...

27
1 why to become a why to become a Pyologist Pyologist Perl is for plumbers – Python is for biologists Stefan Maetschke Teasdale Group

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

1

why to become a Pyologistwhy to become a Pyologist

Perl is for plumbers – Python is for biologists

Stefan MaetschkeTeasdale Group

2

why

Biologists suffer for no good reason Perl is difficult to write and read Perl gives weak error feedback Perl obscures basic concepts Limited understanding of principles Low productivity Reduced research scope

Perl is for plumbers - Python is for scientists I want to have an easy life

why, why, why …

3

plumbers and others

sys admin plumbing vi awk/Perl grep/diff

SW developer designing Emacs/IDE C/C++/Java UML/Unit test

spectrum of tasks, tools and roles

scientist

Python

4

equals( , )

Cross-platform, open-source, scripting language, multi-paradigm, dynamic typing, statement ratio: 6

There should be one way There’s more than one way

Guido van Rossum Larry Wall

1991 1987

Python Perl

Easy Difficult

5

you must be joking!

http://www.strombergers.com/python/

my @list = ('a', 'b', 'c'); my %hash; $hash{‘letters'} = \@list; print "@{$hash{‘letters'}}\n";

list = ['a', 'b', 'c'] hash = {} hash[‘letters'] = list print hash[‘letters']

package Person; use strict; sub new { my $class = shift; my $age = shift or die "Must pass age"; my $rSelf = {'age' => $age}; bless ($rSelf, $class); return $rSelf; }

class Person: def __init__(self, age): self.age = age

@list = ( [‘a’, ’b’, ’c’], [1, 2, 3] );print “@{$list[0]}\n”; print “@{$list[0]}\n”;

list = [ [‘a’, ’b’, ’c’], [1, 2, 3] ]print list[0]

6

More Perl bashing…

http://www.strombergers.com/python/

sub add { $_[0] + $_[1]; }

def add(a, b): return a + b

sub add { my ($a, $b) = _@; return $a + $b; }

sub add { my $a = shift; my $b = shift; return $a + $b; }

def diff(a, b): return len(a) - len(b)

sub diff { my ($aref, $bref) = _@; my (@a) = @$aref; my (@b) = @$bref; return scalar(@a) + scalar(@b);}}

sub add($, $) { local ($a, $b) = _@; return $a + $b; }

7

complexity wall

simple scripts

≈ 100 lines=> fun stops

Higher order concepts

Data structuresFunctionsClasses

=> Python allows you to break through the complexity wall

everything you can do in Python you can do in Perl but you don’t

8

googliness

C 53,000 1,820 572 Java 7,760 2,890 320 C++ 1,290 3,100 231 C# 1,020 794 161 Perl 1,150 685 101 Python 527 798 199 Ruby 470 806 186 Scala 394 354 69 Haskell 212 323 74

X language X load file

kilo-hits, May 2008

X bioinformatics

9

and the winner is…

<- without Psyco

http://shootout.alioth.debian.org/

10

damn lies and stats

http://rengelink.textdriven.com/blog/

sourceforge projects

Perl declining, Python increasing ? May 2008, keyword search : Perl 3474, Python 4063

11

see the light…

classify Iris plants

Fisher, R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936)

http://archive.ics.uci.edu/ml/datasets/Iris

Three species: • Iris setosa• Iris versicolor • Iris virginica

Four attributes:• sepal length• sepal width• petal length• petal width

12

Iris – convert data

13

Iris – correlation

14

Iris – do stats

15

Iris – linear regression

16

Iris – plot data

17

libs for life science Scientific computing: SciPy, NumPy, matplotlib Bioinformatics: BioPython Phylogenetic trees: Mavric, Plone, P4, Newick Microarrays: SciGraph, CompClust Molecular modeling: MMTK, OpenBabel, CDK, RDKit, cinfony,

mmLib Dynamic systems modeling: PyDSTools Protein structure visualization: PyMol, UCSF Chimera Networks/Graphs: NetworkX, PyGraphViz Symbolic math: SymPy, Sage Wrapper for C/C++ code: SWIG, Pyrex, Cython R/SPlus interface: RSPython, RPy Java interface: Jython Fortran to Python: F2PY …

Check also out: http://www.scipy.org/Topical_Softwareand: http://pypi.python.org/pypi

18

last words

Perl perfect for plumbing Python excellent for scientific programming

Easy to learn, write and maintain Suited for scripting and mid-size projects Huge number of scientific libraries

Python is an attractive alternative to Matlab/R Easy integration of Java, C/C++ or Fortran code

19

questions

Interest: Python Course?

isn’t Python lovely…

20

21

links Wikipedia – Python

http://en.wikipedia.org/wiki/Python Instant Python

http://hetland.org/writing/instant-python.html How to think like a computer scientist

http://openbookproject.net//thinkCSpy/ Dive into Python

http://www.diveintopython.org/ Python course in bioinformatics

http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html Beginning Python for bioinformatics

http://www.onlamp.com/pub/a/python/2002/10/17/biopython.html SciPy Cookbook

http://www.scipy.org/CookbookMatplotlib Cookbookhttp://www.scipy.org/Cookbook/Matplotlib

Biopython tutorial and cookbookhttp://www.bioinformatics.org/bradstuff/bp/tut/Tutorial.html

Huge collection of Python tutorialhttp://www.awaretek.com/tutorials.html

What’s wrong with Perlhttp://www.garshol.priv.no/download/text/perl.html

20 Stages of Perl to Python conversionhttp://aspn.activestate.com/ASPN/Mail/Message/python-list/1323993

Why Pythonhttp://www.linuxjournal.com/article/3882

22

some papers Bassi S. (2007)

A Primer on Python for Life Science Researchers. PLoS Comput Biol 3(11): e199. doi:10.1371/journal.pcbi.0030199

Mangalam H. (2002)The Bio* toolkits--a brief overview. Brief Bioinform. 3(3):296-302.

Fourment M., Gillings MR. (2008)A comparison of common programming languages used in bioinformatics.BMC Bioinformatics 9:82.

23

to whom it may concern

NPs who don’t use Perl yet NPs who want to see the light NPs who want to give their code away

without being rightfully ashamed Matlab aficionados

NP = Non-Programmer

24

one of ten Perl mythshttp://www.perl.com/pub/a/2000/01/10PerlMyths.html

“…Perl works the way you do…”

“…That's one, fairly natural way to think about it…”

while (<>) { s/(.*):(.*)/$2:$1/; print; }

Swap two sections of a string: “aaa:bbb” -> “bbb:aaa”

for line in file: line = line.strip() first, second = line.split(‘:’) print second+’:’+first

while (<>) { chomp; ($first, $second) = split /:/; print $second, ":", $first, "\n"; }

“…we can happily consign the idea that ‘Perl is hard’ to mythology.”

from re import subfor line in file: print sub(‘(.*):(.*)’, r’\2:\1’, line)

25

camel chaos does not scale well complex syntax cryptic commands does not encourage clear code difficult to read/maintain hard to understand the principles error prone

no check of subroutine arguments variables are global by default …

26

why Python overcome the complexity wall many, excellent scientific libraries clear, easy to learn syntax hard to do it wrong does not require prior suffering/experience

27

my bias R&D: C/C++ ->

applied ML in robotics, image processing, quality control SW Development: Java ->

Speech Processing, Data Mining Computational Biology: Java, Python Other languages I played with:

Ada, APL, Basic, MatLab, Modula, Pascal, Perl, Prolog, R, Groovy, Forth, Fortran, Scala, Assembly code