introduction to perl programming: the minimum to know! bioinformatic and comparative genome analysis...

22
Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China August 17 - August 29, 2009 Fredj Tekaia Institut Pasteur [email protected]

Upload: ava-grant

Post on 26-Mar-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

Introduction to perl programming:

the minimum to know!

Bioinformatic and Comparative Genome Analysis Course

HKU-Pasteur Research Centre - Hong Kong, China

August 17 - August 29, 2009

Bioinformatic and Comparative Genome Analysis Course

HKU-Pasteur Research Centre - Hong Kong, China

August 17 - August 29, 2009

Fredj TekaiaInstitut Pasteur

[email protected]

Page 2: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

perl

A basic program

#!/bin/perl

# Program to print a message

print 'Hello world.'; # Print a message

Page 3: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

Variables, Arrays

$val=9;

$val=“9”;

$val=“ABC transporter”;

• case sensitive: $val is different from $Val

Page 4: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

Perl uses arithmetic operators:$a = 1 + 2; # Add 1 and 2 and store in $a$a = 3 - 4; # Subtract 4 from 3 and store in $a$a = 5 * 6; # Multiply 5 and 6$a = 7 / 8; # Divide 7 by 8 to give 0.875$a = 9 ** 10; # Nine to the power of 10$a = 5 % 2; # Remainder of 5 divided by 2$a++; # Return $a and then increment it$a--; # Return $a and then decrement it

for strings perl has among others: $a = $b . $c; # Concatenate $b and $c$a = $b x $c; # $b repeated $c times

Operations and Assignment

Page 5: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

To assign values perl includes

$a = $b; # Assign $b to $a$a += $b; # Add $b to $a$a -= $b; # Subtract $b from $a$a .= $b; # Append $b onto $a

Page 6: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

Array variables

An array variable is a list of scalars (ie numbers and/or strings).

they are prefixed by: @

@SEQNAME = (”MG001", ”MG002", ”MG003");

$SEQNAME[2] (MG003)

Attention: 0, 1, 2,....

@num = (0,1,2,3);

Page 7: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

@L_CODONS = ('TTT','TTC','TTA','TTG', 'CTT','CTC','CTA','CTG', 'ATT','ATC','ATA','ATG', 'GTT','GTC','GTA','GTG',

'TCT','TCC','TCA','TCG', 'CCT','CCC','CCA','CCG', 'ACT','ACC','ACA','ACG', 'GCT','GCC','GCA','GCG',

'TAT','TAC','TAA','TAG', 'CAT','CAC','CAA','CAG', 'AAT','AAC','AAA','AAG', 'GAT','GAC','GAA','GAG',

'TGT','TGC','TGA','TGG', 'CGT','CGC','CGA','CGG', 'AGT','AGC','AGA','AGG', 'GGT','GGC','GGA','GGG');

Page 8: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

@AA = ('A','R','N','D','C','Q','E','G','H','I','L','K','M','F','P','S','T','W','Y','V','B');

@mm = ( 'a','r','n','d','c','q','e','g','h','i','l','k','m','f','p','s','t','w','y','v','b’ );

Page 9: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

Associative arrays : hash tables

Ordinary list arrays allow us to access their element by number. The first element of array @AA is $AA[0]. The second element is $AA[1], and so on.

But perl also allows us to create arrays which are accessed by string. These are called associative arrays.

array itself is prefixed by a % sign

Page 10: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

%ages = (”Michael", 39, "Angie", 27, "Willy", "21 years", "The Queen Mother", 108);

$ages{"Michael"}; # Returns 39$ages{"Angie"}; # Returns 27$ages{"Willy"}; # Returns "21 years"$ages{"The Queen Mother"}; # Returns 108

Page 11: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

File handling

#!/bin/perl

open(FILE,”GMG.pep”);

while <FILE>

{

print $_;

}

close (FILE);

a script (cat.pl) equivalent to the UNIX cat:

use: chmod a+x cat.pl ; cat.pl

Page 12: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

split

#!/bin/perl

open(FILE,”GMG.pep”);

while <FILE>

{

@tab=split(/\s+/, $_);

print $tab[0];

}

close (FILE);

A very useful function in perl: splits up a string and places it into an array.

Page 13: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

#!/bin/perl

open(FILE,”GMG.pep”);

while <FILE>

{

@tab=split(/\s+/, $_, 2);

$NOM{$tab[0]} = $tab[1];print $NOM{$tab[0]} ;}

close (FILE);

@tab=split(/\s+/,$_,n);

Page 14: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

Control structures

foreachTo go through each line of an array or other list-like structure (such as lines in a file) perl uses the foreach structure. This has the form

foreach $nom (@SEQNAME) # Visit each item in turn # and call it $nom

{print "$nom\n"; # Print the item

}

Page 15: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

foreach $j ( 0 .. 2) # Visit each value in turn # and call it $j

{print "$SEQNAM[$j]\n";# Print the item

}

foreach $j ( 0 .. $#AA) # Visit each value in turn # and call it $j

{print "$AA[$j]\n";# Print the item

}

Page 16: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

Testing

Here are some tests on numbers and strings.

$a == $b # Is $a numerically equal to $b?#Beware: Don't use the = operator.

$a != $b # Is $a numerically unequal to $b?$a eq $b # Is $a string-equal to $b?$a ne $b # Is $a string-unequal to $b?

You can also use logical and, or and not:

($a && $b) # Is $a and $b true?($a || $b) # Is either $a or $b true?!($a) # is $a false?

Page 17: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

for

for (initialise; test; inc){

first_action; second_action; etc....

}

for ($i = 0; $i < 10; ++$i) # Start with $i = 1# Do it while $i < 10#Increment $i before repeating

{print "$i\n";

}

Page 18: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

Conditionals

if ($a){

print "The string is not empty\n";}else{

print "The string is empty\n";}

#!/bin/perl

open(FILE,”GMG.pep”);

while <FILE>

{print $_ if ( m/>/ );}

close (FILE);

Page 19: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

String matching

$a eq $b # Is $a string-equal to $b?$a ne $b # Is $a string-unequal to $b?

Here are some special RE characters and their meaning

. # Any single character except a newline^ # The beginning of the line or string$ # The end of the line or string* # Zero or more of the last character+ # One or more of the last character? # Zero or one of the last character

Page 20: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

\n # A newline\t # A tab\w # Any alphanumeric (word) character.

# The same as [a-zA-Z0-9_]\W # Any non-word character.

# The same as [^a-zA-Z0-9_]\d # Any digit. The same as [0-9]\D # Any non-digit. The same as [^0-9]\s # Any whitespace character: space,

# tab, newline, etc\S # Any non-whitespace character\b # A word boundary, outside [] only\B # No word boundary

Some more special characters

Page 21: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

Characters like $, |, [, ), \, / and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to preceed it by a backslash (\). So:

\| # Vertical bar\[ # An open square bracket\) # A closing parenthesis\* # An asterisk\^ # A carat symbol\/ # A slash\\ # A backslash

Page 22: Introduction to perl programming: the minimum to know! Bioinformatic and Comparative Genome Analysis Course HKU-Pasteur Research Centre - Hong Kong, China

Substitution and translation

s/london/London/

$sentence =~ s/london/London/

global substitution; i option (for "ignore case").

s/london/London/gi

Translation

$sentence =~ tr/abc/edf/tr/a-z/A-Z/; #converts $_ to upper casetr/A-Z/a-z/; #converts $_ to lower case