what perl can do for you or what you can do with perl paul boddie biotechnology centre of oslo

119
What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo MBV 3070 May 2009 (Originally given by Ian Donaldson, April 2007) http://donaldson.uio.no/wiki/MBV3070

Upload: liesel

Post on 21-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo MBV 3070 May 2009 (Originally given by Ian Donaldson, April 2007) ‏ http://donaldson.uio.no/wiki/MBV3070. Much of the material in this lecture is from the - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

What Perl can do for youOr

What you can do with Perl

Paul BoddieBiotechnology Centre of Oslo

MBV 3070May 2009

(Originally given by Ian Donaldson, April 2007)

http://donaldson.uio.no/wiki/MBV3070

Page 2: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Much of the material in this lecture is from the “Lecture to Programming – Exercises” developed forthe Canadian Bioinformatics Workshops by

Sohrab ShahSanja RogicBoris SteipeWill Hsiao

And released under a Creative Commons license

Page 3: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

http://creativecommons.org/licenses/by-sa/2.5/

Page 4: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

This lecture is a crash course in the Perl programming language and roughly covers the same material as the perlintro at http://perldoc.perl.org.

It is meant to complement the lab work that follows the “Beginning Perl” tutorial at http://www.perl.org/books/beginning-perl/

Page 5: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

This lecture

Why program?Why Perl?What is Perl?Getting Started With PerlBasic SyntaxPerl variable typesConditional and looping constructsBuilt-in operators and functionsFiles and I/ORegular ExpressionsSubroutines

Page 6: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Steps to create and run a Perl program

• Create a directory called “my_perl”> mkdir my_perl

• Change into “perl_one” directory> cd my_perl

• Open file “myscript1.pl” in text editor> notepad myscript1.plx

• Write and save your program• Change the permissions (if you are on a

unix platform)• Run it!

> perl myscript1.plx

This starts a program called

notepad

This starts a program called

perl

Page 7: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Parts of a Perl Program

#!/usr/bin/perl

#This is a very simple program that prints

#“My first Perl program” to the screen

#Print “My first Perl program” followed by a new

#line to the screen

print(“My first Perl program\n”);

Path to the Perl interpreter

Comments get ignored by the

interpreter

Statements are the instructions of your program

All statements must be syntactically

correct(eg semi-colon)

Page 8: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Parts of a Perl program• Interpreter line

– This tells the operating system where the Perl interpreter is sitting

– Every Perl program must have this line

• Comments– Lines that begin with ‘#’ that are ignored by the

interpreter– Used to document the program– Extremely important in deciphering what the statements in

the program are supposed to do

• Statements– Instructions of the program– Follow a strict syntax that must be learned by the

programmer– Ill-formatted statements will prevent your program from

running

Page 9: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Basic Perl syntax

print "Hello, world";

#this is a comment

print “Hello world”

;

Page 10: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 1

• Task– print something to the screen

• Concepts– structure of a program– perl interpreter line– comments– statements

Page 11: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

#!/usr/bin/perl

use strict;

use warnings;

print "My first Perl program\n";

#try single quotes

print "First line\nsecond line and there is a tab\there\n";

Syntax Exercise

1

Page 12: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Common errors

• ‘Command not found’– Is perl is your $PATH ?

• ‘Can't find string terminator '"' anywhere before EOF’ – Did you close the quotation marks?

• Did you remember the semi-colon?

Page 13: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

How does Perl represent data?

• Most data we will encounter is the form of numbers or strings (text)– These are known as scalars

• Numbers– Integers, real, binary...

• Strings– Any text including special characters

enclosed in single or double quotes

Page 14: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Numbers7

3.14159

-0.00004

3.04E15 Perl supports scientific notation

Page 15: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Strings

“This is a string”

‘This is also a string’

“ACGACTACGACTAGCATCAGCATCAG”

Page 16: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Special characters for strings• Strings can contain special characters for things like

tab and newline• These characters need to be escaped with a backslash

– \t means tab– \n means newline

• These characters do not get interpreted inside single quotes

“Put a tab\t here”“There is a newline \n here”

No tab because \t will not be interpreted inside single

quotes

‘Is there a \t tab here?’

Page 17: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Variables: how to store data in memory

• A variable is a ‘container’ that lets you store data• Every variable has a name that acts as a label on the

data it is storing– Variable names begin with a dollar sign followed by

an alphanumeric name that may contain underscores• First character after the dollar sign must be a

letter•$x, $this_variable, $DNA, $DNA2

• To store information in a variable, use the assignment operator ‘=‘

Page 18: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Variables in action

$myNumber = 7;

7myNumber

Assignment operator

Page 19: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo
Page 20: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Using arithmetic operators

$sum = 3 + 4;

$difference = 4 – 3;

$product = 4 * 3;

$quotient = 4 / 3;

$power = 4**3;

$mod = 4 % 3;

$total = (4+3)*4-8; Precedence follows rules of

arithmetic

Page 21: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Increment/Decrement

• Other handy operators on integer variables$var = 1;

$var++; # $var now equals 2

$var--; # $var now equals 1

Page 22: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Variable interpolation

• Substituting the name of a variable for its value in a double-quoted string

• Useful for printing out the value of a variable

$myVariable = 7;print “The value of my variable is: $myVariable\n”;

The value of my variable is: 7Variables only

get interpolated inside double-quoted strings

Page 23: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Variable interpolation

• What about variables inside single-quoted strings?

$myVariable = 7;print ‘The value of my variable is: $myVariable\n’;

The value of my variable is: $myVariable

Variables do not get

interpolated inside single-quoted strings

Page 24: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 2

• Task– use numerical operators– print interpolated variables to the

screen

• Concepts– using numerical operators– variable interpolation

Page 25: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 2 - Functions

• print ITEM1, ITEM2, ITEM3, ... – Prints a list of variables, numbers,

strings to the screen– The list might only contain one item

Page 26: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

#!/usr/bin/perluse strict;use warnings;

#assign values to variables $x and $y and print them outmy $x = 4;my $y = 5.7;print "x is $x and y is $y\n";

#example of arithmetic expressionmy $z = $x + $y**2;$x++;print "x is $x and z is $z\n";

#evaluating arithmetic expression within print commandprint "add 3 to $z: $z + 3\n"; #did it work?print "add 3 to $z:", $z + 3,"\n";

Syntax Exercise

2

Page 27: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Variables can also store strings

“ACGT”

myString

$myString = “ACGT”;

Page 28: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

The value of a variable can be changed

“ACGT” “ACGU”

myString

$myString = “ACGT”;$myString = “ACGU”;

Variables are so named because they can take on

different values.

Page 29: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

String operators

• Strings, like numbers also have operators

• Concatenation is done with ‘.’$exon1 = “ATGTGG”;

$exon2 = “TTGTGA”;

$mRNA = $exon1 . $exon2;

print “The 2 exon transcript is: $mRNA \n”;

The 2 exon transcript is: ATGTGGTTGTGA

The dot operator stitches strings to

the left and right of it together into a

new string

Page 30: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

String operators

• How do I find the length of my string?• Perl provides a useful function called...length

$DNA = “ACGTG”;

$seqLength = length ($DNA);

print “my sequence is $seqLength residues long\n”;

my sequence is 5 residues long

We’ll go into functions in more

detail later on

Page 31: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

String operators• How do I extract a substring (part

of) of a string?• Use the substr function$longDNA = “ACGACTAGCATGCATCGACTACGACTACGATCAGCATCGACTACGTTTTAGCATCA”

$shortDNA = substr ($longDNA, 0, 10);

print “The first 10 residues of my sequence are: $shortDNA\n”;

The first 10 residues of my sequence are: ACGACTAGCA

String from which to extract

the substring

Start from this position – Perl starts counting

from 0

Length of the

substring

Page 32: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

String operators• How do you find the position of a given

substring in a string?– Find the position of the first ambiguous “N”

residue in a DNA sequence

• This can be done with the index function

$DNA = “AGGNCCT”;

$position = index ($DNA, “N”);

print “N found at position: $position\n”;

N found at position: 3

The string to search

The substring to search for

Perl starts counting from 0

index returns -1 if substring is not

found

Page 33: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 3

• Task– concatenate two strings together and

calculate the new string’s length– extract a substring from the new

string

• Concepts– string operators and functions

Page 34: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Functions in Exercise 3

• STRING1 . STRING2– concatenate two strings

• length (VARIABLENAME)– returns the length of a string

• substr (VARIABLENAME, START, LENGTH)– returns a substring from

VARIABLENAME starting at position START with length LENGTH

Page 35: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Variables - recap

• Variables are containers to store data in the memory of your program

• Variables have a name and a value• The value of a variable can change• In Perl, variables starts with a ‘$’

and values are given to it with the ‘=‘ operator

Page 36: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

I/O 2: Inputting data into variables

• So far we have ‘hard coded’ our data in our programs • Sometimes we want to get input from the user• This is done with the ‘<>’ operator

– <STDIN> can be used interchangeably

• This allows your program to interact with the user and store data typed in on the keyboard

$input = <>;print “You typed $input\n”;

Page 37: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

I/O 2: Inputting data into variables

• Extra newline character?• When using <>, the new line

character is included in the input• You can remove it by calling

chomp$input = <STDIN>;chomp ($input);print “$input was typed\n”;

Page 38: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

#!/usr/bin/perluse strict;use warnings;

#TASK: Concatenate two given sequences, #find the length of the new sequence and#print out the second codon of the sequence

#assign strings to variablesmy $DNA = "GATTACACAT";my $polyA = "AAAA";

#concatenate two stringsmy $modifiedDNA = $DNA.$polyA;

#calculate the length of $modifiedDNA and#print out the value of the variable and its lengthmy $DNAlength = length($modifiedDNA);print "Modified DNA: $modifiedDNA has length $DNAlength\n";

#extract the second codon in $modifiedDNAmy $codon = substr($modifiedDNA,3,3);print "Second codon is $codon\n";

Syntax Exercise

3

Page 39: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 4

• Task– get a users name and age and print

their age in days to the screen

• Concept– getting input from the user

Page 40: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 4

• <> – get input from the keyboard

• chomp (VARIABLENAME)– removes the newline character from

the end of a string

Page 41: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Syntax Exercise

4

#!/usr/bin/perluse strict;use warnings;

#TASK: Ask the user for her name and age#and calculate her age in days#get a string from the keyboardprint "Please enter your name\n";my $name = <STDIN>;#getting rid of the new line character#try leaving this line outchomp($name); #prompt the user for his/her age#get a number from the keyboardprint "$name please enter your age\n";my $age = <>;chomp($age);#calculate age in daysmy $age_in_days = $age*365;print "You are $age_in_days days old\n";

Page 42: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Arrays

• Arrays are designed to store more than one item in a single variable

• An ordered list of data (1, 2, 3, 4, 5)(“Mon”, “Tues”, “Wed”, “Thurs”, “Fri”)($gene1, $gene2, …)

• Array variables start with ‘@’@myNumbers = (1, 2, 3, 4, 5);@myGenes = (“gene1”, “gene2”, “gene3”);

Page 43: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo
Page 44: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Array operators – [ ]• Accessing the individual elements

of an array is done with the square brackets operator – [ ]

@myNumbers = (“one”,”two”,”three”,”four”);

$element = $myNumbers[0];

print “The first element is: $element\n”;

The first element is: one

When using the [] operator to access an element, the ‘@’ changes to a ‘$’ because elements are scalars

When using the [] operator, the index of the element you want goes in the brackets

Page 45: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Array functions - scalar

• How do I know how many elements are in my array?

• use the scalar function

@myNumbers = (“one”,”two”,”three”,”four”);

$numElements = scalar (@myNumbers);

print “There are $numElements elements in my array\n”;

There are 4 elements in my array

Page 46: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Array functions - reverse

• You can reverse the order of the elements in an array with reverse

@myNumbers = (“one”,”two”,”three”,”four”);

@backwards = reverse(@myNumbers);

print “@backwards\n”;

four three two one

Page 47: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 5

• Task– create an array and access its

elements

• Concepts– declaring an array– square bracket operator– array functions– array variable interpolation

Page 48: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 5

• $ARRAYNAME[ INDEX]– accesses individual element of the

array indexed by INDEX• remember indexes start from 0

– precede ARRAYNAME with ‘$’

• scalar (ARRAYNAME)– returns the number of elements in the

array

Page 49: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Arrays - recap

• Arrays are used to store lists of data• Array names begin with ‘@’• Individual elements are indexed and

can be accessed with the square bracket operator– use ‘$’ for the variable name!

• Numerous functions exist to manipulate data in arrays

Page 50: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Syntax Exercise

5

#!/usr/bin/perluse strict;use warnings;

#initialize an arraymy @bases = ("A","C","G","T");

#print two elements of the arrayprint $bases[0],$bases[2],"\n";

#print the whole arrayprint @bases,"\n"; #try with double quotes

#print the number of elements in the arrayprint scalar(@bases),"\n";

Page 51: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Control structures

statement 1

statement 2

statement 3

end

statement 1

?

statement 2 statement 3

end

true false

Programs do not necessarily run linearly

Page 52: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Conditional expressions

• Perl allows us to program conditional paths through the program– Based on whether a condition is

true, do this or that• Is condition true? If so then do

{ conditional expression }• Most often conditional expressions

are mediated by comparing two numbers or two strings with comparison operators….

Page 53: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo
Page 54: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo
Page 55: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

What is true?

• Everything except– 0– “”

• empty string

• Comparison operators evaluate to 0 or 1

Page 56: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Conditional expressions - if

• In English: – “If the light is red,

then apply the brake.”

• In Perl:if ($light eq “red”) { $brake = “yes”;}

if (CONDITION) { STATEMENT1; STATEMENT2; STATEMENT3; ...}

This is known as a conditional block. If the

condition is true, execute all statements enclosed by

the curly braces

Page 57: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Conditional expressions – if, else

• In English: – “If the light is red, then apply the brake. Otherwise

cross carefully.”

• In Perl:if ($light eq “red”) { $brake = “yes”;} else { $cross = “yes”;}

light eq “red”?$brake = “yes” $cross = “yes”

This statement only gets executed if ($light eq “red”) is

true

This statement only gets executed if ($light eq “red”) is

false

true false

Page 58: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Conditional expressions - while

• What if we need to repeat a block of statements ?

• This can be done with a while loop

light eq “red”?$brake = “yes”;

$light = check_light();

$gas = “yes”

true

false

$light = check_light();while ($light eq “red”) { $brake = “yes”; $light = check_light();}$gas = “yes”;

Page 59: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Conditional expressions - while

• while loop structure

while (CONDITION) { STATEMENT1; STATEMENT2; STATEMENT3; ...}

This is known as a conditional block. While

the condition is true, execute all statements enclosed by the curly

braces

Page 60: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Conditional expressions - while

• Watch out for infinite loops!

$light = check_light();while ($light eq “red”) { $brake = “yes”; $light = “red”;}$cross = “yes”;

light eq “red”?$brake = “yes”;

$light = “red”;

$cross = “yes”

true

false

Spot the logic error!

Page 61: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

• This loop is better

Conditional expressions - while

$light = check_light();while ($light eq “red”) { $brake = “yes”; $light = check_light();}$gas = “yes”;

light eq “red”?$brake = “yes”;

$light = check_light();

$gas = “yes”

true

false

Page 62: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 6

• Task– count the number of G’s in a DNA

sequence

• Concepts– Conditional expressions– If statement– While loop

Page 63: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 6

Is current position < length?

Get the base at the current position

output

true

false

Is base “G”?

add to the count

Advance the current position

true

false

Page 64: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Syntax Exercise

6

#!/usr/bin/perluse strict;use warnings;

#TASK: Count the frequency of base G#in a given DNA sequence

my $DNA = "GATTACACAT";

#initialize $countG and $currentPosmy $countG = 0;my $currentPos = 0;

#calculate the length of $DNAmy $DNAlength = length($DNA);

#for each letter in the sequence check if it is the base G#if 'yes' increment $countGwhile($currentPos < $DNAlength){

my $base = substr($DNA,$currentPos,1);if($base eq "G"){

$countG++;}$currentPos++;

} #end of while loop

#print out the number of Gsprint "There are $countG G bases\n";

Page 65: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Loops - for

• To repeat statements a fixed number of times, use a for loop

for (INITIAL ASSIGNMENT; CONDITION; INCREMENT) { STATEMENT1; STATEMENT2; STATEMENT3; ...}

for ($i = 1; $i < 11; $i++) { print $i, “\n”; }

This code prints the numbers 1-10 with each number on a

new line

Page 66: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Loops - foreach

• foreach loops are used to iterate over an array– access each element of the array once

@array = (2,4,6,8);foreach $element (@array) { $element++; print $element, “\n”;}

3579

Page 67: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 7

• Task– Initialize an array and print out its

elements

• Concepts– for loops– foreach loops

Page 68: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Syntax Exercise

7#!/usr/bin/perluse strict;use warnings;my @array;#initialize a 20-element array with numbers 0,...19for(my $i=0;$i<20;$i++){

$array[$i] = $i;}

#print elements one-by-one using foreachforeach my $element (@array){

print "$element\n";}

Page 69: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Control structures - recap

• Control structures help your program make decisions about what to do based on conditional expressions

• Can construct conditional expressions using comparison operators for numbers and strings

• If – do something if the condition is true (otherwise do something else)

• While – do something while the condition is true• For – do something a fixed number of times• Foreach – do something for every element of an array

Page 70: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

End of part 1

Go through the above exerciseson your own and then read through Chapters 1-2

….here’s how

Page 71: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

1.Click on START then Run

2.Type “command”…this opens a command line window.

3.Type “notepad myscript1.plx”…this opens a notepad document called myscript1.plx

Page 72: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo
Page 73: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

7. You should see a message indicating that some version of perl is installed:

Page 74: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

8. Now type:M:\> perl myscript1.plxAnd you should see…

Page 75: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

9. Go back through the exercises and enter them by hand into new files repeating the above steps.

10. Feel free to play around; alter the example code, make mistakes and see what happens.

Page 76: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

1. If you finish the exercises, open a web browser and go to http://learn.perl.org/library/beginning_perl/3145_Chap01.pdf

Follow the tutorial from page 24. There’s lots more detail here than in the lecture that is worth going over.

Page 77: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Regular Expressions• Regular expressions are used to define

patterns you wish to search for in strings• Use a syntax with rules and operators

– Can create extremely sophisticated patterns• Numbers, letters, case insensitivity, repetition,

anchoring, zero or one, white space, tabs, newlines, etc....

– Patterns are deterministic but can be made extremely specific or extremely general

– Test for match, replace, select

Page 78: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Using REGEX

• =~ is the operator we use with REGEX

• =~ is combined with utility operators to match, replace$DNA = “AGATGATAT”;

if ($DNA =~ m/ATG/) {

print “Match!”;

}

=~ pattern match comparison

operator

The pattern is a set of characters

between //

Matching leaves the string

unchanged

Page 79: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

REGEX - Substitution• You can substitute the parts of a

string that match a regular expression with another string

$DNA = “AGATGATAT”;

$DNA =~ s/T/U/g;

print $DNA, “\n”;

AGAUGAUAU

Substitution operator

Pattern to search for

Replacement string

Global replacementSubstitution

changes the variable

Page 80: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

REGEX - Translation• You can translate a string by

exchanging one set of characters for another set of characters

$DNA = “AGATGATAT”;

$DNA =~ tr/ACGT/TGCA/;

print $DNA, “\n”;

Translation operator

Set of characters to replace

Replacement characters

Translation changes the

variable

TCTACTATA

Page 81: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 8

• Task– transcription and reverse complement

a DNA sequence

• Concepts– Simple regular expressions using s

and tr

Page 82: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 8 - Functions

• reverse(STRING)– Function that reverses a string

• STRING =~ s/PATTERN/REPLACEMENT/modifiers– This is the substitute operator

• STRING =~ tr/CHARS/REPLACEMENT CHARS/– This is the translation operator

Page 83: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Syntax Exercise

8

#!/usr/bin/perluse strict;use warnings;

#TASK: For a given DNA sequence find its RNA transcript,#find its reverse complement and check if#the reverse complement contains a start codon

my $DNA = "GATTACACAT";

#transcribe DNA to RNA - T changes to Umy $RNA = $DNA;$RNA =~ s/T/U/g;print "RNA sequence is $RNA\n";

#find the reverse complement of $DNA using substitution operator#first - reverse the sequencemy $rcDNA = reverse($DNA);

$rcDNA =~ s/T/A/g;$rcDNA =~ s/A/T/g;$rcDNA =~ s/G/C/g;$rcDNA =~ s/C/G/g;

print "Reverse complement of $DNA is $rcDNA\n"; #did it work?

#find the reverse complement of $DNA using translation operator#first - reverse the sequence$rcDNA = reverse($DNA);#then - complement the sequence$rcDNA =~ tr/ACGT/TGCA/;#then - print the reverse complementprint "Reverse complement of $DNA is $rcDNA\n";

#look for a start codon in the reverse sequenceif($rcDNA =~ /ATG/){

print "Start codon found\n";}else{

print "Start codon not found\n";

Page 84: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

REGEX - recap

• REGEX are used to find patterns in text

• The syntax must be learned in order to be exploited

• Extremely powerful for processing and manipulating text

Page 85: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Functions• Functions (sub-routines) are like small programs

inside your program• Like programs, functions execute a series of

statements that process input to produce some desired output

• Functions help to organize your program – parcel it into named functional units that can be

called repeatedly• There are literally hundreds of functions built-in to

Perl• You can make your own functions

Page 86: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

What happens when you call a function?

$DNA = “ACATAATCAT”;

$rcDNA = reverse($DNA);

$rcDNA =~ tr/ACGT/TGCA/;

sub reverse { # process input

# return output}

Page 87: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Calling a function

• Input is passed to a function by way of an ordered parameter list

$result = function_name (parameter list);Basic syntax of calling a function

$longDNA = "ACGACTAGCATGCATCGACTACGACTACGATCAGCATCGACT"

$shortDNA = substr ($longDNA, 0, 10);

String from

which to extract

the substring

Start from this

position

Length of the substring

Page 88: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Useful string functions in Perl• chomp(STRING) OR chomp(ARRAY) –

– Uses the value of the $/ special variable to remove endings from STRING or each element of ARRAY. The line ending is only removed if it matches the current value of $/.

• chop(STRING) OR chop(ARRAY)– Removes the last character from a string or the last character from every element in an array. The last character chopped is

returned.

• index(STRING, SUBSTRING, POSITION) – Returns the position of the first occurrence of SUBSTRING in STRING at or after POSITION. If you don't specify POSITION, the search

starts at the beginning of STRING.

• join(STRING, ARRAY) – Returns a string that consists of all of the elements of ARRAY joined together by STRING. For instance, join(">>", ("AA", "BB", "cc"))

returns "AA>>BB>>cc".

• lc(STRING)– Returns a string with every letter of STRING in lowercase. For instance, lc("ABCD") returns "abcd".

• lcfirst(STRING)– Returns a string with the first letter of STRING in lowercase. For instance, lcfirst("ABCD") returns "aBCD".

• length(STRING)– Returns the length of STRING.

• split(PATTERN, STRING, LIMIT)– Breaks up a string based on some delimiter. In an array context, it returns a list of the things that were found. In a scalar context, it

returns the number of things found.

• substr(STRING, OFFSET, LENGTH)– Returns a portion of STRING as determined by the OFFSET and LENGTH parameters. If LENGTH is not specified, then everything

from OFFSET to the end of STRING is returned. A negative OFFSET can be used to start from the right side of STRING.

• uc(STRING)– Returns a string with every letter of STRING in uppercase. For instance, uc("abcd") returns "ABCD".

• ucfirst(STRING)– Returns a string with the first letter of STRING in uppercase. For instance, ucfirst("abcd") returns "Abcd".

source: http://www.cs.cf.ac.uk/Dave/PERL/

Page 89: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Useful array functions in Perl

• pop(ARRAY) – Returns the last value of an array. It also reduces the size of the array by one.

• push(ARRAY1, ARRAY2)– Appends the contents of ARRAY2 to ARRAY1. This increases the size of ARRAY1 as needed.

• reverse(ARRAY) – Reverses the elements of a given array when used in an array context. When used in a scalar

context, the array is converted to a string, and the string is reversed.

• scalar(ARRAY) – Evaluates the array in a scalar context and returns the number of elements in the array.

• shift(ARRAY) – Returns the first value of an array. It also reduces the size of the array by one.

• sort(ARRAY) – Returns a list containing the elements of ARRAY in sorted order. See next Chapter 8on

References for more information.

• split(PATTERN, STRING, LIMIT)– Breaks up a string based on some delimiter. In an array context, it returns a list of the things

that were found. In a scalar context, it returns the number of things found.

source: http://www.cs.cf.ac.uk/Dave/PERL/

Page 90: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

String functions - split• ‘splits’ a string into an array based on a

delimiter• excellent for processing tab or comma

delimited files$line = “MacDonald,Old,The farm,Some city,BC,E1E 1O1”;

($lastname, $firstname, $address, $city, $province, $postalcode) = split (/,/, $line);

LAST NAME: MacDonald FIRST NAME: OldADDRESS: The FarmCITY: Some cityPROVINCE: BCPOSTAL CODE: E1E 1O1

print (“LAST NAME: “, $lastname, “\n”, “FIRST NAME: “, $firstname, “\n”, “ADDRESS: “, $address, “\n”, “CITY: “, $city, “\n”, “PROVINCE: “, $province, “\n”, “POSTAL CODE: “, $postalcode, “\n”);

REGEX goes here

String goes here

Page 91: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Array functions - sort

• You can sort the elements in your array with ‘sort’

@myNumbers = ("one","two","three","four");

@sorted = sort(@myNumbers);

print “@sorted\n”;

four one three two

sorts alphabeticall

y

Page 92: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Making your own function

sub function_name {

(my $param1, my $param2, ...) = @_;

# do something with the parameters

my $result = ...

return $result;

}

This is the function name. Use this name to ‘call’ the function

from within your program

What is this? This is an array that gets created automatically

to hold the parameter list.

‘sub’ tells the interpreter you are declaring a function

What is the word ‘my’ doing here? ‘my’ is a variable

qualifier that makes it local to the function. Without it, the

variable is available anywhere in the program. It is good

practice to use ‘my’ throughout your programs – more on this

tomorrow.

‘return’ tells the interpreter to go back to the place in the

program that called this function. When followed by scalars or variables, these values are passed back to

where the function was called. This is the output of the

function

Page 93: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Making your own function - example

sub mean {

my @values = @_;

my $numValues = scalar @values;

my $mean;

foreach my $element (@values) {

my $sum = $sum + $element;

}

$mean = $mean / $numValues;

return $mean;

}

Function definition

local variables to be used inside the function

do the work!

return the answer

$avg = mean(1,2,3,4,5);

Page 94: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 9

• Task– Create a function to reverse

complement a DNA sequence

• Concepts– creating and calling functions

Page 95: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

#!/usr/bin/perluse strict;use warnings;

#TASK: Make a subroutine that calculates#the reverse#complement of a DNA sequence and call it#from the main program

#body of the main program with the function callmy $DNA = "GATTACACAT";my $rcDNA = revcomp($DNA); print "$rcDNA\n";exit;#definition of the function for reverse complementsub revcomp{

my($DNAin) = @_;my $DNAout = reverse($DNAin);$DNAout =~ tr/ACGT/TGCA/;return $DNAout;

}

Syntax Exercise

9

Page 96: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Functions - recap• A function packages up a set of

statements to perform a given task• Functions take a parameter list as

input and return some output• Perl has hundreds of functions

built-in that you should familiarise yourself with– Keep a good book, or URL handy at all

times• You can (and should!) make your

own functions

Page 97: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

I/O 3: Filehandles

• So far, we have only considered input data that we created inside our program, or captured from the user

• Most data of any size is stored on your computer as files– Fasta files for sequences, tab-delimited

files for gene expression, General Feature Format for annotations, XML files for literature records, etc...

• Filehandles allow us to access data contained in files from our programs

Page 98: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Filehandles• Filehandles have 3 major operations

– open– read one line at a time with ‘<>’– close

• Filehandles are special variables that don’t use ‘$’– by convention Filehandle names are ALLCAPS

• The default filehandle for input is STDIN• The default filehandle for output is STDOUT

Page 99: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Filehandles - example

seq1 ACGACTAGCATCAGCAT 47.0seq2 AAAAATGATCGACTATATAGCATA 25.0seq3 AAAGGTGCATCAGCATGG 50.0

gcfile.txt

Page 100: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Filehandles - example

open (FILE, “gcfile.txt“);

while (<FILE>) { $line = $_; chomp ($line);

($id,$seq,$gc) = split (/\t/,$line);

print ("ID: " , $id, "\n", "SEQ: ", $seq, "\n", "GC: ", $gc, "\n");}

close (FILE);

opens the filehandle

reads the next line of the file into the automatic $_ variable

removes the newline character from $line

splits the line on \t

prints out the data in a nice format

closes the filehandle

Page 101: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 10

• Task– read a DNA sequence from a file and

reverse complement it

• Concepts– reading and processing data from a

file

Page 102: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

#!/usr/bin/perluse strict;use warnings;

#TASK: Read DNA sequences from ‘DNAseq’ input file – #there is one sequence per line#For each sequence find the reverse complement and #print it to ‘DNAseqRC’ output file

#open input and output filesopen(IN,"DNAseq.txt");open(OUT,">DNAseqRC.txt");

#read the input file line-by-line#for each line find the reverse complement#print it in the output filewhile(<IN>){

chomp;my $rcDNA = revcomp($_);print OUT "$rcDNA\n";

}

#close input and output filesclose(IN);close(OUT);exit();

#definition of the function for reverse complementsub revcomp{

my($DNAin) = @_;my $DNAout = reverse($DNAin);$DNAout =~ tr/ACGT/TGCA/;return $DNAout;

}

Syntax Exercise

10

Page 103: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Process management

• Perl allows you to make ‘system calls’ inside your program– Executing Unix commands/programs from

your code

• Useful for ‘wrapping’ other programs inside a Perl program– Automated executions of a program with

variable arguments

• Create pipelines of programs where output from one program is used as input to a subsequent program

Create Fasta sequence file

Format sequence with formatdb

Blast against the new database

Page 104: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

system() and backticks• There are two common ways of

executing system calls• Use the function system()

– does not capture the output

• Use the backticks operators– captures the output

$command = “dir”;

system($command);

$listing = `$command`;

Executes ‘dir‘ as though you typed it at the prompt. Prints the output to the

screen

Executes ‘ls –l‘ and stores the output in $linting. Does not print the output to the

screen

Page 105: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Exercise 11

• Task– List the files in your working directory

• Concepts– Backticks operator to execute a Unix

command and capture and process its output

Page 106: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

#!/usr/bin/perluse strict;use warnings;

#TASK: Print a list of all Perl programs we did so far.#These files can be found in your current directory and #they start with the word ‘program’

print "List of programs we made today:\n";

#system call for 'ls' function - the result goes into a stringmy $listing = `dir`; #these are back quotes

#split the string to get individual filesmy @files = split(/\n/,$listing);

#use foreach to step through the array#if a file contains word 'program' print it outforeach my $file (@files){

if($file =~ /program/){ #change this to reflect how you named files

print "$file\n";}

}

Syntax Exercise

11

Page 107: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Putting it all together

• Write a program that reads a DNA sequence from a file, calculates its GC content, finds its reverse complement and writes the reverse complement to a file

• Concepts– covers most of the topics so far

Page 108: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Programming tips

• Think hard and plan before even starting your program

• Save and run often!• Include lots of comments• Use meaningful variable and function names

– self-documenting code• Keep it simple

– If its complex, break the problem into simple, manageable parts

• Use functions• Avoid hard-coding values – use variables!• Make your code modifiable, modular, legible

Page 109: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Putting Perl in Context

• You have learned about the following concepts so far…

• Variables (scalars, arrays and hashes)• Operators (. , = + - * / )• Functions (also known as subroutines)• Conditional control structures (if elsif)• Looping control structures (while for)• User input/output• File input/output (I/O)

Page 110: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Putting Perl in Context

• The good news is that, all programming languages share these concepts, so you are in a good position to learn other programming languages like…

• Python• Java• C• C++• Java

Page 111: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Putting Perl in Context

• For now, stick to Perl and learn it well before you move on to other languages.

• But you might ask how does Perl compare to these other languages and why did I choose this as an introductory language?

Page 112: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Putting Perl in Context

• Perl is good for things like:• Quickly writing small scripts (less than 200 lines) to

– Automate repetitive analyses– Parse results obtained from other programs (i.e. regular expressions)– PERL stands for Practical Extraction Report language

• Pipelining programs (i.e. stringing together multiple programs where the input of one program becomes the input of another). Remember we covered system calls and the back-tick (`) operator

• Retrieving and working with biological data (see BioPerl or NCBI’s NetEntrez interface)

• Making and working with a small local database (not covered in this course…see MySQL, SQL and the Perl DBI.pm module)

• Fast proto-typing of code

• Making simple interactive web-pages (although I’d suggest PHP for that)

Page 113: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Putting Perl in Context

• Perl is NOT so good for things like:

• Writing larger software systems. Consider moving to Python or Java for larger projects.

• Fast Performance and heavy-duty number crunching. Consider C or C++.

• Choosing the right language, involves realizing that computer programming problems belong to a spectrum of complexity and it’s important to choose the right tool for the job.

Page 114: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

JAVA

C/C++

Increasing Complexity and Speed requirements

Perl

Page 115: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

Putting Perl in Context

• Perl is an interpreted language

Perl code

Machine codeC code

Byte code

CPU

Machine code

Run time

Compile time Run time

Java code Byte code Machine code

Run time

Page 116: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo

End of Lecture 1

Page 117: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo
Page 118: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo
Page 119: What Perl can do for you Or What you can do with Perl Paul Boddie Biotechnology Centre of Oslo