computer programming for biologists class 8 nov 28 th, 2014 karsten hokamp

24
Computer Programming for Biologists Class 8 Nov 28 th , 2014 Karsten Hokamp tp://bioinf.gen.tcd.ie/GE3M25/programm

Upload: aubrey-parsons

Post on 30-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Class 8

Nov 28th, 2014

Karsten Hokamp

http://bioinf.gen.tcd.ie/GE3M25/programming

Page 2: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Revision

Subroutines

Overview

Page 3: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

my %seq = (); # initialisation

$freq{$char} = 0; # storing a value

$freq{$char}++; # changing a value

my $aa = $code{$codon}; # extracting

foreach my $header (sort keys %seq) {

my $seq = $seq{$header}; …

}

Revision - Hashes

Page 4: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Hash Variables

Scalars vs Hash

my $A = 0;

A0

my $C = 0;

C

0

my $G = 0;

G

0

my $T = 0;

T

0

Initialisation of values:my %frequency = ();

Page 5: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Hash Variables

if ($char eq 'A') {$A++;

} elsif ($char eq 'C') {$C++

} elsif ($char eq 'G') {$G++;

} elsif ($char eq 'T') {$T++;

}

Scalars vs Hash

C

1

%frequency

C

1

Increment:

$frequency{$char}++;

Page 6: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Hash Variables

Scalars vs Hash

G

T

C

A

9

%frequency

A5

C

9

G

7

T

5

Page 7: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Hash Variables

Scalars vs Hash

G

T

C

A

9

%frequency

print "Frequency of A: $A"\n;print "Frequency of C: $C"\n;print "Frequency of G: $G"\n;print "Frequency of T: $T"\n;

A5

C

9

G

7

T

5

foreach my $char (keys %frequency) {print "Frequency of $char: $frequency{$char}\n";

}

Output:

Page 8: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

write your own functions

run "programs" within a program

Subroutines

Page 9: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Definition:sub name_of_routine {

# optional arguments in @_, e.g.my ($arg1, $arg2) = @_;

# specify statementsstatement1;statement2;…

# optionally return scalar or list, e.g.return $result1, $result2;

}

Subroutines

special array with arguments to

subroutine

Page 10: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Subroutines

&: (optional) symbol indicating subroutine

Usage:

name_of_routine;

or

$rv = &name_of_routine();

or

@results = &name_of_routine($arg1, $arg2);

(optionally) capture return value(s)

(optionally) submitlist of arguments

Page 11: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Subroutines

Example:

my $dna = shift;my $rev_comp = &reverse_complement($dna);print "reverse complement:\n".&format($rev_comp, 60);

# sub routines:sub reverse_complement {

my $out = reverse shift @_;$out =~ tr/acgtACGT/tgcaTGCA/;return $out;

}sub format {

my ($sequence, $width) = @_;…

Page 12: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Subroutines

Example:

my $dna = shift;my $rev_comp = &reverse_complement($dna);print "reverse complement:\n".&format($rev_comp, 60);

# sub routines:sub reverse_complement {

my $out = reverse shift @_;$out =~ tr/acgtACGT/tgcaTGCA/;return $out;

}sub format {

my ($sequence, $width) = @_;…

A copy of $dna is passed on

Main area stays tidy and

Details hidden towards end of

script

Code is re-usable, can be applied multiple times

Page 13: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

• Can be placed anywhere in the program

• Normally all subroutines located after main block of text

• Definition starts with 'sub' followed by name

• Statements enclosed in curly brackets

• Text normally written indented

• Optionally provide arguments

• Optionally return values

• Can be nested

Subroutines

Page 14: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Scenario:

Read in DNA sequence

Translate in all six reading frames

6 x translation of a sequence

Subroutines

Page 15: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Inefficient coding:# frame 1:$sequence = $orig_seq;# Block of translation code, e.g.$prot = '';while ($sequence) {

$codon = substr $sequence, 0, 3, '';$aa = $genetic_code{$codon};$prot .= $aa;

}print "translation: $prot\n";

Subroutines

Page 16: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Inefficient coding:# frame 1:$sequence = $orig_seq;# Block of translation code, e.g.$prot = '';while ($sequence) {

$codon = substr $sequence, 0, 3, '';$aa = $genetic_code{$codon};$prot .= $aa;

}print "translation: $prot\n";

# frame 2:# remove first basesubstr $sequence, 0, 1, ''

Subroutines

Page 17: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Inefficient coding:# frame 1:$sequence = $orig_seq;Block oftranslation code

# frame 2:# remove first basesubstr $sequence, 0, 1, '';Block of translation code

# frame 3:…# frame -1:…

Subroutines

the same block of code specified 6 times

Page 18: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

More efficient coding:# frame 1:$sequence = $orig_seq;&translate($sequence);# frame 2:# remove first basesubstr $sequence, 0, 1, ''&translate($sequence);# frame 3:…# frame -1:…sub translate {

$input = shift;…print "translation: $prot\n";

}

Subroutines

6 times use of subroutine

1 specification oftranslation code

Page 19: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Alternative:# frame 1:$sequence = $orig_seq;print &translate($sequence), "\n";# frame 2:# remove first basesubstr $sequence, 0, 1, ''print &translate($sequence), "\n";# frame 3:…# frame -1:…sub translate {

$input = shift;…return $protein;

}

Subroutines

print return value

return translatedsequence

Page 20: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Other uses – recursion:

# calculate factorial value for a given number:

$fv = &fact(10);

print "factorial 10 is $fv\n";

sub fact {

my $val = shift;

$fact = 1;

if ($val > 1) {

$fact = $val * &fact($val-1);

}

return $fact;

}

Subroutines

call subroutine within itself

Page 21: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Other uses – recursion:

$val = 10;

$fact = $val * &fact($val-1);

$fact = 10 * fact(9);

$fact = 10 * 9 * fact(8);

$fact = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * fact(1);

$fact = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1;

Subroutines

Page 22: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

reduce programming effort

improve flow

increase clarity

enable recursion

Subroutines

Page 23: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Extend your sequence analysis tool:

-add translation into protein

as subroutine into your script

e-mail me at [email protected]

with questions or problems

Exercises

Page 24: Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp

Computer Programming for Biologists

Mock exam!

Next week