introduction to pr ogramming: perl for · pdf fileintroduction to pr ogramming: perl for...

98
Introduction to Programming: Perl for Biologists Timothy M. Kunau Center for Biomedical Research Informatics Academic Health Center University of Minnesota [email protected] Bioinformatics Summer Institute 2007

Upload: duongnguyet

Post on 10-Mar-2018

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Introduction to Programming: Perl for Biologists

Timothy M. Kunau

Center for Biomedical Research InformaticsAcademic Health CenterUniversity of [email protected]

Bioinformatics Summer Institute 2007

Page 2: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Outline

•Art and Programming

•Getting Started

•Biology and Computer Science

•Bioinformatics Data

•Perl basics:

•Strings and Variables

•Math and Logic

•Looping, operators, and functions

Page 3: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Art and Programming

• Moving from Data to Story

• Systems = Beauty

“Science is what we understand well enough

to explain to a computer. Art is all the rest.”

Donald Knuth

Page 4: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Edit -- Run -- Revise (and Save)

•As a programmer, most of your time will be spent planning, testing, and revising your program.

•Running is often incidental on today’s hardware.

•Carefully written programs can be productive tools for years.

•Programming is a method of communication: your code must be readable by both the computer and your users.

Page 5: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Errors and Debugging

• Rarely involves actual insects.

• If the task is well understood, errors are mostly typographical.

• these error messages can be extraordinarily helpful.

• If the task is not well understood or the data is irregular, it may produce a ‘logical’ error and require more thought.

• Beware: a valid program can still produce the wrong result.

Page 6: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Programming

• Is an exercise in problem solving:

• iterative

• gradual

• often a solitary activity

• Social activity

• You are now part of a community of tool builders.

• A program does not often stand alone, but interacts with other programs that make up its environment. Each building on the others.

• Systematic and beautiful

Page 7: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Programming

• Is an economically valuable skill.

• Commercial and proprietary systems are built to protect their economic value.

• Open Source projects are different.

• Open Source software projects publish their source code so that is can be shared and improved by the community of users.

http://www.opensource.org/

Page 8: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Open Source Programs

•Firefox

•LINUX

•MySQL

•Apache web server

•Languages:

•Perl

•Ruby

•Python

Page 9: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Programming Strategies

•Break down into two major approaches:

1. Find a program written by someone else.

2. Write one yourself.

•The reality is usually somewhere in between.

Page 10: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Programming Strategies

• Open Source programming communities are often large and prolific.

• If you cannot find a program that does exactly what you need -- you can likely find one that does most of what you need.

• A little tweaking is often significantly quicker than rolling your own.

• “A day in the library can save you six months in the lab.” -- ancient adage

Page 11: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Programming Strategies

• It is important to become aware of the communities that use and support the tools you use.

• Some copyrights may apply but use is generally free.

• CPAN

What has been will be again, what

has been done will be done again;

there is nothing new under the sun.

(Ecclesiastes 1:9 NIV)

Page 12: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

The Process

1. Identify the inputs, data, and specifications from the user.

2. Design the solution as a series of steps toward the desired result.

3. Decide on the output(s). Does the result print to the screen or to a file? How will this output be used? Does format matter?

4. Refine the design with increasing detail. (pseudocode)

5. Do appropriate code modules exist? (CPAN)

6. Write the program.

Page 13: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Pseudocode

• An informal program in which there are no details and formal syntax is not followed.

• A quick and informal way to collect your ideas about solving the problem at hand.

get the name of DNA file from user

read in DNA from DNA file

for each element

if element is DNA, then add one to the count

print count

Page 14: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

What is Perl?

•Scripting language by Larry Wall, cica 1985

•Born of AWK

•Practical Extraction and Reporting Language

•Pathologically Eclectic Rubbish Lister

•Disturbingly flexible in form, format, and usage.

•“Swiss Army chain-saw”

Page 15: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Why Perl?!

•An easy language to use, though sometimes hard to learn. Some choices were made to make things easier for the programmer at the expense of the student.

•Fast cross platform text processing.

•Good pattern matching. (regex)

•Many extensions for Life Sciences data types. (BioPerl)

•Many biologists already know Perl.

•Powerful

Page 16: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

#!/usr/local/bin/perl -w

use SOAP::Lite;

print STDERR "Welcome to the SOAP demonstration\n";

my $res;

$servername = "inquiry.ccgb.umn.edu";

my $server = SOAP::Lite

-> uri("http://$servername/Backbeat")

-> proxy("http://$servername/cgi-bin/bipod/BIFX.pl");

$res = $server-> (SOAP::Data->name(USER)->value("kunau"),

SOAP::Data->name(PASSWORD)->value(” ));

my $ticket;

if ($res->result()) { $ticket = $res->result(); }

print STDERR "Got ticket $ticket\n";

my $id = "nt:ABY13260";

= $id;

=~ s/: ;

$res = $server-> (SOAP::Data->name(TICKET)->value($ticket),

SOAP::Data->name("BLOCKING")->value(1),

SOAP::Data->name("sequence")->value("$id"),

SOAP::Data->name( )->value("fasta"),

SOAP::Data->name("outseq")->value( ));

($res);

print STDERR "fetched file for $id\n";

$res = $server-> (SOAP::Data->name(TICKET)->value($ticket),

SOAP::Data->name("BLOCKING")->value(0),

SOAP::Data->name("blastall")->value("blastn"),

SOAP::Data->name("query")->value( ),

SOAP::Data->name( )->value("yeast.nt"),

SOAP::Data->name( )->value("yeast.nt"),

SOAP::Data->name( )->value( . ".blastx"));

($res);

my $jid = 0;

if ($res->result()) { $jid = $res->result(); }

print "Submitted BLAST for . Got job id $jid\n";

# Client side block

my $result = "";

while ($result ne "FINISHED") {

print "Checking status for job $jid\n";

$res = $server-> (

SOAP::Data->name("TICKET")->value($ticket),

$jid));

($res);

if ($res->result()) { $result = $res->result(); }

print "Got status $result\n";

if ($result ne "FINISHED") { sleep 3; }

}

$res = $server-> (SOAP::Data->name(TICKET)->value($ticket),

SOAP::Data->name(FILENAME)->value("blastall.txt"));

($res);

if ($res->result()) {

$result = $res->result();

print "Got status $result\n";

if ($result ne "FINISHED") { sleep 3; }

}

$res = $server-> (SOAP::Data->name(TICKET)->value($ticket),

SOAP::Data->name(FILENAME)->value("blastall.txt"));

($res);

if ($res->result()) { print $res->result(); }

###################### SUBROUTINES #####################

sub {

my $res = shift;

if (my $fault = $res->fault()) {

my %fault = %$fault;

while (my ($key, $val) = each (%fault)) {

print "$key $val\n";

}

}

}

Login

Get a ticket

Configure a service

Submit request

Check status (rinse, repeat)

Print result

Page 17: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Beginning Perl for

Bioinformatics

• Hardcover: 400 pages

• Publisher: O'Reilly Media, Inc.; 1

edition (October 15, 2001)

• Language: English

• ISBN: 0596000804

• Product Dimensions: 9.2 x 7.1 x

0.9 inches

• Shipping Weight: 1.3 pounds.

• Average Customer Review: 4.5/5

based on 25 reviews.

Page 18: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Mastering Perl for

Bioinformatics

• Hardcover: 377 pages

• Publisher: O'Reilly Media, Inc.; 1

edition (June, 2003)

• Language: English

• ISBN: 0596003072

• Product Dimensions: 9.4 x 6.8 x

0.9 inches

• Shipping Weight: 1.4 pounds.

• Average Customer Review: 4.5/5

based on 8 reviews.

Page 19: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Safari Books on-line

http://proquest.safaribooksonline.com/home

Page 20: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Safari: Perl

Page 21: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Safari: bioinformatics

Page 22: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Getting Started

•The programming rite of passage.

•Tidbits

•print “string”;

•newline: “\n”

•tab: “\t”

•# comments

•All about context

Page 23: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

A simple program

#!/usr/bin/perl -w

#

# a program to do the obvious

#

print “Hello, world!\n”;

Page 24: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

A simple result

% ./hello-world.pl

Hello, world!

Page 25: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

How does it work?

#!/usr/bin/perl -w

#

# a program to do the obvious

#

print “Hello, world!\n”;

Every Perl program

begins with this line.

The ‘print’ function

sends the quoted

text to the default

output device, the

screen.

Comments

Page 26: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Theme and variation

#!/usr/bin/perl -w

#

# assign a value to $message

my $message = “Hello, world!\n”;

# print the $message

print $message;

Store the

value “Hello,

world!” in a

container

called a

variable.

Page 27: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Theme and variation

#!/usr/bin/perl -w

#

# assign a value to $message

my $message = qq{Hello, world!\n};

# print the $message

print $message;

Don’t let a

change in

form throw

you.

Page 28: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

TMTOWTDI

•There’s More Than One Way To Do It

•This can be frustrating for new users.

•We’ll try to focus on what we’re doing. Don’t worry about all the possible ways to do it yet.

Page 29: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: Let’s try it!

• Login to your workstation

• launch a terminal window

•mkdir bsi2007

•cd bsi2007

• launch a text editor: pico, vi, emacs

• create and save your “Hello, world!” program

• Run it

Page 30: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: Let’s try it!

% mkdir bsi2007

% cd bsi2007

% pico hello-world.pl

% chmod +x hello-world.pl

% ./hello-world.pl

Page 31: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: Let’s try it!

#!/usr/bin/perl -w

#

# a program to do the obvious

#

print “Hello, world!\n”;

Page 32: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: Let’s try a little variation.

#!/usr/bin/perl -w

#

# assign a value to $message

my $message = “Hello, world!\n”;

# print the $message

print $message;

Page 33: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: break it.

What happens when?:

1. You remove a semicolon?

2. You remove a dollar sign?

3. You change the shebang?

4. Can you change the shebang to something else that works?

Page 34: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

lather --> rinse --> repeat

The goal of testing is to cause your code to fail. The goal of testing is not to cause your code to succeed.

D. Conway

Page 35: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: A simple program

#!/usr/bin/perl -w

#

# a program to do the obvious

#

print “Hello, world!\n”;

Page 36: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: A simple result

% ./hello-world.pl

Hello, world!

Page 37: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Biology and Computer Science

• The Life Sciences and many of the Computer Sciences grew up together.

• Databases

• Languages

• Networks

• the World Wide Web

“It is better to use one’s

head for a few minutes,

than to use a computing

machine for a few days.”

Francis Crick

Page 38: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

A brief history

• 1950’s: Double helix structure of DNA

• 1960’s: Manual alignment using “edit distances”

• 1970’s: Optimal global alignment (Needleman & Wunsch)

• Substitution matrixes (Dayhoff)

• 1980’s: Optimal local alignment (Smith & Waterman)

• 1990’s: Heuristic local alignment search

• FASTA: (Pearson et al.),

• BLAST: (Altschul et al.)

Page 39: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Disconnects

•Social differences

•Managing expectations

•Developing a common vocabulary

•Conway’s Law

Page 40: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Conway’s Law states:

“Organizations which design systems are constrained to produce designs which are copies of the communication structures of their organizations.”

In other words:

Any piece of software reflects the organizational structure that produced it.

Page 41: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Social differences

• Tool building versus the great discovery:

• Computer scientists create new rules to engineer a solution. (“Inventing laws”)

• Life scientists look for the exception that breaks the rules. (“Discover laws”)

Page 42: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Social differences

BiologistsComputer

Scientists

Sharing resultssit on it until ready to

publish

Share but do not

guarantee correctness

Reporting results Peer reviewed papersTalks at conferences

Publish Source Code

Who’s who

(on publications)Lab leader always last

Lab leader second, least

involved last

Page 43: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Managing Expectations

What can we expect from each other?

Life Sciences are presenting the grand

challenges of our time...

What does Computer Science have to offer Life Sciences research?

Page 44: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Developing a common vocabulary

Words in common but with different meanings:

Array, chip, clone, cluster, database, domain, insert, library, node, partitioning, root, sequence, transformation, tree, vector, virus

Page 45: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Isn’t it odd?

Biology is the only science in which multiplication means the same thing as division.

Page 46: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Developing a common vocabulary

• The importance of interpreters.

• Constrained and negotiated vocabularies, Ontologies:

• “gene expression” and “Gene Expression” and “gene regulation”

• “putative kinase” and “possibly a kinase” and “it may be something, but it isn’t a kinase”

• Metadata without guidelines will lead to entropy.

• Folksonomy: in-formalisms, tagging?

• You are becoming interpreters.

Page 47: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Developing a common vocabulary

BioBench-Bob: “The information is in the file, what’s the problem?”

Compu-Carla: “This file is a mess! How about some consistency and structure?”

Page 48: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

What we have here is a failure to communicate.

Compu-Carla: “The information is all in the database, why are you complaining?”

BioBench-Bob: “How do I read it?”

Page 49: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Conway’s Law"

“Organizations which design systems are constrained to produce designs which are copies of the communication structures of their organizations.”

Page 50: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Bioinformatics Data

“Quantity has a quality

all its own”

Russian military axiom

GBREL.TXT Genetic Sequence Data BankApril 15 2007

NCBI-GenBank Flat File Release 159.0Distribution Release Notes

71,802,595 loci, 75,742,041,056 bases, from 71,802,595 reported sequences

Page 51: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Bioinformatics Data

• Often unstructured or semi-structured.

• Data appears as text strings:

• Protein sequences: FASTA flat-files, et alia.

• Annotation: often free-text

• Feudal states (Lincoln Stein)

Page 52: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

FASTA

>ContigId:Contig1 AssemblyProcessId:MtSC AssemblyProcessVersion:6 GCTTTAATCTTGTAGGTTTGATGAAAGAATAAGTTCGTTTGCTGAGAAGA AGTTTACAAGAGATGGTATAGAAGTTCAAACTGGATGCCGCGTTATGAGT GTTGATGACAAGGAAATTACAGTGAAGGTGAAATCAACGGGAGAGGTTTG CTCGGTTCCCCATGGATTGATTATCTGGTCTACTGGCATTTCTACTCTTC CAGTTATAAGAGATTTTATGGAAGAAATTGGTCAGACTAAAAGGCATGTA CTGGCAACCGATGAATGGTTGAGAGTGAAGGAATGTGAAGATGTGTTTGC CATTGGTGATTGTTCATCAATAAATCAACGTAAAATCATGGATGATATCT TGGACATATTTAAGGCTGCAGACAAAAATAACTCCGGTACCTTAACTGTG TAAGAATGCGAAGAAGTGATGGATGAATGTATCTTAAGATATCCTGCAGT GGAATGC

Medicago Truncatula consensus sequence

Page 53: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

GenBank

LOCUS SCU49845 5028 bp DNA PLN 21-JUN-1999DEFINITION Saccharomyces cerevisiae TCP1-beta gene, partial cds, and Axl2p (AXL2) and Rev7p (REV7) genes, complete cds.ACCESSION U49845VERSION U49845.1 GI:1293613KEYWORDS .SOURCE Saccharomyces cerevisiae (baker's yeast) ORGANISM Saccharomyces cerevisiae Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes; Saccharomycetales; Saccharomycetaceae; Saccharomyces.REFERENCE 1 (bases 1 to 5028) AUTHORS Torpey,L.E., Gibbs,P.E., Nelson,J. and Lawrence,C.W. TITLE Cloning and sequence of REV7, a gene whose function is required for DNA damage-induced mutagenesis in Saccharomyces cerevisiae JOURNAL Yeast 10 (11), 1503-1509 (1994) PUBMED 7871890REFERENCE 2 (bases 1 to 5028) AUTHORS Roemer,T., Madden,K., Chang,J. and Snyder,M. TITLE Selection of axial growth sites in yeast requires Axl2p, a novel plasma membrane glycoprotein JOURNAL Genes Dev. 10 (7), 777-793 (1996) PUBMED 8846915REFERENCE 3 (bases 1 to 5028) AUTHORS Roemer,T. TITLE Direct Submission JOURNAL Submitted (22-FEB-1996) Terry Roemer, Biology, Yale University, New Haven, CT, USAFEATURES Location/Qualifiers source 1..5028 /organism="Saccharomyces cerevisiae" /db_xref="taxon:4932" /chromosome="IX" /map="9" CDS <1..206 /codon_start=3 /product="TCP1-beta" /protein_id="AAA98665.1" /db_xref="GI:1293614" /translation="SSIYNGISTSGLDLNNGTIADMRQLGIVESYKLKRAVVSSASEA AEVLLRVDNIIRARPRTANRQHM" gene 687..3158 /gene="AXL2" CDS 687..3158 /gene="AXL2"

Approximately 71,802,595 loci,

75,742,041,056 bases, from 71,802,595

reported sequences in traditional

GenBank divisions as of April 2007.

Page 54: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

GenBank

/note="plasma membrane glycoprotein" /codon_start=1 /function="required for axial budding pattern of S. cerevisiae" /product="Axl2p" /protein_id="AAA98666.1" /db_xref="GI:1293615" /translation="MTQLQISLLLTATISLLHLVVATPYEAYPIGKQYPPVARVNESF TFQISNDTYKSSVDKTAQITYNCFDLPSWLSFDSSSRTFSGEPSSDLLSDANTTLYFN VILEGTDSADSTSLNNTYQFVVTNRPSISLSSDFNLLALLKNYGYTNGKNALKLDPNE VFNVTFDRSMFTNEESIVSYYGRSQLYNAPLPNWLFFDSGELKFTGTAPVINSAIAPE TSYSFVIIATDIEGFSAVEVEFELVIGAHQLTTSIQNSLIINVTDTGNVSYDLPLNYV YLDDDPISSDKLGSINLLDAPDWVALDNATISGSVPDELLGKNSNPANFSVSIYDTYG DVIYFNFEVVSTTDLFAISSLPNINATRGEWFSYYFLPSQFTDYVNTNVSLEFTNSSQ DHDWVKFQSSNLTLAGEVPKNFDKLSLGLKANQGSQSQELYFNIIGMDSKITHSNHSA NATSTRSSHHSTSTSSYTSSTYTAKISSTSAAATSSAPAALPAANKTSSHNKKAVAIA CGVAIPLGVILVALICFLIFWRRRRENPDDENLPHAISGPDLNNPANKPNQENATPLN NPFDDDASSYDDTSIARRLAALNTLKLDNHSATESDISSVDEKRDSLSGMNTYNDQFQ SQSKEELLAKPPVQPPESPFFDPQNRSSSVYMDSEPAVNKSWRYTGNLSPVSDIVRDS YGSQKTVDTEKLFDLEAPEKEKRTSRDVTMSSLDPWNSNISPSPVRKSVTPSPYNVTK HRNRHLQNIQDSQSGKNGITPTTMSTSSSDDFVPVKDGENFCWVHSMEPDRRPSKKRL VDFSNKSNVNVGQVKDIHGRIPEML" gene complement(3300..4037) /gene="REV7" CDS complement(3300..4037) /gene="REV7" /codon_start=1 /product="Rev7p" /protein_id="AAA98667.1" /db_xref="GI:1293616" /translation="MNRWVEKWLRVYLKCYINLILFYRNVYPPQSFDYTTYQSFNLPQ FVPINRHPALIDYIEELILDVLSKLTHVYRFSICIINKKNDLCIEKYVLDFSELQHVD KDDQIITETEVFDEFRSSLNSLIMHLEKLPKVNDDTITFEAVINAIELELGHKLDRNR RVDSLEEKAEIERDSNWVKCQEDENLPDNNGFQPPKIKLTSLVGSDVGPLIIHQFSEK LISGDDKILNGVYSQYEEGESIFGSLF"ORIGIN 1 gatcctccat atacaacggt atctccacct caggtttaga tctcaacaac ggaaccattg 61 ccgacatgag acagttaggt atcgtcgaga gttacaagct aaaacgagca gtagtcagct 121 ctgcatctga agccgctgaa gttctactaa gggtggataa catcatccgt gcaagaccaa 181 gaaccgccaa tagacaacat atgtaacata tttaggatat acctcgaaaa taataaaccg 241 ccacactgtc attattataa ttagaaacag aacgcaaaaa ttatccacta tataattcaa 301 agacgcgaaa aaaaaagaac aacgcgtcat agaacttttg gcaattcgcg tcacaaataa 361 attttggcaa cttatgtttc ctcttcgagc agtactcgag ccctgtctca agaatgtaat 421 aatacccatc gtaggtatgg ttaaagatag catctccaca acctcaaagc tccttgccga

Page 55: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

GenBank

481 gagtcgccct cctttgtcga gtaattttca cttttcatat gagaacttat tttcttattc 541 tttactctca catcctgtag tgattgacac tgcaacagcc accatcacta gaagaacaga 601 acaattactt aatagaaaaa ttatatcttc ctcgaaacga tttcctgctt ccaacatcta 661 cgtatatcaa gaagcattca cttaccatga cacagcttca gatttcatta ttgctgacag 721 ctactatatc actactccat ctagtagtgg ccacgcccta tgaggcatat cctatcggaa 781 aacaataccc cccagtggca agagtcaatg aatcgtttac atttcaaatt tccaatgata 841 cctataaatc gtctgtagac aagacagctc aaataacata caattgcttc gacttaccga 901 gctggctttc gtttgactct agttctagaa cgttctcagg tgaaccttct tctgacttac 961 tatctgatgc gaacaccacg ttgtatttca atgtaatact cgagggtacg gactctgccg 1021 acagcacgtc tttgaacaat acataccaat ttgttgttac aaaccgtcca tccatctcgc 1081 tatcgtcaga tttcaatcta ttggcgttgt taaaaaacta tggttatact aacggcaaaa 1141 acgctctgaa actagatcct aatgaagtct tcaacgtgac ttttgaccgt tcaatgttca 1201 ctaacgaaga atccattgtg tcgtattacg gacgttctca gttgtataat gcgccgttac 1261 ccaattggct gttcttcgat tctggcgagt tgaagtttac tgggacggca ccggtgataa 1321 actcggcgat tgctccagaa acaagctaca gttttgtcat catcgctaca gacattgaag 1381 gattttctgc cgttgaggta gaattcgaat tagtcatcgg ggctcaccag ttaactacct 1441 ctattcaaaa tagtttgata atcaacgtta ctgacacagg taacgtttca tatgacttac 1501 ctctaaacta tgtttatctc gatgacgatc ctatttcttc tgataaattg ggttctataa 1561 acttattgga tgctccagac tgggtggcat tagataatgc taccatttcc gggtctgtcc 1621 cagatgaatt actcggtaag aactccaatc ctgccaattt ttctgtgtcc atttatgata 1681 cttatggtga tgtgatttat ttcaacttcg aagttgtctc cacaacggat ttgtttgcca 1741 ttagttctct tcccaatatt aacgctacaa ggggtgaatg gttctcctac tattttttgc 1801 cttctcagtt tacagactac gtgaatacaa acgtttcatt agagtttact aattcaagcc 1861 aagaccatga ctgggtgaaa ttccaatcat ctaatttaac attagctgga gaagtgccca 1921 agaatttcga caagctttca ttaggtttga aagcgaacca aggttcacaa tctcaagagc 1981 tatattttaa catcattggc atggattcaa agataactca ctcaaaccac agtgcgaatg 2041 caacgtccac aagaagttct caccactcca cctcaacaag ttcttacaca tcttctactt 2101 acactgcaaa aatttcttct acctccgctg ctgctacttc ttctgctcca gcagcgctgc 2161 cagcagccaa taaaacttca tctcacaata aaaaagcagt agcaattgcg tgcggtgttg 2221 ctatcccatt aggcgttatc ctagtagctc tcatttgctt cctaatattc tggagacgca 2281 gaagggaaaa tccagacgat gaaaacttac cgcatgctat tagtggacct gatttgaata 2341 atcctgcaaa taaaccaaat caagaaaacg ctacaccttt gaacaacccc tttgatgatg 2401 atgcttcctc gtacgatgat acttcaatag caagaagatt ggctgctttg aacactttga 2461 aattggataa ccactctgcc actgaatctg atatttccag cgtggatgaa aagagagatt 2521 ctctatcagg tatgaataca tacaatgatc agttccaatc ccaaagtaaa gaagaattat 2581 tagcaaaacc cccagtacag cctccagaga gcccgttctt tgacccacag aataggtctt 2641 cttctgtgta tatggatagt gaaccagcag taaataaatc ctggcgatat actggcaacc 2701 tgtcaccagt ctctgatatt gtcagagaca gttacggatc acaaaaaact gttgatacag 2761 aaaaactttt cgatttagaa gcaccagaga aggaaaaacg tacgtcaagg gatgtcacta 2821 tgtcttcact ggacccttgg aacagcaata ttagcccttc tcccgtaaga aaatcagtaa 2881 caccatcacc atataacgta acgaagcatc gtaaccgcca cttacaaaat attcaagact 2941 ctcaaagcgg taaaaacgga atcactccca caacaatgtc aacttcatct tctgacgatt 3001 ttgttccggt taaagatggt gaaaattttt gctgggtcca tagcatggaa ccagacagaa 3061 gaccaagtaa gaaaaggtta gtagattttt caaataagag taatgtcaat gttggtcaag

Page 56: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

GenBank

3121 ttaaggacat tcacggacgc atcccagaaa tgctgtgatt atacgcaacg atattttgct 3181 taattttatt ttcctgtttt attttttatt agtggtttac agatacccta tattttattt 3241 agtttttata cttagagaca tttaatttta attccattct tcaaatttca tttttgcact 3301 taaaacaaag atccaaaaat gctctcgccc tcttcatatt gagaatacac tccattcaaa 3361 attttgtcgt caccgctgat taatttttca ctaaactgat gaataatcaa aggccccacg 3421 tcagaaccga ctaaagaagt gagttttatt ttaggaggtt gaaaaccatt attgtctggt 3481 aaattttcat cttcttgaca tttaacccag tttgaatccc tttcaatttc tgctttttcc 3541 tccaaactat cgaccctcct gtttctgtcc aacttatgtc ctagttccaa ttcgatcgca 3601 ttaataactg cttcaaatgt tattgtgtca tcgttgactt taggtaattt ctccaaatgc 3661 ataatcaaac tatttaagga agatcggaat tcgtcgaaca cttcagtttc cgtaatgatc 3721 tgatcgtctt tatccacatg ttgtaattca ctaaaatcta aaacgtattt ttcaatgcat 3781 aaatcgttct ttttattaat aatgcagatg gaaaatctgt aaacgtgcgt taatttagaa 3841 agaacatcca gtataagttc ttctatatag tcaattaaag caggatgcct attaatggga 3901 acgaactgcg gcaagttgaa tgactggtaa gtagtgtagt cgaatgactg aggtgggtat 3961 acatttctat aaaataaaat caaattaatg tagcatttta agtataccct cagccacttc 4021 tctacccatc tattcataaa gctgacgcaa cgattactat tttttttttc ttcttggatc 4081 tcagtcgtcg caaaaacgta taccttcttt ttccgacctt ttttttagct ttctggaaaa 4141 gtttatatta gttaaacagg gtctagtctt agtgtgaaag ctagtggttt cgattgactg 4201 atattaagaa agtggaaatt aaattagtag tgtagacgta tatgcatatg tatttctcgc 4261 ctgtttatgt ttctacgtac ttttgattta tagcaagggg aaaagaaata catactattt 4321 tttggtaaag gtgaaagcat aatgtaaaag ctagaataaa atggacgaaa taaagagagg 4381 cttagttcat cttttttcca aaaagcaccc aatgataata actaaaatga aaaggatttg 4441 ccatctgtca gcaacatcag ttgtgtgagc aataataaaa tcatcacctc cgttgccttt 4501 agcgcgtttg tcgtttgtat cttccgtaat tttagtctta tcaatgggaa tcataaattt 4561 tccaatgaat tagcaatttc gtccaattct ttttgagctt cttcatattt gctttggaat 4621 tcttcgcact tcttttccca ttcatctctt tcttcttcca aagcaacgat ccttctaccc 4681 atttgctcag agttcaaatc ggcctctttc agtttatcca ttgcttcctt cagtttggct 4741 tcactgtctt ctagctgttg ttctagatcc tggtttttct tggtgtagtt ctcattatta 4801 gatctcaagt tattggagtc ttcagccaat tgctttgtat cagacaattg actctctaac 4861 ttctccactt cactgtcgag ttgctcgttt ttagcggaca aagatttaat ctcgttttct 4921 ttttcagtgt tagattgctc taattctttg agctgttctc tcagctcctc atatttttct 4981 tgccatgact cagattctaa ttttaagcta ttcaatttct ctttgatc//

Page 57: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

SWISS-Prot

• ID DMI1_MEDTR STANDARD; PRT; 882 AA.

• AC Q6RHR6;

• DT 29-MAR-2005, integrated into UniProtKB/Swiss-Prot.

• DT 05-JUL-2004, sequence version 1.

• DT 04-APR-2006, entry version 13.

• DE Putative ion channel DMI-1 (Does not make infections protein 1).

• GN Name=DMI1;

• OS Medicago truncatula (Barrel medic).

• OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;

• OC Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;

• OC rosids; eurosids I; Fabales; Fabaceae; Papilionoideae; Trifolieae;

• OC Medicago.

• OX NCBI_TaxID=3880;

• RN [1]

• RP NUCLEOTIDE SEQUENCE [MRNA], INDUCTION, AND TISSUE SPECIFICITY.

• RC TISSUE=Root;

• RX PubMed=14963334; DOI=10.1126/science.1092986;

• RA Ane J.-M., Kiss G.B., Riely B.K., Penmetsa R.V., Oldroyd G.E.,

• RA Ayax C., Levy J., Debelle F., Baek J.-M., Kalo P., Rosenberg C.,

• RA Roe B.A., Long S.R., Denarie J., Cook D.R.;

• RT "Medicago truncatula DMI1 required for bacterial and fungal symbioses

• RT in legumes.";

• RL Science 303:1364-1367(2004).

• CC -!- FUNCTION: Required for early signal transduction events leading to

• CC endosymbioses. Acts early in a signal transduction chain leading

• CC from the perception of Nod factor to the activation of calcium

• CC spiking. Also involved in mycorrhizal symbiosis.

• CC -!- SUBCELLULAR LOCATION: Plastid; chloroplast; chloroplast membrane;

• CC multi-pass membrane protein (Potential).

• CC -!- TISSUE SPECIFICITY: Mainly expressed in roots. Also detected in

• CC pods, flowers, leaves, and stems.

• CC -!- INDUCTION: Not induced after bacterial or Nod factor treatment.

• CC -!- SIMILARITY: Belongs to the castor/pollux family.

• CC -----------------------------------------------------------------------

• CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms

• CC Distributed under the Creative Commons Attribution-NoDerivs License

• CC -----------------------------------------------------------------------

• DR EMBL; AY497771; AAS49490.1; -; mRNA.

• KW Chloroplast; Coiled coil; Ion transport; Ionic channel; Membrane;

• KW Plastid; Transmembrane; Transport.

• FT CHAIN 1 882 Putative ion channel DMI-1.

• FT /FTId=PRO_0000165855.

• FT TRANSMEM 129 149 Potential.

• FT TRANSMEM 192 212 Potential.

• FT TRANSMEM 255 275 Potential.

• FT TRANSMEM 307 327 Potential.

• FT COILED 378 403 Potential.

• FT COMPBIAS 78 96 Pro-rich.

• FT COMPBIAS 114 117 Poly-Ser.

Release 53.0 of 29-May-07 of UniProtKB/Swiss-Prot contains 269,293 sequence entries,comprising 98,902,758 amino acids abstracted from 156,204 references.

Page 58: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Tower of Babel, Pieter Brueghel the Elder, 1563.

Page 59: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

XML

<?xml version="1.0" encoding="UTF-8"?><uniprot xmlns="http://uniprot.org/uniprot" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://uniprot.org/uniprot http://www.uniprot.org/support/docs/uniprot.xsd"><entry dataset="Swiss-Prot" created="2005-03-29" modified="2006-04-04" version="13"> <accession>Q6RHR6</accession> <name>DMI1_MEDTR</name> <protein> <name>Putative ion channel DMI-1</name> <name>Does not make infections protein 1</name> </protein> <gene> <name type="primary">DMI1</name> </gene> <organism key="1"> <name type="scientific">Medicago truncatula</name> <name type="common">Barrel medic</name> <dbReference type="NCBI Taxonomy" id="3880" key="2"/> <lineage> <taxon>Eukaryota</taxon> <taxon>Viridiplantae</taxon> <taxon>Streptophyta</taxon> <taxon>Embryophyta</taxon> <taxon>Tracheophyta</taxon> <taxon>Spermatophyta</taxon> <taxon>Magnoliophyta</taxon> <taxon>eudicotyledons</taxon> <taxon>core eudicotyledons</taxon> <taxon>rosids</taxon> <taxon>eurosids I</taxon> <taxon>Fabales</taxon> <taxon>Fabaceae</taxon> <taxon>Papilionoideae</taxon> <taxon>Trifolieae</taxon> <taxon>Medicago</taxon> </lineage> </organism> <reference key="3"> <citation type="journal article" date="2004" name="Science" volume="303" first="1364" last="1367"> <title>Medicago truncatula DMI1 required for bacterial and fungal symbioses in legumes.</title> <authorList> <person name="Ane J.-M."/> <person name="Kiss G.B."/> <person name="Riely B.K."/> <person name="Penmetsa R.V."/> <person name="Oldroyd G.E."/> <person name="Ayax C."/> <person name="Levy J."/>

Page 60: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Two principle problems in bioinformatics

•distribution: data is created and controlled by autonomous groups all over the world.

•biology is hard and messy: large collections of data, many numbers of data types and tools; few of which talk to each other.

Page 61: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Perl is often the glue that binds these systems together.

Page 62: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Perl basics: Strings

•Primitives:

•Strings

•Numerics

TGACATGCTAGCTAGCTAGCTAT

1356

#@$!$!%@&&!@

Page 63: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Data types

• Scalar: a variable quantity that cannot be resolved into components.

• List: a collection of items, often stored in an array.

• Hash: a dish of cooked meat cut into small pieces and re-cooked, usually with potatoes.

Page 64: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Data types

• Scalar: a variable quantity that cannot be resolved into components.

• List: a collection of items, often stored in an array.

• Hash: like an array, but instead of indexing values by number, values are accessed by name. Think of them as name-value pairs.

Page 65: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Data types

• Scalar: my $var = “a”;

•my $num = 10;

• List: my @fruit_list = (‘apple’,‘orange’,‘banana’);

• Hash:

my %ip2hostname = (

“160.94.109.65” => “leaf.cbri.umn.edu”,

“160.94.109.55” => “blastoma.cbri.umn.edu”,

“160.94.109.211” => “kierkegaard.cbri.umn.edu”

);

Page 66: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Math

• Standard arithmetic: +, -, *, /

• modulus operator: %

•4 % 2 = 0 and 5 % 2 = 1

• Operate in place: $num += 3;

• Increment and decrement variable: $i++, $a--

• power: 2**5

• Square-root: sqrt(9)

Page 67: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Some Math Code

# Pythagorean theoremmy $a = 3; my $b = 4;my $c = sqrt($a**2 + $b**2);

# what’s left over from the divisionmy $x = 22; my $y = 6;my $div = int ( $x / $y );my $mod = $x % $y;print “output: ”, $div, “ “, $mod, “\n”;

output: 3 4

Page 68: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Logic and Equality

•if / unless / elsif / else

•if( TEST ) { DO SOMETHING }elsif( TEST ) { SOMETHING ELSE }else { DO SOMETHING ELSE IN CASE }

• Equality: == (numbers) and eq (strings)

• Numeric Less/Greater than: <, <=, >, >=

• String (lexical) comparisons: lt, le, gt, ge

Page 69: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Testing equality

my $str1 = “mumbo”;my $str2 = “jumbo”;

if( $str1 eq $str2 ) { print “strings are equal\n”;}

if( $str1 lt $str2 ) { print “less”; }} else { print “more\n”;}

Page 70: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Testing equality

my $str1 = “mumbo”;my $str2 = “jumbo”;

if( $str1 eq $str2 ) { print “strings are equal\n”;}

if( $str1 lt $str2 ) { print “less”; }} else { print “more\n”;}

Page 71: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Testing equality

my $num1 = “10”;my $num2 = “100”;

if( $num1 == $num2 ) { print “nums are equal\n”;}

if( $num1 < $num2 ) { print “less”; }} else { print “more\n”;}

Page 72: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Boolean Logic

• AND: && and

• OR: || or

• NOT: ! not

if( $a > 10 && $a <= 20) {

do something interesting here;

}

Page 73: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Loops

•while( TEST ) { }until( ! TEST ) { }

•for( $i = 0 ; $i < 10; $i++ ) {}

•foreach $item ( @list ) { }

•for $item ( @list ) { }

Page 74: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Using logic

for( $i = 0; $i < 20; $i++ ) { if( $i == 0 ) { print “$i is 0\n”; } elsif( $i / 2 == 0) { print “$i is even\n”; } else { print “$i is odd\n”; }}

Page 75: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Using logic: subtile

for( $i = 0; $i < 20; $i++ ) { if( $i == 0 ) { print “$i is 0\n”; } elsif( $i % 2 == 0) { print “$i is even\n”; } else { print “$i is odd\n”; }}

Page 76: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Using logic: looping

for( $i = 0; $i < 20; $i++ ) { if( $i == 0 ) { print “$i is 0\n”; } elsif( $i % 2 == 0) { print “$i is even\n”; } else { print “$i is odd\n”; }}

Page 77: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Using logic: comparing

for( $i = 0; $i < 20; $i++ ) { if( $i == 0 ) { print “$i is 0\n”; } elsif( $i % 2 == 0) { print “$i is even\n”; } else { print “$i is odd\n”; }}

Page 78: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

What is truth?

•True

•if( “zero” ) {}

•if( 23 || -1 || ! 0) {}

•$x = “0 or none”; if( $x )

•False

•if( 0 || undef || ‘’ || “0” ) { }

Page 79: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Special variables

• This is why many people dislike Perl.

• Too many little silly things to remember.

• One of the trade-offs that make it harder to learn and ultimately easier to use.

• perldoc perlvar

for more detailed information.

Page 80: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Some special variables

•$! : error messages here

•$, : separator when doing print “@array”;

•$/ : record delimiter (“\n” usually)

•$a,$b : used in sorting

•$_ : implicit variable

•perldoc perlvar for more info

Page 81: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

The Implicit variable?

•Implicit variable is $_

•It is the last thing were were thinking about.

•Examples:

for ( @list ) { print $_ };

while(<IN>) { print $_};

Page 82: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Some operators imbedded functions

• tr///: transliteration from one group of characters to another.

• lc, lcfirst

• uc, ucfirst

• chomp: removes the line endings from all elements of a list; returning the (total) number of chars removed.

• chop: chops off the last character on all elements of a list; returns the last chopped char.

Page 83: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: Math

#!/usr/bin/perl -w## assign valuesmy $num1 = 22;my $num2 = 7;

my $result = $num1 / $num2;

# print the resultprint $result;

% pico pi.pl

% chmod +x pi.pl

% ./pi.pl

Page 84: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: Math

#!/usr/bin/perl -w## assign valuesmy $num1 = 22;my $num2 = 7;

my $result = int($num1 / $num2);

# print the resultprint $result;

% pico pi.pl

% chmod +x pi.pl

% ./pi.pl

Page 85: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: Math

#!/usr/bin/perl -w## assign valuesmy $num1 = 22;my $num2 = 7;

my $result = $num1 % $num2;

# print the resultprint $result;

% pico pi.pl

% chmod +x pi.pl

% ./pi.pl

Page 86: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: break it.

What happens when?:

1.You change the operation?

2.You change the values?

3.You put the numbers in quotes?

4.Add another number and multiply the result?

Page 87: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: Loops and logic

#!/usr/bin/perl -w

for( $i = 0; $i < 20; $i++ ) { if( $i == 0 ) { print “$i is 0\n”; } elsif( $i % 2 == 0) { print “$i is even\n”; } else { print “$i is odd\n”; }}

% pico loops-and-logic.pl

% chmod +x loops-and-logic.pl

% ./loops-and-logic.pl

Page 88: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: Loops and logic

% ./loops-and-logic.pl 0 is 01 is odd2 is even3 is odd4 is even5 is odd6 is even7 is odd8 is even9 is odd10 is even11 is odd12 is even13 is odd14 is even15 is odd16 is even17 is odd18 is even19 is odd

Page 89: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: Loops and logic

#!/usr/bin/perl -w

foreach $item ( “contig”, “seq”, “phrap” ) { if( $item eq “phrap” ) { print “Is there a phred file for this ‘$item’ file?\n”; } elsif( $item eq “seq”) { print “Is ‘$item’ in FASTA format?\n”; } else { print “’$item’ is an unknown type.\n”; }

}

% pico foreach.pl

% chmod +x foreach.pl

% ./foreach.pl

Page 90: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: Loops and logic

#!/usr/bin/perl -w

my @items = ( “contig”, “seq”, “phrap” );

foreach $item ( @items ) { if( $item eq “phrap” ) { print “Is there a phred file for this ‘$item’ file?\n”; } elsif( $item eq “seq”) { print “Is ‘$item’ in FASTA format?\n”; } else { print “’$item’ is an unknown type.\n”; }

}

% pico foreach.pl

% chmod +x foreach.pl

% ./foreach.pl

Page 91: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: break it.

What happens when?:

1. You change the test?

2. You change the values?

3. Test with booleans?

Page 92: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: operators and

functions

#!/usr/bin/perl -w# Transcribing DNA into RNA

# The DNAmy $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Print the DNA onto the screenprint "Here is the starting DNA:\n\n";

print "$DNA\n\n";

# Transcribe the DNA to RNA by substituting all T's with U's.my $RNA = $DNA;

$RNA =~ s/T/U/g;

# Print the RNA onto the screenprint "Here is the result of transcribing the DNA to RNA:\n\n";

print "$RNA\n";

% pico transcribe.pl

% chmod +x transcribe.pl

% ./transcribe.pl

Page 93: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: operators and

functions

#!/usr/bin/perl -w# Transcribing DNA into RNA

# The DNAmy $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Print the DNA onto the screenprint "Here is the starting DNA:\n\n";

print "$DNA\n\n";

# Transcribe the DNA to RNA by substituting all T's with U's.my $RNA = $DNA;

$RNA =~ s/T/U/g;

# Print the RNA onto the screenprint "Here is the result of transcribing the DNA to RNA:\n\n";

print "$RNA\n";

% pico transcribe.pl

% chmod +x transcribe.pl

% ./transcribe.pl

Page 94: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

LAB: break it.

What happens when?:

1.You change the case?

2. Change the case with different methods? (tr///, \L, \U, lc(), uc() )

3.You reverse the sequence?

Page 95: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

If you remember nothing else

•biology is hard and messy.

•The key problems are social. Together we are smarter than any one of us.

•Technology is easy by comparison.

Page 96: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Parting Thoughts: an assignment.

1. Calculate the reverse complement of a DNA strand using the tr/// operation.

2. Read about file handling. (Safari on-line documentation is available.)

3. Read about Regular Expressions (regex). (Safari)

4. Find CPAN.ORG and locate a module that would be useful to you as a biologist.

5. Read about that module and email me ([email protected]) the following details:

1. Name of the module.

2. The name of the person who wrote it.

3. What it does.

4. How it would be useful to you?

Page 97: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Questions?

Page 98: Introduction to Pr ogramming: Perl for · PDF fileIntroduction to Pr ogramming: Perl for Biologists ... ¥ ÒA day in the library can save you six months in the lab. Ó ... Manual

Thank You.