managing complexity (advanced perl) using perl for specific tasks with help from bioperl and others

43
Managing Managing complexity complexity (Advanced Perl) (Advanced Perl) Using perl for specific Using perl for specific tasks with help from tasks with help from Bioperl and others Bioperl and others

Upload: virginia-bailey

Post on 03-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Managing Managing complexitycomplexity

(Advanced Perl)(Advanced Perl)Using perl for specific tasks Using perl for specific tasks with help from Bioperl and with help from Bioperl and

othersothers

Page 2: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

LoginLogin

Username: bioinfouserUsername: bioinfouser Password: loginbioinfoPassword: loginbioinfo

Page 3: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Funny?Funny?

Page 4: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

GoalsGoals

I already assume you know perl basics I already assume you know perl basics -- some more advanced features-- some more advanced features

Learn how to write OO codeLearn how to write OO code More flexible modulesMore flexible modules Understand other modulesUnderstand other modules

Some API’s that you may need.Some API’s that you may need. BioperlBioperl PerlDBIPerlDBI

Page 5: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

What I assume you What I assume you already knowalready know

ScalarsScalars ArraysArrays HashesHashes Control structures (if-then, for, Control structures (if-then, for,

foreach, while, etc.)foreach, while, etc.) File IOFile IO

Page 6: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Managing complexity Managing complexity By managing complexityBy managing complexity

Make hard tasks easy(er)Make hard tasks easy(er) Perl itself does thisPerl itself does this

Regular expressions, text manipulationsRegular expressions, text manipulations Extensions (modules) do thisExtensions (modules) do this

May come at the expense of execution speedMay come at the expense of execution speed You may not careYou may not care Consider the big pictureConsider the big picture

Development timeDevelopment time ErrorsErrors

Extremely custom softwareExtremely custom software Some things need speedSome things need speed

Page 7: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

How complex is it now?How complex is it now?

Perl is a very compact language in terms Perl is a very compact language in terms of human languagesof human languages

Perl is large compared with other Perl is large compared with other languageslanguages TMTOWTDITMTOWTDI Perl has approximately 233 reserved wordsPerl has approximately 233 reserved words Java has approximately 47 reserved wordsJava has approximately 47 reserved words

Both are easy to learn harder to use Both are easy to learn harder to use effectivelyeffectively

Page 8: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

General practicesGeneral practices

Always use #!/usr/bin/perl –w or use Always use #!/usr/bin/perl –w or use warnings;warnings;

Consider use strict; for scripts Consider use strict; for scripts longer than 10 lineslonger than 10 lines

You can’t have too many commentsYou can’t have too many comments ## =head=head =cut=cut perldocperldoc

Page 9: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Getting values into the Getting values into the program or subroutine.program or subroutine.

Perl is pass by valuePerl is pass by value A scalar can have as a value a “pointer” A scalar can have as a value a “pointer”

to an array, hash, function etc.to an array, hash, function etc. The args to a program or function The args to a program or function

arrive in a special variable called @_arrive in a special variable called @_ my $first_value = shift @_;my $first_value = shift @_; my $first_value = $_[1];my $first_value = $_[1]; my $first_value = shift;my $first_value = shift;

Page 10: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

ReferencesReferences

my @array = (“one”, “two”, “three”, “four”);

function_call(@array);

function_call(\@array);

function_call([“one”,”two”,”three”]);

sub function_call{

my $passed = shift @_;

print $passed;

}

Output

oneARRAY(0x80601a0)ARRAY(0x804c9a0)

Page 11: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Debugging complex data Debugging complex data structures.structures.

Print the referencePrint the reference It will tell you a little bit of informationIt will tell you a little bit of information

Use the Dumper module.Use the Dumper module. This will give you a snapshot of the This will give you a snapshot of the

whole data structurewhole data structure

Page 12: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Some more advanced Some more advanced featuresfeatures

Page 13: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Regular expressionsRegular expressions

Not Perl specific Not Perl specific Very usefulVery useful What they do:What they do:

String comparisonsString comparisons String substitutionsString substitutions Substring selectionSubstring selection

Page 14: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

RegexRegex$string =~ /find/ $string =~ /find$/

$string =~ /^find/ $string =~ /^find$/

. Match any character\w Match "word" character (alphanumeric plus "_")\W Match non-word character\s Match whitespace character\S Match non-whitespace character\d Match digit character\D Match non-digit character\t Match tab\n Match newline\r Match return

Could put ‘m’

Page 15: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

RepetitionRepetition

$string =~ /(ti){2}/

$string =~ /A*T+G?C{3}A{3,}T{4,6}/

Character ClassesCharacter Classes$string =~ /[ATGCN]/$string =~ /[^ATGCNatgcn]/i

Page 16: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Selection/ReplacementSelection/Replacement

$string =~ /(A{3,8})/;print $1;

$string =~ s/a/A/

$string =~ tr/[atgc]/[ATGC]/

Page 17: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Additional syntaxAdditional syntax

$string =~ /AT*?AT/

$string =~ m#/var/log/messages#

$_ = “ATATATAGTGTGCGTGATATGGG”;

($one,$two,$three) =~ /AT..AT/g;

Page 18: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

What is a moduleWhat is a module

Two typesTwo types Object-oriented typeObject-oriented type

Provides something similar to a class Provides something similar to a class definitiondefinition

Remote function call Remote function call Provides a method to import subroutines or Provides a method to import subroutines or

variables for the main program to usevariables for the main program to use

Page 19: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Howto: Howto: MakingMaking a module a module

Create a file called workSaver.pm###########package workSaver;

sub doStuff {print “Stuff done\n”;

}

1; #statement that evaluates to true###########Now you can use with “use workSaver;”*

*Some restrictions apply

Page 20: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Howto:Making a module Howto:Making a module cont.cont.

This method would work very well for This method would work very well for subroutines that are used in several subroutines that are used in several programs.programs.

Reduces the “clutter” in your Reduces the “clutter” in your programprogram

Provides one maintenance point Provides one maintenance point instead of unknown number.instead of unknown number. Eases bug fixesEases bug fixes Careful of boundariesCareful of boundaries

Page 21: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

More Complete method:More Complete method:

Allows you to “pollute” the Allows you to “pollute” the namespace of the original program namespace of the original program selectively.selectively.

Makes the use of functions and Makes the use of functions and variables easiervariables easier

Still used about the same way as the Still used about the same way as the simple method but things are clearersimple method but things are clearer

Page 22: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

More CompleteMore Complete

package functional;use strict;use Exporter;our @ISA = ("Exporter");our @EXPORT = qw ();our @EXPORT_OK = qw ($variable1 $variable2 printout);our $VERSION = 2.0;

our $variable1 = "var1";our $variable2 = "var2";my $variable3 = "var3";

sub printout { my $passed_variable = shift; print "Your variable is $passed_variable mine are $variable1 , $variable2, $variable3 \n";}

1;

Page 23: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

CPANCPAN

Wouldn’t it be nice to have a place Wouldn’t it be nice to have a place where:where: You could find a bunch of perl modulesYou could find a bunch of perl modules It would be brows ableIt would be brows able SearchableSearchable Big pipe for people to download stuffBig pipe for people to download stuff Other people would be encouraged to Other people would be encouraged to

submit fixes and updatessubmit fixes and updates And it was all freeAnd it was all free

Page 24: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Sources of Sources of modules/Informationmodules/Information

www.CPAN.orgwww.CPAN.org www.bioperl.orgwww.bioperl.org www.perl.comwww.perl.com www.cetus-links.org/oo_infos.htmlwww.cetus-links.org/oo_infos.html

Page 25: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

BioperlBioperl

Set of modules that are extremely Set of modules that are extremely useful for working with biological useful for working with biological data. Actively maintained.data. Actively maintained.

www.bioperl.orgwww.bioperl.org is a very good is a very good place to get the basics of bioperlplace to get the basics of bioperl

We will go through an example to We will go through an example to see a typical usesee a typical use

Page 26: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Bioperl has several basic types of Bioperl has several basic types of objects:objects: Seq: a sequence the most common type Seq: a sequence the most common type

Bio::SeqBio::Seq Location objects: where it is how long it Location objects: where it is how long it

is etc.is etc. Interface objects: Bio::xyzI No Interface objects: Bio::xyzI No

implementation mostly a documentationimplementation mostly a documentation

Page 27: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Bioperl documentationBioperl documentation

Several different ways to find out Several different ways to find out about a moduleabout a module perldoc Bio::Seqperldoc Bio::Seq bioperl.org/usr/lib/perl5/site_perl/bioperl.org/usr/lib/perl5/site_perl/

5.8.0/bptutorial.pl 100 Bio::Seq5.8.0/bptutorial.pl 100 Bio::Seq Data::Dumper to print the data Data::Dumper to print the data

structurestructure Print the variablePrint the variable

Page 28: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Bio perl demoBio perl demo

Page 29: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Why use a databaseWhy use a database

Transaction control - only one user Transaction control - only one user can modify the data at any one time.can modify the data at any one time.

Access control - some people can Access control - some people can modify data, some can read data, modify data, some can read data, others can create data-structures.others can create data-structures.

Fast handling of lots of dataFast handling of lots of data Precise definition of data (mostly).Precise definition of data (mostly). Easy to share data resources with Easy to share data resources with

othersothers

Page 30: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Many choicesMany choices

There are many types: MS Access, There are many types: MS Access, Excel(sortof), sybase, oracle, Excel(sortof), sybase, oracle, postgres, msql, mysql …postgres, msql, mysql …

They each have their niche and They each have their niche and function best in certain cases, there function best in certain cases, there is also considerable overlap.is also considerable overlap.

SQL – structured query language is SQL – structured query language is a common threada common thread

Page 31: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

MySQL is better than MySQL is better than YourSQLYourSQL

Free on UnixFree on Unix Good developer supportGood developer support Constant bug fixes and feature additionConstant bug fixes and feature addition Good scalability to medium size and load, Good scalability to medium size and load,

OK performance.OK performance. Easy to install.Easy to install. Used at Ensemble and UCSC genome Used at Ensemble and UCSC genome

browsers, so a lot of information is readily browsers, so a lot of information is readily available in that format.available in that format.

Page 32: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Table Structure - SchemaTable Structure - Schema

Gene tableGene_IDName

Alias tableAlias_IDGene_IDAlias

Reference tableReference_IDGene_IDReferenceDataSource

Gene: ATP7BAliases:

Wilson disease-associated proteinCopper-transporting ATPase 2

References: Enzyme Commission: 3.6.3.4UniGene: Hs.84999AffyProbeU133: 204624_atAffyProbeU95: 37930_atRefSeq: NM_000053GenBank: AF034838GenBank: U11700LocusLink: 540

Page 33: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

SQL (MySQL dialect)SQL (MySQL dialect)

SELECT col_name FROM table SELECT col_name FROM table WHERE col_name = value;WHERE col_name = value;

SELECT COUNT(*) FROM table SELECT COUNT(*) FROM table WHERE col_name is like ‘%value%’;WHERE col_name is like ‘%value%’;

SELECT count(distinct(col_name)) SELECT count(distinct(col_name)) FROM table where col_name is not FROM table where col_name is not null;null;

CREATE, UPDATE, DELETE, INSERT CREATE, UPDATE, DELETE, INSERT have similar formshave similar forms

Page 34: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

SQL cont.SQL cont.

USE database_nameUSE database_name Also can be specified on the command line –DAlso can be specified on the command line –D

SHOW TABLES – lists all the tables in SHOW TABLES – lists all the tables in that database (also SHOW DATABASES).that database (also SHOW DATABASES).

DESCRIBE table_name – lists the columns DESCRIBE table_name – lists the columns and datatypes for each columnand datatypes for each column

or SHOW COLUMNS FROM table_nameor SHOW COLUMNS FROM table_name

Page 35: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

More advanced SELECTSMore advanced SELECTS

SELECT (column_list) FROM SELECT (column_list) FROM (table_list) WHERE (constraints) (table_list) WHERE (constraints) GROUP_BY (grouping columns) GROUP_BY (grouping columns) ORDER_BY (sorting columns) LIMIT ORDER_BY (sorting columns) LIMIT (limit number);(limit number);

SELECT col_name from (table1, SELECT col_name from (table1, table2) where table1_val = table2) where table1_val = table2_val and table1_val2 > value;table2_val and table1_val2 > value; Example of a equi-joinExample of a equi-join

Page 36: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Getting the names rightGetting the names right

If you only have one table you only If you only have one table you only need to use the column nameneed to use the column name

When you are using joins this may When you are using joins this may not be adequate.not be adequate. If two tables have the column primary If two tables have the column primary

you would need to call the column you would need to call the column table1.primary or table2.primarytable1.primary or table2.primary

Page 37: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Data TypesData Types INTINT

Tinyint –128 to 127Tinyint –128 to 127 Smallint –32768 to 32767Smallint –32768 to 32767 Mediumint –8388608 to 8388607Mediumint –8388608 to 8388607 Int –2147683648 to 2147483647Int –2147683648 to 2147483647 Bigint –9223372036854775808 to Bigint –9223372036854775808 to

9223372036854775807 9223372036854775807 FLOATFLOAT

Float 4 bytesFloat 4 bytes Double 8 bytesDouble 8 bytes

Page 38: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

CHARCHAR Char(n) character string of n n bytesChar(n) character string of n n bytes Varchar(n) character string up to n long Varchar(n) character string up to n long

L+1 bytesL+1 bytes Text upto 2^16 bytesText upto 2^16 bytes

BLOBs Binary Large OBjects BLOBs Binary Large OBjects

Page 39: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Perl DBIPerl DBI

Method for perl to connect to a Method for perl to connect to a database (virtually any database) database (virtually any database) and read or modify data. and read or modify data.

The statements are constructed very The statements are constructed very similar to SQL statements that similar to SQL statements that would be entered on the command would be entered on the command line so learning SQL is still line so learning SQL is still necessarynecessary

Page 40: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Statements in DBIStatements in DBI

ConnectConnect Used to establish initial connectionUsed to establish initial connection

PreparePrepare Prepare a statement to executePrepare a statement to execute

ExecuteExecute Execute the statementExecute the statement

DoDo prepare a statement that does not return prepare a statement that does not return

results and execute it results and execute it

Page 41: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

FetchFetch Several types used to get returned dataSeveral types used to get returned data

DisconnectDisconnect Disconnect from the serverDisconnect from the server

Page 42: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

Types of fetchTypes of fetch

““fetchrow_array”fetchrow_array” Used to fetch an array of scalars each Used to fetch an array of scalars each

timetime Can also use “fetchrow_arrayref”Can also use “fetchrow_arrayref”

““fetchrow_hash”fetchrow_hash” Used to fetch a hash indexed by column Used to fetch a hash indexed by column

name.name. Slower but cleaner code.Slower but cleaner code. Can also use “fetchrow_hashref”.Can also use “fetchrow_hashref”.

Page 43: Managing complexity (Advanced Perl) Using perl for specific tasks with help from Bioperl and others

More advanced More advanced statementsstatements

QuoteQuote Used to properly quote data for use with a Used to properly quote data for use with a

prepare statementprepare statement ““$value = $dbh->quote($blast_result);”$value = $dbh->quote($blast_result);”

PlaceholdersPlaceholders Speeds up execution, optionalSpeeds up execution, optional

my $prep = $dbh->prepare (“select x from y where z my $prep = $dbh->prepare (“select x from y where z = ?”);= ?”);

loop_startloop_start $prep->bind_param(1,$z);$prep->bind_param(1,$z); $prep->execute();$prep->execute(); loop_endloop_end