functions and modules in python
DESCRIPTION
Day 4 of an introductory python course for biologists. Theme: functions and modules.TRANSCRIPT
Homework: TranslateProtein.py
● Input files are in /projects/temporary/cees-python-course/Karin
● translationtable.txt - tab separated● dna31.fsa
● Script should:
● Open the translationtable.txt file and read it into a dictionary
● Open the dna31.fsa file and read the contents.● Translates the DNA into protein using the dictionary● Prints the translation in a fasta format to the file
TranslateProtein.fsa. Each protein line should be 60 characters long.
Modularization
● Programs can get big● Risk of doing the same thing many times● Functions and modules encourage
● re-usability● readability● helps with maintenance
Functions
● Most common way to modularize a program
● Takes values as parameters, executes code on them, returns results
● Functions also found builtin to Python:● open(filename, mode)● sum([list of numbers]
● These do something on their parameters, and returns the results
Functions – how to define
def FunctionName(param1, param2, ...):
""" Optional Function desc (Docstring) """
FUNCTION CODE ...
return DATA
● keyword: def – says this is a function
● functions need names
● parameters are optional, but common
● docstring useful, but not mandatory
● FUNCTION CODE does something
● keyword return results: return
Function example
>>> def hello(name):... results = "Hello World to " + name + "!"... return results... >>> hello()Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: hello() takes exactly 1 argument (0 given)>>> hello("Lex")'Hello World to Lex!'>>>
● Task: make script from this – take name from command line
● Print results to screen
Function examplescript
import sys
def hello(name): results = "Hello World to " + name + "!" return results
name = sys.argv[1]functionresult = hello(name)print functionresult
[karinlag@freebee]% python hello.py Traceback (most recent call last): File "hello.py", line 8, in ? name = sys.argv[1]IndexError: list index out of range[karinlag@freebee]% python hello.py LexHello World to Lex![karinlag@freebee]%
Returning values
● Returning is not mandatory, if no return, None is returned by default
● Can return more than one value - results will be shown as a tuple
>>> def test(x, y):... a = x*y... return x, a... >>> test(1,2)(1, 2)>>>
Function scope
● Variables defined inside a function can only be seen there!
● Access the value of variables defined inside of function: return variable
>>> def test(x):... z = 10... print "the value of z is " + str(z)... return x*2... >>> z = 50>>> test(3)the value of z is 106>>> z50>>> xTraceback (most recent call last): File "<stdin>", line 1, in <module>NameError: name 'x' is not defined>>>
Scope example
Parameters
● Functions can take parameters – not mandatory
● Parameters follow the order in which they are given
>>> def test(x, y):... print x*2... print y + str(x)... >>> test(2, "y")4y2>>> test("y", 2)yyTraceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in testTypeError: unsupported operand type(s) for +: 'int' and 'str'>>>
Named parameters
● Can use named parameters
>>> def test(x, y):... print x*2... print y + str(x)... >>> test(2, "y")4y2>>> test("y", 2)yyTraceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in testTypeError: unsupported operand type(s) for +: 'int' and 'str'>>> test(y="y", x=2)4y2>>>
Default parameters
● Parameters can be given a default value● With default, parameter does not have to
be specified, default will be used● Can still name parameter in parameter list
>>> def hello(name = "Everybody"):... results = "Hello World to " + name + "!"... return results... >>> hello("Anna")'Hello World to Anna!'>>> hello()'Hello World to Everybody!'>>> hello(name = "Annette")'Hello World to Annette!'>>>
Exercise TranslateProteinFunctions.py● Use script from homework● Create the following functions:
● get_translation_table(filename)– return dict with codons and protein codes
● read_dna_string(filename)– return tuple with (descr, DNA_string)
● translate_protein(dictionary, DNA_string)– return the protein version of the DNA string
● pretty_print(descr, protein_string, outname)– write result to outname in fasta format
TranslateProteinFunctions.py
import sys
YOUR CODE GOES HERE!!!!
translationtable = sys.argv[1]fastafile = sys.argv[2]outfile = sys.argv[3]
translation_dict = get_translation_table(translationtable)description, DNA_string = read_dna_string(fastafile)protein_string = translate_protein(translation_dict, DNA_string)pretty_print(description, protein_string, outfile)
get_translation_table
def get_translation_table(translationtable): fh = open('translationtable.txt' , 'r') trans_dict = {} for line in fh: codon = line.split()[0]
aa = line.split()[1] trans_dict[codon] = aa fh.close() return trans_dict
read_dna_string
def read_dna_string(fastafile): fh = open(fastafile, "r") line = fh.readline() header_line = line[1:-1]
seq = "" for line in fh: seq += line[:-1] fh.close() return (header_line, seq)
translate_protein
def translate_protein(translation_dict, DNA_string): aa_seq = ""
for i in range(0, len(DNA_string)-3, 3): codon = DNA_string[i:i+3] one_letter = translation_dict[codon] aa_seq += one_letter
return aa_seq
pretty_print
def pretty_print(description, protein_string, outfile): fh = open(outfile, "w") fh.write(">" + description + "\n")
for i in range(0, len(protein_string), 60): fh.write(protein_string[i:i+60] + "\n") fh.close()
Modules
● A module is a file with functions, constants and other code in it
● Module name = filename without .py● Can be used inside another program● Needs to be import-ed into program● Lots of builtin modules: sys, os, os.path....● Can also create your own
Using module
● One of two import statements:
1: import modulename
2: from module import function/constant
● If method 1:● modulename.function(arguments)
● If method 2:● function(arguments) – module name not
needed● beware of function name collision
Operating system modules – os and os.path
● Modules dealing with files and operating system interaction
● Commonly used methods:● os.getcwd() - get working directory● os.chdir(path) – change working directory● os.listdir([dir = .]) - get a list of all files in this
directory● os.mkdir(path) – create directory● os.path.join(dirname, dirname/filename...)
Your own modules
● Three steps:
1. Create file with functions in it. Module name is same as filename without .py
2. In other script, do import modulename
3. In other script, use function like this: modulename.functionname(args)
Separating module use and main use
● Files containing python code can be:● script file● module file
● Module functions can be used in scripts● But: modules can also be scripts● Question is – how do you know if the code
is being executed in the module script or an external script?
Module use / main use
● When a script is being run, within that script a variable called __name__ will be set to the string “__main__”
● Can test on this string to see if this script is being run
● Benefit: can define functions in script that can be used in module mode later
Module mode / main mode
import sys
<code as before>
translationtable = sys.argv[1]fastafile = sys.argv[2]outfile = sys.argv[3]
translation_dict = get_translation_table(translationtable)description, DNA_string = read_dna_string(fastafile)protein_string = translate_protein(translation_dict, DNA_string)pretty_print(description, protein_string, outfile)
When this script is being used,this will always run, no matter what!
Module use / main use
# this is a scriptimport sysimport TranslateProteinFunctions
description, DNA_string = read_dna_string(sys.argv[1])print description
[karinlag@freebee]% python modtest.py dna31.fsa Traceback (most recent call last): File "modtest.py", line 2, in ? import TranslateProteinFunctions File "TranslateProteinFunctions.py", line 44, in ? fastafile = sys.argv[2]IndexError: list index out of range[karinlag@freebee]Karin%
TranslateProteinFuctions.py with main
import sys
<code as before>
if __name__ == “__main__”:translationtable = sys.argv[1]fastafile = sys.argv[2]outfile = sys.argv[3]
translation_dict = get_translation_table(translationtable)description, DNA_string = read_dna_string(fastafile)protein_string = translate_protein(translation_dict, DNA_string)pretty_print(description, protein_string, outfile)
ConcatFasta.py
● Create a script that has the following:● function get_fastafiles(dirname)
– gets all the files in the directory, checks if they are fasta files (end in .fsa), returns list of fasta files
– hint: you need os.path to create full relative file names
● function concat_fastafiles(filelist, outfile)– takes a list of fasta files, opens and reads each of
them, writes them to outfile
● if __name__ == “__main__”:– do what needs to be done to run script
● Remember imports!