python course in bioinformaticsxhx/courses/cs174/lectures/python_tutorial.pdf · outline general...
TRANSCRIPT
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Python course in Bioinformatics
Xiaohui Xie
March 31, 2009
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
General Introduction
Basic Types in Python
Programming
Exercises
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Why Python?
I Scripting language, raplid applications
I Minimalistic syntax
I Powerful
I Flexiablel data structure
I Widely used in Bioinformatics, and many other domains
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Where to get Python and learn more?
I Main source of information: http://docs.python.org/
I Tutorial: http://docs.python.org/tutorial/index.html
I Biopython: http://biopython.org/wiki/Main Page
Xiaohui Xie Python course in Bioinformatics
http://docs.python.org/http://docs.python.org/tutorial/index.htmlhttp://biopython.org/wiki/Main_Page
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Invoking Python
I To start: type python in command lineI It will look like
Python 2.5.2 (r252:60911, Mar 25 2009, 00:12:33)
[GCC 4.1.2 (Gentoo 4.1.2 p1.0.2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
I You can now type commands in the line denoted by >>>
I To leave: type end-of-file character ctrl-D on Unix, ctrl-zon Windows
I This is called interactive mode
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Appetizer Example
I Task: Print all numbers in a given fileI File: numbers.txt
2.1
3.2
4.3
I Code: print.py
# Note: the code lines begin in the first column of the file. In
# Python code indentation *is* syntactically relevant. Thus, the
# hash # (which is a comment symbol, everything past a hash is
# ignored on current line) marks the first column of the code
data = open("numbers.txt", "r")
for d in data:
print d
data.close()
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Appetizer Example cont’d
I Task: Print the sum of all the data in the fileI Code: sum.py
data = open("numbers.txt", "r")
s = 0
for d in data:
s = s + float(d)
print s
data.close()
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Interative Mode
I prompt >>> allows to enter command
I command is ended by newline
I variables need not be initialized or declared
I a colon “:” opens a block
I ... prompt denotes that block is expected
I no prompt means python output
I a block is indented
I by ending indentation, block is ended
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Differences to Java or C
I can be used interatively. This makes it much easier to testprograms and to debug
I no declaration of variables
I no brackets denote block, just indentation (Emacs supportsthe style)
I a comment begins with a “#”. Everything after that isignored.
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Numbers
I Example
>>> 2+2
4
>>> # This is a comment
... 2+2
4
>>> 2+2 # and a comment on the same line as code
4
>>> (50-5*6)/4
5
>>> # Integer division returns the floor:
... 7/3
2
>>> 7/-3
-3
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Numbers cont’d
I Example
>>> width = 20
>>> height = 5*9
>>> width * height
900
>>> # Variables must be defined (assigned a value) before they can be
>>> # used, or an error will occur:
>>> # try to access an undefined variable
... n
Traceback (most recent call last):
File "", line 1, in
NameError: name ’n’ is not defined
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Strings
I Strings can be enclosed in single quotes or double quotesI Example
>>> ’spam eggs’
’spam eggs’
>>> ’doesn\’t’
"doesn’t"
>>> "doesn’t"
"doesn’t"
>>> ’"Yes," he said.’
’"Yes," he said.’
>>> "\"Yes,\" he said."
’"Yes," he said.’
>>> ’"Isn\’t," she said.’
’"Isn\’t," she said.’
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Strings cont’d
I Strings can be surrounded in a pair of matching triple-quotes:""" or ’’’. End of lines do not need to be escaped whenusing triple-quotes, but they will be included in the string.
I Example
print """
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to
"""
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Strings cont’d
I Strings can be concatenated (glued together) with the +operator, and repeated with *:
I Example
>>> word = ’Help’ + ’A’
>>> word
’HelpA’
>>> ’’
’’
>>> ’str’ ’ing’ # >> ’str’.strip() + ’ing’ #
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Strings cont’d
I Strings can be subscripted (indexed); like in C, the firstcharacter of a string has subscript (index) 0.
I There is no separate character type; a character is simply astring of size one.
I Substrings can be specified with the slice notation: twoindices separated by a colon.
I Example
>>> word = ’Help’ + ’A’
>>> word[4]
’A’
>>> word[0:2]
’He’
>>> word[:2] # The first two characters
’He’
>>> word[2:] # Everything except the first two characters
’lpA’
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Strings cont’d
I Unlike a C string, Python strings cannot be changed.Assigning to an indexed position in the string results in anerror:
I Example
>>> word[0] = ’x’
Traceback (most recent call last):
File "", line 1, in ?
TypeError: object doesn’t support item assignment
>>> ’x’ + word[1:]
’xelpA’
>>> ’Splat’ + word[4]
’SplatA’
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Strings cont’d
I Example
>>> from string import *
>>> dna = ’gcatgacgttattacgactctg’
>>> len(dna)
22
>>> ’n’ in dna
False
>>> count(dna,’a’)
5
>>> replace(dna, ’a’, ’A’)
’gcAtgAcgttAttAcgActctg’
I Exercise: Calculate GC percent of dna
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Strings cont’d
I Solution: Calculate GC percent
>>> gc = (count(dna, ’c’) + count(dna, ’g’)) / float(len(dna)) * 100
>>> "%.2f" % gc
’64.08’
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Strings cont’d
I Exercise: Calculate the complement of DNA
A - T
C - G
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Lists
I A list of comma-separated values (items) between squarebrackets.
I List items need not all have the same type (compund datatypes)
>>> a = [’spam’, ’eggs’, 100, 1234]
>> a[0]
’spam’
>>> a[3]
1234
>>> a[-2]
100
>>> a[1:-1]
[’eggs’, 100]
>>> a[:2] + [’bacon’, 2*2]
[’spam’, ’eggs’, ’bacon’, 4]
>>> 3*a[:3] + [’Boo!’]
[’spam’, ’eggs’, 100, ’spam’, ’eggs’, 100, ’spam’, ’eggs’, 100, ’Boo!’]
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Lists cont’d
I Unlike strings, which are immutable, it is possible to changeindividual elements of a list
I Assignment to slices is also possible, and this can even changethe size of the list or clear it entirely
I Example
>>> a
[’spam’, ’eggs’, 100, 1234]
>>> a[2] = a[2] + 23
>>> a[0:2] = [1, 12] # Replace some items:
>>> a[0:2] = [] # Remove some:
>>> a
[123, 1234]
>>> a[1:1] = [’bletch’, ’xyzzy’] # Insert some:
>>> a
[123, ’bletch’, ’xyzzy’, 1234]
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Lists cont’d
I Functions returning a list
>>> range(3)
[0, 1, 2]
>>> range(10,20,2)
[10, 12, 14, 16, 18]
>>> range(5,2,-1)
[5, 4, 3]
>>> aas = "ALA TYR TRP SER GLY".split()
>>> aas
[’ALA’, ’TYR’, ’TRP’, ’SER’, ’GLY’]
>>> " ".join(aas)
’ALA TYR TRP SER GLY’
>>> l = list(’atgatgcgcccacgtacga’)
[’a’, ’t’, ’g’, ’a’, ’t’, ’g’, ’c’, ’g’, ’c’, ’c’, ’c’, ’a’,
’c’, ’g’, ’t’, ’a’, ’c’, ’g’, ’a’]
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Dictionaries
I A dictionary is an unordered set of key: value pairs, with therequirement that the keys are unique
I A pair of braces creates an empty dictionary: .I Placing a comma-separated list of key:value pairs within the
braces adds initial key:value pairs to the dictionaryI The main operations on a dictionary are storing a value with
some key and extracting the value given the keyI Example
>>> tel = {’jack’: 4098, ’sape’: 4139}
>>> tel[’guido’] = 4127
>>> tel
{’sape’: 4139, ’guido’: 4127, ’jack’: 4098}
>>> tel[’jack’]
4098
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Dictionaries cont’d
I Example
>>> tel = {’jack’: 4098, ’sape’: 4139, ’guido’ = 4127}
>>> del tel[’sape’]
>>> tel[’irv’] = 4127
>>> tel
{’guido’: 4127, ’irv’: 4127, ’jack’: 4098}
>>> tel.keys()
[’guido’, ’irv’, ’jack’]
>>> ’guido’ in tel
True
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Programming
I Example
a, b = 3, 4
if a > b:
print a + b
else:
print a - b
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Programming cont’d
I Example
>>> # Fibonacci series:
... # the sum of two elements defines the next
... a, b = 0, 1
>>> while b < 10:
... print b
... a, b = b, a+b
...
1
1
2
3
5
8
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Programming features
I multiple assignment: rhs evaluated before anything on theleft, and (in rhs) from left to right
I while loop executes as long as condition is True (non-zero,not the empty string, not None)
I block indentation must be the same for each line of block
I need empty line in interactive mode to indicate end of block(not required in edited code)
I use of print
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Printing
I Example
>>> i = 256*256
>>> print ’The value of i is’, i
The value of i is 65536
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Flow control
I Example
x = 35
if x < 0:
x = 0
print ’Negative changed to zero’
elif x == 0:
print ’Zero’
elif x == 1:
print ’Single’
else:
print ’More’
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Iteration
I Python for iterates over sequence (string, list, generatedsequence)
I Example
a = [’cat’, ’window’, ’defenestrate’]
for x in a:
print x, len(x)
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Iteration
I Python for iterates over sequence (string, list, generatedsequence)
I Example
a = [’cat’, ’window’, ’defenestrate’]
for x in a:
print x, len(x)
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Definiting functions
I Example
def fib(n): # write Fibonacci series up to n
"""Print a Fibonacci series up to n."""
a, b = 0, 1
while b < n:
print b,
a, b = b, a+b
# Now call the function we just defined:
fib(2000)
# will return:
# 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Reverse Complement of DNA
I Excercise: Find the reverse complement of a DNA sequenceI Example
5’ - ACCGGTTAATT 3’ : forward strand
3’ - TGGCCAATTAA 5’ : reverse strand
So the reverse complement of ACCGGTTAATT is AATTAACCGGA
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Reverse Complement of DNA
I Solution: Find the reverse complement of a DNA sequence
from string import *
def revcomp(dna):
""" reverse complement of a DNA sequence """
comp = dna.translate(maketrans("AGCTagct", "TCGAtcga"))
lcomp = list(comp)
lcomp.reverse()
return join(lcomp, "")
Xiaohui Xie Python course in Bioinformatics
-
OutlineGeneral Introduction
Basic Types in PythonProgramming
Exercises
Translate a DNA sequence
I Excercise: Translate a DNA sequence to an amino acidsequence
I Genetic code
standard = { ’ttt’: ’F’, ’tct’: ’S’, ’tat’: ’Y’, ’tgt’: ’C’,
’ttc’: ’F’, ’tcc’: ’S’, ’tac’: ’Y’, ’tgc’: ’C’,
’tta’: ’L’, ’tca’: ’S’, ’taa’: ’*’ , ’tca’: ’*’,
’ttg’: ’L’, ’tcg’: ’S’, ’tag’: ’*’, ’tcg’: ’W’,
’ctt’: ’L’, ’cct’: ’P’, ’cat’: ’H’, ’cgt’: ’R’,
’ctc’: ’L’, ’ccc’: ’P’, ’cac’: ’H’, ’cgc’: ’R’,
’cta’: ’L’, ’cca’: ’P’, ’caa’: ’Q’, ’cga’: ’R’,
’ctg’: ’L’, ’ccg’: ’P’, ’cag’: ’Q’, ’cgg’: ’R’,
’att’: ’I’, ’act’: ’T’, ’aat’: ’N’, ’agt’: ’S’,
’atc’: ’I’, ’acc’: ’T’, ’aac’: ’N’, ’agc’: ’S’,
’ata’: ’I’, ’aca’: ’T’, ’aaa’: ’K’, ’aga’: ’R’,
’atg’: ’M’, ’acg’: ’T’, ’aag’: ’K’, ’agg’: ’R’,
’gtt’: ’V’, ’gct’: ’A’, ’gat’: ’D’, ’ggt’: ’G’,
’gtc’: ’V’, ’gcc’: ’A’, ’gac’: ’D’, ’ggc’: ’G’,
’gta’: ’V’, ’gca’: ’A’, ’gaa’: ’E’, ’gga’: ’G’,
’gtg’: ’V’, ’gcg’: ’A’, ’gag’: ’E’, ’ggg’: ’G’ }
Xiaohui Xie Python course in Bioinformatics
OutlineGeneral IntroductionBasic Types in PythonProgrammingExercises