introduction to python - verbs indexverbs.colorado.edu/~xuen/teaching/ling5200/ppts/python1.pdf ·...

62
1 Introduction to Python LING 5200 Computational Corpus Linguistics Nianwen Xue

Upload: others

Post on 26-Apr-2020

27 views

Category:

Documents


0 download

TRANSCRIPT

1

Introduction to Python

LING 5200Computational Corpus LinguisticsNianwen Xue

LING 5200, 2006 BASED on Kevin Cohen’s LING 52002

What's a programming language?

Way of converting a text file toinstructions for the machine

LING 5200, 2006 BASED on Kevin Cohen’s LING 52003

What's a programming language?

lexicon syntax

Vs. natural languages: no ambiguity

LING 5200, 2006 BASED on Kevin Cohen’s LING 52004

What does a program do?

Take in data (input) Do something with it (processing) Produce output

LING 5200, 2006 BASED on Kevin Cohen’s LING 52005

What does a program do?

Input your regex one or more files switches

For each line in each file, determinewhether or not it matches your regex

Tell you about it

egrep '^[0-9]+\/' epw.cd

LING 5200, 2006 BASED on Kevin Cohen’s LING 52006

Producing output in Python

print "hello, world"

LING 5200, 2006 BASED on Kevin Cohen’s LING 52007

Producing output

print "hello, world"

verb

LING 5200, 2006 BASED on Kevin Cohen’s LING 52008

Producing output

print "hello, world"

noun (object)

LING 5200, 2006 BASED on Kevin Cohen’s LING 52009

Producing output

Filename: helloWorld.py What do the file's permissions need to be?

LING 5200, 2006 BASED on Kevin Cohen’s LING 520010

Producing output

babel>./helloWorld.py

./helloWorld.py: line 1: print:command not found

LING 5200, 2006 BASED on Kevin Cohen’s LING 520011

Producing output

#!/usr/local/bin/python

print "hello, world"

“The magic line”

LING 5200, 2006 BASED on Kevin Cohen’s LING 520012

Producing output

babel>./helloWorld.py

hello, worldbabel>

LING 5200, 2006 BASED on Kevin Cohen’s LING 520013

Producing output

#!/usr/local/bin/python

print "hello, world\n";

"escape"character

LING 5200, 2006 BASED on Kevin Cohen’s LING 520014

Producing output

\t tab \n "newline"

LING 5200, 2006 BASED on Kevin Cohen’s LING 520015

Comments

#!/usr/local/bin/python

# the purpose of this program

# is to print "hello, world" to# the screen.

# author:[email protected]# 303-735-5383

# do the actual printing

print "hello, world\n" "Commenting"your code

“Not comments”

LING 5200, 2006 BASED on Kevin Cohen’s LING 520016

Comments

#1 use for comments: adding notes toyourself/other programmers

# the purpose of this program

# is to print "hello, world" to

# the screen.

# author: [email protected]

# 303-735-5383

LING 5200, 2006 BASED on Kevin Cohen’s LING 520017

Comments

Other use: causing Python to ignore a line

# print "hello, world\n";print "goodbye, cruel world\n";

"Commenting out"a line of code

LING 5200, 2006 BASED on Kevin Cohen’s LING 520018

Comments

Own-line or end-of-line formats

# print it

print "hello, world\n"

print "hello, world\n" # print it

LING 5200, 2006 BASED on Kevin Cohen’s LING 520019

Comments

Start comments with # – the rest of line isignored.

Can include a “documentation string” as the firstline of any new function or class that you define.

The development environment, debugger, andother tools use it: it’s good style to include one.def my_function(x, y): “““This is the docstring. Thisfunction does blah blah blah.”””# The code would go here...

LING 5200, 2006 BASED on Kevin Cohen’s LING 520020

Whitespace

Whitespace is meaningful in Python: especiallyindentation and placement of newlines. Use a newline to end a line of code.

(Not a semicolon like in C++ or Java.)(Use \ when must go to next line prematurely.)

No braces { } to mark blocks of code in Python…Use consistent indentation instead. The first line witha new indentation is considered outside of the block.

Often a colon appears at the start of a new block.(We’ll see this later for function and classdefinitions.)

LING 5200, 2006 BASED on Kevin Cohen’s LING 520021

Getting input

From the user input = raw_input(‘Your name please:\n’) Print input

From a file input_file = open(‘phone-numbers.txt’, “r”) Will learn what to do with a file later

LING 5200, 2006 BASED on Kevin Cohen’s LING 520022

Producing output

I'd like to print something different everyonce in a while…

#!/usr/local/bin/python

#my first python program

print “hello world\n”

name = raw_input(“Your name please:”)

print name

LING 5200, 2006 BASED on Kevin Cohen’s LING 520023

Variables

Name Contents Location in memory

LING 5200, 2006 BASED on Kevin Cohen’s LING 520024

Variables

Name (name) Contents (Kinder) Location in memory (13025)

$name

LING 5200, 2006 BASED on Kevin Cohen’s LING 520025

Good and bad names

1stnumber = 32 Print = 32 Large-number = 123456789 Dir:subdir = “/home/corpora”

LING 5200, 2006 BASED on Kevin Cohen’s LING 520026

Naming Rules

Names are case sensitive and cannot start with anumber. They can contain letters, numbers, andunderscores. bob Bob _bob _2_bob_ bob_2 BoB

There are some reserved words:and, assert, break, class, continue, def,del, elif, else, except, exec, finally,for, from, global, if, import, in, is,lambda, not, or, pass, print, raise,return, try, while

LING 5200, 2006 BASED on Kevin Cohen’s LING 520027

Accessing Non-existent Name If you try to access a name before it’s been properly

created (by placing it on the left side of anassignment), you’ll get an error.

>>> y

Traceback (most recent call last): File "<pyshell#16>", line 1, in -toplevel- yNameError: name ‘y' is not defined>>> y = 3>>> y3

LING 5200, 2006 BASED on Kevin Cohen’s LING 520028

Names and References 1

Python has no pointers like C or C++. Instead, it has “names” and“references”. (Works a lot like Lisp or Java.)

You create a name the first time it appears on the left side of anassignment expression:

x = 3

Names store “references” which are like pointers to locations inmemory that store a constant or some object. Python determines the type of the reference automatically based on

what data is assigned to it. It also decides when to delete it via garbage collection after any names

for the reference have passed out of scope.

LING 5200, 2006 BASED on Kevin Cohen’s LING 520029

Names and References 2 There is a lot going on when we type:x = 3

First, an integer 3 is created and stored inmemory.

A name x is created. An reference to the memory location storing the

3 is then assigned to the name x.

Type: IntegerData: 3

Name: xRef: <address1>

name list memory

LING 5200, 2006 BASED on Kevin Cohen’s LING 520030

Names and References 3

The data 3 we created is of type integer. InPython, the basic data types integer, float, andstring are “immutable.”

This doesn’t mean we can’t change the value ofx… For example, we could increment x.>>> x = 3>>> x = x + 1>>> print x4

LING 5200, 2006 BASED on Kevin Cohen’s LING 520031

Names and References 4 If we increment x, then what’s really happening is:

The reference of name x is looked up. The value at that reference is retrieved. The 3+1 calculation occurs, producing a new data element 4 which

is assigned to a fresh memory location with a new reference. The name x is changed to point to this new reference. The old data 3 is garbage collected if no name still refers to it.

Type: IntegerData: 3Name: x

Ref: <address1>

LING 5200, 2006 BASED on Kevin Cohen’s LING 520032

Names and References 4 If we increment x, then what’s really happening is:

The reference of name x is looked up. The value at that reference is retrieved. The 3+1 calculation occurs, producing a new data element 4 which

is assigned to a fresh memory location with a new reference. The name x is changed to point to this new reference. The old data 3 is garbage collected if no name still refers to it.

Type: IntegerData: 3Name: x

Ref: <address1>Type: IntegerData: 4

LING 5200, 2006 BASED on Kevin Cohen’s LING 520033

Names and References 4 If we increment x, then what’s really happening is:

The reference of name x is looked up. The value at that reference is retrieved. The 3+1 calculation occurs, producing a new data element 4 which

is assigned to a fresh memory location with a new reference. The name x is changed to point to this new reference. The old data 3 is garbage collected if no name still refers to it.

Type: IntegerData: 3Name: x

Ref: <address2>Type: IntegerData: 4

LING 5200, 2006 BASED on Kevin Cohen’s LING 520034

Names and References 4 If we increment x, then what’s really happening is:

The reference of name x is looked up. The value at that reference is retrieved. The 3+1 calculation occurs, producing a new data element 4 which

is assigned to a fresh memory location with a new reference. The name x is changed to point to this new reference. The old data 3 is garbage collected if no name still refers to it.

Name: xRef: <address2>

Type: IntegerData: 4

LING 5200, 2006 BASED on Kevin Cohen’s LING 520035

Assignment 1 So, for simple built-in datatypes (integers,

floats, strings), assignment behaves as youwould expect:>>> x = 3 # Creates 3, name x refers to 3>>> y = x # Creates name y, refers to 3.>>> y = 4 # Creates ref for 4. Changes y.>>> print x # No effect on x, still ref 3.3

LING 5200, 2006 BASED on Kevin Cohen’s LING 520036

Assignment 1 So, for simple built-in datatypes (integers,

floats, strings), assignment behaves as youwould expect:>>> x = 3 # Creates 3, name x refers to 3>>> y = x # Creates name y, refers to 3.>>> y = 4 # Creates ref for 4. Changes y.>>> print x # No effect on x, still ref 3.3

Type: IntegerData: 3

Name: xRef: <address1>

LING 5200, 2006 BASED on Kevin Cohen’s LING 520037

Assignment 1 So, for simple built-in datatypes (integers,

floats, strings), assignment behaves as youwould expect:>>> x = 3 # Creates 3, name x refers to 3>>> y = x # Creates name y, refers to 3.>>> y = 4 # Creates ref for 4. Changes y.>>> print x # No effect on x, still ref 3.3

Type: IntegerData: 3

Name: xRef: <address1>

Name: yRef: <address1>

LING 5200, 2006 BASED on Kevin Cohen’s LING 520038

Assignment 1 So, for simple built-in datatypes (integers,

floats, strings), assignment behaves as youwould expect:>>> x = 3 # Creates 3, name x refers to 3>>> y = x # Creates name y, refers to 3.>>> y = 4 # Creates ref for 4. Changes y.>>> print x # No effect on x, still ref 3.3

Type: IntegerData: 3

Name: xRef: <address1>

Type: IntegerData: 4

Name: yRef: <address1>

LING 5200, 2006 BASED on Kevin Cohen’s LING 520039

Assignment 1 So, for simple built-in datatypes (integers,

floats, strings), assignment behaves as youwould expect:>>> x = 3 # Creates 3, name x refers to 3>>> y = x # Creates name y, refers to 3.>>> y = 4 # Creates ref for 4. Changes y.>>> print x # No effect on x, still ref 3.3

Type: IntegerData: 3

Name: xRef: <address1>

Type: IntegerData: 4

Name: yRef: <address2>

LING 5200, 2006 BASED on Kevin Cohen’s LING 520040

Assignment 1 So, for simple built-in datatypes (integers,

floats, strings), assignment behaves as youwould expect:>>> x = 3 # Creates 3, name x refers to 3>>> y = x # Creates name y, refers to 3.>>> y = 4 # Creates ref for 4. Changes y.>>> print x # No effect on x, still ref 3.3

Type: IntegerData: 3

Name: xRef: <address1>

Type: IntegerData: 4

Name: yRef: <address2>

LING 5200, 2006 BASED on Kevin Cohen’s LING 520041

Assignment 2 But we’ll see that for other more complex data types

assignment seems to work differently. We’re talking about: lists, dictionaries, user-defined classes.

We will learn details about all of these type later.

The important thing is that they are “mutable.” This means we can make changes to their data without having to

copy it into a new memory reference address each time.

>>> x = 3 x = some mutable object>>> y = x y = x>>> y = 4 make a change to y>>> print x look at x3 x will be changed as well

immutable mutable

LING 5200, 2006 BASED on Kevin Cohen’s LING 520042

Assignment 3

Assume we have a name x that refers to a mutable object of someuser-defined class. This class has a “set” and a “get” function for somevalue.

>>> x.getSomeValue()4

We now create a new name y and set y=x.

>>> y = xThis creates a new name y which points to the same memory referenceas the name x. Now, if we make some change to y, then x will beaffected as well.

>>> y.setSomeValue(3)>>> y.getSomeValue()3>>> x.getSomeValue()3

LING 5200, 2006 BASED on Kevin Cohen’s LING 520043

Assignment 4 Because mutable data types can be changed in place without

producing a new reference every time there is a modification,then changes to one name for a reference will seem to affect allthose names for that same reference. This leads to thebehavior on the previous slide.

Passing Parameters to Functions: When passing parameters, immutable data types appear to

be “call by value” while mutable data types are “call byreference.”

(Mutable data can be changed inside a function to which theyare passed as a parameter. Immutable data seemsunaffected when passed to functions.)

LING 5200, 2006 BASED on Kevin Cohen’s LING 520044

Multiple Assignment

You can also assign to multiple names at the sametime.

>>> x, y = 2, 3>>> x2>>> y3

LING 5200, 2006 BASED on Kevin Cohen’s LING 520045

Basic Datatypes

Integers (default for numbers)z = 5 / 2 # Answer is 2, integer division.

Floatsx = 3.456

Strings ‘the movie “gladiator”’Can use “” or ‘’ to specify. “abc” ‘abc’ (Same thing.)Unmatched ones can occur within the string. “matt’s”Use triple double-quotes for multi-line strings or strings

than contain both ‘ and “ inside of them: “““a‘b“c”””

LING 5200, 2006 BASED on Kevin Cohen’s LING 520046

Python and TypesPython determines the data types

in a program automatically. “Dynamic Typing”

But Python’s not casual about types, itenforces them after it figures them out. “Strong Typing”

So, for example, you can’t just append an integer to a string. Youmust first convert the integer to a string itself.

x = “the answer is ” # Decides x is string. y = “23” # Decides y is integer. print x + y # Python will complain about this.

47

Numerical Operations

LING 5200, 2006 BASED on Kevin Cohen’s LING 520048

Numerical operations

Integer and float additions 4 + 6 4 + 6.0

Will the results be the same? 5 / 3 5 / 3.0

LING 5200, 2006 BASED on Kevin Cohen’s LING 520049

Numerical operations

Integer and float additions 4 + 6 4 + 6.0

Will the results be the same? 5 / 3 5 / 3.0

Python will first convert the operands up to themost complicated operand, and then perform themath to the same-type operands

LING 5200, 2006 BASED on Kevin Cohen’s LING 520050

Operator precedence

Numerical operator precedence *, /, //, %, +, -

Use parentheses to break precedence (3 + 5) * 6

51

String Operations

LING 5200, 2006 BASED on Kevin Cohen’s LING 520052

Using +

name = “John Doe"

print "hello, ” + name

+ means concatenation when applied to a string

LING 5200, 2006 BASED on Kevin Cohen’s LING 520053

Using *

name = “John Doe"

print "hello, ” * 3 + name

* Means repetition when applied to a string

LING 5200, 2006 BASED on Kevin Cohen’s LING 520054

Index and slice

str = “counterterrorism” str[4] str[-4] str[0:7] Str[7:13] Str[13:16]

LING 5200, 2006 BASED on Kevin Cohen’s LING 520055

Index and slice

msirorretretnuoc

0 1 2 3 4 5 … … 15 16

… … -3 -2 -1

LING 5200, 2006 BASED on Kevin Cohen’s LING 520056

len and replace len(str) str.replace(‘ism’, ‘ist’) str.replace(‘ism’, ‘’) What is the value of str?

Str1 = str.replace(‘ism’, ‘’)

replace(‘ism’, ‘ist’), what happens? str.len(str), what happens?

replace is a method of the “str” object, len is a built-in function

LING 5200, 2006 BASED on Kevin Cohen’s LING 520057

String Operations

We can use some methods built-in to the stringdata type to perform some formatting operationson strings:

>>> “hello”.upper()‘HELLO’

There are many other handy string operationsavailable. Check the Python documentation formore.

LING 5200, 2006 BASED on Kevin Cohen’s LING 520058

String Formatting Operator: %

The operator % allows us to build a string out ofmany data items in a “fill in the blanks” fashion. Also allows us to control how the final string output

will appear. For example, we could force a number to display with a

specific number of digits after the decimal point.

It is very similar to the sprintf command of C.

LING 5200, 2006 BASED on Kevin Cohen’s LING 520059

Formatting Strings with %>>> x = “abc”>>> y = 34>>> “%s xyz %d” % (x, y)‘abc xyz 34’

The tuple following the % operator is used to fill inthe blanks in the original string marked with %s or%d. Check Python documentation for whether to use %s, %d, or

some other formatting code inside the string.

LING 5200, 2006 BASED on Kevin Cohen’s LING 520060

Printing with Python You can print a string to the screen using “print.” Using the % string operator in combination with the

print command, we can format our output text.>>> print “%s xyz %d” % (“abc”, 34)abc xyz 34

“Print” automatically adds a newline to the end of the string. Ifyou include a list of strings, it will concatenate them with aspace between them.>>> print “abc” >>> print “abc”, “def”abc abc def

LING 5200, 2006 BASED on Kevin Cohen’s LING 520061

Getting more practice: Python forLinguists

http://verbs.colorado.edu/~xuen/teaching/ling5200/PythonForLinguists/Python-1.pdf

LING 5200, 2006 BASED on Kevin Cohen’s LING 520062

More Python resources

http://docs.python.org/tut/tut.html