getting set with python and nltk tuples, strings, numeric types

37
Getting set with Python and NLTK Tuples, Strings, Numeric types

Upload: kerri

Post on 24-Feb-2016

63 views

Category:

Documents


0 download

DESCRIPTION

Getting set with Python and NLTK Tuples, Strings, Numeric types. Python. By now, you should have Python available on a computer you can use. Download nltk : http://nltk.org/ install.html From python (type python from the command line or use your favorite IDE) import nltk nltk.download () - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Getting set with Python and NLTK Tuples, Strings, Numeric types

Getting set with Python and NLTKTuples, Strings, Numeric types

Page 2: Getting set with Python and NLTK Tuples, Strings, Numeric types

2

Python• By now, you should have Python available

on a computer you can use.• Download nltk: http://nltk.org/install.html• From python (type python from the

command line or use your favorite IDE)– import nltk– nltk.download()

• This opens a window with choices of what to download. Choose “book”

Page 3: Getting set with Python and NLTK Tuples, Strings, Numeric types

3

Getting started• from nltk.book import *• Let’s do some of the basic operations of the

nltk– Create a concordance from text1 of “monstrous”– Repeat for another word in another text– Find words similar to “monstrous” in several texts– Determine the length of some of the texts– Get a sorted list of the words in one of the texts.

How many distinct words are in the text?

Page 4: Getting set with Python and NLTK Tuples, Strings, Numeric types

4

Defining a function>>> def lexical_diversity(text):... len(text) / len(set(text))

Note the indentation. Python does not use brackets to indicate boundaries of blocks of code. The indentation is necessary.

Do a “lexical_diversity” test on one of the other texts.

Page 5: Getting set with Python and NLTK Tuples, Strings, Numeric types

5

Lists: Review and some operators not mentioned before

• List is a mutable collection of objects of arbitrary type.– Create a list:

– places = list() or places = []– places = [“home”, “work”, “hotel”]– otherplaces=[‘home’,’office’,’restaurant’]

– Changing a list:• places.append(‘restaurant’)• places.insert(0,’stadium’)• places.remove(‘work’)• places.extend(otherplaces)• places.pop()• places.pop(3)• places[1]=“beach”• places.sort()• places.reverse()

Note use of single or double quotes

Note ues of () or []

Not a complete set – selected by text authors

Page 6: Getting set with Python and NLTK Tuples, Strings, Numeric types

6

Information about lists• Again, the list of places

• len(places)• places[i] --- positive or negative values• “beach” in places• places.count(“home”)• places.index(“stadium”)• places.index(‘home’,0,4)• places == otherplaces• places != otherplaces• places < otherplaces• places.index[‘home’] • places.index[‘home’,2] -- start looking at spot 2

Page 7: Getting set with Python and NLTK Tuples, Strings, Numeric types

7

New lists• from old lists– places[0,3]– places[1,4,2]– places + otherplaces• note places + “pub” vs places +[‘pub’]

– places * 2• Creating a list– range(5,100,25) -- how many entries

Page 8: Getting set with Python and NLTK Tuples, Strings, Numeric types

8

Immutable objects• Lists are mutable.

– Operations that can change a list –• Name some –

• Two important types of objects are not mutable: str and tuple– tuple is like a list, but is not mutable

• A fixed sequence of arbitrary objects• Defined with () instead of []

– grades = (“A”, “A-”, “B+”,”B”,”B-”,”C+”,”C”)– str (string) is a fixed sequence of characters

• Operations on lists that do not change the list can be applied to tuple and to str also

• Operations that make changes must create a new copy of the structure to hold the changed version

Page 9: Getting set with Python and NLTK Tuples, Strings, Numeric types

9

Strings• Strings are specified using quotes –

single or double– name1 = “Ella Lane”– name2= ‘Tom Riley’

• If the string contains a quotation mark, it must be distinct from the marks denoting the string:– part1= “Ella’s toy”– Part2=‘Tom\n’s plane’

Page 10: Getting set with Python and NLTK Tuples, Strings, Numeric types

10

Methods• In general, methods that do not change the

list are available to use with str and tuple• String methods

>>> message=(“Meet me at the coffee shop. OK?”)

>>> message.lower()'meet me at the coffee shop. ok?'>>> message.upper()'MEET ME AT THE COFFEE SHOP. OK?'

Page 11: Getting set with Python and NLTK Tuples, Strings, Numeric types

11

Immutable, but…• It is possible to create a new string with

the same name as a previous string. This leaves the previous string without a label.>>> note="walk today">>> note'walk today'>>> note = "go shopping">>> note'go shopping'

The original string is still there, but cannot be accessed because it no longer has a label

Page 12: Getting set with Python and NLTK Tuples, Strings, Numeric types

12

Strings and Lists of Strings• Extract individual words from a string

>>> words = message.split()>>> words['Meet', 'me', 'at', 'the', 'coffee', 'shop.', 'OK?']

• OK to split on any token>>> terms=("12098,scheduling,of,real,time,10,21,,real time,")>>> terms'12098,scheduling,of,real,time,10,21,,real time,'>>> termslist=terms.split()>>> termslist['12098,scheduling,of,real,time,10,21,,real', 'time,']>>> termslist=terms.split(',')>>> termslist['12098', 'scheduling', 'of', 'real', 'time', '10', '21', '', 'real time',

'’]

Note that there are no spaces in the words in the list. The spaces were used to separate the words and are dropped.

Page 13: Getting set with Python and NLTK Tuples, Strings, Numeric types

13

String Methods• Methods for strings, not lists:

– terms.isalpha()– terms.isdigit()– terms.isspace()– terms.islower()– terms.isupper()– message.lower()– message.upper()– message.capitalize()– message.center(80) (center in 80 places)– message.ljustify(80) (left justify in 80 places)– message.rjustify(80)– message.strip() (remove left and right white spaces)– message.strip(chars) (returns string with left and/or right chars

removed)– startnote.replace("Please m","M")

Page 14: Getting set with Python and NLTK Tuples, Strings, Numeric types

14

Adding lists• sent1 is the first sentence in text1, sent2

the first sentence in text2, etc. – expressed as lists of words.

• sent1+sent2 is the list that is the first sentence of text1 followed by the first sentence of text2

• Try it: combine the sentences of some texts.

Page 15: Getting set with Python and NLTK Tuples, Strings, Numeric types

15

Indexing>>> text4[173]'awaken'>>> text4.index('awaken')173

>>> text4[1000:1100]['that', 'the', 'propitious', 'smiles', 'of', 'Heaven', 'can', 'never', 'be', 'expected', 'on', 'a', 'nation', 'that', 'disregards', 'the', 'eternal', 'rules', 'of', 'order', 'and', 'right', 'which', 'Heaven', 'itself', 'has', 'ordained', ';', 'and', 'since', 'the', 'preservation', 'of', 'the', 'sacred', 'fire', 'of’ …

Slicing

>>> text4[:10]['Fellow', '-', 'Citizens', 'of', 'the', 'Senate', 'and', 'of', 'the', 'House']

>>> len(text4)145735>>> text4[145720:]['you', '.', 'God', 'bless', 'you', '.', 'And', 'God', 'bless', 'the', 'United', 'States', 'of', 'America', '.']

Page 16: Getting set with Python and NLTK Tuples, Strings, Numeric types

16

>>> saying = ['After','all','is','said','and','done','more','is','said','than','done']>>> tokens=set(saying)>>> tokensset(['and', 'all', 'said', 'is', 'After', 'done', 'than', 'more'])>>> tokens=sorted(tokens)>>> tokens['After', 'all', 'and', 'done', 'is', 'more', 'said', 'than']>>> tokens[-2:]['said', 'than']

Page 17: Getting set with Python and NLTK Tuples, Strings, Numeric types

17

Some statistics on text• Frequency distributions>>> fdist1=FreqDist(text1)fd>>> fdist1<FreqDist with 19317 samples and 260819 outcomes>>>> vocabulary1=fdist1.keys()>>> vocabulary1[:50][',', 'the', '.', 'of', 'and', 'a', 'to', ';', 'in', 'that', "'", '-', 'his', 'it', 'I', 's', 'is', 'he', 'with', 'was', 'as', '"', 'all', 'for', 'this', '!', 'at', 'by', 'but', 'not', '--', 'him', 'from', 'be', 'on', 'so', 'whale', 'one', 'you', 'had', 'have', 'there', 'But', 'or', 'were', 'now', 'which', '?', 'me', 'like']>>> fdist1['whale']906

Page 18: Getting set with Python and NLTK Tuples, Strings, Numeric types

18

Spot check• With a partner, do exercises 2.14, 2.15,

2.16. 2.17 (Python book)– Half the room do first and last. Other half

do the middle two. Choose a spokesperson to present your answers (one person per problem). Choose another person to be designated questioner of other side (though anyone can ask a question, that person must do so.)

Page 19: Getting set with Python and NLTK Tuples, Strings, Numeric types

19

Numeric types• int – whole numbers, no decimal places• float – decimal numbers, with decimal place• long – arbitrarily long ints. Python does

conversion when needed• operations between same types gives result

of that type• operations between int and float yields float

>>> 3/21

>>> 3./2.1.5

>>> 3/2.1.5

>>> 3.//2.1.0

>>> 18%42

>>> 18//44

Page 20: Getting set with Python and NLTK Tuples, Strings, Numeric types

20

Numeric operators

book slide

Page 21: Getting set with Python and NLTK Tuples, Strings, Numeric types

21

Numeric Operators

book slide

Page 22: Getting set with Python and NLTK Tuples, Strings, Numeric types

22

Numeric Operators

book slide

Page 23: Getting set with Python and NLTK Tuples, Strings, Numeric types

23

Casting

>>> str(3.14159)'3.14159'>>> int(3.14159)3>>> round(3.14159)3.0>>> round(3.5)4.0>>> round(3.499999999999)3.0>>> num=3.789>>> num3.7890000000000001>>> str(num)'3.789'>>> str(num+4)'7.789’

>>> str(num)'3.789'>>> str(num+4)'7.789'>>> >>> list(num)Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: 'float' object is not

iterable>>> list(str(num))['3', '.', '7', '8', '9']>>> tuple(str(num))('3', '.', '7', '8', '9')

Convert from one type to another

Page 24: Getting set with Python and NLTK Tuples, Strings, Numeric types

24

Functions• We have seen some of these before

book slide

Page 25: Getting set with Python and NLTK Tuples, Strings, Numeric types

25

Functions

book slide

Page 26: Getting set with Python and NLTK Tuples, Strings, Numeric types

26

Modules• Collections of things that are very handy to have, but

not as universally needed as the built-in functions.>>> from math import pi>>> pi3.1415926535897931>>> import math>>> math.sqrt(32)*1056.568542494923804>>>

• We will use the nltk module• Once imported, use help(<module>) for full

documentation

Page 27: Getting set with Python and NLTK Tuples, Strings, Numeric types

27

Common modules

book slide

Page 28: Getting set with Python and NLTK Tuples, Strings, Numeric types

28

Expressions• Several part operations, including

operators and/or function calls• Order of operations same as arithmetic– Function evaluation– Parentheses– Exponentiation (right to left)– Multiplication and Division (left to right)– Addition and Subtraction (left to right)

book slide

Page 29: Getting set with Python and NLTK Tuples, Strings, Numeric types

29

Evaluation trees make precedence clear1 + 2 * 3

book slide

Page 30: Getting set with Python and NLTK Tuples, Strings, Numeric types

30

Evaluation tree for stringsfullname=firstName+ ‘ ‘ + lastName

book slide

Page 31: Getting set with Python and NLTK Tuples, Strings, Numeric types

31

BooleanValues are False or True

book slide

X Y not X X and Y X or Y X == Y X != y

False False True False False True False

False True True False True False True

True False False False True False True

True True False True True True False

Page 32: Getting set with Python and NLTK Tuples, Strings, Numeric types

32

Evaluation tree involving boolean values

book slide

Page 33: Getting set with Python and NLTK Tuples, Strings, Numeric types

33

Source code in file• Avoid retyping each command each time you

run the program. Essential for non-trivial programs.

• Allows exactly the same program to be run repeatedly -- still interpreted, but no accidental changes

• Use print statement to output to display• File has .py extension• Run by typing python <filename>.py

python termread.py

Page 34: Getting set with Python and NLTK Tuples, Strings, Numeric types

34

Basic I/O• print– list of items separated by commas– automatic newline at end– forced newline: the character ‘\n’

• raw_input(<string prompt>)– input from the keyboard– input comes as a string. Cast it to make it into

some other type• input(<prompt>) – input comes as a numeric value, int or float

Page 35: Getting set with Python and NLTK Tuples, Strings, Numeric types

35

Case Study – Date conversionmonths = ['Jan', 'Feb', 'Mar', 'Apr', 'May',

'Jun’, 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']date = raw_input('Enter date (mm-dd-yyyy)')pieces = date.split('-')monthVal = months[int(pieces[0])]print monthVal+ ' '+pieces[1]+', '+pieces[2]

Try it – run it on your machine with a few dates

Page 36: Getting set with Python and NLTK Tuples, Strings, Numeric types

36

Spot check• Again, split the class. Work in pairs– Side by my office do Exercise 2.24 and 2.28– Other side do Exercise 2.26 and 2.27

• Again, designate a person to report on each of the side’s results and a person who is designated question generator for the other side’s results– No repeats of individuals from the first set!

Page 37: Getting set with Python and NLTK Tuples, Strings, Numeric types

37

For Next Week• 2.36– Check now to make sure that you

understand it.– Make a .py file, which you will submit. – I will get the Blackboard site ready for an

upload.