ling 408/508: programming for linguists lecture 26 december 7 th

38
LING 408/508: Programming for Linguists Lecture 26 December 7 th

Upload: abraham-preston

Post on 19-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LING 408/508: Programming for Linguists Lecture 26 December 7 th

LING 408/508: Programming for Linguists

Lecture 26December 7th

Page 2: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Adminstrivia

• Due Dates– Term Programming Project• Write-up and code by December 17th

– Homework 12• Install NLTK. Screenshots to show it working. Due Friday

11th

Page 3: LING 408/508: Programming for Linguists Lecture 26 December 7 th

www.nltk.org

Page 4: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Install

Page 5: LING 408/508: Programming for Linguists Lecture 26 December 7 th

get-pip.py

Page 6: LING 408/508: Programming for Linguists Lecture 26 December 7 th

PIP (Pip Installs Packages)

Page 7: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Using pip to install nltk

Page 8: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Install nltk_data

• Command:– sudo python -m nltk.downloader -d

/usr/local/share/nltk_data all

Page 9: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Test nltk

Page 10: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Test nltk_data

Page 11: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Ubuntu

Page 12: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Ubuntu

Page 13: LING 408/508: Programming for Linguists Lecture 26 December 7 th

curl

• Command line tool using URL syntax to transfer a file– curl URL (downloads file specified in HRL)– can understand http/https and ftp/sftp protocols

Page 14: LING 408/508: Programming for Linguists Lecture 26 December 7 th

setuptools

Page 15: LING 408/508: Programming for Linguists Lecture 26 December 7 th

setuptools

Page 16: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Ubuntu pip

Page 17: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Ubuntu nltk

Page 18: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Ubuntu nltk_data

Page 19: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Homework 12

• Install NLTK– show it working

1st sentence from the WSJ section of the Penn Treebank

Page 20: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Homework 12

Pick a sentence of your choosing, tokenize and Part-of-Speech (POS) label it.Comment on whether the labeling is good.

Tagset documented herehttps://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

Page 21: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Homework 12

• Pick another sentence with noun-noun compounds, – e.g. Helicopters will patrol the temporary no-fly zone around New

Jersey's MetLife Stadium Sunday, with F-16s based in Atlantic City ready to be scrambled if an unauthorized aircraft does enter the restricted airspace.

• Tokenize, POS tag, and extract named entities

Page 22: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Homework 12

Page 23: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Homework 12

GPE = Geo-Political Entity

Page 24: LING 408/508: Programming for Linguists Lecture 26 December 7 th

http://www.nltk.org/book/

Page 25: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Today's Topics

• Chapter 11: Data Collections– Lists– Arrays– Dictionaries (Hash)

Page 26: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Lists

• We've seen these before …• Example:– range(start,stop,step)– parameters are integers– start = 0 by default– step = 1 by default

Page 27: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Lists

• split:

Page 28: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Lists• Basic operations:

Page 29: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Lists

Page 30: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Listsa = [1,2,3]

Page 31: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Lists• Lists are malleable:

Can implement stacks and queues easily

Page 32: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Lists

Page 33: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Lists

Page 34: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Lists

Page 35: LING 408/508: Programming for Linguists Lecture 26 December 7 th

ListsQueue #1: list.append(x) and list.pop(0)

Page 36: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Lists

Queue #2: list.insert(0,x) and list.pop(len(list)-1)

non-object-oriented function: del

Page 37: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Higher order functions

map(function,list) reduce(function,list)

inline anonymous function: lambdaexample: λx.x+x lambda x:x+x

filter(function,list)

Page 38: LING 408/508: Programming for Linguists Lecture 26 December 7 th

Standard Deviation Example