lecture 1 overview topics overview readings: chapters 1,2 january 14, 2013 csce 771 natural language...

58
Lecture 1 Overview Topics Topics Overview Readings: Chapters 1,2 Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

Upload: erik-wheeler

Post on 17-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

Lecture 1 OverviewLecture 1 Overview

Topics Topics Overview

Readings: Chapters 1,2Readings: Chapters 1,2

January 14, 2013

CSCE 771 Natural Language Processing

Page 2: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 2 – CSCE 771 Spring 2013

OverviewOverviewPragmatic issuesPragmatic issues

Course PlansCourse Plans Foundation for research

TodayToday Challenge of 2001’s HAL Areas of Research Examples of Language Processing

Page 3: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 3 – CSCE 771 Spring 2013

Slide from: Speech and Language Processing Jurafsky and Martin

NLP Why Should You Care?NLP Why Should You Care?

Two trendsTwo trends1. An enormous amount of amount of knowledge is now is now available in

machine readable form as natural language text

2. Conversational agents are becoming an important form of human-computer communication

Much of human-human communication is now mediated by computers

Page 4: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 4 – CSCE 771 Spring 2013

Commercial WorldCommercial World

Lot’s of exciting stuff going on…Lot’s of exciting stuff going on…

Powerset

Slide from: Speech and Language Processing Jurafsky and Martin

Page 5: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 5 – CSCE 771 Spring 2013

Commercial WorldCommercial World

Lot’s of exciting stuff going on…Lot’s of exciting stuff going on…

Page 6: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 6 – CSCE 771 Spring 2013

Google TranslateGoogle Translate

Slide from: Speech and Language Processing Jurafsky and Martin

Page 7: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 7 – CSCE 771 Spring 2013

Google TranslateGoogle Translate

Slide from: Speech and Language Processing Jurafsky and Martin

Page 8: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 8 – CSCE 771 Spring 2013

Web Q/AWeb Q/A

Slide from: Speech and Language Processing Jurafsky and Martin

Page 9: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 9 – CSCE 771 Spring 2013

HAL 9000 of 2001: A Space OdysseyHAL 9000 of 2001: A Space Odyssey

A scene from Arthur Clarke and Stanley Kubrick’s 2001A scene from Arthur Clarke and Stanley Kubrick’s 2001

DAVE:DAVE: Open the pod bay doors, HAL.Open the pod bay doors, HAL.

HAL:HAL: I’m sorry Dave, I’m afraid I can’t do that.I’m sorry Dave, I’m afraid I can’t do that.

Notes on Context:Notes on Context: HAL is the main computer on the spaceship HAL is paranoid and decides to kill off the crew

Page 10: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 10 – CSCE 771 Spring 2013

Clarke a little too OptimisticClarke a little too Optimistic

We don’t have a HAL today in 2009.We don’t have a HAL today in 2009.

How close are we?How close are we? Computers replaced bank tellers (in many instances) But the NASA computers don’t talk yet Microsoft XP/Vista’s voice commands Adobe Reader reading PDF documents

But can they understand spoken commands?But can they understand spoken commands?

Page 11: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 11 – CSCE 771 Spring 2013

Challenges in developing HALChallenges in developing HAL

So what are the major challenges in developing HAL?So what are the major challenges in developing HAL?

Speech recognitionSpeech recognition

Natural Language understandingNatural Language understanding

Information retrievalInformation retrieval

Information extractionInformation extraction

InferenceInference

Speech generationSpeech generation

Page 12: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 12 – CSCE 771 Spring 2013

Samples of Language ProcessingSamples of Language Processing

Text processing (in Unix)Text processing (in Unix)

wc – word countwc – word count

grep regexpr files - print lines in the files that match regrep regexpr files - print lines in the files that match re

findfind

More knowledgeable processingMore knowledgeable processing

spelling checking/correctingspelling checking/correcting

grammar checkinggrammar checking

Information retrievalInformation retrieval Find all documents on decomposition by David Parnas

Page 13: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 13 – CSCE 771 Spring 2013

Even More knowledgeable processingEven More knowledgeable processing

Information extractionInformation extraction Reading the “online” Wall Street Journal

What was the dividend paid by GM last year?

USC HandbookHow many hours does it take to get a PhD in CSE?

Machine translationMachine translation The spirit is willing but the body is weak. To Russian: Sprit охотно готово но тело слабо. Back to English: Vodka is good but the meat is rotten. (Rich 86) Babelfish - http://world.altavista.com/tr Back to English: Sprit is willingly prepared but body weakly.

Page 14: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 14 – CSCE 771 Spring 2013

Even Deeper UnderstandingEven Deeper Understanding

Email access over the phoneEmail access over the phone Respond to commands “list all emails from Bob” Read email message 8

Text to speech

Assistants Assistants Agents reading the net summarizing a topic

Page 15: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 15 – CSCE 771 Spring 2013

Subcategories of Knowledge in S&LSubcategories of Knowledge in S&L

Phonetics/phonologyPhonetics/phonology

Morphology – shape and behavior of words in Morphology – shape and behavior of words in contextscontexts

Syntax – the legitimate sequences of wordsSyntax – the legitimate sequences of words

Semantics – the meanings of words, phrases, Semantics – the meanings of words, phrases, sentences and documentssentences and documents

Pragmatics – the appropriate use of language – Pragmatics – the appropriate use of language – politeness, direct/indirectnesspoliteness, direct/indirectness

Discourse conventions – correctly structuring Discourse conventions – correctly structuring conversationsconversations

Page 16: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 16 – CSCE 771 Spring 2013

Ambiguity: I made her duck.Ambiguity: I made her duck.

1.1. ..

2.2. ..

3.3. ..

4.4. ..

5.5. ..

Page 17: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 17 – CSCE 771 Spring 2013

Word AmbiguityWord Ambiguity

Her – who is this?Her – who is this?

Made Made Verb with meanings: 1) create 2) cook 3) force

DuckDuck Noun: the waterfowl, the food Verb

So how do we resolve this sentence?So how do we resolve this sentence?

Page 18: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 18 – CSCE 771 Spring 2013

Turing TestTuring Test

Computer simulate intelligenceComputer simulate intelligence

http://en.wikipedia.org/wiki/Turing_test

Page 19: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 19 – CSCE 771 Spring 2013

The Chinese roomThe Chinese room

John Searle's 1980 paper 's 1980 paper Minds, Brains, and Programs proposed an argument against the Turing Test proposed an argument against the Turing Test known as the "known as the "Chinese room" thought experiment." thought experiment.

Searle argued that software (such as ELIZA) could Searle argued that software (such as ELIZA) could pass the Turing Test simply by manipulating pass the Turing Test simply by manipulating symbols of which they had no understanding.symbols of which they had no understanding.

Without understanding, they could not be described as Without understanding, they could not be described as "thinking" in the same sense people do. "thinking" in the same sense people do.

Loebner Prize – competition since 1991 to best attempt Loebner Prize – competition since 1991 to best attempt at passing Turing Testat passing Turing Test

http://en.wikipedia.org/wiki/Turing_test

Page 20: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 20 – CSCE 771 Spring 2013

Loebner PrizeLoebner Prize

The prizes for each year include:The prizes for each year include:

$2,000 for the most human seeming of all bots for that $2,000 for the most human seeming of all bots for that year - awarded every year year - awarded every year

$25,000 for the first bot that judges cannot distinguish $25,000 for the first bot that judges cannot distinguish from a real human in a text-only based Turing Test from a real human in a text-only based Turing Test (awarded once only) (awarded once only)

$100,000 to the first bot that judges cannot distinguish $100,000 to the first bot that judges cannot distinguish from a real human in a Turing Test that includes from a real human in a Turing Test that includes deciphering and understanding text, visual, auditory deciphering and understanding text, visual, auditory (and tactile?) input.(and tactile?) input.

http://en.wikipedia.org/wiki/Loebner_prize

http://www.loebner.net/Prizef/loebner-prize.html www.loebner.net/Prizef/loebner-prize.html

Page 21: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 21 – CSCE 771 Spring 2013

Finite Automata arose in the 1950’sFinite Automata arose in the 1950’s

1936 Turing’s model of algorithmic computation1936 Turing’s model of algorithmic computation

1943 McCulloch-Pitts model of the neuron1943 McCulloch-Pitts model of the neuron

1951, 1956 Kleene first introduced finite automata and 1951, 1956 Kleene first introduced finite automata and regular expressionsregular expressions

1959 Rabin and Scott - Nondeterministic finite automata1959 Rabin and Scott - Nondeterministic finite automata

1968 Thompson first to compile regular expressions into 1968 Thompson first to compile regular expressions into an editor for text searchingan editor for text searching

Page 22: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 22 – CSCE 771 Spring 2013

Key Concepts #1 Formal LanguageKey Concepts #1 Formal LanguageA formal language is a set of strings (finite) from a finite A formal language is a set of strings (finite) from a finite

alphabet.alphabet.

Key Concept #1: A model that can both recognize and Key Concept #1: A model that can both recognize and generate all and only the strings of a formal generate all and only the strings of a formal language acts as a definition of the language.language acts as a definition of the language.

L(re) = L(ML(re) = L(Mnfanfa))

Formal languages are not the same as natural Formal languages are not the same as natural languages.languages.

Linguists are generally more interested Generative Linguists are generally more interested Generative Grammars, CS are more interested in recognizing.Grammars, CS are more interested in recognizing.

Page 23: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 23 – CSCE 771 Spring 2013

Formal LanguagesFormal Languages

Alphabet: Alphabet: ΣΣ (finite set of symbols) (finite set of symbols)

Strings:Strings: s = c1c2 … cn (finite sequence of characters)

Length | s | = n

Language:Language: a language is a set of strings

Example languages over Example languages over ΣΣ = {a, b, c} = {a, b, c}

Page 24: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 24 – CSCE 771 Spring 2013

Regular ExpressionsRegular Expressions

..

Page 25: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 25 – CSCE 771 Spring 2013

Regular Expression ExamplesRegular Expression Examples

Page 26: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 26 – CSCE 771 Spring 2013

Finite Automata to recognize a LanguageFinite Automata to recognize a Language

Page 27: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 27 – CSCE 771 Spring 2013

CSCE 531 – Overview in one slideCSCE 531 – Overview in one slide% flex lang.l% flex lang.l // lex.yy.c// lex.yy.c

% bison lang.y % bison lang.y // lang.c// lang.c

% gcc lex.yy.c lang.c –o parse% gcc lex.yy.c lang.c –o parse

% parse input% parse input

lang.y

lang.l FLEXlex.yy.cyylex()

lang.cyyparse()

BISON

Input source program

Executable Program

Page 28: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 28 – CSCE 771 Spring 2013

Regular Expressions in Unix toolsRegular Expressions in Unix tools

Ken Thompson regular expressions in ed Ken Thompson regular expressions in ed ex ex vi vi Reg-expr NFA then simulate Global pattern match command

g/Unix/s/Unix/UNIX/gg/re/print == grep

Page 29: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 29 – CSCE 771 Spring 2013

Grep family Grep family

Global match Regular Expression and Print (GREP)Global match Regular Expression and Print (GREP) grep [uU]nix f1 f2 … fn egrep pat files // efficient NFADFA, then execute fgrep pat files // fixed grep for fixed strings

Find for searching directories (not really reg expr)Find for searching directories (not really reg expr) find dir –name pat // search for files with name matching pat find dir -exec grep pat {} //search in files for the pattern pat

Page 30: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 30 – CSCE 771 Spring 2013

Editing scriptsEditing scripts

Create a script of editing commands then execute withCreate a script of editing commands then execute with

ex file1 < edScriptex file1 < edScript

Example:Example:

1,$s/[uU]nix/UNIX/g1,$s/[uU]nix/UNIX/g

1,$s/langauge/language/g1,$s/langauge/language/g

g/^$/dg/^$/d // delete empty lines ^=start of line $=end// delete empty lines ^=start of line $=end

……

ww

qq

Page 31: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 31 – CSCE 771 Spring 2013

Other Unix regular expression Based ToolsOther Unix regular expression Based Tools

sed (stream editor)sed (stream editor)

awk awk

Perl – scripting languagePerl – scripting language

PythonPython

RubyRuby

reg_comp, reg_exec in Creg_comp, reg_exec in C

Page 32: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 32 – CSCE 771 Spring 2013

Python String constantsPython String constantshttp://docs.python.org/2/library/stdtypes.html

string.ascii_letters -

string.ascii_lowercase

string.ascii_uppercase -

string.digits - The string '0123456789'.

string.hexdigits - The string '0123456789abcdefABCDEF'.

string.letters - The specific value is updated when locale.setlocale() is called.

string.lowercase

string.octdigits - The string '01234567'.

string.punctuation - String of ASCII characters which are considered punctuation

string.printable

string.uppercase

string.whitespace

Page 33: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 33 – CSCE 771 Spring 2013

String Method ExamplesString Method Exampless = "i think 771 is going great!"s = "i think 771 is going great!"print s.capitalize( )print s.capitalize( )

#center( width[, fillchar])#center( width[, fillchar])print ':'+ s.center(44, '.') + ':‘print ':'+ s.center(44, '.') + ':‘

#count( sub[, start[, end]])#count( sub[, start[, end]])

print s.count("in")print s.count("in")

print s.count("in", 13)print s.count("in", 13)

print s.count("in", 3)print s.count("in", 3)

print s.count("in", 13, 22)print s.count("in", 13, 22)

print s.count("in", 13, 15)print s.count("in", 13, 15)

#decode( [encoding[, errors]])#decode( [encoding[, errors]])

#encode( [encoding[,errors]])#encode( [encoding[,errors]])

#endswith( suffix[, start[, end]])#endswith( suffix[, start[, end]])

Page 34: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 34 – CSCE 771 Spring 2013

expandtabs( [expandtabs( [tabsizetabsize])])

find( find( subsub[[, start, start[[, end, end]])]])

index( index( subsub[[, start, start[[, end, end]]) Like find(), but raise ]]) Like find(), but raise ValueError when the substring is not found. ValueError when the substring is not found.

isalnum( )isalnum( )

isalpha( )isalpha( )

isdigit( )isdigit( )

Page 35: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 35 – CSCE 771 Spring 2013

rpartition( rpartition( sepsep))

rsplit( [rsplit( [sep sep [[,maxsplit,maxsplit]])]])

rstrip( [rstrip( [charschars])])

split( [split( [sep sep [[,maxsplit,maxsplit]])]])

splitlines( [splitlines( [keependskeepends])])

startswith( startswith( prefixprefix[[, start, start[[, end, end]])]])

strip( [strip( [charschars]) swapcase( )]) swapcase( )

title( )title( )

translate( translate( tabletable[[, deletechars, deletechars])])

upper( )upper( )

zfill( zfill( widthwidth))

Page 36: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 36 – CSCE 771 Spring 2013

Python re — Regular expressionsPython re — Regular expressions

• http://docs.python.org/library/re.html

• re — Regular expression modulere — Regular expression module• Operators (special characters)• Lookahead / lookbehind• Search vs match• re module contents

Page 37: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 37 – CSCE 771 Spring 2013

Python Regular ExpressionsPython Regular Expressions

http://docs.python.org/2/library/re.htmlhttp://docs.python.org/2/library/re.html

Page 38: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 38 – CSCE 771 Spring 2013

Fundamental Re Operators in PythonFundamental Re Operators in Python

RegExpr matches

c matches the single character c

A | B Matches either re A or re B

AB matches re A followed by re B

A* matches 0 or more repetitions of the re A

( A ) Matches re A, i.e. The re inside the parentheses

Page 39: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 39 – CSCE 771 Spring 2013

Other Operators in PythonOther Operators in Python

RegExpr Matches

'.' (Dot.) In the default mode, this matches any character except a newline. …

“A +”

“A ?”

“ A{m} ”

“A{m,n}”

“ \c ” Quoted character

“[chars]” character class

Page 40: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 40 – CSCE 771 Spring 2013

Greedy Operators in PythonGreedy Operators in Python

Page 41: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 41 – CSCE 771 Spring 2013

Non Greedy Operators in PythonNon Greedy Operators in Python

RegExpr Matches

“ A*? ”

“ A+? ”

“ A?? ”

“ A{m,n}? ”

Page 42: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 42 – CSCE 771 Spring 2013

GroupsGroups

The actual text that matches a re in parentheses is a group can be referred to later

Example: (?P<frst> [a-z]{3}) (?P=frst)

Meaning of special character

( A ) Matches re A, and indicates the start and end of a group

(?P<name>A) Matches A and names the group “name”

(?: A) A non-capturing version of regular parentheses

(?P=name) Matches whatever text was matched by the earlier group named name.

\number Matches the contents of the group of that number.

Page 43: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 43 – CSCE 771 Spring 2013

Group relatedGroup related

Meaning of special character

( ?# … ) A comment

(?= A) lookahead assertion

(?! A) negative lookahead assertion.

(?<= A) lookbehind assertion

(?<!...)

(?(id/name)yes-pattern|no-pattern)

Page 44: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 44 – CSCE 771 Spring 2013

Positional special charactersPositional special characters

Meaning of special character

'^' (Caret.) Matches the start of the string

'$' Matches the end of the string or just before the newline at the end of the string

Page 45: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 45 – CSCE 771 Spring 2013

Positional special charactersPositional special characters

\A Matches only at the start of the string.Matches only at the start of the string.

\b Matches the empty string, but only at the beginning or Matches the empty string, but only at the beginning or end of a word.end of a word.

\B

\d matches any decimal digit ---matches any decimal digit --- \D any non-digit characterany non-digit character

\s matches any whitespace character, equivalent to matches any whitespace character, equivalent to [ \t\n\r\f\v] --- [ \t\n\r\f\v] --- \S

\w matches any alphanumeric character and the matches any alphanumeric character and the underscore ---underscore --- \W

\Z Matches only at the end of the stringMatches only at the end of the string

Page 46: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 46 – CSCE 771 Spring 2013

re Module - Matching vs Searchingre Module - Matching vs Searching

import re import re

re.match(pattern, line)re.match(pattern, line)

re.search(pattern, line)re.search(pattern, line)

>>> re.match("c", "abcdef") # No match >>> re.match("c", "abcdef") # No match

>>> re.search("c", "abcdef") # Match >>> re.search("c", "abcdef") # Match

<_sre.SRE_Match object at ...> <_sre.SRE_Match object at ...>

Page 47: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 47 – CSCE 771 Spring 2013

re.compilere.compile

re.compile(re.compile(patternpattern[, [, flagsflags])])

prog = re.compile(pattern) prog = re.compile(pattern)

result = prog.match(string) result = prog.match(string)

Page 48: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 48 – CSCE 771 Spring 2013

Python’s Raw String FormatPython’s Raw String Format

What regular expression matches the two character pattern “\\”?

• Re = “\\\\”

Sometimes it simplifies patterns to disable the ‘\’. The “raw” modifier changes the interpretation of ‘\’ in regular expressions.

For instance

“\n” is an regular expression matches one character the newline

r“\n” is a regular expression with two characters ‘\’ and ‘n’

Page 49: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 49 – CSCE 771 Spring 2013

Natural Language ToolkitNatural Language Toolkit

• http://nltk.org/

• interfaces to over 50 corpora andinterfaces to over 50 corpora and

• lexical resources such as WordNetlexical resources such as WordNet

• suite of text processing libraries for suite of text processing libraries for • classification, • tokenization, • stemming, • tagging, • parsing, and • semantic reasoning.

Page 50: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 50 – CSCE 771 Spring 2013

Installing NLTKInstalling NLTK

http://nltk.org/install.htmlhttp://nltk.org/install.html

Windows 32-bit binary installationWindows 32-bit binary installation

1.1. Install Python: Install Python: http://www.python.org/download/releases/2.7.3/http://www.python.org/download/releases/2.7.3/

2.2. Install Numpy (optional): Install Numpy (optional): http://sourceforge.net/projects/numpy/files/NumPy/1.6.http://sourceforge.net/projects/numpy/files/NumPy/1.6.2/numpy-1.6.2-win32-superpack-python2.7.exe2/numpy-1.6.2-win32-superpack-python2.7.exe

3.3. Install NLTK: http://pypi.python.org/pypi/nltkInstall NLTK: http://pypi.python.org/pypi/nltk

4.4. Install PyYAML: http://pyyaml.org/wiki/PyYAMLInstall PyYAML: http://pyyaml.org/wiki/PyYAML

5.5. Test installation: Start>Python27, then type import nltkTest installation: Start>Python27, then type import nltk

Page 51: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 51 – CSCE 771 Spring 2013

Installing NLTK DataInstalling NLTK Data

http://nltk.org/nltk_data/http://nltk.org/nltk_data/

Run the Python interpreter and type the commands:Run the Python interpreter and type the commands:

>>> import nltk >>> import nltk

>>> nltk.download() >>> nltk.download()

Page 52: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 52 – CSCE 771 Spring 2013

Page 53: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 53 – CSCE 771 Spring 2013

Page 54: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 54 – CSCE 771 Spring 2013

ElizaEliza

1966 Weizenbaum – program that chatted simulating a 1966 Weizenbaum – program that chatted simulating a Rogerian psychologistRogerian psychologist

User: User: Men are all alike.Men are all alike.

Eliza:Eliza: IN WHAT WAY?IN WHAT WAY?

User:User: They are always bugging us about something.They are always bugging us about something.

Eliza:Eliza: CAN THINK OF A SPECIFIC EXAMPLE CAN THINK OF A SPECIFIC EXAMPLE

……

http://en.wikipedia.org/wiki/Elizahttp://en.wikipedia.org/wiki/Eliza

http://code.google.com/p/nltk/source/browse/trunk/http://code.google.com/p/nltk/source/browse/trunk/nltk/nltk/chat/eliza.py?r=8479nltk/nltk/chat/eliza.py?r=8479

Page 55: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 55 – CSCE 771 Spring 2013

Links and ReferencesLinks and References

ElizaEliza http://i5.nyu.edu/~mm64/x52.9265/january1966.html http://www-ai.ijs.si/eliza/eliza.html http://www.strout.net/info/coding/python/ai/therapist.py

Turing TestTuring Test http://www.abelard.org/turpap/turpap.htm

Page 56: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 56 – CSCE 771 Spring 2013

IBM’s WatsonIBM’s Watson

http://en.wikipedia.org/wiki/Watson_%28computer%29http://en.wikipedia.org/wiki/Watson_%28computer%29

Page 57: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 57 – CSCE 771 Spring 2013

Watson ArchitectureWatson Architecture

http://en.wikipedia.org/wiki/Watson_%28computer%29http://en.wikipedia.org/wiki/Watson_%28computer%29

Page 58: Lecture 1 Overview Topics Overview Readings: Chapters 1,2 January 14, 2013 CSCE 771 Natural Language Processing

– 58 – CSCE 771 Spring 2013

The Face of WatsonThe Face of Watson

https://www.youtube.com/watch?v=WIKM732oEekhttps://www.youtube.com/watch?v=WIKM732oEek

Text to SpeechText to Speech