1/11/2016cpsc503 winter 20101 cpsc 503 computational linguistics lecture 2 giuseppe carenini
TRANSCRIPT
![Page 1: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/1.jpg)
04/21/23 CPSC503 Winter 2010 1
CPSC 503Computational Linguistics
Lecture 2Giuseppe Carenini
![Page 2: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/2.jpg)
04/21/23 CPSC503 Winter 2010 2
Today Sep 14• Subscribe to mailing list cpsc503
(majordomo)
• Questionnaire
• Brief check of some background knowledge (& annotated corpora)
• English Morphology
• FSA and Morphology
• Start: Finite State Transducers (FST) and Morphological Parsing/Gen.
![Page 3: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/3.jpg)
04/21/23 CPSC503 Winter 2010 3
Finite state machinesRegular Expressions & Finite State Automata 6.7Finite State Transducers 2.0Hidden-Markov Models 4.2Basic Probability, Bayesian Statistics and Information TheoryConditional Probability Programming 7.2 JavaBayesian Networks 6.5
5.4 Python Entropy 5.4 3.4 Dynamic ProgrammingMachine Learning 5.7Supervised Classification (e.g., Decision Trees) Search Algorithms 4.5 6.0Unsupervised Learning (e.g., clustering) Linguistics 4.3 2.4Richer FormalismsContext-Free Grammar 4.3First-Order Logics
5.4
![Page 4: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/4.jpg)
04/21/23 CPSC503 Winter 2010 4
Today Sep 14• Brief check of some background
knowledge
• English Morphology
• FSA and Morphology
• Start: Finite State Transducers (FST) and Morphological Parsing/Gen.
![Page 5: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/5.jpg)
04/21/23 CPSC503 Winter 2010 5
Knowledge-Formalisms Map(including probabilistic formalisms)
Logical formalisms (First-Order Logics, Prob. Logics)
Rule systems (and prob. versions)
(e.g., (Prob.) Context-Free Grammars)
State Machines (and prob. versions)
(Finite State Automata,Finite State Transducers, Markov Models)
Morphology
Syntax
PragmaticsDiscourse
and Dialogue
Semantics
AI planners (MDP Markov Decision Processes)
![Page 6: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/6.jpg)
04/21/23 CPSC503 Winter 2010 6
Next Two Lectures
State Machines (no prob.)• Finite State Automata
(and Regular Expressions)
• Finite State Transducers
(English)Morpholo
gy
Logical formalisms (First-Order Logics)
Rule systems (and prob. version)(e.g., (Prob.) Context-Free
Grammars)
Syntax
PragmaticsDiscourse and
Dialogue
Semantics
AI planners
![Page 7: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/7.jpg)
04/21/23 CPSC503 Winter 2010 7
??
b a a a ! \
0 1 2 3 4 65
b a b a ! \
0 1 2 3 4 65
![Page 8: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/8.jpg)
04/21/23 CPSC503 Winter 2010 8
??
/CPSC50[34]/
/^([Ff]rom\b|[Ss]ubject\b|[Dd]ate\b)/
/[0-9]+(\.[0-9]+){3}/
![Page 9: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/9.jpg)
04/21/23 CPSC503 Winter 2010 9
Fundamental Relations
FSA
RegularExpression
s
ManyLinguistic
Phenomena
model
implement(generate and
recognize)
describe
![Page 10: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/10.jpg)
04/21/23 CPSC503 Winter 2010 10
Second Usage of RegExp: Text Searching/Editing
Find me all instances of the determiner “the” in an English text. – To count them– To substitute them with something else
You try: /the/
/[tT]he/ /\bthe\b/
/\b[tT]he\b/
The other cop went to the bank but there were no people there.
s/\b([tT]he|[Aa]n?)\b/DET/
![Page 11: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/11.jpg)
Annotated Corpora• Example The CoNLL corpora provide
chunk structures, which are encoded as flat trees.
• The CoNLL 2000 Corpus includes ***phrasal chunks***
• The CoNLL 2002 Corpus includes ***named entity chunks***.
• http://nltk.googlecode.com/svn/trunk/doc/howto/corpus.html
04/21/23 CPSC503 Winter 2010 11
![Page 12: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/12.jpg)
04/21/23 CPSC503 Winter 2010 12
Next Two Lectures
State Machines (no prob.)• Finite State Automata
(and Regular Expressions)
• Finite State Transducers
(English)Morpholo
gy
Logical formalisms (First-Order Logics)
Rule systems (and prob. version)(e.g., (Prob.) Context-Free
Grammars)
Syntax
PragmaticsDiscourse and
Dialogue
Semantics
AI planners
![Page 13: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/13.jpg)
04/21/23 CPSC503 Winter 2010 13
English Morphology
• We can usefully divide morphemes into two classes– Stems: The core meaning bearing units– Affixes: Bits and pieces that adhere to
stems to change their meanings and grammatical functions
Def. The study of how words are formed from minimal meaning-bearing units (morphemes)
Examples: unhappily, ……………
![Page 14: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/14.jpg)
04/21/23 CPSC503 Winter 2010 14
Word Classes
• For now word classes: nouns, verbs, adjectives and adverbs.
• We’ll go into the gory details in Ch 5
• Word class determines to a large degree the way that stems and affixes combine
![Page 15: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/15.jpg)
04/21/23 CPSC503 Winter 2010 15
English Morphology
• We can also divide morphology up into two broad classes– Inflectional– Derivational
![Page 16: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/16.jpg)
04/21/23 CPSC503 Winter 2010 16
Inflectional Morphology
• The resulting word:– Has the same word class as the
original– Serves a grammatical/semantic
purpose different from the original
![Page 17: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/17.jpg)
04/21/23 CPSC503 Winter 2010 17
Nouns, Verbs and Adjectives (English)
• Nouns are simple (not really)– Markers for plural and possessive
• Verbs are only slightly more complex– Markers appropriate to the tense of
the verb and to the person
• Adjectives– Markers for comparative and
superlative
![Page 18: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/18.jpg)
04/21/23 CPSC503 Winter 2010 18
Regulars and Irregulars• Some words misbehave (refuse to
follow the rules)– Mouse/mice, goose/geese, ox/oxen– Go/went, fly/flew
• Regulars…– Walk, walks, walking, walked, walked
• Irregulars– Eat, eats, eating, ate, eaten– Catch, catches, catching, caught, caught– Cut, cuts, cutting, cut, cut
![Page 19: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/19.jpg)
04/21/23 CPSC503 Winter 2010 19
Derivational Morphology
• Derivational morphology is the messy stuff that no one ever taught you.– Changes of word class – Less Productive ( -ant V -> N only
with V of Latin origin!)
![Page 20: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/20.jpg)
04/21/23 CPSC503 Winter 2010 20
Derivational Examples
• Verb/Adj to Noun
-ation computerize computerization
-ee appoint appointee
-er kill killer
-ness fuzzy fuzziness
![Page 21: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/21.jpg)
04/21/23 CPSC503 Winter 2010 21
Derivational Examples
• Noun/Verb to Adj
-al Computation
Computational
-able Embrace Embraceable
-less Clue Clueless
![Page 22: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/22.jpg)
04/21/23 CPSC503 Winter 2010 22
Compute
• Many paths are possible…• Start with compute
– Computer -> computerize -> computerization
– Computation -> computational– Computer -> computerize ->
computerizable– Compute -> computee
![Page 23: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/23.jpg)
04/21/23 CPSC503 Winter 2010 23
Summary
State Machines (no prob.)• Finite State Automata
(and Regular Expressions)
• Finite State Transducers
(English)Morpholo
gy
Logical formalisms (First-Order Logics)
Rule systems (and prob. version)(e.g., (Prob.) Context-Free
Grammars)
Syntax
PragmaticsDiscourse and
Dialogue
Semantics
AI planners
![Page 24: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/24.jpg)
04/21/23 CPSC503 Winter 2010 24
FSAs and Morphology• GOAL1: recognize whether a
string is an English word
• PLAN:1. First we’ll capture the
morphotactics (the rules governing the ordering of affixes in a language)
2. Then we’ll add in the actual stems
![Page 25: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/25.jpg)
04/21/23 CPSC503 Winter 2010 25
FSA for Portion of Noun Inflectional Morphology
![Page 26: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/26.jpg)
04/21/23 CPSC503 Winter 2010 26
Adding the Stems
But it does not express that:
•Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box)
•Reg nouns ending –y preceded by a consonant change the –y to -i
![Page 27: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/27.jpg)
04/21/23 CPSC503 Winter 2010 27
Small Fragment of V and N Derivational Morphology
[nouni] eg. hospital
[adjal] eg. formal
[adjous] eg. arduous
[verbj] eg. speculate
[verbk] eg. conserve
![Page 28: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/28.jpg)
04/21/23 CPSC503 Winter 2010 28
GOAL2: Morphological Parsing/Generation (vs. Recognition)
• Recognition is usually not quite what we need. – Usually given a word we need to find: the stem
and its class and morphological features (parsing)– Or we have a stem and its class and morphological
features and we want to produce the word (production/generation)
• Examples (parsing)– From “cats” to “cat +N +PL”– From “lies” to ……
![Page 29: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/29.jpg)
04/21/23 CPSC503 Winter 2010 29
Computational problems in Morphology
• Recognition: recognize whether a string is an English word (FSA)
• Parsing/Generation: word
stem, class, lexical features
….….
lieslie +N +PL
lie +V +3SG• Stemming:
wordstem
….
e.g.,
![Page 30: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/30.jpg)
04/21/23 CPSC503 Winter 2010 30
Finite State Transducers• FSA cannot help….• The simple story
– Add another tape– Add extra symbols to the
transitions
– On one tape we read “cats”, on the other we write “cat +N +PL”
![Page 31: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/31.jpg)
04/21/23 CPSC503 Winter 2010 31
FSTs
generationparsing
![Page 32: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/32.jpg)
04/21/23 CPSC503 Winter 2010 32
(Simplified) FST formal definition(you can skip 3.4.1 unless you want to work on
FST)
• Q: a finite set of states• I,O: input and an output alphabets
(which may include ε)• Σ: a finite alphabet of complex symbols
i:o, iI and oO
• Q0: the start state
• F: a set of accept/final states (FQ)• A transition relation δ that maps QxΣ
to 2Q
![Page 33: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/33.jpg)
04/21/23 CPSC503 Winter 2010 33
FST can be used as…
• Translators: input one string from I, output another from O (or vice versa)
• Recognizers: input a string from IxO
• Generator: output a string from IxO
![Page 34: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/34.jpg)
04/21/23 CPSC503 Winter 2010 34
Simple Example
Transitions (as a translator):• c:c means read a c on one tape and write a c
on the other (or vice versa)• +N:ε means read a +N symbol on one tape
and write nothing on the other (or vice versa)• +PL:s means read +PL and write an s (or vice
versa)
c:c a:a t:t +N:ε +PL:s
+SG: ε
![Page 35: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/35.jpg)
Examples (as a translator)
c a t s
+N +SGc a tlexical
lexical
surface
surface
generation
parsing
c:c a:a t:t +N:ε+PL:s
+SG: ε
04/21/23 35CPSC503 Winter 2010
![Page 36: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/36.jpg)
04/21/23 CPSC503 Winter 2010 36
Slightly More complex Example
Transitions (as a translator):• l:l means read an l on one tape and write an l on
the other (or vice versa)• +N:ε means read a +N symbol on one tape and
write nothing on the other (or vice versa)• +PL:s means read +PL and write an s (or vice
versa)• …
+3SG:s
l:l i:i e:e +N:ε +PL:s
+V:ε
q1
q0
q2
q3
q4q5
q6q7
![Page 37: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/37.jpg)
Examples (as a translator)
l i e s
+V+3SGl i elexical
lexical
surface
surface
generation
parsing
+3SG:s
l:l i:i e:e +N:ε +PL:s
+V:ε
q1
q0
q2
q3
q4q5
q6q7
04/21/2337
CPSC503 Winter 2010
![Page 38: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/38.jpg)
Examples (as a recognizer and a generator)
l i e s
+V +3SGl i e
lexical
lexical
surface
surface
+3SG:s
l:l i:i e:e +N:ε +PL:s
+V:εq1
q0
q2
q3
q4q5
q6q7
04/21/23 38CPSC503 Winter 2010
![Page 39: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/39.jpg)
04/21/23 CPSC 503 – Winter 2010 39
Introductions• Your Name• Previous experience in NLP?• Why are you interested in NLP?• Are you thinking of NLP as your
main research area? If not, what else do you want to specialize in….
• Anything else…………
![Page 40: 1/11/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini](https://reader035.vdocument.in/reader035/viewer/2022062409/5697c0271a28abf838cd67ea/html5/thumbnails/40.jpg)
04/21/23 CPSC503 Winter 2010 40
Next Time
• Finish FST and morphological analysis
• Porter Stemmer• Read Chp. 3 up to 3.10 excluded(def. of FST: understand the one on slides)(3.4.1 optional)
Assignment-1 will be out today (due Sept21)