lecture 5: morphology - computer sciencekc2wc/teaching/nlp16/slides/05-morphology.pdf · this...
TRANSCRIPT
![Page 1: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/1.jpg)
Lecture 5: Morphology
Kai-Wei ChangCS @ University of Virginia
Couse webpage: http://kwchang.net/teaching/NLP16
16501 Natural Language Processing
![Page 2: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/2.jpg)
This lecture
vWhat is the structure of words?vCan we build an analyzer to model the
structure of words?vFinite-state automata and regular expression
26501 Natural Language Processing
![Page 3: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/3.jpg)
Words
vFinite-state methods are particularly useful in dealing with a lexiconvCompact representations of words
vAgendavsome facts about wordsvcomputational methods
6501 Natural Language Processing 3
![Page 4: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/4.jpg)
A Turkish word
vHow about English?
6501 Natural Language Processing 4
ExamplefromJuliaHockenmaier, IntrotoNLP
![Page 5: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/5.jpg)
Longest word in English
v Longest word in Shakespeare’sHonorificabilitudinitatibus (27 letters)
v Longest non-technical word:Antidisestablishmentarianism (28 letters)
v Longest word in a major dictionaryPneumonoultramicroscopicsilicovolcanoconiosis (45 letters)
v Longest word in literatureLopadotemachoselachogaleokranioleipsano...pterygon (182 letters) – Ancient greek transliteration
v Methionylthreonylthreonylglutaminylarginyl...isoleucine (189,819 letters) – chemical name of a protein
6501 Natural Language Processing 5
![Page 6: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/6.jpg)
What is Morphology?
vThe ways that words are built up from smaller meaningful units (morphemes)
vTwo classes of morphemesvStems: The core meaning-bearing unitsvAffixes: adhere to stems to change their
meanings and grammatical functions ve.g,. dis-grace-ful-ly
6501 Natural Language Processing 6
![Page 7: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/7.jpg)
Inflection Morphology
Create different forms of the same word:vExamples:
vVerbs: walk, walked, walksvNouns: Book, books, book’s vPersonal pronouns: he, she, her, them, us
vServes a grammatical/semantic purpose that is different from the original but is transparently related to the original
6501 Natural Language Processing 7
![Page 8: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/8.jpg)
Derivational Morphology
Create different words from the same lemma:v Nominalization:
v V+ -ation: e.g., computerizationv V+er: killer
v Negation:v Un-: Unod, unseen, …v Mis-: mistake, misunderstand ...
v Adjectivization:v V+-able: doablev N+-al: national
6501 Natural Language Processing 8
![Page 9: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/9.jpg)
What else?
vCombines words into a new word:vCream, ice cream, ice cream cone, ice cream
cone bakery
vWord formation is productivevGoogle, Googler, to google, to misgoogle, to
googlefy, googlificationvGoogle Map, Google Book, …
6501 Natural Language Processing 9
![Page 10: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/10.jpg)
Morphological parsing and generation
vMorphological parsing:
vMorphological generationvWhat words can be generated from grace?
grace, graceful, gracefully, disgrace, ungrace, undisgraceful, undisgracefully
6501 Natural Language Processing 10
![Page 11: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/11.jpg)
Finite State Automata
vFSA and regular expression has the same expressive power
vThe above FSA accepts string r/baa+!/
6501 Natural Language Processing 11
![Page 12: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/12.jpg)
Finite State Automata
v Terminology: v It has 5 statesv Alphabet: {b, a, !}v Start state: 𝑞"v Accept state: 𝑞#v 5 transitions
v Are there other machines that correspond to the same language r/baa+!/ ? v Yes
6501 Natural Language Processing 12
Alphabet justmeansafinitesetofsymbols intheinput
Canhavemanyacceptstates
![Page 13: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/13.jpg)
Formal definition
vYou can specify an FSA by enumerating the following things.vThe set of states: QvA finite alphabet: ΣvA start statevA set of accept/final statesvA transition function that maps QxΣ to Q
6501 Natural Language Processing 13
![Page 14: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/14.jpg)
Example -- dollars and Cents
6501 Natural Language Processing 14
![Page 15: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/15.jpg)
Yet another view – table representation
6501 Natural Language Processing 15
b a ! e0 11 22 2,33 44
Ifyou’reinstate1andyou’relookingatana,gotostate2
![Page 16: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/16.jpg)
Non-Deterministic FSA
v 𝜖- transitionvMore than one possible next statesvEquivalent to deterministic FSA
6501 Natural Language Processing 16
![Page 17: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/17.jpg)
Regular expression
vEquivalent to FSAvMatching strings with regular expressions
(e.g., perl, python, grep)v translating the regular expression into a machine (a
table) and v passing the table and the string to an interpreter
6501 Natural Language Processing 17
![Page 18: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/18.jpg)
Model morphology with FSA
vRegular singular nouns are okvRegular plural nouns have an -s on the endv Irregulars are ok as is
6501 Natural Language Processing 18
![Page 19: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/19.jpg)
Now plug in the words
6501 Natural Language Processing 19
![Page 20: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/20.jpg)
Derivational Rules
6501 Natural Language Processing 20
![Page 21: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/21.jpg)
From recognition to parsing
vNow we can use these machines to recognize strings
vCan we use the machines to assign a structure to a string? (parsing)
vExample:vFrom “cats” to “cat +N +p”
6501 Natural Language Processing 21
![Page 22: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/22.jpg)
Transitions
v c:c reads a c and write a cv ε:+N reads nothing and write +N
6501 Natural Language Processing 22
c:c a:a t:t ε: +N s: +p
![Page 23: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/23.jpg)
Challenge: Ambiguity
v books: book +N +p or book +V +z (3rd
person)vNon-deterministic FSA: allows multiple
paths through a machine lead to the same accept state
vBias the search (or learn) so that a few likely paths are explored
6501 Natural Language Processing 23
![Page 24: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/24.jpg)
Challenge: Spelling rules
v The underlying morphemes (e.g., plural-s)can have different surface realization (-s, -es)v cat+s = catsv fox+s = foxesv Make+ing = making
v How can we model it?
6501 Natural Language Processing 24
![Page 25: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/25.jpg)
Intermediate representation
6501 Natural Language Processing 25
![Page 26: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/26.jpg)
Overall Scheme
vOne FST that has explicit informationabout the lexiconvLexical level to intermediate forms
v Large set of machinesthat capture spelling rulesv Intermediate forms to surface
6501 Natural Language Processing 26
![Page 27: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/27.jpg)
Lexical to intermediate level
6501 Natural Language Processing 27
![Page 28: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/28.jpg)
Intermediate level to surface
vThe add and “e” rule for –svExample: fox^s# ↔ foxes#
6501 Natural Language Processing 28
![Page 29: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/29.jpg)
Other application of FST
v ELIZA: https://en.wikipedia.org/wiki/ELIZAv Implemented using pattern matching -- FST
6501 Natural Language Processing 29
![Page 30: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/30.jpg)
ELIZA as a FST cascade
Human: You don't argue with me.Computer: WHY DO YOU THINK I DON'T ARGUE WITH YOU
A simple rule:v1. Replace you with I and me with you:
I don't argue with you.v2. Replace <...> with Why do you think <...>:
Why do you think I don't argue with you.
6501 Natural Language Processing 30
![Page 31: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure](https://reader031.vdocument.in/reader031/viewer/2022021821/5af8755a7f8b9aff288bd430/html5/thumbnails/31.jpg)
What about compounds?
vCompounds have heretical structure:v (((ice cream) cone) bakery) not
(ice ((cream cone) bakery))v ((computer science) (graduate student)) not
(computer ((science graduate) student))
vWe need context-free grammars to capturethis underlying structure
6501 Natural Language Processing 31