6/16/20151/1/ morphology and finite-state transducers part 2 ics 482: natural language processing...
Post on 20-Dec-2015
221 views
TRANSCRIPT
![Page 1: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/1.jpg)
04/18/23 1/
Morphology and Finite-state Transducers Part 2
ICS 482: Natural Language Processing
Lecture 6Husni Al-Muhtaseb
![Page 2: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/2.jpg)
04/18/23 2/
ICS 482: Natural Language Processing
Lecture 6
Morphology and Finite-state Transducers Part 2
Husni Al-Muhtaseb
الرحيم الرحمن الله بسم
![Page 3: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/3.jpg)
NLP Credits and Acknowledgment
These slides were adapted from presentations of the Authors of the
bookSPEECH and LANGUAGE PROCESSING:
An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
and some modifications from presentations found in the WEB by
several scholars including the following
![Page 4: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/4.jpg)
NLP Credits and Acknowledgment
If your name is missing please contact memuhtaseb
AtKfupm.
Edu.sa
![Page 5: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/5.jpg)
NLP Credits and AcknowledgmentHusni Al-MuhtasebJames MartinJim MartinDan JurafskySandiway FongSong young inPaula MatuszekMary-Angela
PapalaskariDick Crouch Tracy KinL. Venkata
SubramaniamMartin Volk Bruce R. MaximJan HajičSrinath SrinivasaSimeon NtafosPaolo PirjanianRicardo VilaltaTom Lenaerts
Heshaam Feili Björn GambäckChristian Korthals Thomas G.
DietterichDevika
SubramanianDuminda
Wijesekera Lee McCluskey David J. KriegmanKathleen McKeownMichael J. CiaraldiDavid FinkelMin-Yen KanAndreas Geyer-
Schulz Franz J. KurfessTim FininNadjet BouayadKathy McCoyHans Uszkoreit Azadeh Maghsoodi
Khurshid Ahmad
Staffan Larsson
Robert Wilensky
Feiyu Xu
Jakub Piskorski
Rohini Srihari
Mark Sanderson
Andrew Elks
Marc Davis
Ray Larson
Jimmy Lin
Marti Hearst
Andrew McCallum
Nick KushmerickMark CravenChia-Hui ChangDiana MaynardJames Allan
Martha Palmerjulia hirschbergElaine RichChristof Monz Bonnie J. DorrNizar HabashMassimo PoesioDavid Goss-GrubbsThomas K HarrisJohn HutchinsAlexandros PotamianosMike RosnerLatifa Al-Sulaiti Giorgio Satta Jerry R. HobbsChristopher ManningHinrich SchützeAlexander GelbukhGina-Anne Levow Guitao GaoQing MaZeynep Altan
![Page 6: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/6.jpg)
04/18/23 6/
Previous Lectures• 1 Pre-start questionnaire• 2 Introduction and Phases of an NLP system• 2 NLP Applications• 3 Chatting with Alice• 3 Regular Expressions, Finite State Automata• 3 Regular languages• 4 Regular Expressions & Regular languages• 4 Deterministic & Non-deterministic FSAs• 5 Morphology: Inflectional & Derivational• 5 Parsing
![Page 7: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/7.jpg)
04/18/23 7/
Today’s Lecture
• Review of Morphology• Finite State Transducers• Stemming & Porter Stemmer
![Page 8: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/8.jpg)
04/18/23 8/
Reminder: Quiz 1 Next class
• Next time: Quiz – Ch 1!, 2, & 3 (Lecture presentations)– Do you need a sample quiz?
• What is the difference between a sample and a template?• Let me think – It might appear at the WebCt site on late
Saturday.
![Page 9: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/9.jpg)
04/18/23 9/
Introduction
State Machines (no probability)
• Finite State Automata (and Regular Expressions)
• Finite State Transducers
(English)
Morphology
Logical formalisms
(First-Order Logics)
Rule systems (and prob. version)
(e.g., (Prob.) Context-Free Grammars)
Syntax
Pragmatics
Discourse and Dialogue
Semantics
AI planners
![Page 10: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/10.jpg)
04/18/23 10/
English Morphology
• Morphology is the study of the ways that words are built up from smaller meaningful units called morphemes
• morpheme classes– Stems: The core meaning bearing units– Affixes: Adhere to stems to change their
meanings and grammatical functions– Example: unhappily
![Page 11: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/11.jpg)
04/18/23 11/
English Morphology
• We can also divide morphology up into two broad classes– Inflectional– Derivational
• Non English– Concatinative Morphology– Templatic Morphology
![Page 12: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/12.jpg)
04/18/23 12/
Word Classes
• By word class, we have in mind familiar notions like noun, verb, adjective and adverb
• Why to concerned with word classes?– The way that stems and affixes combine is based
to a large degree on the word class of the stem
![Page 13: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/13.jpg)
04/18/23 13/
Inflectional Morphology
• Word building process that serves grammatical function without changing the part of speech or the meaning of the stem
• The resulting word– Has the same word class as the original– Serves a grammatical/ semantic purpose different
from the original
![Page 14: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/14.jpg)
04/18/23 14/
Inflectional Morphology in English
on Nouns• PLURAL -s books• POSSESSIVE -’s Mary’son Verbs• 3 SINGULAR -s s/he knows• PAST TENSE -ed talked• PROGRESSIVE -ing talking• PAST PARTICIPLE -en, -ed written, talkedon Adjectives• COMPARATIVE -er longer• SUPERLATIVE -est longest
![Page 15: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/15.jpg)
04/18/23 15/
Nouns and Verbs (English)
• Nouns are simple– Markers for plural and possessive
• Verbs are slightly more complex– Markers appropriate to the tense of the verb
• Adjectives– Markers for comparative and superlative
![Page 16: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/16.jpg)
04/18/23 16/
Regulars and Irregulars• some words misbehave (refuse to follow the
rules)– Mouse/mice, goose/geese, ox/oxen– Go/went, fly/flew
• The terms regular and irregular will be used to refer to words that follow the rules and those that don’t.
![Page 17: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/17.jpg)
04/18/23 17/
Regular and Irregular Verbs
• Regulars…– Walk, walks, walking, walked, walked
• Irregulars– Eat, eats, eating, ate, eaten– Catch, catches, catching, caught, caught– Cut, cuts, cutting, cut, cut
![Page 18: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/18.jpg)
04/18/23 18/
Derivational Morphology
• word building process that creates new words, either by changing the meaning or changing the part of speech of the stem– Irregular meaning change– Changes of word class
![Page 19: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/19.jpg)
04/18/23 19/
Examples of derivational morphemes in English that change the part of speech
• ful (N → Adj) – pain → painful– beauty → beautiful– truth → truthful– cat → *catful– rain → *rainful
• ment (V → N) establish →
establishment
• ity (Adj → N) – pure → purity
• ly (Adj → Adv) – quick → quickly
• en (Adj → V) – wide → widen
![Page 20: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/20.jpg)
04/18/23 20/
Examples of derivational morphemes in English that change the meaning
• dis- – appear → disappear
• un- – comfortable → uncomfortable
• in- – accurate → inaccurate
• re- – generate → regenerate
• inter- – act → interact
![Page 21: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/21.jpg)
04/18/23 21/
Examples on Derivational Morphology
V → N
compute computer
nominate nominee
deport deportation
computerize computerization
N → V
computer computerize
A → N
furry furriness
apt aptitude
sincere sincerity
N → A
cat catty, catlike
hope hopeless
magic magical
V → A
love lovable
A → V
black blacken
modern modernize
![Page 22: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/22.jpg)
04/18/23 22/
Derivational Examples
• Verb/Adj to Noun
-ation computerize computerization
-ee appoint appointee
-er kill killer
-ness fuzzy fuzziness
![Page 23: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/23.jpg)
04/18/23 23/
Derivational Examples
• Noun/ Verb to Adj
-al Computation Computational
-able Embrace Embraceable
-less Clue Clueless
![Page 24: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/24.jpg)
04/18/23 24/
Compute
• Many paths are possible…• Start with compute
– Computer -> computerize -> computerization– Computation -> computational– Computer -> computerize -> computerizable– Compute -> computee
![Page 25: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/25.jpg)
04/18/23 25/
Templatic Morphology: Root Pattern Examples from Arabic
Word & Transliteration
MeaningWord & Transliteration
Meaning
<naâma> [ He slept [ناَم <naâ'imun> [ Sleeping [نائم�
<yanaâmu> [ He sleeps [يناَم�<munawwamun> [ [منَّو�َم�
Under hypnotic
<nam> [ Sleep [نم� <na'ûmun> [ Late riser [نؤوَم�
<tanwçmun> [ [تنَّويم�
Lulling to sleep <'anwamu> [ [أنَّوَمMore given to sleep
<manaâmun> [ [مناَم�
Dream<nawwaâmun> [ اَم� � [نَّو
The most given to sleep
<nawmatun> [نَّومة]
Of one sleep<manaâmun> [ [مناَم�
Dormitory
<nawwaâmatun> [ [نَّوامة�
Sleeper<'an yanaâma> [ يناَم [أن
That he sleeps
<nawmiyyatun> [ [نَّومية�
Pertaining to sleep
<munawwamun> [ [منَّو!َم�
hypnotic
![Page 26: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/26.jpg)
04/18/23 26/
Morphotactic Models
• English nominal inflection
q0 q2q1
plural (-s)reg-n
irreg-sg-n
irreg-pl-n
•Inputs: cats, goose, geese
•reg-n: regular noun
•irreg-pl-n: irregular plural noun
•irreg-sg-n: irregular singular noun
![Page 27: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/27.jpg)
04/18/23 27/
• Derivational morphology: adjective fragment
q3
q5
q4
q0
q1 q2un-
adj-root1
-er, -ly, -est
adj-root1
adj-root2
-er, -est
• Adj-root1: clear, happy, real
• Adj-root2: big, red
![Page 28: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/28.jpg)
04/18/23 28/
Using FSAs to Represent the Lexicon and Do Morphological Recognition
• Lexicon: We can expand each non-terminal in our NFSA into each stem in its class (e.g. adj_root2 = {big, red}) and expand each such stem to the letters it includes (e.g. red r e d, big b i g)
q0
q1
r e
q2
q4
q3
-er, -est
db
gq5 q6i
q7
![Page 29: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/29.jpg)
04/18/23 29/
Limitations• To cover all of English will require very
large FSAs with consequent search problems– Adding new items to the lexicon means re-
computing the FSA– Non-determinism
• FSAs can only tell us whether a word is in the language or not – what if we want to know more?– What is the stem?– What are the affixes?– We used this information to build our FSA:
can we get it back?
![Page 30: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/30.jpg)
04/18/23 30/
Parsing with Finite State Transducers
• cats cat +N +PL• Kimmo Koskenniemi’s two-level morphology
– Words represented as correspondences between lexical level (the morphemes) and surface level (the orthographic word)
– Morphological parsing :building mappings between the lexical and surface levels
c a t +N +PL
c a t s
![Page 31: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/31.jpg)
04/18/23 31/
Finite State Transducers
• FSTs map between one set of symbols and another using an FSA whose alphabet is composed of pairs of symbols from input and output alphabets
• In general, FSTs can be used for– Translator (Hello:مرحبا)– Parser/generator (Hello:How may I help you?)– To map between the lexical and surface levels of
Kimmo’s 2-level morphology
![Page 32: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/32.jpg)
04/18/23 32/
• FST is a 5-tuple consisting of– Q: set of states {q0,q1,q2,q3,q4} : an alphabet of complex symbols, each is an
i/o pair such that i I (an input alphabet) and o O (an output alphabet) and is in I x O
– q0: a start state– F: a set of final states in Q {q4} (q,i:o): a transition function mapping Q x to
Q– Emphatic Sheep Quizzical Cow
q0 q4q1 q2 q3
b:m a:o a:oa:o !:?
![Page 33: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/33.jpg)
04/18/23 33/
FST for a 2-level Lexicon
• Example
Reg-n Irreg-pl-n Irreg-sg-n
c a t g o:e o:e s e g o o s e
q0 q1 q2 q3c a t
q1 q3 q4q2
se:o e:o e
q0 q5
g
![Page 34: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/34.jpg)
04/18/23 34/
FST for English Nominal Inflection
q0 q7
+PL:^s#
Combining (cascade or composition) this FSA with FSAs for each noun type replaces e.g. reg-n with every regular noun representation in the lexicon
q1 q4
q2 q5
q3 q6
reg-n
irreg-n-sg
irreg-n-pl
+N:
+PL:-s#
+SG:-#
+SG:-#
+N:
+N:
![Page 35: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/35.jpg)
04/18/23 35/
Orthographic Rules and FSTs
• Define additional FSTs to implement rules such as consonant doubling (beg begging), ‘e’ deletion (make making), ‘e’ insertion (watch watches), etc.
Lexical f o x +N +PL
Intermediate f o x ^ s #
Surface f o x e s
![Page 36: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/36.jpg)
04/18/23 36/
• Note: These FSTs can be used for generation as well as recognition by simply exchanging the input and output alphabets (e.g. ^s#:+PL)
![Page 37: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/37.jpg)
04/18/23 37/
FSAs and the Lexicon
• First we’ll capture the morphotactics– The rules governing the ordering of affixes in a
language.
• Then we’ll add in the actual stems
![Page 38: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/38.jpg)
04/18/23 38/
Simple Rules
![Page 39: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/39.jpg)
04/18/23 39/
Adding the Words
But it does not express that:
•Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box)
•Reg nouns ending –y preceded by a consonant change the –y to -i
![Page 40: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/40.jpg)
04/18/23 40/
Derivational Rules
[nouni] eg. hospital[adjal] eg. formal[adjous] eg. arduous[verbj] eg. speculate[verbk] eg. conserve
![Page 41: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/41.jpg)
04/18/23 41/
Parsing/Generation vs. Recognition
• Recognition is usually not quite what we need. – Usually if we find some string in the language we
need to find the structure in it (parsing)– Or we have some structure and we want to produce
a surface form (production/ generation)
![Page 42: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/42.jpg)
04/18/23 42/
In other words
• Given a word we need to find: the stem and its class and properties (parsing)
• Or we have a stem and its class and properties and we want to produce the word (production/generation)
• Example (parsing)– From “cats” to “cat +N +PL”– From “lies” to ……
![Page 43: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/43.jpg)
04/18/23 43/
Applications
• The kind of parsing we’re talking about is normally called morphological analysis
• It can either be – An important stand-alone component of an
application (spelling correction, information retrieval)
– Or simply a link in a chain of processing
![Page 44: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/44.jpg)
04/18/23 44/
Finite State Transducers
• The simple story– Add another tape– Add extra symbols to the transitions
– On one tape we read “cats”, on the other we write “cat +N +PL”, or the other way around.
![Page 45: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/45.jpg)
04/18/23 45/
FSTs
generationparsing
![Page 46: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/46.jpg)
04/18/23 46/
Transitions
• c:c means read a c on one tape and write a c on the other
• +N:ε means read a +N symbol on one tape and write nothing on the other
• +PL:s means read +PL and write an s
c:c a:a t:t +N:ε +PL:s
![Page 47: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/47.jpg)
04/18/23 47/
Typical Uses
• Typically, we’ll read from one tape using the first symbol on the machine transitions (just as in a simple FSA).
• And we’ll write to the second tape using the other symbols on the transitions.
![Page 48: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/48.jpg)
04/18/23 48/
Ambiguity
• Recall that in non-deterministic recognition multiple paths through a machine may lead to an accept state.– Didn’t matter which path was actually traversed
• In FSTs the path to an accept state does matter since different paths represent different parses and different outputs will result
![Page 49: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/49.jpg)
04/18/23 49/
Ambiguity
• What’s the right parse for– Unionizable– Union-ize-able– Un-ion-ize-able
• Each represents a valid path through the derivational morphology machine.
![Page 50: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/50.jpg)
04/18/23 50/
Ambiguity
• There are a number of ways to deal with this problem– Simply take the first output found– Find all the possible outputs (all paths) and return
them all (without choosing)– Bias the search so that only one or a few likely
paths are explored
![Page 51: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/51.jpg)
04/18/23 51/
More Details
• Its not always as easy as – “cat +N +PL” <-> “cats”
• There are geese, mice and oxen• There are also spelling/ pronunciation
changes that go along with inflectional changes
![Page 52: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/52.jpg)
04/18/23 52/
Multi-Tape Machines
• To deal with this we can simply add more tapes and use the output of one tape machine as the input to the next
• So to handle irregular spelling changes we’ll add intermediate tapes with intermediate symbols
![Page 53: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/53.jpg)
04/18/23 53/
Spelling Rules and FSTs
Name Description of Rule Example
Consonant doubling
1-letter consonant doubled before -ing/-ed
beg/begging
E deletion Silent e dropped before
-ing and –edmake/making
E insertion e added after –s, -z, -x,
-ch, -sh before -swatch/watches
Y replacement -y changes to –ie before
-s, and to -i before -edtry/tries
K insertion verbs ending with vowel + -c add -k
panic/panicked
![Page 54: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/54.jpg)
04/18/23 54/
Multi-Level Tape Machines
• We use one machine to transducer between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape
![Page 55: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/55.jpg)
04/18/23 55/
Lexical to Intermediate Level
Machine
![Page 56: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/56.jpg)
04/18/23 56/
FST for the E-insertion Rule: Intermediate to Surface
q0 q3 q4
q5
q1 q2
^:
:e
^:
^:
z, s, xz, s, x
z, s, x
s
#other
z, x#, other
#, other
#
other
s
• The add an “e” rule as in fox^s# <-> foxes #__^/ s
z
s
x
e
MachineMore
![Page 57: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/57.jpg)
04/18/23 57/
Note
• A key feature of this machine is that it doesn’t do anything to inputs to which it doesn’t apply.
• Meaning that: they are written out unchanged to the output tape.
![Page 58: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/58.jpg)
04/18/23 58/
English Spelling Changes
• We use one machine to transduce between the lexical and the intermediate level, and another to handle the spelling changes to the surface tape
![Page 59: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/59.jpg)
04/18/23 59/
Foxes
Machine 1
Machine 2
![Page 60: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/60.jpg)
04/18/23 60/
Overall Plan
![Page 61: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/61.jpg)
04/18/23 61/
Final Scheme: Part 1
![Page 62: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/62.jpg)
04/18/23 62/
Final Scheme: Part 2
![Page 63: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/63.jpg)
04/18/23 63/
Stemming vs Morphology
• Sometimes you just need to know the stem of a word and you don’t care about the structure.
• In fact you may not even care if you get the right stem, as long as you get a consistent string.
• This is stemming… it most often shows up in IR (Information Retrieval) applications
![Page 64: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/64.jpg)
04/18/23 64/
Stemming in IR
• Run a stemmer on the documents to be indexed
• Run a stemmer on users queries• Match
– This is basically a form of hashing
![Page 65: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/65.jpg)
04/18/23 65/
Porter Stemmer
• No lexicon needed• Basically a set of staged sets of rewrite rules
that strip suffixes• Handles both inflectional and derivational
suffixes• Doesn’t guarantee that the resulting stem is
really a stem • Lack of guarantee doesn’t matter for IR
![Page 66: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/66.jpg)
04/18/23 66/
Porter Example
• Computerization– ization -> -ize computerize– ize -> ε computer
• Other Rules– ing -> ε (motoring -> motor)– ational -> ate (relational -> relate)
• Practice: See Poter’s Stemmer at Appendix B and suggest some rules for A KFUPM Arabic Stemmer
![Page 67: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/67.jpg)
04/18/23 67/
Porter Stemmer
• The original exposition of the Porter stemmer did not describe it as a transducer but…– Each stage is separate transducer– The stages can be composed to get one big
transducer
![Page 68: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/68.jpg)
04/18/23 68/
Human Morphological Processing: How do people represent words?
• Hypotheses:– Full listing hypothesis: words listed – Minimum redundancy hypothesis: morphemes
listed
• Experimental evidence:– Priming experiments (Does seeing/ hearing one
word facilitate recognition of another?)– Regularly inflected forms prime stem but not
derived forms – But spoken derived words can prime stems if
they are semantically close (e.g. government/govern but not department/depart)
![Page 69: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/69.jpg)
04/18/23 69/
Reminder: Quiz 1 Next class
• Next time: Quiz – Ch 1!, 2, & 3 (Lecture presentations)– Do you need a sample quiz?
• What is the difference between a sample and a template?• Let me think – It might appear at the WebCt site on late
Saturday.
![Page 70: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/70.jpg)
04/18/23 70/
More Examples
![Page 71: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/71.jpg)
04/18/23 71/
Using FSTs for orthographic rules
#__/ s
z
s
x
e
#
q0 q1 q2 q3 q4
q5:̂#
other
otherZ! = Z, s, x
Z! Z!
Z!
S
#, other
:e
#, other z,x
^:
^:
s
![Page 72: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/72.jpg)
04/18/23 72/
Using FSTs for orthographic rules
fox^s#…we get to q1 with ‘x’
#
q0 q1 q2 q3 q4
q5:̂#
other
otherZ! = Z, s, x
Z! Z!
Z!
S
#, other
:e
#, other z,x
^:
^:
s
![Page 73: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/73.jpg)
04/18/23 73/
Using FSTs for orthographic rules
#
q0 q1 q2 q3 q4
q5:̂#
other
otherZ! = Z, s, x
Z! Z!
Z!
S
#, other
:e
#, other z,x
^:
^:
s
fox^s#…we get to q2 with ‘^’
![Page 74: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/74.jpg)
04/18/23 74/
Using FSTs for orthographic rules
#
q0 q1 q2 q3 q4
q5:̂#
other
otherZ! = Z, s, x
Z! Z!
Z!
S
#, other
:e
#, other z,x
^:
^:
s
fox^s#…we can get to q3 with ‘NULL’
![Page 75: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/75.jpg)
04/18/23 75/
Using FSTs for orthographic rules
#
q0 q1 q2 q3 q4
q5:̂#
other
otherZ! = Z, s, x
Z! Z!
Z!
S
#, other
:e
#, other z,x
^:
^:
s
fox^s#…we also get to q5 with ‘s’but we don’t want to!
![Page 76: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/76.jpg)
04/18/23 76/
#
q0 q1 q2 q3 q4
q5:̂#
other
otherZ! = Z, s, x
Z! Z!
Z!
S
#, other
:e
#, other z,x
^:
^:
s
fox^s#…we also get to q5 with ‘s’but we don’t want to!
So why is this transition there??friend^ship, ?fox^s^s (= foxes’s)
![Page 77: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/77.jpg)
04/18/23 77/
#
q0 q1 q2 q3 q4
q5:̂#
other
otherZ! = Z, s, x
Z! Z!
Z!
S
#, other
:e
#, other z,x
^:
^:
s
fox^s#…q4 with s
![Page 78: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/78.jpg)
04/18/23 78/
#
q0 q1 q2 q3 q4
q5:̂#
other
otherZ! = Z, s, x
Z! Z!
Z!
S
#, other
:e
#, other z,x
^:
^:
s
fox^s#…q0 with # (accepting state)
Back
![Page 79: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/79.jpg)
04/18/23 79/
#
q0 q1 q2 q3 q4
q5:̂#
other
otherZ! = Z, s, x
Z! Z!
Z!
S
#, other
:e
#, other z,x
^:
^:
s
arizona: we leave q0 but return
Other transitions…
![Page 80: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/80.jpg)
04/18/23 80/
#
q0 q1 q2 q3 q4
q5:̂#
other
otherZ! = Z, s, x
Z! Z!
Z!
S
#, other
:e
#, other z,x
^:
^:
s
m i s s ^ s
Other transitions…
![Page 81: 6/16/20151/1/ Morphology and Finite-state Transducers Part 2 ICS 482: Natural Language Processing Lecture 6 Husni Al-Muhtaseb](https://reader036.vdocument.in/reader036/viewer/2022062516/56649d445503460f94a2140c/html5/thumbnails/81.jpg)
04/18/23 81/
الله ورحمة عليكم السالم
اللهم سبحانكال أن أشهد وبحمدكأستغفرك أنت إال إله
اليك وأتَّوب