comp_morpho.pdf
TRANSCRIPT
-
Computational Morphology
Pawan Goyal
CSE, IIT Kharagpur
August 14, 2014
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 1 / 34
-
Prosody and orthography
Orthography: The conventional spelling system of a language.Prosody: The pattern of sounds in a language.
Case for EnglishA given morpheme is represented with a single orthography despite the factthat it has different surface phonetic representations in different contexts.Ex: The past tense suffix -ed is so written despite three distinct phoneticrealizations in three clearly defined contexts:
/t/ e.g. dip, dipped/d/ e.g. boom, boomed/1d/ e.g. loot, looted
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 2 / 34
-
Prosody and orthography
Orthography: The conventional spelling system of a language.Prosody: The pattern of sounds in a language.
Case for EnglishA given morpheme is represented with a single orthography despite the factthat it has different surface phonetic representations in different contexts.
Ex: The past tense suffix -ed is so written despite three distinct phoneticrealizations in three clearly defined contexts:
/t/ e.g. dip, dipped/d/ e.g. boom, boomed/1d/ e.g. loot, looted
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 2 / 34
-
Prosody and orthography
Orthography: The conventional spelling system of a language.Prosody: The pattern of sounds in a language.
Case for EnglishA given morpheme is represented with a single orthography despite the factthat it has different surface phonetic representations in different contexts.Ex: The past tense suffix -ed is so written despite three distinct phoneticrealizations in three clearly defined contexts:
/t/ e.g. dip, dipped/d/ e.g. boom, boomed/1d/ e.g. loot, looted
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 2 / 34
-
Prosody and orthography
Orthography: The conventional spelling system of a language.Prosody: The pattern of sounds in a language.
Case for SanskritAn advanced discipline of phonetics explicitly described prosodic changes,these prosodic changes, well known by the term sandhi, are represented inwriting.
Ex: past passive participle suffix -ta variously realized as ta or dha dependingsolely upon the phonetic context, is written as follows:
/ta/ e.g. from su press, suta pressed/dha/ e.g. from budh awake, buddha awakened
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 3 / 34
-
Prosody and orthography
Orthography: The conventional spelling system of a language.Prosody: The pattern of sounds in a language.
Case for SanskritAn advanced discipline of phonetics explicitly described prosodic changes,these prosodic changes, well known by the term sandhi, are represented inwriting.Ex: past passive participle suffix -ta variously realized as ta or dha dependingsolely upon the phonetic context, is written as follows:
/ta/ e.g. from su press, suta pressed/dha/ e.g. from budh awake, buddha awakened
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 3 / 34
-
Morphology
Morphology studies the internal structure of words, how words are built upfrom smaller meaningful units called morphemes
dogs2 morphemes, dog and s
s is a plural marker on nouns
unladylike3 morphemes
un- not
lady well-behaved woman
-like having the characteristic of
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 4 / 34
-
Morphology
Morphology studies the internal structure of words, how words are built upfrom smaller meaningful units called morphemes
dogs2 morphemes, dog and s
s is a plural marker on nouns
unladylike3 morphemes
un- not
lady well-behaved woman
-like having the characteristic of
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 4 / 34
-
Morphology
Morphology studies the internal structure of words, how words are built upfrom smaller meaningful units called morphemes
dogs2 morphemes, dog and s
s is a plural marker on nouns
unladylike3 morphemes
un- not
lady well-behaved woman
-like having the characteristic of
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 4 / 34
-
Allomorphs
Variants of the same morpheme, but cannot be replaced by one another
ExamplePlural morphemes: cat-s, judge-s, dog-s
opposite: un-happy, in-comprehensible, im-possible, ir-rational
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 5 / 34
-
Bound and Free Morphemes
BoundCannot appear as a word by itself.-s (dog-s), -ly (quick-ly), -ed (walk-ed)
FreeCan appear as a word by itself; often can combine with other morphemes too.house (house-s), walk (walk-ed), of, the, or
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 6 / 34
-
Bound and Free Morphemes
BoundCannot appear as a word by itself.-s (dog-s), -ly (quick-ly), -ed (walk-ed)
FreeCan appear as a word by itself; often can combine with other morphemes too.house (house-s), walk (walk-ed), of, the, or
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 6 / 34
-
Stems and Affixes
Stems and AffixesStems (roots): The core meaning bearing units
Affixes: Bits and pieces adhering to stems to change their meanings andgrammatical functions
Mostly, stems are free morphemes and affixes are bound morphemes
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 7 / 34
-
Stems and Affixes
Stems and AffixesStems (roots): The core meaning bearing units
Affixes: Bits and pieces adhering to stems to change their meanings andgrammatical functions
Mostly, stems are free morphemes and affixes are bound morphemes
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 7 / 34
-
Types of affixes
Prefix: un-, anti-, etc (a-, ati-, pra- etc.)un-happy, pre-existing
Suffix: -ity, -ation, etc (-taa, -ke, -ka etc.)talk-ing, quick-ly
Infix: n in vindati (he knows), as contrasted with vid (to know).Philippines: basa read b-um-asa readEnglish: abso-bloody-lutely (emphasis)
Circumfixes - precedes and follow the stemDutch: berg mountain, ge-berg-te mountains
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 8 / 34
-
Types of affixes
Prefix: un-, anti-, etc (a-, ati-, pra- etc.)un-happy, pre-existing
Suffix: -ity, -ation, etc (-taa, -ke, -ka etc.)talk-ing, quick-ly
Infix: n in vindati (he knows), as contrasted with vid (to know).Philippines: basa read b-um-asa readEnglish: abso-bloody-lutely (emphasis)
Circumfixes - precedes and follow the stemDutch: berg mountain, ge-berg-te mountains
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 8 / 34
-
Types of affixes
Prefix: un-, anti-, etc (a-, ati-, pra- etc.)un-happy, pre-existing
Suffix: -ity, -ation, etc (-taa, -ke, -ka etc.)talk-ing, quick-ly
Infix: n in vindati (he knows), as contrasted with vid (to know).Philippines: basa read b-um-asa readEnglish: abso-bloody-lutely (emphasis)
Circumfixes - precedes and follow the stemDutch: berg mountain, ge-berg-te mountains
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 8 / 34
-
Types of affixes
Prefix: un-, anti-, etc (a-, ati-, pra- etc.)un-happy, pre-existing
Suffix: -ity, -ation, etc (-taa, -ke, -ka etc.)talk-ing, quick-ly
Infix: n in vindati (he knows), as contrasted with vid (to know).Philippines: basa read b-um-asa readEnglish:
abso-bloody-lutely (emphasis)
Circumfixes - precedes and follow the stemDutch: berg mountain, ge-berg-te mountains
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 8 / 34
-
Types of affixes
Prefix: un-, anti-, etc (a-, ati-, pra- etc.)un-happy, pre-existing
Suffix: -ity, -ation, etc (-taa, -ke, -ka etc.)talk-ing, quick-ly
Infix: n in vindati (he knows), as contrasted with vid (to know).Philippines: basa read b-um-asa readEnglish: abso-bloody-lutely (emphasis)
Circumfixes - precedes and follow the stemDutch: berg mountain, ge-berg-te mountains
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 8 / 34
-
Types of affixes
Prefix: un-, anti-, etc (a-, ati-, pra- etc.)un-happy, pre-existing
Suffix: -ity, -ation, etc (-taa, -ke, -ka etc.)talk-ing, quick-ly
Infix: n in vindati (he knows), as contrasted with vid (to know).Philippines: basa read b-um-asa readEnglish: abso-bloody-lutely (emphasis)
Circumfixes - precedes and follow the stemDutch: berg mountain, ge-berg-te mountains
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 8 / 34
-
Content and functional morphemes
Content morphemesCarry some semantic contentcar, -able, un-
Functional morphemesProvide grammatical information-s (plural), -s (3rd singular)
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 9 / 34
-
Content and functional morphemes
Content morphemesCarry some semantic contentcar, -able, un-
Functional morphemesProvide grammatical information-s (plural), -s (3rd singular)
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 9 / 34
-
Inflectional and Derivational Morphology
Two different kind of relationship among words
Inflectional morphologyGrammatical: number, tense, case, genderCreates new forms of the same word: bring, brought, brings, bringing
Derivational morphologyCreates new words by changing part-of-speech: logic, logical, illogical,illogicality, logicianFairly systematic but some derivations missing: sincere - sincerity, scarce -scarcity, curious - curiosity, fierce - fiercity?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 10 / 34
-
Inflectional and Derivational Morphology
Two different kind of relationship among words
Inflectional morphologyGrammatical: number, tense, case, genderCreates new forms of the same word: bring, brought, brings, bringing
Derivational morphologyCreates new words by changing part-of-speech: logic, logical, illogical,illogicality, logicianFairly systematic but some derivations missing: sincere - sincerity, scarce -scarcity, curious - curiosity, fierce - fiercity?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 10 / 34
-
Inflectional and Derivational Morphology
Two different kind of relationship among words
Inflectional morphologyGrammatical: number, tense, case, genderCreates new forms of the same word: bring, brought, brings, bringing
Derivational morphologyCreates new words by changing part-of-speech: logic, logical, illogical,illogicality, logician
Fairly systematic but some derivations missing: sincere - sincerity, scarce -scarcity, curious - curiosity, fierce - fiercity?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 10 / 34
-
Inflectional and Derivational Morphology
Two different kind of relationship among words
Inflectional morphologyGrammatical: number, tense, case, genderCreates new forms of the same word: bring, brought, brings, bringing
Derivational morphologyCreates new words by changing part-of-speech: logic, logical, illogical,illogicality, logicianFairly systematic but some derivations missing: sincere - sincerity, scarce -scarcity, curious - curiosity, fierce - fiercity?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 10 / 34
-
Morphological processes
ConcatenationAdding continuous affixes - the most common process:
hope+less, un+happy, anti+capital+ist+s
Often, there are phonological/graphemic changes on morpheme boundaries:
book + s [s], shoe + s [z]
happy +er happier
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 11 / 34
-
Morphological processes
ConcatenationAdding continuous affixes - the most common process:
hope+less, un+happy, anti+capital+ist+s
Often, there are phonological/graphemic changes on morpheme boundaries:
book + s [s], shoe + s [z]
happy +er happier
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 11 / 34
-
Morphological processes
Reduplication: part of the word or the entire word is doubled
Nama: go (look), go-go (examine with attention)
Tagalog: basa (read), ba-basa(will read)
Sanskrit: pac (cook), papaca (perfect form, cooked)
Phrasal reduplication (Telugu): pillavad. u nad. ustu nad. ustu pad. i poyad. u(The child fell down while walking)
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 12 / 34
-
Morphological processes
Reduplication: part of the word or the entire word is doubled
Nama: go (look), go-go (examine with attention)
Tagalog: basa (read), ba-basa(will read)
Sanskrit: pac (cook), papaca (perfect form, cooked)
Phrasal reduplication (Telugu): pillavad. u nad. ustu nad. ustu pad. i poyad. u(The child fell down while walking)
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 12 / 34
-
Morphological processes
Reduplication: part of the word or the entire word is doubled
Nama: go (look), go-go (examine with attention)
Tagalog: basa (read), ba-basa(will read)
Sanskrit: pac (cook), papaca (perfect form, cooked)
Phrasal reduplication (Telugu): pillavad. u nad. ustu nad. ustu pad. i poyad. u(The child fell down while walking)
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 12 / 34
-
Morphological processes
Reduplication: part of the word or the entire word is doubled
Nama: go (look), go-go (examine with attention)
Tagalog: basa (read), ba-basa(will read)
Sanskrit: pac (cook), papaca (perfect form, cooked)
Phrasal reduplication (Telugu): pillavad. u nad. ustu nad. ustu pad. i poyad. u(The child fell down while walking)
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 12 / 34
-
Morphological processes
Reduplication: part of the word or the entire word is doubled
Nama: go (look), go-go (examine with attention)
Tagalog: basa (read), ba-basa(will read)
Sanskrit: pac (cook), papaca (perfect form, cooked)
Phrasal reduplication (Telugu): pillavad. u nad. ustu nad. ustu pad. i poyad. u(The child fell down while walking)
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 12 / 34
-
Morphological processes
Suppletionirregular relation between the wordsgo - went, good - better
Morpheme internal changesThe word changes internallysing - sang - sung, man - men, goose - geese
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 13 / 34
-
Morphological processes
Suppletionirregular relation between the wordsgo - went, good - better
Morpheme internal changesThe word changes internallysing - sang - sung, man - men, goose - geese
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 13 / 34
-
Word Formation
CompoundingWords formed by comnining two or more wordsExample in English:
Adj + Adj Adj: bitter-sweetN + N N: rain-bowV + N V: pick-pocketP + V V: over-do
Particular to languagesroom-temperature: Hindi translation?
Can be non-compositionalasva-karn. a (horse -ear?)name of a medicinal plant
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 14 / 34
-
Word Formation
CompoundingWords formed by comnining two or more wordsExample in English:
Adj + Adj Adj: bitter-sweetN + N N: rain-bowV + N V: pick-pocketP + V V: over-do
Particular to languagesroom-temperature: Hindi translation?
Can be non-compositionalasva-karn. a (horse -ear?)name of a medicinal plant
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 14 / 34
-
Word Formation
CompoundingWords formed by comnining two or more wordsExample in English:
Adj + Adj Adj: bitter-sweetN + N N: rain-bowV + N V: pick-pocketP + V V: over-do
Particular to languagesroom-temperature: Hindi translation?
Can be non-compositionalasva-karn. a (horse -ear?)
name of a medicinal plant
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 14 / 34
-
Word Formation
CompoundingWords formed by comnining two or more wordsExample in English:
Adj + Adj Adj: bitter-sweetN + N N: rain-bowV + N V: pick-pocketP + V V: over-do
Particular to languagesroom-temperature: Hindi translation?
Can be non-compositionalasva-karn. a (horse -ear?)name of a medicinal plant
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 14 / 34
-
Word Formation
Acronymslaser: Light Amplification by Simulated Emission of Radiation
BlendingParts of two different words are combined
breakfast + lunch brunchsmoke + fog smogmotor + hotel motel
ClippingLonger words are shorteneddoctor, laboratory, advertisement, dormitory, examination, bicycle, refrigerator
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 15 / 34
-
Word Formation
Acronymslaser: Light Amplification by Simulated Emission of Radiation
BlendingParts of two different words are combined
breakfast + lunch brunchsmoke + fog smogmotor + hotel motel
ClippingLonger words are shorteneddoctor, laboratory, advertisement, dormitory, examination, bicycle, refrigerator
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 15 / 34
-
Word Formation
Acronymslaser: Light Amplification by Simulated Emission of Radiation
BlendingParts of two different words are combined
breakfast + lunch brunchsmoke + fog smogmotor + hotel motel
ClippingLonger words are shortened
doctor, laboratory, advertisement, dormitory, examination, bicycle, refrigerator
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 15 / 34
-
Word Formation
Acronymslaser: Light Amplification by Simulated Emission of Radiation
BlendingParts of two different words are combined
breakfast + lunch brunchsmoke + fog smogmotor + hotel motel
ClippingLonger words are shorteneddoctor, laboratory, advertisement, dormitory, examination, bicycle, refrigerator
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 15 / 34
-
Processing morphology
Lemmatization: word lemmasaw {see, saw}
Morphological analysis : word setOf(lemma +tag)saw { , < saw, noun.sg>}Tagging: word tag, considers contextPeter saw her { }Morpheme segmentation: de-nation-al-iz-ation
Generation: see + verb.past saw
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 16 / 34
-
Processing morphology
Lemmatization: word lemmasaw {see, saw}Morphological analysis : word setOf(lemma +tag)saw { , < saw, noun.sg>}
Tagging: word tag, considers contextPeter saw her { }Morpheme segmentation: de-nation-al-iz-ation
Generation: see + verb.past saw
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 16 / 34
-
Processing morphology
Lemmatization: word lemmasaw {see, saw}Morphological analysis : word setOf(lemma +tag)saw { , < saw, noun.sg>}Tagging: word tag, considers contextPeter saw her { }
Morpheme segmentation: de-nation-al-iz-ation
Generation: see + verb.past saw
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 16 / 34
-
Processing morphology
Lemmatization: word lemmasaw {see, saw}Morphological analysis : word setOf(lemma +tag)saw { , < saw, noun.sg>}Tagging: word tag, considers contextPeter saw her { }Morpheme segmentation: de-nation-al-iz-ation
Generation: see + verb.past saw
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 16 / 34
-
Processing morphology
Lemmatization: word lemmasaw {see, saw}Morphological analysis : word setOf(lemma +tag)saw { , < saw, noun.sg>}Tagging: word tag, considers contextPeter saw her { }Morpheme segmentation: de-nation-al-iz-ation
Generation: see + verb.past saw
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 16 / 34
-
What are the applications?
Text-to-speech synthesis:lead:
verb or noun?read: present or past?
Search and information retrieval
Machine translation, grammar correction
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 17 / 34
-
What are the applications?
Text-to-speech synthesis:lead: verb or noun?read:
present or past?
Search and information retrieval
Machine translation, grammar correction
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 17 / 34
-
What are the applications?
Text-to-speech synthesis:lead: verb or noun?read: present or past?
Search and information retrieval
Machine translation, grammar correction
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 17 / 34
-
Morphological Analysis
GoalTo take input forms like those in the first column and produce output forms likethose in the second column.Output contains stem and additional information; +N for noun, +SG forsingular, +PL for plural, +V for verb etc.
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 18 / 34
-
Morphological Analysis
GoalTo take input forms like those in the first column and produce output forms likethose in the second column.Output contains stem and additional information; +N for noun, +SG forsingular, +PL for plural, +V for verb etc.
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 18 / 34
-
Issues involved
boy boys
fly flys flies (y i rule)
Toiling toilDuckling duckl?
Getter get + erDoes do + erBeer be + er?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 19 / 34
-
Issues involved
boy boysfly flys flies (y i rule)
Toiling toilDuckling duckl?
Getter get + erDoes do + erBeer be + er?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 19 / 34
-
Issues involved
boy boysfly flys flies (y i rule)
Toiling toil
Duckling duckl?
Getter get + erDoes do + erBeer be + er?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 19 / 34
-
Issues involved
boy boysfly flys flies (y i rule)
Toiling toilDuckling duckl?
Getter get + erDoes do + erBeer be + er?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 19 / 34
-
Issues involved
boy boysfly flys flies (y i rule)
Toiling toilDuckling duckl?
Getter get + erDoes do + er
Beer be + er?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 19 / 34
-
Issues involved
boy boysfly flys flies (y i rule)
Toiling toilDuckling duckl?
Getter get + erDoes do + erBeer be + er?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 19 / 34
-
Knowledge Required
Knowledge of stems or rootsDuck is a possible root, not duckl.We need a dictionary (lexicon)
MorphotacticsWhich class of morphemes follow other classes of orphemes inside the word?Ex: plural morpheme follows the noun
Only some endings go on some wordsDo+er: ok
Be+er: not so
Spelling change rulesAdjust the surface form using spelling change rules
Get + er getterFox + s foxes
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 20 / 34
-
Knowledge Required
Knowledge of stems or rootsDuck is a possible root, not duckl.We need a dictionary (lexicon)
MorphotacticsWhich class of morphemes follow other classes of orphemes inside the word?Ex: plural morpheme follows the noun
Only some endings go on some wordsDo+er: ok
Be+er: not so
Spelling change rulesAdjust the surface form using spelling change rules
Get + er getterFox + s foxes
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 20 / 34
-
Knowledge Required
Knowledge of stems or rootsDuck is a possible root, not duckl.We need a dictionary (lexicon)
MorphotacticsWhich class of morphemes follow other classes of orphemes inside the word?Ex: plural morpheme follows the noun
Only some endings go on some wordsDo+er: ok
Be+er: not so
Spelling change rulesAdjust the surface form using spelling change rules
Get + er getterFox + s foxes
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 20 / 34
-
Knowledge Required
Knowledge of stems or rootsDuck is a possible root, not duckl.We need a dictionary (lexicon)
MorphotacticsWhich class of morphemes follow other classes of orphemes inside the word?Ex: plural morpheme follows the noun
Only some endings go on some wordsDo+er: ok
Be+er: not so
Spelling change rulesAdjust the surface form using spelling change rules
Get + er getterFox + s foxes
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 20 / 34
-
Why cant this be put in a big lexicon?
English: just 317,477 forms from 90,196 lexical entries, a ratio of 3.5:1
Sanskrit: 11 million forms from a lexicon of 170,000 entries, a ratio of64.7:1
New forms can be created, compounding etc.
One of the most common methods is finite-state-machines
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 21 / 34
-
Why cant this be put in a big lexicon?
English: just 317,477 forms from 90,196 lexical entries, a ratio of 3.5:1
Sanskrit: 11 million forms from a lexicon of 170,000 entries, a ratio of64.7:1
New forms can be created, compounding etc.
One of the most common methods is finite-state-machines
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 21 / 34
-
Why cant this be put in a big lexicon?
English: just 317,477 forms from 90,196 lexical entries, a ratio of 3.5:1
Sanskrit: 11 million forms from a lexicon of 170,000 entries, a ratio of64.7:1
New forms can be created, compounding etc.
One of the most common methods is finite-state-machines
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 21 / 34
-
Finite State Automaton (FSA)
What is FSA?A kind of directed graph
Nodes are called states, edges are labeled with symbols (possibly empty)
Start state and accepting states
Recognizes regular languages, i.e., languages specified by regularexpressions
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 22 / 34
-
Finite State Automaton (FSA)
What is FSA?A kind of directed graph
Nodes are called states, edges are labeled with symbols (possibly empty)
Start state and accepting states
Recognizes regular languages, i.e., languages specified by regularexpressions
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 22 / 34
-
Finite State Automaton (FSA)
What is FSA?A kind of directed graph
Nodes are called states, edges are labeled with symbols (possibly empty)
Start state and accepting states
Recognizes regular languages, i.e., languages specified by regularexpressions
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 22 / 34
-
Finite State Automaton (FSA)
What is FSA?A kind of directed graph
Nodes are called states, edges are labeled with symbols (possibly empty)
Start state and accepting states
Recognizes regular languages, i.e., languages specified by regularexpressions
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 22 / 34
-
FSA for nominal inflection in English
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 23 / 34
-
FSA for English Adjectives
Word modeledhappy, happier, happiest, real, unreal, cool, coolly, clear, clearly, unclear,unclearly, ...
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 24 / 34
-
FSA for English Adjectives
Word modeledhappy, happier, happiest, real, unreal, cool, coolly, clear, clearly, unclear,unclearly, ...
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 24 / 34
-
Morphotactics
The last two examples model some parts of the English morphotactics
But what about the information about regular and irregular roots?
LexiconCan we include the lexicon in the FSA?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 25 / 34
-
Morphotactics
The last two examples model some parts of the English morphotactics
But what about the information about regular and irregular roots?
LexiconCan we include the lexicon in the FSA?
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 25 / 34
-
FSA for nominal inflection in English
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 26 / 34
-
After adding a mini-lexicon
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 27 / 34
-
Some properties of FSAs: Elegance
Recognizing problem can be solved in linear time (independent of thesize of the automaton)
There is an algorithm to transform each automaton into a uniqueequivalent automaton with the least number of states
An FSA is deterministic iff it has no empty () transition and for each stateand each symbol, there is at most one applicable transition
Every non-deterministic automaton can be transformed into adeterministic one
Pawan Goyal (IIT Kharagpur) Computational Morphology August 14, 2014 28 / 34