![Page 1: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/1.jpg)
Computational support for minority languages using a
typologically oriented questionnaire system
Lori LevinLanguage Technologies Institute
School of Computer ScienceCarnegie Mellon University
Joint work with Jeff Good
![Page 2: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/2.jpg)
Outline
• The AVENUE MT project– Including a list of languages we have worked on
• The elicitation tool– Including which kinds of fonts it works for
• The questionnaire– Including which languages it has been translated into
• Tools for building and revising questionnaires
![Page 3: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/3.jpg)
MT Approaches
Interlingua: introduce-self
Syntactic ParsingPronoun-acc-1-sg chiamare-1sg N
Semantic Analysis
Sentence Planning Text
Generation[np poss-1sg “name”] BE-pres N
SourceMi chiamo Lori
TargetMy name is Lori
Transfer Rules
Direct: SMT, EBMT
AVENUE: Automate Rule Learning
![Page 4: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/4.jpg)
AVENUE Machine Translation System
Type informationSynchronous Context Free
RulesAlignments
x-side constraints
y-side constraints
xy-constraints, e.g. ((Y1 AGR) = (X1 AGR))
;SL: the old man, TL: ha-ish ha-zaqen
NP::NP [DET ADJ N] -> [DET N DET ADJ]((X1::Y1)(X1::Y3)(X2::Y4)(X3::Y2)
((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X3 AGR) = *3-SING)((X3 COUNT) = +)
((Y1 DEF) = *DEF)((Y3 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y4 GENDER)))
Jaime Carbonell (PI), Alon Lavie (Co-PI), Lori Levin (Co-PI)
Rule learning: Katharina Probst
![Page 5: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/5.jpg)
AVENUE
• Rules can be written by hand or learned automatically.
• Hybrid– Rule-based transfer– Statistical decoder– Multi-engine combinations with SMT and EBMT
![Page 6: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/6.jpg)
AVENUE systems(Small and experimental, but tested on unseen data)
• Hebrew-to-English – Alon Lavie, Shuly Wintner, Katharina Probst– Hand-written and automatically learned– Automatic rules trained on 120 sentences perform
slightly better than about 20 hand-written rules.
• Hindi-to-English – Lavie, Peterson, Probst, Levin, Font, Cohen, Monson– Automatically learned– Performs better than SMT when training data is limited
to 50K words
![Page 7: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/7.jpg)
AVENUE systems(Small and experimental, but tested on unseen data)
• English-to-Spanish– Ariadna Font Llitjos– Hand-written, automatically corrected
• Mapudungun-to-Spanish – Roberto Aranovich and Christian Monson– Hand-written
• Dutch-to-English – Simon Zwarts– Hand-written
![Page 8: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/8.jpg)
Outline
• The AVENUE MT projectThe elicitation tool
• The questionnaire
• Tools for building questionnaires
![Page 9: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/9.jpg)
Elicitation
• Get data from someone who is– Bilingual – Literate
• With consistent spelling
– Not experienced with linguistics
![Page 10: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/10.jpg)
English-Hindi Example
Elicitation Tool: Erik Peterson
![Page 11: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/11.jpg)
English-Chinese Example
Note: Translator has to insert spaces between words in Chinese.
![Page 12: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/12.jpg)
English-Arabic Example
![Page 13: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/13.jpg)
Outline
• The AVENUE MT project
• The elicitation toolThe questionnaire
• Tools for building questionnaires
![Page 14: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/14.jpg)
Size of Questionnaire
• Around 3200 sentences
• 20K words
![Page 15: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/15.jpg)
Questionnaire Sample: clause level
• Mary is writing a book for John.• Who let him eat the sandwich?• Who had the machine crush the
car?• They did not make the policeman
run.• Mary had not blinked.• The policewoman was willing to
chase the boy.• Our brothers did not destroy files.• He said that there is not a manual.• The teacher who wrote a textbook
left.• The policeman chased the man
who was a thief.• Mary began to work.
• Tense, aspect, transitivity, animacy
• Questions, causation and permission
• Interaction of lexical and grammatical aspect
• Volitionality
• Embedded clauses and sequence of tense
• Relative clauses
• Phase aspect
![Page 16: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/16.jpg)
Questionnaire Sample: noun phrase level
• The man quit in November.• The man works in the
afternoon.• The balloon floated over the
library.• The man walked over the
platform.• The man came out from
among the group of boys.• The long weekly meeting
ended.• The large bus to the post office
broke down.• The second man laughed.• All five boys laughed.
• Temporal and locative meanings• Quantifiers• Numbers• Combinations of different types of
modifers– My book
• Possession, definiteness– A book of mine
• Possession, indefiniteness
![Page 17: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/17.jpg)
Organization into Minimal Pairs
srcsent: Tú caíste.tgtsent: Eymi ütrünagimi.aligned: ((1,1),(2,2))context: tú = Juan [masculino, 2a persona del singular]comment: You (John) fell
srcsent: Tú estás cayendo.tgtsent: Eymi petu ütrünagimi.aligned: ((1,1),(2 3,2 3))context: tú = Juan [masculino, 2a persona del singular]comment: You (John) are falling
srcsent: Tú caíste .tgtsent: Eymi ütrunagimi.aligned: ((1,1),(2,2))context: tú = María [femenino, 2a persona del singular]comment: You (Mary) fell
![Page 18: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/18.jpg)
Feature Detection: Spanish
The girl saw a red book.((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))La niña vió un libro rojo
A girl saw a red book((1,1)(2,2)(3,3)(4,4)(5,6)(6,5))Una niña vió un libro rojo
I saw the red book((1,1)(2,2)(3,3)(4,5)(5,4))Yo vi el libro rojo
I saw a red book.
((1,1)(2,2)(3,3)(4,5)(5,4)) Yo vi un libro rojo
Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: noMarked-on-dependent: yesMarked-on-governor: noMarked-on-other: noAdd/delete-word: noChange-in-alignment: no
![Page 19: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/19.jpg)
Feature Detection: Chinese
A girl saw a red book.
((1,2)(2,2)(3,3)(3,4)(4,5)(5,6)(5,7)(6,8))
有 一个 女人 看见 了 一本 红色 的 书 。
The girl saw a red book.
((1,1)(2,1)(3,3)(3,4)(4,5)(5,6)(6,7))
女人 看见 了 一本 红色的 书
Feature: definiteness
Values: definite, indefinite
Function-of-*: subject
Marked-on-head-of-*: no
Marked-on-dependent: no
Marked-on-governor: no
Add/delete-word: yes
Change-in-alignment: no
![Page 20: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/20.jpg)
Feature Detection: Chinese
I saw the red book((1, 3)(2, 4)(2, 5)(4, 1)(5, 2))
红色的 书, 我 看见 了
I saw a red book.((1,1)(2,2)(2,3)(2, 4)(4,5)(5,6))我 看见 了 一本 红色的 书 。
Feature: definitenesValues: definite, indefiniteFunction-of-*: objectMarked-on-head-of-*: noMarked-on-dependent: noMarked-on-governor: noAdd/delete-word: yesChange-in-alignment: yes
![Page 21: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/21.jpg)
Feature Detection: Hebrew
A girl saw a red book.((2,1) (3,2)(5,4)(6,3))
ראתה ספר אדוםילדה
The girl saw a red book((1,1)(2,1)(3,2)(5,4)(6,3))
ראתה ספר אדוםהילדה
I saw a red book.((2,1)(4,3)(5,2))
אדוםספרראיתי
I saw the red book.((2,1)(3,3)(3,4)(4,4)(5,3))
האדוםהספרראיתי את
Feature: definitenessValues: definite, indefiniteFunction-of-*: subj, objMarked-on-head-of-*: yesMarked-on-dependent: yesMarked-on-governor: noAdd-word: noChange-in-alignment: no
![Page 22: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/22.jpg)
Feature Detection Feeds into…
• Corpus Navigation: which minimal pairs to pursue next.– Don’t pursue gender in Mapudungun– Do pursue definiteness in Hebrew
• Morphology Learning:– Morphological learner identifies the forms of the morphemes– Feature detection identifies the functions
• Rule learning:– Rule learner will have to learn a constraint for each morpho-
syntactic marker that is discovered• E.g., Adjectives and nouns agree in gender, number, and definiteness
in Hebrew.
![Page 23: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/23.jpg)
Languages
• The set of feature structures with English sentences has been delivered to the Linguistic Data Consortium as part of the Reflex program.
• Translated (by LDC) into:– Thai– Bengali
• Plans to translate into:– Seven “strategic” languages per year for five years.
• As one small part of a language pack (BLARK) for each language.
![Page 24: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/24.jpg)
Languages
• Spanish version in progress at New Mexico State University (Helmreich and Cowie)– Plans to translate into Guarani
• Portuguese version in progress in Brazil (Marcello Modesto)– Plans to translate into Karitiana
• 200 speakers
• Plans to translate into Inupiaq (Kaplan and MacLean)
![Page 25: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/25.jpg)
Previous Elicitation Work
• Pilot corpus– Around 900 sentences– No feature structures
• Mapudungun– Two partial translations
• Quechua– Three translations
• Aymara– Seven translations
• Hebrew• Hindi
– Several translations• Dutch
![Page 26: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/26.jpg)
Feature Structures
• The questionnaire is actually a corpus of feature structures that happen to have English or Spanish sentences attached to them.
![Page 27: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/27.jpg)
Bengali example with feature structure
srcsent: The large bus to the post office broke down. context: tgtsent:
((actor ((modifier ((mod-role mod-descriptor)(mod-role role-loc-general-to))) (np-identifiability identifiable)(np-specificity specific)(np-biological-gender bio-gender-n/a)(np-animacy anim-inanimate)(np-person person-third)(np-function fn-actor)(np-general-type common-noun-type)(np-number num-sg)(np-pronoun-exclusivity inclusivity-n/a)(np-pronoun-antecedent antecedent-n/a)(np-distance distance-neutral)))
(c-general-type declarative-clause)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)(c-comparator-function comparator-n/a)(c-causee-control control-n/a)(c-our-situations situations-n/a)(c-comparand-type comparand-n/a)(c-causation-directness directness-n/a)(c-source source-neutral)(c-causee-volitionality volition-n/a)(c-assertiveness assertiveness-neutral)(c-solidarity solidarity-neutral)(c-polarity polarity-positive)(c-v-grammatical-aspect gram-aspect-neutral)(c-adjunct-clause-type adjunct-clause-type-n/a)(c-v-phase-aspect phase-aspect-neutral)(c-v-lexical-aspect activity-accomplishment)(c-secondary-type secondary-neutral)(c-event-modality event-modality-none)(c-function fn-main-clause)(c-minor-type minor-n/a)(c-copula-type copula-n/a)(c-v-absolute-tense past)(c-power-relationship power-peer)(c-our-shared-subject shared-subject-n/a)(c-question-gap gap-n/a))
![Page 28: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/28.jpg)
Why feature structures?
• Decide what grammatical meaning to elicit.
• Represent it in a feature structure.
• Formulate an English or Spanish sentence that expresses that meaning.– We can use the same corpus of feature
structures for several elicitation languages
• Have the informant translate it.
![Page 29: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/29.jpg)
Grammatical meanings vs syntactic categories
• Features and values are based on a collection of grammatical meanings– Many of which are similar to the
grammatemes of the Prague Treebanks
![Page 30: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/30.jpg)
Grammatical Meanings
YES• Semantic Roles• Identifiability• Specificity• Time
– Before, after, or during time of speech
• Modality
NO• Case• Voice• Determiners• Auxiliary verbs
![Page 31: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/31.jpg)
Grammatical Meanings
YES• How is identifiability
expressed?– Determiner– Word order– Optional case marker– Optional verb agreement
• How is specificity expressed?
• How are generics expressed?
• How are predicate nominals marked?
NO• How are English
determiners translated?– The boy cried.– The lion is a fierce beast.– I ate a sandwich.– He is a soldier.
• Il est soldat.
![Page 32: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/32.jpg)
Argument Roles
• Actor
• Undergoer
• Predicate and predicatee– The woman is the manager.
• Recipient– I gave a book to the students.
• Beneficiary– I made a phone call for Sam.
![Page 33: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/33.jpg)
Why not subject and object?
• Languages use their voice systems for different purposes.
• Mapudungun obligatorily uses an inverse marked verb when third person acts on first or second person.– Verb agrees with undergoer– Undergoer exhibits other subjecthood properties– Actor may be object.
• Yes: How are actor and undergoer encoded in combination with other semantic features like adversity (Japanese) and person (Mapudungun)?
• No: How is English voice translated into another language?
![Page 34: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/34.jpg)
Argument Roles
• Accompaniment– With someone– With pleasure
• Material– (out) of wood
• About 20 more roles – From the Lingua checklist; Comrie & Smith (1977)– Many also found in tectogrammatical representations in the
Prague Treebanks
• Around 80 locative relations– From Lingua checklist
• Many temporal relations
![Page 35: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/35.jpg)
Noun Phrase Features
• Person• Number• Biological gender• Animacy• Distance (for deictics)• Identifiability• Specificity• Possession• Other semantic roles
– Accompaniment, material, location, time, etc.
• Type– Proper, common, pronoun
• Cardinals• Ordinals• Quantifiers• Given and new
information– Not used yet because of
limited context in the elicitation tool.
![Page 36: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/36.jpg)
Clause level features
• Tense• Aspect
– Lexical, grammatical, phase
• Type– Declarative, open-q,
yes-no-q
• Function– Main, argument,
adjunct, relative
• Source– Hearsay, first-hand,
sensory, assumed
• Assertedness– Asserted,
presupposed, wanted
• Modality– Permission, obligation– Internal, external
![Page 37: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/37.jpg)
Other clause types(Constructions)
• Causative– Make/let/have someone do something
• Predication– May be expressed with or without an overt copula.
• Existential– There is a problem.
• Impersonal– One doesn’t smoke in restaurants in the US.
• Lament– If only I had read the paper.
• Conditional• Comparative• Etc.
![Page 38: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/38.jpg)
Outline
• The AVENUE MT project
• The elicitation tool
• The questionnaireTools for building questionnaires
![Page 39: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/39.jpg)
Mar 1, 2006
The Process
List of semantic features and values
The Corpus
Feature Maps: which combinations of features and values are of interest
…Clause-Level
Noun-Phrase
Tense & Aspect Modality
Feature Structure Sets
Feature Specification
Reverse Annotated Feature Structure Sets: add English sentences
Smaller CorpusSampling
![Page 40: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/40.jpg)
Feature Specification
• Defines Features and their values
• Sets default values for features
• Specifies feature requirements and restrictions
• Written in XML
![Page 41: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/41.jpg)
Feature SpecificationFeature: c-copula-type
(a copula is a verb like “be”; some languages do not have copulas)Values
copula-n/a Restrictions: 1. ~(c-secondary-type secondary-copula)Notes:
copula-role Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A role is something like a job or a function. "He is a teacher" "This is a vegetable peeler"
copula-identity Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "Clark Kent is Superman" "Sam is the teacher"
copula-location Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. "The book is on the table" There is a long list of locative relations later in the feature specification.
copula-description Restrictions: 1. (c-secondary-type secondary-copula)Notes: 1. A description is an attribute. "The children are happy." "The books are long."
![Page 42: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/42.jpg)
Feature Maps
• Some features interact in the grammar– English –s reflects person and number of the subject and tense of
the verb.– In expressing the English present progressive tense, the auxiliary
verb is in a different place in a question and a statement:• He is running.
• Is he running?
• We need to check many, but not all combinations of features and values.
• Using unlimited feature combinations leads to an unmanageable number of sentences
![Page 43: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/43.jpg)
Feature Combination Template((predicatee((np-general-type pronoun-type common-
noun-type)(np-person person-first person-second
person-third)(np-number num-sg num-pl)(np-biological-gender bio-gender-male bio-
gender-female)))
{[(predicate ((np-general-type common-noun-type)
(np-person person-third)))(c-copula-type role)][(predicate ((adj-general-type quality-type)(c-copula-type attributive)))][(predicate ((np-general-type common-
noun-type)(np-person person-third)(c-copula-type identity)))]}
(c-secondary-type secondary-copula) (c-polarity #all)
(c-general-type declarative)(c-speech-act sp-act-state)(c-v-grammatical-aspect gram-aspect-
neutral)(c-v-lexical-aspect state)(c-v-absolute-tense past present future)(c-v-phase-aspect durative))
Summarizes 288 feature structures, which are automatically generated.
![Page 44: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/44.jpg)
Adding Sentences to Feature Structures
srcsent: Mary was not a leader.context: Translate this as though it were spoken to a peer co-
worker;
((actor ((np-function fn-actor)(np-animacy anim-human)(np- biological-gender bio-gender-female) (np-general-type proper-noun-type)(np-identifiability identifiable)(np- specificity specific)…))
(pred ((np-function fn-predicate-nominal)(np-animacy anim- human)(np-biological-gender bio-gender-female) (np- general-type common-noun-type)(np-specificity specificity- neutral)…))
(c-v-lexical-aspect state)(c-copula-type copula-role)(c-secondary-type secondary-copula)(c-solidarity solidarity-neutral) (c-v-grammatical-aspect gram-aspect-neutral)(c-v-absolute-tense past) (c-v-phase-aspect phase-aspect-neutral) (c-general-type declarative-clause)(c-polarity polarity-negative)(c-my-causer-intentionality intentionality-n/a)(c-comparison-type comparison-n/a)(c-relative-tense relative-n/a)(c-our-boundary boundary-n/a)…)
![Page 45: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/45.jpg)
Difficult Issues in Adding Sentences
• Have to remember that the grammatical meanings don’t correspond exactly to English morphemes.– Identifiability and specificity vs the and a– Modality, tense, aspect vs auxiliary verbs
• The meaning has to be clear to a translator.– If English is going to be the source language for
translation, the clearest way to say something may not be the most common way it is said in real text or conversation.
![Page 46: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/46.jpg)
Hard Problems
• Expressing meanings that are not grammaticalized in English.– Evidentiality:
• He stole the bread.• Context: Translate this as if you do not
have first hand knowledge. In English, we might say, “They say that he stole the bread” or “I hear that he stole the bread.”
![Page 47: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/47.jpg)
Hard Problems
• Reverse annotating things that can be said in several ways in English.– Impersonals:
• One doesn’t smoke here.• You don’t smoke here.• They don’t smoke here.• There’s no smoking here.• Credit cards aren’t accepted.
– Problem in the Reflex corpus because space was limited.
![Page 48: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/48.jpg)
Evaluation
• Current funding has not covered evaluation of the questionnaire.– Except for informal observations as it was
translated into several languages.
• Does it elicit the meanings it was intended to elicit?– Informal observation: usually
• Is it useful for machine translation?
![Page 49: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/49.jpg)
Navigation
• Currently, feature combinations are specified by a human.
• Plan to work in active learning mode.– Build seed questionnaire– Translate some data– Do some learning– Identify most valuable pieces of information to get
next– Generate an RTB for those pieces of information– Translate more– Learn more– Generate more, etc.
![Page 50: Computational support for minority languages using a typologically oriented questionnaire system Lori Levin Language Technologies Institute School of Computer](https://reader035.vdocument.in/reader035/viewer/2022070415/5697bf751a28abf838c807dd/html5/thumbnails/50.jpg)
Summary
• Feature Specification: – lists features and values – Grammatical meanings
• Feature Combinations
• Set of Feature Structures
• Add English or Spanish Sentences
• Get a translation and word alignment from a bilingual, literate informant