from bits to bots: women everywhere, leading the way

18
FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós School of Computer Science

Upload: blake-house

Post on 31-Dec-2015

26 views

Category:

Documents


0 download

DESCRIPTION

FROM BITS TO BOTS: Women Everywhere, Leading the Way. Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós School of Computer Science Carnegie Mellon University. AVENUE Automatic Machine Translation for low-density languages. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: FROM BITS TO BOTS: Women Everywhere, Leading the Way

FROM BITS TO BOTS: Women Everywhere, Leading

the Way

Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin,

Bernardine Dias, Ariadna Font Llitjós

School of Computer Science

Carnegie Mellon University

Page 2: FROM BITS TO BOTS: Women Everywhere, Leading the Way

AVENUEAutomatic Machine

Translation for low-density languages

Ariadna Font Llitjós

Language Technologies Institute

Page 3: FROM BITS TO BOTS: Women Everywhere, Leading the Way

Automatic Machine Translation

Interlingua

Transfer rules

Corpus-based methodsanalysis

interpretation

generation

Page 4: FROM BITS TO BOTS: Women Everywhere, Leading the Way

Low-density languages

• Not endangered languages, but languages with little or no presence in the web, little or no linguistic resources

• AVENUE is currently working with:– Mapudungun [Chile]– Inupiaq [Alaska]– Aymara, Quechua and Aguaruna [Peru]– Siona [Colombia]

Page 5: FROM BITS TO BOTS: Women Everywhere, Leading the Way

Mapudungun for the Mapuche

ChileOfficial Language: SpanishPopulation: ~15 million

~1/2 million Mapuche people

Language: Mapudungun

Page 6: FROM BITS TO BOTS: Women Everywhere, Leading the Way

The language: Mapudungun

• Oral tradition (170 hours of recorded speech in the medical domain)

• Just a few written texts exist• Need to standardize the alphabet, determine

phoneme set and writing rules, develop an electronic dictionary

• We provide them with linguistic and technical advice + tools such as a morphological analyzer, parser and ultimately an MTS

• We work in collaboration with a local team in Temuco

Page 7: FROM BITS TO BOTS: Women Everywhere, Leading the Way

Our last meeting in Temuco, May 2002

Page 8: FROM BITS TO BOTS: Women Everywhere, Leading the Way

New approach to MT

• Fully automatic (no human intervention)

• Very little electronic data available elicitation corpus

• Machine learning techniques– Seeded version space algorithm to

automatically learn transfer rules– Interactive and Automatic refinement of

Transfer rules

Page 9: FROM BITS TO BOTS: Women Everywhere, Leading the Way

Elicitation corpus sample…\spa Una mujer se quedó en casa\map Kie domo mlewey ruka mew\eng One woman stayed at home.

\spa V una mujer\map Pen kie domo\eng I saw one woman.

\spa Hay suficiente comida para una mujer\map Mley iagel i yochiluwam kie domo\eng There is enough food for one woman.…

Page 10: FROM BITS TO BOTS: Women Everywhere, Leading the Way

Automatic Learning of a Transfer-based MTS

Elicitation corpus

SVS algorithm

Transfer module

tentativeTransfer

rules

Rule Refinement

module

SL sentences(tentative)

TL sentences

Kathrin Probst

Erik Peterson Ariadna Font

Page 11: FROM BITS TO BOTS: Women Everywhere, Leading the Way

Interactive and Automatic rule refinement

1. Given an MTS, translate sentences and present them to the users for minimal correction (interface design, MT error classification)

2. Determine blame assignment

3. Structure learning, as opposed to binary feedback, to automatically refine the existing rules

Page 12: FROM BITS TO BOTS: Women Everywhere, Leading the Way

Interactive Learning• Translation Correction Tool, web application• Bilingual informants (no knowledge of

linguistics assumed)• User-friendly and Intuitive interface

• Can naïve users reliably pinpoint the source of errors? MT error classification realistic?

• Need of user studies:– Spanish - English– English - Spanish– English - Chinese

Page 13: FROM BITS TO BOTS: Women Everywhere, Leading the Way
Page 14: FROM BITS TO BOTS: Women Everywhere, Leading the Way
Page 15: FROM BITS TO BOTS: Women Everywhere, Leading the Way

Structure learning

• Given user feedback (correction + error classification) and blame assignment, modify the appropriate transfer rule(s) to obtain correct translation

• Need to evaluate based on cross-validation, number of sentences it can translate correctly (elicitation corpus)

Learn mapping between incorrect structures and correct structures:

She saw high woman She saw the tall woman

Page 16: FROM BITS TO BOTS: Women Everywhere, Leading the Way

A simple example Spanish SLS: Ella vio a la mujer altaEnglish TLS: She saw high womanCorrected TLS: She saw the tall woman

• MT error classification: missing determiner + wrong lexical selection

• Blame assignment (NP rule that generated the direct object + selectional restrictions)

• Rule refinement: the Noun Phrase (NP) rule that generated the error:

NP -> Adj Nneeds to be refined into 2 different cases:

NP -> Det Adj N[sg] (the tall woman)NP -> (Det) Adj N[pl] ((the)? tall women)

Page 17: FROM BITS TO BOTS: Women Everywhere, Leading the Way

AVENUE project members LTI team:

Researchers  Ph. D. students Jaime Carbonell Ariadna Font Llitjós Lori Levin Christian Monson Alon Lavie Erik Peterson

Ralf Brown Katharina Probst Avenue External Project Coordinator  Rodolfo M Vega,

Chilean team:Eliseo Cañulef Luis Caniupil Huaiquiñir Hugo Carrasco Marcela Collio Calfunao Rosendo Huisca Cristian Carrillan Anton

Hector Painequeo Salvador Cañulef Flor Caniupil Claudio Millacura

Page 18: FROM BITS TO BOTS: Women Everywhere, Leading the Way

Thanks!

For more information:

http://www.cs.cmu.edu/~aria/avenue/