language technologies institute school of computer science carnegie mellon university nsf august 6,...

16
Language Technologies Institute School of Computer Science Carnegie Mellon University NICE: Native language Interpretation and Communication Environment Jaime Carbonell, Lori Levin, Alon Lavie, Language Technologies Institute Carnegie Mellon University {jgc, lsl, alavie}@cs.cmu.edu

Upload: maryann-barker

Post on 23-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

NICE: Native language Interpretation and Communication Environment

Jaime Carbonell, Lori Levin, Alon Lavie, Language Technologies Institute

Carnegie Mellon University{jgc, lsl, alavie}@cs.cmu.edu

Page 2: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

Machine Translation of Indigenous Languages

• Policy makers have access to information about indigenous people.– Epidemics, crop failures, etc.

• Indigenous people can participate in – Health care– Education– Government– Internet

without giving up their languages.

Page 3: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

History of NICE

• Arose from a series of joint workshops of NSF and OAS.

• Workshop recommendations: – Create multinational projects using information

technology to:• provide immediate benefits to governments and citizens

• develop critical infrastructure for communication and collaborative research

– training researchers and engineers

– advancing science and technology

Page 4: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

Architecture Diagram

User

Learning Module

ElicitationProcess

Learning Process

TransferRules

Run-Time Module SLInput

SL Parser

TransferEngine

TLGenerator

EBMTEngine

UnifierModule

TLOutput

Page 5: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

EBMT Example

English: I would like to meet her.Mapudungun: Ayükefun trawüael fey engu.

English: The tallest man is my father.Mapudungun: Chi doy fütra chi wentru fey ta inche ñi chaw.

English: I would like to meet the tallest man Mapudungun (new): Ayükefun trawüael Chi doy fütra chi wentru Mapudungun (correct): Ayüken ñi trawüael chi doy fütra wentruengu.

Page 6: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

NICE PartnersLanguage Country Institutions

Mapudungun

(in place)

Chile Universidad de la Frontera, Institute for Indigenous Studies,

Ministry of Education

Iñupiaq

(advanced

discussion)

US (Alaska) Ilisagvik College, Barrow school district, Alaska Rural Systemic Initiative, Trans-Arctic and Antarctic Institute, Alaska Native Language Center

Siona

(discussion)

Colombia OAS-CICAD, Plante, Department of the Interior

Page 7: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

Agreement Between LTI and Institute of Indigenous Studies (IEI),

Universidad De La Frontera, Chile

• Contributions of IEI– Native language knowledge and linguistic

expertise in Mapudungun– Experience in bicultural, bilingual education– Data collection: recording, transcribing,

translating– Orthographic normalization of Mapudungun

Page 8: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

Agreement between LTI and Institute of Indigenous Studies (IEI), Universidad de la

Frontera, Chile

• Contributions of LTI– Develop MT technology for indigenous

languages– Training for data collection and transcription– Partial support for data collection effort

pending funding from Chilean Ministry of Education

– International coordination, technical and project management

Page 9: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

LTI/IEI Agreement

• Continue collaboration on data collection and machine translation technology.

• Pursue focused areas of mutual interest, such as bilingual education.

• Seek additional funding sources in Chile and the US.

Page 10: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

The IEI Team• Coordinator (leader of a bilingual and multicultural education project):

– Eliseo Canulef

• Distinguished native speaker:

– Rosendo Huisca

• Linguists (one native speaker, one near-native)

– Juan Hector Painequeo

– Hugo Carrasco

• Typists/Transcribers

• Recording assistants

• Translators

• Native speaker linguistic informants

Page 11: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

MINEDUC/IEIAgreement Highlights:

Based on the LTI/IEI agreement, the Chilean Ministry of Education agreed to fund the data collection and processing team for the year 2001. This agreement will be renewed each year, as needed.

Page 12: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

MINEDUC/IEI Agreement:Objectives

To evaluate the NICE/Mapudungun proposal for orthography and spelling

To collect an oral corpus that represent the four Mapudungun dialects spoken in Chile. The main domain is primary health, traditional and western.

Page 13: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

MINEDUC/IEI Agreement:Deliverables

An oral corpus of 800 hours recorded, proportional to the demography of each current spoken dialect

120 hours transcribed and translated from Mapudungun to Spanish

A refined proposal for writing Mapudungun

Page 14: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

Nice/Mapudungun:Database

• Writing conventions (Grafemario)• Glossary Mapudungun/Spanish• Bilingual newspaper, 4 issues• Ultimas Familias –memoirs• Memorias de Pascual Coña

– Publishable product with new Spanish translation

• 35 hours transcribed speech• 80 hours recorded speech`

Page 15: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

NICE/Mapudungun:Other Products

• Standardization of orthography: Linguists at UFRO have evaluated the competing orthographies for Mapudungun and written a report detailing their recommendations for a standardized orthography for NICE.

• Training for spoken language collection: In January 2001 native speakers of Mapudungun were trained in the recording and transcription of spoken data.

Page 16: Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University

Underfunded Activities• Data collection

– Colombia (unfunded)

– Chile (partially funded)

• Travel

– More contact between CMU and Chile (UFRO) and Colombia.

• Training

– Train Mapuche linguists in language technologies at CMU.

– Extend training to Colombia

• Refine MT system for Mapudungun and Siona

– Current funding covers research on the MT engine and data collection, but not detailed linguistic analysis