speech and language technologies in the next generation localisation cset prof. andy way, school of...

45
Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Upload: angel-willis

Post on 18-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Speech and Language Technologies in the Next Generation Localisation CSET

Prof. Andy Way, School of Computing, DCU

Page 2: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Overview of Presentation

Speech & Language Technologies in the NGL CSET

Page 3: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Overview of Presentation

Speech & Language Technologies in the NGL CSET

Facilitating Optimal Multilingual NGL Applications

Page 4: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Overview of Presentation

Speech & Language Technologies in the NGL CSET

Facilitating Optimal Multilingual NGL Applications

Key Research Challenges

Page 5: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Overview of Presentation

Speech & Language Technologies in the NGL CSET

Facilitating Optimal Multilingual NGL Applications

Key Research Challenges

Novel Research Tracks

Page 6: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Overview of Presentation

Speech & Language Technologies in the NGL CSET

Facilitating Optimal Multilingual NGL Applications

Key Research Challenges

Novel Research Tracks

Typical LSP’s Translation Process

Page 7: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Overview of Presentation

Speech & Language Technologies in the NGL CSET

Facilitating Optimal Multilingual NGL Applications

Key Research Challenges

Novel Research Tracks

Typical LSP’s Translation Process

Key Integration Challenges

Page 8: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Overview of Presentation

Speech & Language Technologies in the NGL CSET

Facilitating Optimal Multilingual NGL Applications

Key Research Challenges

Novel Research Tracks

Typical LSP’s Translation Process

Key Integration Challenges

Concluding Remarks

Page 9: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

ILT - Integrated Language Technologies

NextGenerationLocalisation

SystemsFramework

Ent

erp

rise

Lo

calis

atio

n

Per

son

alis

ed L

ocal

isat

ion

Unified Model

DigitalContentManagement

IntegratedLanguageTechnologies

Prof. Andy WayILT Area Coordinator

Page 10: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

ILT: Facilitating Optimal Multilingual NGL Applications

Machine Translation

Text Input

Text Output

Text Processing

e.g. bulk localisation

Page 11: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

ILT: Facilitating Optimal Multilingual NGL Applications

Speech TechnologiesMachine Translation

Text Input

Text OutputSpeech Output

Speech Input

Text Processing

e.g. bulk localisation e.g. personalisation

Page 12: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Machine Translation: Significance

For our industrial partners, volume of material needing translation increasing, while budgets remain the sameIn the EU, now 23 official languages (506 language pairs), and expanding …In the US, huge investment in translation between Arabic, Chinese and UrduEnglish …

Page 13: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Machine Translation: Significance

For our industrial partners, volume of material needing translation increasing, while budgets remain the sameIn the EU, now 23 official languages (506 language pairs), and expanding …In the US, huge investment in translation between Arabic, Chinese and UrduEnglish …

Automation the only option (especially for PL) …

Page 14: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Enhanced Translation Quality

MT: Key Research Challenges

Enhanced Translation Quality

Faster Translation Times

Scalability

Other Modalities (Speech, SMS etc.)

Page 15: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

The State-of-the-Art

Source:

Reference: The two sides highlighted the role of the World Trade Organization (WTO)

Baseline: The two sides on the role of the WTO

Page 16: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Improving the State-of-the-Art

Our MT systems have knowledge of syntaxParts of speech (nouns, verbs etc.)Roles in sentences (subject, object etc.)

better translation quality

Source:

Reference: The two sides highlighted the role of the World Trade Organization (WTO)

Baseline: The two sides on the role of the WTO

Our System: The two sides reaffirmed the role of the WTO

Page 17: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

The State-of-the-Art

Source:

Reference: Mahmoud Abbas: The wall and settlements will not bring Israel security

Baseline: Mahmoud Abbas, the wall and settlements will provide security to Israel

Our System: Mahmoud Abbas, the wall and settlements will not provide security for Israel

Page 18: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Improving the State-of-the-Art

better translation quality (especially where end-users are concerned)

DCU ArabicEnglish system ranked first at international MT evaluation in Oct. 2007

Source:

Reference: Mahmoud Abbas: The wall and settlements will not bring Israel security

Baseline: Mahmoud Abbas, the wall and settlements will provide security to Israel

Our System: Mahmoud Abbas, the wall and settlements will not provide security for Israel

Page 19: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

MT Novel Research: Handling Different Types of Text

Translating patent applications, or doctors’ prescriptions, or visa applications: different tasks, as the content is different …

So is the form …

Page 20: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

MT Novel Research: Handling Different Types of Text

Translating patent applications, or doctors’ prescriptions, or visa applications: different tasks, as the content is different …

So is the form …

Build different MT systems for each different task, using our industrial partners’ documentation

Page 21: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Text Processing: Significance and Challenges

If texts are automatically annotated with:

syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM)

Page 22: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Text Processing: Significance and Challenges

If texts are automatically annotated with:

syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM)

text-type and genre information, this helps our MT systems disambiguate text and improve translation quality

Page 23: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Text Processing: Significance and Challenges

If texts are automatically annotated with:

syntactic information (e.g. subject, object), today’s MT systems can learn syntax required for improved output quality and improved processing of multilingual queries (DCM)

text-type and genre information, this helps our MT systems disambiguate text and improve translation quality

localisation information (e.g. <DNT>Andy Way</DNT>), then the workflows of our industrial partners (currently done manually) can be significantly improved (cf. LOC)

Page 24: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Speech Technology: Significance

Speech interfaces for eyes-busy, hands-busy scenairos

Speech recognition and synthesis systems which can deal withpotentially an unlimited vocabularymultiple (and non-native) speakersmultiple languages

and can be tightly integrated with MT

localisation & personalisation

volume & scalability

access

Page 25: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

the more it snows the more it goes…

them ore its nows them ore it goes?themo rei tsn ow sthe mo reitg o es?

themoreitsnowsthemoreitgo

es

Speech Technology: Challenges

Page 26: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

the more it snows the more it goes…

them ore its nows them ore it goes?themo rei tsn ow sthe mo reitg o es?

themoreitsnowsthemoreitgo

esdemoreisnowsdemoregoes

Speech Technology: Challenges

Page 27: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

themoreitsnowsthemoreitgo

es

linguistic competence of native speaker

“rules” and vocabulary of

system

performance of (native) speaker

Speech Technology: Challenges

the more it snows the more it goes…

them ore its nows them ore it goes?themo rei tsn ow sthe mo reitg o es?

demoreisnowsdemoregoes

Page 28: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

themoreitsnowsthemoreitgo

esthe more it snows the more it goes…

linguistic competence of native speaker

them ore its nows them ore it goes?themo rei tsn ow sthe mo reitg o es?

“rules” and vocabulary of

system

performance of (native) speaker

Speech Technology: Innovations

which integrates explicit linguistic knowledge

Robust & Novel Speech

Recognition Engine

demoreisnowsdemoregoes

Page 29: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

themoreitsnowsthemoreitgo

esdetverkarhavaritenstorstormhurmån

the more it snows the more it goes…

linguistic competence of native speaker

them ore its nows them ore it goes?themo rei tsn ow sthe mo reitg o es?

“rules” and vocabulary of

system

Jemehreschneitdesto

mehresgeht

Innovations: Speech Recognition & MT

Robust & Novel Speech

Recognition Engine

Tight coupling with MT Engines

which integrates explicit linguistic knowledge

Page 30: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

themoreitsnowsthemoreitgo

esdetverkarhavaritenstorstormhurmån

Jemehreschneit

destomehres

geht

Innovations: MT & Speech Synthesis

Robust & Novel Speech

Synthesis Enginewhich integrates explicit linguistic knowledge

Tight coupling with MT Engines

Page 31: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Typical LSP’s Translation Process

Freelance Translators

Step 2: Post-editing &

translation

In-house Translators

Incoming documents

(segmented)

Partially Translated Documents, with confidence rating

for segments

Translation Memory

DB

Step 1: Translation

Memory

Step 3: Documents Validation & Finalization

Requirement: Requirement: minimal disruption minimal disruption

of this processof this process

& Machine Translation TM match score < 50 %:

expensive50 % < TM match score < 70 %: medium

TM match score > 70 %: cheap

Page 32: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Key Integration Challenges

Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]

Page 33: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Key Integration Challenges

Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]

Linking MT automatic evaluation metrics with post-editing cost

Page 34: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Key Integration Challenges

Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]

Linking MT automatic evaluation metrics with post-editing cost

Ensuring that MT omissions are highlighted

Page 35: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Key Integration Challenges

Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]

Linking MT automatic evaluation metrics with post-editing cost

Ensuring that MT omissions are highlighted

Enforcing customer terminology

Page 36: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Key Integration Challenges

Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]

Linking MT automatic evaluation metrics with post-editing cost

Ensuring that MT omissions are highlighted

Enforcing customer terminology

Deal with markup, tags …

Page 37: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Key Integration Challenges

Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]

Linking MT automatic evaluation metrics with post-editing cost

Ensuring that MT omissions are highlighted

Enforcing customer terminology

Deal with markup, tags …

Produce true-cased translations

Page 38: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Key Integration Challenges

Use MT to automatically upgrade some TM matches to a cheaper cost class, cf. Dynamic Translation Memory [Bicici and Dymetman, 2008]

Linking MT automatic evaluation metrics with post-editing cost

Ensuring that MT omissions are highlighted

Enforcing customer terminology

Deal with markup, tags …

Produce true-cased translations

Integrate into pre-existing workflows!

Page 39: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Concluding Remarks

For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students

Page 40: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Concluding Remarks

For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students

Large interest from industrial partners, both large and small

Page 41: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Concluding Remarks

For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students

Large interest from industrial partners, both large and small

Input from LOC, DCM and SF

Page 42: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Concluding Remarks

For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students

Large interest from industrial partners, both large and small

Input from LOC, DCM and SF

Significant role in CNGL demonstrators

Page 43: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Concluding Remarks

For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students

Large interest from industrial partners, both large and small

Input from LOC, DCM and SF

Significant role in CNGL demonstrators

Research tools Industrial prototypes

Page 44: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Concluding Remarks

For ILT, ramp up almost complete, c. over 30 new researchers in addition to pre-existing PIs, postdoctoral researchers and PhD students

Large interest from industrial partners, both large and small

Input from LOC, DCM and SF

Significant role in CNGL demonstrators

Research tools Industrial prototypes

Well placed to succeed in going ‘beyond TMs’ …

Page 45: Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU

Speech & Language Technologies in the NGL CSET

Thanks for listening!

Questions?

http://www.cngl.ie

[email protected]