taus open source machine translation showcase, paris, gustavo lucardi, trusted translations, 4 june...

11
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE Moses: The Trusted Translations Experience 15:30-15:45 Monday 4 June Gustavo Lucardi Trusted Translations

Upload: taus-enabling-better-translation

Post on 17-Jan-2015

638 views

Category:

Technology


0 download

DESCRIPTION

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. For the latest updates, follow us on Twitter - #MosesCore

TRANSCRIPT

Page 1: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE

Moses: The Trusted Translations Experience15:30-15:45Monday 4 June

Gustavo LucardiTrusted Translations

Page 2: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

Moses: The Trusted Translations

Experience

Gustavo LucardiCOO Trusted Translations, Inc.@glucardi

TAUS Open Source Machine TranslationPre Conference WorkshopsLocalization World ParisJune 4, 2012

Page 3: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

TAUS Open Source Machine TranslationPre Conference Workshops

Localization World Paris 2012

•What Exactly are We Doing?

•Unexpected Experiences and Issues

•Evaluating Complementary Tools

•Our Next Steps with Moses

•Our Conclusions about Moses up to Now

•Looking Back What Would We Do Differently

•What other LSPs could learn from Our Experience

Moses from the Point of View of an LSP

Moses: The Trusted Translations Experience

Page 4: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

TAUS Open Source Machine TranslationPre Conference Workshops

Localization World Paris 2012

• The real answer is LEARNING MT ASAP

• If we are asked to choose, we prefer Open Source

• We built two Moses engines:

▫ Legal English to Spanish (many clients) and Technical English to Spanish (1 client)

▫ We are re-training those engines with our post-editors feedback

▫ But, we are also re-building those engines with different Moses Customizations

• Evaluating Complementary Tools

▫ Like SymEval among others

• Testing other MT solutions

▫ Pangea MT (Pangeanic), DoMY (Translation Precision Tools), Smart MATE (Applied Languages), Language Studio (Asia Online), Tauyou, LetsMT, and others

What exactly are we doing?

Moses: The Trusted Translations Experience

Page 5: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

TAUS Open Source Machine TranslationPre Conference Workshops

Localization World Paris 2012

• The lack of information and documented procedures on how post-editing should be done (There is a very helpful TAUS Paper about that and we also went to the TAUS Pre Conference this morning)

• The difficulty of measuring human post-editing quality and productivity (SymEval was helpful for that)

• The difficulty to measure the quality of an Engine using BLEU (For example an Engine with a BLEU of 70 can have a worse output than Engines with a BLEU of 40)

• The difficulty of aligning existing Parallel DATA without human intervention

• The fear of translators to be displaced by MT

• MT focus is improving Quality and Time, not Costs.

• MT + PE is not enough = We are doing MT + PE + E and/or P

Unexpected Experiences and Issues

Moses: The Trusted Translations Experience

Page 6: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

TAUS Open Source Machine TranslationPre Conference Workshops

Localization World Paris 2012

• We want to mention SymEval. It has 3 components: Eval, Extract and Diff.

• Eval is used with TMX files and it produces an XML file highlighting the differences between the two versions.

• Also the Project Score can be seen as the percentage of the machine translation that was used (Two identical files would have a score of 100)

• We notice that it can help judging either the quality of the machine translation or the quality of the post-editing. In other words:

▫ If you have a good Engine you could use Eval to test your Post-editor

▫ If you have a good Post-Editor you could use Eval to test your Engines

• SymEval is not very user-friendly, and there is not enough support documentation, For example, in the help page it says that the Eval tool only supports XLIFF files, but it actually does support also TMX

• We find a more user-friendly tool in a Memory Server that we were testing, but we need to make the Memory Server speak with Moses yet

Evaluating Complementary Tools

Moses: The Trusted Translations Experience

Page 7: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

TAUS Open Source Machine TranslationPre Conference Workshops

Localization World Paris 2012

• Scripts to Train (Automate through scripts the pre-processing of DATA to train Moses engines: Take out Placeables like Numbers, Acronyms, XML Tags, E-mail addresses and URLs / Remove Short Segments and Long Segments / Spell check on Source and Target / Remove segments where target and source have different structures)

• Scripts to Translate (Automate through scripts the pre and post-processing for translations with Moses engines: Go from TMX to TXT and from TXT to TMX)

• Plug-in for Moses interaction with CAT TOOLS (Create or find PLUG-INS to use Moses directly through CAT TOOLS, like the ones for Okapi or Globalsight. This could allow us to avoid to go to and from TMX and TXT. Plug-ins for Moses like the plug-ins for Trados to connect with FreeTranslation, Systran and others)

• Moses GUI (Create or find a user interface to allow the less tech savvy to interact directly with Moses)

• Interaction of Moses with Cloud Memory Server and Workbench Online

• Add the possibility that the Post Editor chose different MT Engines for each segment

Our Next steps with Moses

Moses: The Trusted Translations Experience

Page 8: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

TAUS Open Source Machine TranslationPre Conference Workshops

Localization World Paris 2012

• Moses has great potential, but it is not yet optimized for LSPs

• Its interface is not user-friendly (in fact there is no GUI yet)

• There are no reliable sources of information for certain niches (a lot of work is still needed to obtain a reliable corpus in some niches with the tools that exist at the moment)

• We still have to wait for the development of the new features that LSPs require (glossaries, blacklist words, labels)

• It is necessary to optimize the internal processes of the LSPs to adapt to the needs of Moses (Management of Memories, product names and trademarks, languages)

• It's better to work with an engine for a specific client, than with one for a specific industry or area of specialization

Our Conclusions about Moses up to Now

Moses: The Trusted Translations Experience

Page 9: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

TAUS Open Source Machine TranslationPre Conference Workshops

Localization World Paris 2012

• Start using a Memory Server before

• A Memory Server is the best solution to be sure that your company always properly manages translation memories

• Memories are:▫ The basic input for an LSP to customize an Engine

▫ The added value created by an LSP for a certain client or a domain

• Start testing with Moses one year early (We started in 2010 when we should have started in 2009, but it’s never too late!)

• Dedicate more resources to test all the other more mature implementations

• Build Engines for Clients and not for Domains (???)

Looking Back What Would We Do Differently

Moses: The Trusted Translations Experience

Page 10: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Gustavo Lucardi, Trusted Translations, 4 June 2012

TAUS Open Source Machine TranslationPre Conference Workshops

Localization World Paris 2012

• LSPs can only profit from running their own MT engines if they have a recurrent customer with enough data volume to build an engine and perfect engines on a specific domain over time

• General (non-specific) MTs (i.e., Google Translate, Microsoft) will provide better results for one time jobs or for recurrent clients with low data volume

• The effort put to overcome the problems found during the deployment stage on integrated MT customizations was pretty similar to the effort needed to set up Moses

• The current proprietary MTs customizations that we tested have yet to travel some distance in order to offer a turnkey solution for an LSP (We prefer Linux to Windows, and for now we prefer Ubuntu to RedHat)

What other LSPs Could Learn from Our Experience

Moses: The Trusted Translations Experience