taus open source machine translation showcase, beijing, yu gong, adobe, 23 april 2012

17
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Moses Tool Set A set of tools based on Adobe technology to simplify your usage of Moses Yu Gong | Software Engineer

Upload: taus-enabling-better-translation

Post on 03-Jul-2015

1.185 views

Category:

Technology


0 download

DESCRIPTION

Moses Tool Set is a set of tools to simplify the usage of Moses. By using this tool, the training process of Moses can be done in an easier and intuitive way. It consists of 4 features: Corpus Clean Tool, Corpus Splitting Tool, Moses Training Harness, and Moses Scoring Harness. Each feature cannot only work independently but be combined into a job, which enables users to complete the whole training process in one click. This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supporetd by the European Commission Grant Number 288487 under the 7th Framework Programme. Latest news on Twitter - #MosesCore

TRANSCRIPT

Page 1: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Moses Tool Set A set of tools based on Adobe technology to simplify your usage of Moses Yu Gong | Software Engineer

Page 2: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Agenda

§  Addressing Moses Pain Points

§  Advantages of Moses Tool Set

§  Moses Tool Set Architecture

§  Moses Tool Set Features

§  Useful Resources

§  Q&A

Page 3: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Addressing Moses Pain Points

1.  Corpus Cleaning

2.  Engine Training

3.  Engine Testing

4.  Integrating Moses With Linguistic Platform

Page 4: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Advantages of Moses Tool Set

•  User Friendly

•  Platform Independent

•  Open Source

Page 5: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Moses Tool Set Architecture

Page 6: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Moses Tool Set Features – Corpus Cleaning

Moses  Func*onality  •  Tokenizing  •  Casing    •  Long  Segments  

Adobe  Func*onality  •  Placeholder  Handling  •  URL  Handling  •  Number  Cleaning  •  Duplicate  Line  

Cleaning  •  Weird  Aligned  Pairs  •  Cleaning  by  regular  

expressions  

Page 7: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Moses Tool Set Features – Corpus Splitting & Uploading

Split  Corpus  by  Purpose  •  Training    •  Tuning  •  TesCng  

Upload  Split  Corpus  to  Moses  Server  

Page 8: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Moses Tool Set Features –Training & Tuning

Command Line Pain

Human Unfriendly •  Highly Detailed •  Error Prone •  Difficult To Reproduce

Page 9: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Moses Tool Set Features –Training & Tuning

UI  To  Simplify  Inputs  •  Training  Run  ID  •  Language  Model  

Parameters  •  Corpus  ID  •  Source  &  Target  •  Default  Alignment  •  Default  Reordering  •  Remote  Server  

Page 10: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Moses Tool Set Features –Testing

•  How do you know when an engine is good enough? •  How do you know when it is intrinsically flawed? •  How do you automate comparing a new engine to old ones?

Page 11: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Moses Tool Set Features –Testing

 •  Reliable  Scoring  •  Bleu/Nist/Meteor  

•  Simplified  UI  •  Dynamic  ConnecCon  to  

exisCng  engines  •  Repeatable  

Page 12: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Moses Tool Set Features –Testing

 •  Reliable  Scoring  •  Bleu/Nist/Meteor  

•  Simplified  UI  •  Dynamic  ConnecCon  to  

exisCng  engines  •  Repeatable  

Page 13: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Automation

Corpus Cleaning

Corpus Splitting & Uploading

Training & Tuning

Testing

Page 14: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Localization Workflow Integration

Moses Tooling Chain

Linguistic Platform

Page 15: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Resources

Source Code: http://code.google.com/p/m4loc

Page 16: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Questions

Page 17: TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Yu Gong, Adobe, 23 April 2012

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.