taus mt showcase, moses past, present and future, hieu hoang, university of edinburgh, 12 june 2013
DESCRIPTION
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. For the latest updates, follow us on Twitter - #MosesCoreTRANSCRIPT
![Page 1: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/1.jpg)
TAUS MACHINE TRANSLATION SHOWCASE
Moses Past, Present and Future 09:20 – 09:40 Wednesday, 12 June 2013 Hieu Hoang University of Edinburgh
![Page 2: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/2.jpg)
Sta$s$cal Machine Transla$on with Moses
Hieu Hoang Localiza$on World 2013
0.6227
![Page 3: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/3.jpg)
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 3
![Page 4: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/4.jpg)
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 4
![Page 5: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/5.jpg)
What is Sta$s$cal Machine Transla$on?
It is very temp,ng to say that a book wri5en in Chinese is simply a book wri5en in English which was coded into the “Chinese code.” If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpreta,on we already have useful methods for transla,on?
Warren Weaver 1949
Moses by Hieu Hoang, University of Edinburgh 5
![Page 6: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/6.jpg)
• NLP Applica$on – search engines, text mining etc.
• Big-‐data – bi-‐text from the Internet
• eg. mul$lingual websites, documents
– large monolingual data
• Learn to translate – from previous transla$ons – models of language
What is Sta$s$cal Machine Transla$on?
Moses by Hieu Hoang, University of Edinburgh 6
![Page 7: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/7.jpg)
What is Sta$s$cal Machine Transla$on? Training
Training Data Linguis$c Tools bi-‐text monolingual data dic$onary
SMT System transla$on model language model lots of numbers…
Using
Source Text
SMT System transla$on model language model lots of numbers…
§
Source Text
Moses by Hieu Hoang, University of Edinburgh 7
![Page 8: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/8.jpg)
What is a model?
Moses by Hieu Hoang, University of Edinburgh 8
thanks to Precision Transla$on Tools
• Transla$on Model • Language Model – (of the target language)
![Page 9: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/9.jpg)
What is a model? • Transla$on model – source à transla$on – probability
Moses by Hieu Hoang, University of Edinburgh 9
source target probability
den Vorschlag the proposal 0.6227
‘s proposal 0.1068
a proposal 0.0341
the idea 0.0250
this proposal 0.0227
proposal 0.0205
…. ….
![Page 10: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/10.jpg)
What is a model? • Language model – Likelihood of sentence – in target language
Moses by Hieu Hoang, University of Edinburgh 10
text probability
I would like 0.489
would like to 0.905
like to commend 0.002
to commend the 0.472
commend the rapporteur
0.147
…. ….
![Page 11: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/11.jpg)
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 11
![Page 12: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/12.jpg)
What is Moses?
• Replacement for Pharoah – Academic so_ware – Closed-‐source
• Open source • Re-‐wriaen, clean code – More features
• Large developer community – Ini$ated by Hieu Hoang – Developed at NLP Workshop
Moses by Hieu Hoang, University of Edinburgh 12
![Page 13: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/13.jpg)
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Timeline – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 13
![Page 14: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/14.jpg)
What is Moses?
• Only for Linux • Difficult to use • Unreliable • Only phrase-‐based • Developed by one person • Slow
Common Misconcep$ons
Moses by Hieu Hoang, University of Edinburgh 14
![Page 15: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/15.jpg)
Only works on Linux
• Tested on – Windows 7 (32-‐bit) with Cygwin 6.1 – Mac OSX 10.7 with MacPorts – Ubuntu 12.10, 32 and 64-‐bit – Debian 6.0, 32 and 64-‐bit – Fedora 17, 32 and 64-‐bit – openSUSE 12.2, 32 and 64-‐bit
• Project files for – Visual Studio – Eclipse on Linux and Mac OSX
Moses by Hieu Hoang, University of Edinburgh 15
![Page 16: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/16.jpg)
Difficult to use • Easier compile and install – Boost bjam – No installa$on required
• Binaries available for – Linux – Mac – Windows/Cygwin – Moses + Friends
• IRSTLM • GIZA++ and MGIZA
• Ready-‐made models trained on Europarl Moses by Hieu Hoang, University of
Edinburgh 16
![Page 17: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/17.jpg)
Unreliable • Monitor check-‐ins • Unit tests • More regression tests • Nightly tests – Run end-‐to-‐end training – hap://www.statmt.org/moses/cruise/
• Tested on all major OSes • Train Europarl models – Phrase-‐based, hierarchical, factored – 8 language-‐pairs – hap://www.statmt.org/moses/RELEASE-‐1.0/models/
Moses by Hieu Hoang, University of Edinburgh 17
![Page 18: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/18.jpg)
Only phrase-‐based model – replacement for Pharoah – extension of Pharaoh
• From the beginning – Factored models – Lamce and confusion network input – Mul$ple LMs, mul$ple phrase-‐tables
• since 2009 – Hierarchical model – Syntac$c models
Moses by Hieu Hoang, University of Edinburgh 18
![Page 19: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/19.jpg)
Developed by one person • ANYONE can contribute
– 50 contributors
‘git blame’ of Moses repository
0% 5% 10% 15% 20% 25% 30% 35% 40%
Moses by Hieu Hoang, University of Edinburgh 19
![Page 20: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/20.jpg)
Slow
thanks to Ken!!
Decoding
-101.7
-101.6
-101.5
-101.4
1 2 3 4 5
Mod
elscore
CPU seconds/sentence excluding loading
Mosescdec
Joshua
Moses by Hieu Hoang, University of Edinburgh 20
![Page 21: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/21.jpg)
Slow
• Mul$threaded
• Reduced disk IO – compress intermediate files
• Reduce disk space requirement
Time (mins) 1-‐core 2-‐cores 4-‐cores 8-‐cores Size (MB)
Phrase-‐based
60 47 (79%)
37 (63%)
33 (56%)
893
Hierarchical 1030 677 (65%)
473 (45%)
375 (36%)
8300
Training
Moses by Hieu Hoang, University of Edinburgh 21
![Page 22: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/22.jpg)
What is Moses? Common Misconcep$ons
• Only for Linux • Difficult to use • Unreliable • Only phrase-‐based • Developed by one person • Slow
Moses by Hieu Hoang, University of Edinburgh 22
![Page 23: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/23.jpg)
What is Moses?
• Only for Linux Windows, Linux, Mac • Difficult to use Easier compile and install • Unreliable Mul$-‐stage tes$ng • Only phrase-‐based Hierarchical, syntax model • Developed by one person everyone • Slow Fastest decoder, mul$threaded training, less IO
Common Misconcep$ons
Moses by Hieu Hoang, University of Edinburgh 23
![Page 24: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/24.jpg)
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 24
![Page 25: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/25.jpg)
Coming up…
Moses by Hieu Hoang, University of Edinburgh 25
• Code cleanup • Incremental Training • Beaer transla$on – smaller model – bigger data – faster training and decoding
• Applica$ons – CAT tools – Speech transla$on
![Page 26: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/26.jpg)
Applica$ons
• EU Project – CASMACAT – MATECAT
Moses by Hieu Hoang, University of Edinburgh 26
Computer-‐Aided Transla$on
![Page 27: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/27.jpg)
Agenda
• What is Sta$s$cal Machine Transla$on? • What is Moses? – Common misconcep$ons
• Coming up • What can we do for you?
Moses by Hieu Hoang, University of Edinburgh 27
![Page 28: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/28.jpg)
What can we do for you?
– simpler Moses – graphical interface – Windows compa$bility – terminology and glossary – incremental training
• What can you do for us? – code – data – funding
Moses by Hieu Hoang, University of Edinburgh 28
![Page 29: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013](https://reader033.vdocument.in/reader033/viewer/2022052601/558bfdd3d8b42ae47e8b461b/html5/thumbnails/29.jpg)
What can we do for you?
– simpler Moses – graphical interface – Windows compa$bility – terminology and glossary – incremental training
• What can you do for us? – code – data – funding
Moses by Hieu Hoang, University of Edinburgh 29