taus mt showcase, sovee smart engine 2.0, a leap beyond base moses technology, scott gaskill, sovee
TRANSCRIPT
Wednesday, 4 June
Sovee Smart Engine 2.0: A Leap Beyond Base Moses Technology
Sco$ Gaskill, Sovee
TAUS Machine TranslaDon Showcase 2014 Dublin (Ireland)
The research within the project MosesCore leading to these results has received funding from the European Union 7th Framework Programme, grant agreement no 288487
4
Where is the world going?
CNNTech, “Google boss: EnDre world will be online by 2020,” April 2013 hXp://www.cnn.com/2013/04/15/tech/web/eric-‐schmidt-‐internet Kenya stat from ITU, 2-‐13. Photo used by permission of Deseret News.
2016 the world will have internet connecDvity By the end of this decade everyone in the world will be on the Web, with Mobile access growing as the preferred interface In Kenya, 99% of Internet connecDons are mobile
5
We are entering the Convergence era: translaDon will be a uDlity embedded in every app, device and screen. Businesses will prosper by finding new customers in new markets…. Consumers will become world-‐wise, communicaDng as if language barriers never existed.
Jaap van der Meer, Director of TAUS, 2013
6
Transla9on Memory – Is More Be?er?
If we simply add an addiDonal 1,000 TM lines to a database of 40-‐60 billion, will we see beXer translaDons?
Knowing how to use the data is key
7
Challenges Technology, approach & process
Progress in first 60 years Progress Needed by 2016
Engines for < 150 Languages Engines for > 6000 languages
< 3% of the world’s content translated
All content translated
Cloud-‐based speed providing more servers for translaDon
92 billion Servers
StaDsDcal translaDon introduced, but “fuzzy logic” does not deliver quality businesses need
Quality improvement to standards required to meet world commerce demand
8
4 ( n(n-‐1) 2 )
Generic SMT
92 million 9.2 billion – based on 100 businesses
92 billion Based on 1000
customers
Not valued as pracDcal – infinite servers required
MT Assets (cascades)
Technology Challenge 6800 languages
Generic SMT
Domain
Generic SMT
Domain
Customer
Generic SMT
Domain
Customer Project
Minimum Server Requirements
9
Accuracy Challenge
Relevant Segments General Corpus Adequacy
Accuracy
General MT (30-‐40%)
TM (40-‐60%)
Post EdiDng (up to 100%)
Preparing new project / import TM / CAT Leverage Exact Fuzzy Match Post Edit Review Deliver to customer
Gather past TM Package and send TM to SMT provider Clean, tokenize, data (prepare data) Train –Tune-‐Test (3Ts) Repeat unDl viewed as acceptable (repeat with customer data each Dme)
10
Post Edi9ng Learning Engine SMT Workflow
Segments are not just a string of text – they are a living learning en99es
Process Real-time Automation and Integration
Sovee Smart Engine 2.0
11
Smart Engine Advantages
Language from
Scratch
Seamless integra9on to Post Edi9ng workflow
Training / Learning
Efficiency Gains (what we have seen) Post ediDng – 50%+ improvement TM /MT management and training – 100% improvement
Update MT on the fly Watch it learn before your eyes
Never leave the post ediDng environment
12
Learned Transla9ons
!"#"$%&'()"*+"&',( -"&".%#((/0.12,((34"52%67(
3662.%67(
-"&".%#(89(
:0+%;&(
(<.*%&;=%>0&(
/2,'0+".(
?.0@"6'(
3,,"'(
9%*,(
Cascading Assets Sovee Smart Engine MT
Learned Segments
Segment output
1 2
Asset Synchrony (CAT Tools)
Post edi9ng interface Smart Engine
13
Asset Push (Past TM)
Real-‐9me progressive transla9on cycle (Sovee MT,
save /push post edits)
1
2
15
Seamless Integra9on
Apps
Websites
eCommerce
elearning
Videos
Podcasts
Sorware
Live chat
Text Messages
“Convergence Era”
Apps
Websites
eCommerce
elearning
Videos
Podcasts
Sorware
Live chat
Text Messages
Japan Sovee Smart Engine TranslaDon
USA
Yukiko (Japan): ホールインワンを決めたよ! Robert (USA): I just scored a hole-‐in-‐one! Original: ホールインワンを決めたよ!
Japan
SNAG
17
Jack Nicklaus Learning Leagues
Languages: Spanish and Japanese In Process: 10 more languages
Video and Training Materials for Golf Instruc9on