taus machine translation showcase, machine translation at ebay, ebay, 2014
Post on 19-Jun-2015
528 Views
Preview:
DESCRIPTION
TRANSCRIPT
Machine Translation at
Saša Hasan
eBay Machine Translation Applied Science
TAUS MT Showcase
Vancouver, BC, Oct 29, 2014
Outline
– Introduction to eBay
– MT at eBay:
• Architectural overview
• Quality assurance
– Conclusion
eBay MT 2
Who We Are
– One of the world’s largest online marketplaces
– Connect buyers and sellers globally
– 152+M active users
– 800+M total listings (80% new)
– Enabled Commerce Volume in Q3 2014 was $63 billion
– Cross-border trade grew 27%, representing $14 billion of ECV
– Technical scale:
• 150PB of data storage (Hadoop)
• 10k nodes / 150+k cores
• Log aggregation: 8-10TB per day
• 300+M user queries per day
• 8.6B pages served per day
eBay MT 3
Cross-Border Trade (CBT) Source: Modern Spice Routes – PayPal report
eBay MT 4
Cross-Border Trade (CBT) Source: Modern Spice Routes – PayPal report
eBay MT 5
RU
LATAM
FRITES
DE
PT
Transaction Flow
eBay MT 6
ebay.com
in English
ebay.com
in Russian
Primary use cases
eBay MT 7
Primary use cases
eBay MT 8
Query Translation
(RUEN)
= red shoes
Primary use cases
eBay MT 9
Query Translation
(RUEN)
Item Title
Translation
(ENRU)
= red shoes
Primary use cases (cont’d)
eBay MT 10
Item Description
Translation
(EN RU, on
demand)
Supported languages
eBay MT 11
– QT: Query Translation
– ITT: Item Title Translation
– IDT: Item Description Translation
– M2MT: Member-to-Member Translation
Language QT ITT IDT M2MT
RU↔EN eBay eBay Bing Bing
PT↔EN eBay eBay Bing Bing
ES↔EN eBay eBay Bing Bing
FR↔EN eBay eBay … …
IT↔EN eBay … … …
DE↔EN … … …
EN↔DE … …
Realtime
Near-Realtime
On-demand
DE↔EN
Query
Language
Inventory
Language
Architecture
eBay MT 12
Architecture
eBay MT 13
Teams
eBay MT 14
MT Engineering Orchestration Layer
Deployment
Monitoring
MT Science Data acquisition
Engine training & Analytics
Quality improvements
L10N / MTLS Human translations
Post-Editing
Evaluation & Feedback
Technology stack
– Orchestration layer (Java)
– Core MT based on Moses (XMLRPC):
• Phrase-based decoder w/ out-of-the-box features
• Tuned heavily for translation throughput:
– 20 msec per user search query (online, “realtime”)
– <500 msec per item title (offline, “near-realtime”)
– Caching (MongoDB)
– Translated queries:
• 72M per day
• 99%-ile at 19 msec
– Translated titles:
• 30M per day
• 99%-ile at 100 msec
eBay MT 15
Moses
Technology stack (cont’d)
– In-domain data (Teradata):
• Item titles sampled from data warehouse
• Relevant categories based on #impressions
– User behavioral data (Hadoop):
• Analytics based on click-through statistics
and other user engagement
eBay MT 16
Quality assurance
– Post-edited eBay-specific data sets for training (100+k)
– Human-translated eBay-specific data sets for tuning and testing (1+k)
– Automatic evaluations:
• Search recall,
• Brand names preservation, and
• English query preservation (e.g. for XEN)
• Out-Of-Vocabulary rate, position-independent error rate (PER), BLEU
eBay MT 17
Query MT
vitesse speed
duplo double
rei king
Query MT
e cigarette and cigarette
cobra snake
car because сумки из
натуральной кожи
#search
results
bags of genuine leather 161
genuine leather bag 47,097
FR
PT
PT PT
PT
FR
RU
– Human evaluations:
• Acceptability (QT), 1-5 ratings (ITT), internal release criteria
Quality assurance (cont’d)
eBay MT 18
ENPT
Item title Triumph Stag Wind Deflector
Bing translation Defletor de vento veado de triunfo
eBay translation Triumph Stag defletor do vento
ENPT
Item title Authentic Coach Ladies Purse MEDIUM
Bing translation Autêntico treinador senhoras bolsa médio
eBay translation Authentic Coach Senhoras Bolsa Médio
Summary
– eBay Machine Translation:
• Moses core
• Complex orchestration layer
• Optimized for speed & quality
• eBay-specific evaluation criteria
– Coordination among 3 teams:
• MT Engineering
• MT Applied Science
• Localization, MT language specialists
– Analytics and monitoring:
• User behavioral data
• System health
eBay MT 19
Discussion
eBay MT 20
top related