h2o world - machine translation in mobile games - nikhil bojja

15
H20 World 2015 MT in Mobile Games: Social Media Text Normalization with incentivized feedback 1

Upload: srisatish-ambati

Post on 06-Jan-2017

706 views

Category:

Software


1 download

TRANSCRIPT

H20 World 2015

MT in Mobile Games:

Social Media Text Normalization with incentivized feedback

1

An MMORPG Mobile Game Ø  Game of War: Fire Age o  A Massively Multiplayer Online Role Playing Game

Ø  Millions of Downloads, Top grossing game on iOS and Android

2

Why Translation? Ø  A truly global ‘One World’

Ø  Only 75% of players speak English o  25%: Japanese, Russian, French, Spanish, German, Turkish,

Portuguese and others

Ø  Seamless communication drives Player Engagement

Ø  Tighter community - Long Term Relationships

3

“Hi  guys  a  cyclone  is  due  to  hit  us  in  about  4  hours.  Im  unsure  how  electricity  will  be  affected.  Im  peaced  for  2  days.  If  you  see  my  accounts  unpeaced  keep  an  eye  on  them  as  it  means  im  unable  to  access  the  game.  wish  

me  luck"  

Realtime MT

4

Need for Nrmlizatn Ø  Mobile game players use a lot of slang

o  Hard to type on mobile devices o  Players are busy playing the game

Ø  Constantly evolving language

Ø  Regional, Linguistic influences

o  Maori Slang o  Emoticons and Emoji

Ø  Slang text effects Machine Translation accuracy

Ø  Hence MZ Transformer – A Social Media Text Normalization system

5

The Data Problem Ø  Hard to get a parallel corpus of Slang-

Grammatical text

Ø  Hard to compile Abbreviation lists, Spelling error lists for non-English languages

Ø  Expensive to create

Ø  Domain varies from Microblogs

6

Avg.  Tweet  Length   Avg.  GoW  chat  length  

73.51  characters   34.43  characters  

Crowdsourcing + Game Economy Ø  In-game currency buys Virtual goods

Ø  Competitive game

Ø  Crowdsource parallel corpus creation

Ø  In-game items and/or currency as rewards

Ø  Steady stream of normalization training data

Ø  Across languages and over time

Ø  Rate modification with game economy

7

8

9

Crowdsourced Training Corpus Ø  1-best hypothesis selected from data collected

Ø  Incentivized feedback loop

10

Source  Phrase   Response  Received   Num.  Players  

yo  wasup  zack  ..  i  just  wakey    

Yo,  what’s  up  Zack?  I  just  woke  up.   1013  

Hi,  what’s  up  Zack?  I  just  woke  up.   327  

Hey,  what’s  up  Zack?  I  just  woke  up.   133  

What’s  up  Zack?  I  just  woke  up.   61  

To  what’s  up  Zack?  I  just  work  up.   12  

Yo  what’s  up  Zack.  I  just  awoke   3  

MZ Transformer Ø  Normalization as a pre-step before MT o  Abbreviations o  Spelling Errors o  Phrase pairs o  HMM based Text Normalization system

o Word alignment

o Phrase based Text normalization system built on a parallel corpus

o P(gramm_phrase/slang_phrase) o HMM decoder to generate target language

o Language model

11

Bleu score Improvements

12

Source  Language   Target  Language   w/o  MZ  Transformer   w/  MZ  Transformer  

Spanish   English   37.82   39.77  

English   Spanish   31.29   32.87  

French   English   46.30   47.73  

English   French   31.90   33.19  

German   English   41.02   43.98  

English   German   26.92   26.96  

Portuguese   English   50.94   52.13  

English   Portuguese   38.09   38.12  

Russian   English   38.64   40.17  

English   Russian   24.80   25.43  

Conclusions Ø  Normalization helps!

Ø  Crowdsourced data collection is low cost, faster

Ø  More Normalization layers and training data – higher improvement

Ø  lol à mdr

Ø  Using 10-best instead of 1-best caused overfitting

Ø  Normalization across languages

13

Future Work Ø  Crowdsourcing system can be used for: o  Text Translation o  Speech Transcription o  mTurk style tasks

Ø  Collect data in resource poor languages o  Bulgarian, Malay, Slovak, Ukrainian

14

H2O World 2015

Questions?

15