machine translation: the neural frontier

Machine Translation The Neural Frontier

John Tinsley

GALA, Amsterdam, March 2017

Source: http://nlp.stanford.edu/projects/nmt/Luong-Cho-Manning-NMT-ACL2016-v4.pdf

What we’re actually going to cover this morning!

How does it work?

What’s all the fuss about?

“Neural machine translation is ______.”

What is the status as of today?

Is it really that good?

What does all this mean for the future?

What they actually said... “In some cases human and GNMT translations are nearly indistinguishable on the relatively simplistic and isolated sentences sampled from Wikipedia and news articles for this experiment.”

What was reported...

MT developers around the world

Evolution or

Revolution?

Source: (modified from) http://nlp.stanford.edu/projects/nmt/Luong-Cho-Manning-NMT-ACL2016-v4.pdf

Rule Based Statistical Neural

A brief history of MT…

“State of the Union”

The initial splash made by

statistical MT

The initial splash made by neural

MT

wow that’s pretty good!

We’re about here now

March 27th 2007

This is where the excitement is coming from

Statistical Machine

Translation

MT

Qua

lity

NeuralMachine

Translation

20+ years worth of research

?

Neural machine translation is exciting!

Neural machine translation is the future

Neural machine translation is ultimately just another type of MT

Neural machine translation is not going to replace human translators

Neural machine translation is not a silver bullet

Still early stage

Language independent

Fundamental practical considerations not yet addressed

Neural Machine Translation March 27th 2017

Generic applications only

No flexibility for customisation

Significant hurdles for cost-effective scalable production performance

Academia Industry

Output can be insanely fluent!

Source:https://www.nytimes.com/2016/12/14/magazine/the-great-ai-awakening.html

They needed more computers — “G.P.U.s,” graphics processors reconfigured for neural networks — for training…

“Should we ask for a thousand G.P.U.s?”

“Why not 2,000?”

Ten days later, they had the additional 2,000 processors.

Is it really that good? (Yes, it can be!)

• “Yeah it looks better”Anecdotal

• Generally, neural is better*• More obviously so for complex languages• It falls over badly on long sentences

Academic

• Stark improvements for Chinese and Arabic• Comparable performance on other

languagesWIPO

What evaluations are out there?

WIPO large scale apples-to-apples comparison

English to Chinese

Arabic to Chinese

Spanish to Chinese

French to Chinese

• “Yeah it looks better”Anecdotal

• Generally, neural is better*• More obviously so for complex languages•  It falls over badly on long sentences

Academic

• Stark improvements for Chinese and Arabic• Comparable performance on other languagesWIPO

• Practical comparison with production MT• Mixed results depending on content type• Clear strengths and weaknesses emerging

Iconic

What evaluations are out there?

Real-world languages and content

Chinese to English patents, mature production engine, highly tuned.

“Real-world” comparative use case

Apples to apples comparison

Access to same training data, test data, including all of the ugly parts.

Effective qualitative evaluation

No one-size-fits-all, so what MT good and what and where does it fall down?

Short Sentences

All Sentences

u  Iconic Production MT

u  Iconic Neural MT

Neural MT works – and it’s good!

It is not a silver bullet

+ word order + agreement -  omitting phrases

+ terminology + error free output -  sentence structure

New Opportunities = New Challenges

Black Box Customisation Production “Why is this error happening?”

“Can you fix this error please?”

“How much is that GPU??!”

Data Evaluation Pricing Still needed, now more than ever!

Do we know how to quantify “quality”?

How much does it cost now?

Old Challenges

Short term •  Research which takes time•  More effective use of general machine translation

2-5 years •  Emerging use cases, new types of hybrid, and clarity

Longer term •  “Zero-shot” translation?

What does this mean for the future?

Rule-basedStatistical

Neural

You are here

1st Recurrent

NeuralNetwork

2nd

RecurrentNeural

Network

0.0342034233.3434234232.2342352340.4534234230.0023402342.2342342345.0232342343.3423423550.0342034233.343423423

“GORAIBH MAITHAGAT”

“THANK YOU”

Encoder Decoder

EncodedSentenceGaelic

Input EnglishOutput

Memory of previously translated words influence

next result

Thank you!

P.S. This is kind of how neural machine translation works…

[email protected] @johntins

machine translation: the neural frontier

Technology