Georg Rehm
German Research Center for Artificial Intelligence (DFKI) GmbH
Language Technology Lab – Berlin, Germany
META-NET, General Secretary
Towards a Human Language
Project for Multilingual Europe
AI and Interpretation
SCIC Universities Conference (19/20 April 2018) 2
SCIC Universities Conference (19/20 April 2018)
Data Intelligence
Current breakthroughs based on Machine Learning (“Deep Learning”)Also still in use: symbolic, rule-based methods and systems
Artificial Intelligence
• Huge data sets + powerful algorithms + extremely fast hardware
• Enormous potential for disruptions in all sectors and areas
3
• Since approx. 2015, with breakthroughs in neural technolo-
gies, Machine Translation has been getting better and better.
• All areas of AI look for “super-human performance” but
language is fundamentally different and much more complex.
• Neural AI approaches cannot understand language, they
process it according to huge underlying data sets.
• In many use cases, mistakes can be tolerated.
• But: translation and interpretation are often mission-critical!
• Mistakes can have serious consequences (politics, medicine).
Translation and Interpretation
SCIC Universities Conference (19/20 April 2018) 4
• Example: Lecture Translator
– University lectures are automatically transcribed and translated,
in near-real time, into several languages
– Students can follow the translation through a web interface
• Example: Presentation Translator
– Presenter can have the speech automatically translated
– Translations are displayed as subtitles
• Example: Call Translator
– Internet telephony provider offers automatic voice translation
Speech Translation
SCIC Universities Conference (19/20 April 2018) 5
• The three example applications work surprisingly well for
general-domain language and input. But:
– They are far from being perfect.
– They aren’t robust.
– They cannot cope with unforeseen situations.
– They cannot understand language as humans do.
– They are not (yet?) suited for conference interpretation.
➢ Limitations as regards their fields of application.
• Interpretation is often mission-critical.
➢ Human interpreters won’t be replaced anytime soon.
Issues and Limitations
SCIC Universities Conference (19/20 April 2018) 6
SCIC Universities Conference (19/20 April 2018) 7
https://slator.com/features/ai-interpreter-fail-at-china-summit-sparks-debate-about-future-of-profession/
• LT in Europe: World class research, strong SME base, thousands
of LSPs; immense fragmentation; need for coordination.
• Need for High-Quality LT: translation, interpretation, MDSM etc.
• The European Language Challenge cannot be – it must not be –
abandoned or outsourced!
➢ Need for Language Technology, made in Europe, for Europe!
➢ STOA Workshop in the EP (January 2017): “Language equality in
the digital age – towards a Human Language Project”
LT – Current Developments
SCIC Universities Conference (19/20 April 2018) 8
• Goal: Deep Natural Language Understanding by 2030
• Vision: EU FET Flagship Project (10+ years)
• Broad coverage, high quality, high precision
• Create approaches, algorithms, data sets, resources
• Across modalities: text, text types, speech, video etc.
Artificial Intelligenceincluding cognition, perception, vision,
cross-modal, cross-platform, cross-culture etc.
Machine Learning
Language TechnologyLinguistics
SCIC Universities Conference (19/20 April 2018)
Human Language Project
9
Summary & Conclusions• AI is disrupting all industries – including translation
and, increasingly, also interpretation.
➢ But: perfect, robust, precise language technologies (incl.
written/spoken MT and interpretation) are still far away.
• Linguists are increasingly needed – new profiles emerging
➢ The machine will support human experts and help them
become more efficient – it will not replace them.
• The Human Language Project is still a vision. Its goal:
develop new breakthroughs in Language Technology.
SCIC Universities Conference (19/20 April 2018) 10
Recommendation• SCIC Speech Repository
• 4,000 speeches (3,000 public + 1,000 private)
• Extremely interesting data set and language resource for
Language Technology researchers!
• Many R&D groups currently work on TED talk data sets
• Recommendation: establish bridges between SCIC
and research groups for spoken language translation
• Help build the next generation of AI tools for interpreters
• AI tools that are tailored to the needs and wishes, topics
and domains of conference interpreters in the EC/EP
SCIC Universities Conference (19/20 April 2018) 11
Thank you!
Dr. Georg Rehm
DFKI Berlin
👉🏻 http://de.linkedin.com/in/georgrehm
👉🏻 https://www.slideshare.net/georgrehm
SCIC Universities Conference (19/20 April 2018) 12
Strategic Research and Innovation Agenda
Language Technologies for
Multilingual Europe
Towards a Human Language Project
SRIA Editorial Team
Version 1.0 – December 2017
• Multilingualism is at the heart of the European idea
• 24 EU languages – all have the same status
• Dozens of regional and minority languages as well as
languages of immigrants and trade partners
• Many economic and social challenges:
– The Digital Single Market needs to be multilingual
– Cross-border, cross-lingual, cross-cultural
communication
60 research centres in 34 countries (founded in 2010)
Chair of Executive Board: Jan Hajic (CUNI)
Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde)
General Secretary: Georg Rehm (DFKI)
Multilingual Europe
Technology Alliance.
826 members in
67 countries
(published in 2013) (31 volumes; published in 2012)
T4ME (META-NET) CESAR METANET4UMETA-NORD
Basque
Bulgarian*
Catalan
Croatian*
Czech*
Danish*
Dutch*
English*
Estonian*
Finnish*
French*
Galician
German*
Greek*
Hungarian*
Icelandic
Irish*
Italian*
Latvian*
Lithuanian*
Maltese*
Norwegian
Polish*
Portuguese*
Romanian*
Serbian
Slovak*
Slovene*
Spanish*
Swedish*
Welsh
* Official EU languagehttp://www.meta-net.eu/whitepapers
MT
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German,
Hungarian, Italian, Polish,
Romanian
weak or no support through LT
Basque, Bulgarian, Croatian,
Czech, Danish, Estonian, Finnish,
Galician, Greek, Icelandic, Irish,
Latvian, Lithuanian, Maltese,
Norwegian, Portuguese, Serbian,
Slovak, Slovene, Swedish, Welsh
excellent
Czech, Dutch,
Finnish, French,
German, Italian,
Portuguese,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan,
Danish, Estonian, Galician,
Greek, Hungarian, Irish,
Norwegian, Polish, Serbian,
Slovak, Slovene, Swedish
weak or no support through LT
Croatian, Icelandic, Latvian,
Lithuanian, Maltese, Romanian,
Welsh
excellent
English
good
Sp
ee
ch
English
good
Dutch, French,
German, Italian,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan,
Czech, Danish, Finnish,
Galician, Greek, Hungarian,
Norwegian, Polish,
Portuguese, Romanian,
Slovak, Slovene, Swedish
weak or no support through LT
Croatian, Estonian, Icelandic, Irish,
Latvian, Lithuanian, Maltese,
Serbian, Welsh
excellent
English
good
Czech, Dutch,
French, German,
Hungarian, Italian,
Polish, Spanish,
Swedish
moderate fragmentary
Basque, Bulgarian, Catalan,
Croatian, Danish, Estonian,
Finnish, Galician, Greek,
Norwegian, Portuguese,
Romanian, Serbian, Slovak,
Slovene
Icelandic, Irish, Latvian,
Lithuanian, Maltese, Welsh
weak or no support through LTexcellent
Re
so
urc
es
Te
xt
An
aly
tic
s
Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg,
New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)
Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg,
New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)
We carried out the study in 2010/2012. While support
for many languages has improved in the meantime,
the overall picture remains mostly the same.