human language technologies in a multilingual europe
TRANSCRIPT
Georg [email protected]
DFKI GmbH, Language Technology Lab – Berlin, GermanyMETA-NET, General Secretary
Human Language Technologiesin a Multilingual Europe
Outline• Multilingual Europe
• Analysis I: Technology Support for Europe’s Languages
• Analysis II: Status and Current Developments
• Example: LT for the Digital Single Market
• Missions and Opportunities
• Towards the Human Language Project
2EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
• Multilingualism is at the very heart of the European idea.
• 24 EU languages – all languages have the same status.
• Dozens of regional and minority languages as well as languages of immigrants and trade partners.
• Economic challenges: – If the DSM is not multilingual, there will be 20+ isolated markets!
– Language barriers are market barriers!
• Social and public challenges:– Empower all citizens to use their mother tongues.
– Provide multilingual digital public services.
– Enable cross-border, cross-lingual, cross-cultural communication. Towards a European public sphere and e-participation.
– Restore trust in media (fake news debate, filter bubble issue etc.)
Analysis I: Technology Support for Europe’s Languages
4EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
q
60 research centres in 34 countries (founded in 2010)Chair of Executive Board: Jan Hajic (CUNI)Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde) General Secretary: Georg Rehm (DFKI)
q
Multilingual Europe Technology Alliance.826 members in 67 countries
(published in 2013) (31 volumes; published in 2012)
T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET
q Basqueq Bulgarian*q Catalanq Croatian*q Czech*q Danish*q Dutch*q English*q Estonian*q Finnish*q French*
q Galicianq German*q Greek*q Hungarian*q Icelandicq Irish*q Italian*q Latvian*q Lithuanian*q Maltese*q Norwegian
q Polish*q Portuguese*q Romanian*q Serbianq Slovak*q Slovene*q Spanish*q Swedish*q Welsh
* Official EU languagehttp://www.meta-net.eu/whitepapers
MT
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian, Italian, Polish,
Romanian
weak or no support through LT
Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish,
Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh
excellent
Czech, Dutch, Finnish, French, German, Italian,
Portuguese, Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Danish, Estonian, Galician,
Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish
weak or no support through LT
Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian,
Welsh
excellent
English
good
Spee
ch
English
good
Dutch, French, German, Italian,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan,Czech, Danish, Finnish,
Galician, Greek, Hungarian, Norwegian, Polish,
Portuguese, Romanian, Slovak, Slovene, Swedish
weak or no support through LT
Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese,
Serbian, Welsh
excellent
English
good
Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish,
Swedish
moderate fragmentary
Basque, Bulgarian, Catalan, Croatian, Danish, Estonian,
Finnish, Galician, Greek, Norwegian, Portuguese,
Romanian, Serbian, Slovak, Slovene
Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh
weak or no support through LTexcellent
Res
ourc
esTe
xt A
naly
tics
Fragmentary
Weak/none
Moderate
Good
Excellent
Welsh
Maltese
Lithuanian
Latvian
Icelandic
Irish
Croatian
Serbian
Estonian
Slovene
Slovak
Roma
nian
Norwegian
Greek
Galician
Danish
Bulgarian
Basque
Swedish
Portu
guese
Finnish
Catal
anPo
lish
Hung
arian
Czech
Italia
nGe
rman
Dutch
Span
ishFre
nch
Engli
sh
Leve
l of s
uppo
rt
Languages with names in redhave little or no MT support
Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)
Important: even current state of the art technologies are far from being perfect!
Important: 20+ European languages areseverely under-supported and face the danger of digital extinction.
Excellent
Good
Moderate
Fragmentary
Weak/nosupport
Lang
uage
Tech
nolo
gy Su
ppor
tM
illions of Native Speakers (Worldwide)
Yiddis
h
Welsh
Vlax R
oman
i
Turki
sh
Scot
s
Roma
ny
Occit
an
Malte
se
Mace
donia
n
Luxe
mbou
rgish
Lithu
anian
Limbu
rgish
Latvi
an
Icelan
dicFri
ulian
Frisia
n
Breto
n
Bosn
ian
Astu
rian
Alban
ian Irish
Croati
an
Serb
ian
Hebr
ew
Esto
nian
Slove
ne
Slova
k
Romanian
Norw
egian
Gree
k
Galic
ian
Danis
hBu
lgaria
n
Basq
ue
Swed
ish
Portu
gues
e
Finnis
h
Catalan
Polish
Hungarian
Czec
h
Italian
German
Dutch
Spanish
French
English
0
50
100
150
200
250
300
350
400
Source: Georg Rehm, Hans Uszkoreit, Ido Dagan, Vartkes Goetcherian, Mehmet Ugur Dogan, Coskun Mermer, Tamás Váradi, Sabine Kirchmeier-Andersen, Gerhard Stickel, Meirion Prys Jones, Stefan Oeter, and Sigve Gramstad. An Update and Extension of the META-NET Study “Europe's Languages in theDigital Age”. In Proceedings of the Workshop on Collaboration and Computing for Under-Resourced Languages in the Linked Open Data Era (CCURL 2014), Reykjavik, Iceland, May 2014.
Analysis II: Status and Current Developments
10EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
• Multilingual Europe: our languages enjoy equal status yet digital extinction of the majority of European languages is a very severe danger.
• Language Technology Research and Innovation in Europe: World class research, excellent results (examples: Moses, recent NMT results of QT21), strong SME base, thousands of LSPs; fragmentation; need for coordination.
• Big need for high-quality, high-coverage, precise, robust, deployable Language Technologies: translation, conversational interfaces, text and media analytics, personal assistants, multilingual DSM etc.
• Artificial Intelligence: Important breakthroughs and massive investments in R&D and applications (mostly in US and Asia) – huge opportunity for Europe!
• The European Language Challenge cannot be abandoned or outsourced.
Ø Europe must not make its digital infrastructure dependent on non-European solutions. This is why the EU is building GALILEO as an alternative to GPS, GLONASS, Bei Dou.
• Big need for Language Technologies made in Europe for Europe!
Status and Current Developments
11EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
!
Example: Language Technology for the
Digital Single Market
12EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
q Top priority in the European Union.
q Expected to add 400b€ to European GDP and hundreds of thousands of new jobs.
q Unfortunately, the language topic is not included in the EC’s Digital Single Market strategy (published in May 2015).
MDSM: Needed Applications
q Crosslingual SME presales communication and aftersales servicesq Multilingual websites, product catalogues, product descriptionsq Crosslingual business intelligence (e.g., based on UGC)q Crosslingual communication for SMEs, public institutions, citizensq Multilingual (big) data, language and knowledge value chainsq Multilingual knowledge bases and knowledge graphs (and services)q Multilingual conversational interfaces for connected devices (IoT)q Crosslingual social media analytics for EU-wide societal issuesq Multilingual text and report generation (knowledge/data to text)q All services must be domain-adaptable (avoid one size fits all)q etc.
15EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
Multilingual Value Programme
q Multilingual Value Programe§ Suggested three-year programme§ Requires modest investment
q “Enabling the Multilingual Digital SingleMarket through technologies fortranslating, analysing, processing andcurating natural language content”
q Three components address the main needs of the Multilingual DSM (MDSM)and how to put them into practice:1. Multilingual Application Areas2. Multilingual Services3. Research
Strategic Research and Innovation Agenda
Language as a Data Type and Key Challenge for Big Data
Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing
and curating natural language content
SRIA Editorial Team
Version 0.9 – July 2016
16EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
Version 1.0 to be published in 2017
Missions and Opportunities
17EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
Missions and Opportunities• Languages & European Society: Enable all European citizens
to communicate and operate in their mother tongues (online & offline).
• Languages & Media: Address – technologically – the massively increasing social, political and commercial relevance of content and communication (fake news debate, filter bubble challenge).
• Languages & Market: Realise the Multilingual DSM, including multilingual content, crosslingual text analytics, multilingual generation.
• Languages & Digital Tech: Future-proof our languages.
• Languages & Devices: Robust, precise, high-quality spoken language interfaces for billions of connected things – and all languages.
• Excellent opportunity for Europe, European research, European education, European industry, European innovation, European culture!
• Goal: Move Europe into the pole position in this field!
18EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
Towards theHuman Language Project
19EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
Multilingual Europe through
Technology
Multilingual Strategy of the EU: more tech
support for multilingualism
Language Technologies for Europe's digital public
services
Technologies for the
Multilingual Digital Single
Market
Language Technologies for Big Data text analytics
The Human Language
Project – long-term R&D&I, post-H2020
Language Technologies
R&D&I (H2020, WP
2018-20)
Multilingual Europein January 2017
Strategic Research and Innovation Agenda
Language as a Data Type and Key Challenge for Big Data
Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing
and curating natural language content
SRIA Editorial Team
Version 0.9 – July 2016
Open calls andupcoming servicecontracts
Dec. 2016: EC brainstormingmeeting on future LT prioritiesin Horizon 2020 and FP9.Need for a new strategy paper?
Jan. 2017: STOA workshop and study on LT for Europe
Dec. 2017: LT Sessionat BDVA Summit inValencia
2017: MDSM SRIA V1.0
Policy change and initiative towards a European digital public sphere enabled by MT/LT
DG CONNECT
DGT andDG CONNECT
DG CONNECT
WP 2018-20 (incl. IoT, I4.0, assistants, robots etc.)
Shared programmebetween EU and MS Suggested MLV Programme
Strategic Research and Innovation Agenda
Language as a Data Type and Key Challenge for Big Data
Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing
and curating natural language content
SRIA Editorial Team
Version 0.9 – July 2016
CEF ATELRC
Multilingual Europe through
Technology
Multilingual Strategy of the EU: more tech
support for multilingualism
Language Technologies for Europe's digital public
services
Technologies for the
Multilingual Digital Single
Market
Language Technologies for Big Data text analytics
The Human Language
Project – long-term R&D&I, post-H2020
Language Technologies
R&D&I (H2020, WP
2018-20)
Multilingual Europein January 2017
Strategic Research and Innovation Agenda
Language as a Data Type and Key Challenge for Big Data
Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing
and curating natural language content
SRIA Editorial Team
Version 0.9 – July 2016
Open calls andupcoming servicecontracts
Dec. 2016: EC brainstormingmeeting on future LT prioritiesin Horizon 2020 and FP9.Need for a new strategy paper?
Jan. 2017: STOA workshop and study on LT for Europe
Dec. 2017: LT Sessionat BDVA Summit inValencia
2017: MDSM SRIA V1.0
Policy change and initiative towards a European digital public sphere enabled by MT/LT
DG CONNECT
DGT andDG CONNECT
DG CONNECT
WP 2018-20 (incl. IoT, I4.0, assistants, robots etc.)
Shared programmebetween EU and MS Suggested MLV Programme
Strategic Research and Innovation Agenda
Language as a Data Type and Key Challenge for Big Data
Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing
and curating natural language content
SRIA Editorial Team
Version 0.9 – July 2016
CEF ATELRC
Observations:
• Current initiatives are too small and unbalanced; they concentrate on innovation and technology deployment.
• Danger to loose touch with research and novel, potentially paradigm-shifting developments.
• Difficult to kick-start new, paradigm-shifting research.
• We need a coordinated, concerted and consolidated push in basic research, applied R&D and innovation!
Human Language Project – Interdisciplinary R&D&I Programme
Basic Research
•Results in new methods, approaches
Applied R&D
•Results in novel technologies
Innovation
•Results in novel or improved products or services
Research Themes – Needs and Gaps (market-driven)
• Computational Linguistics• Artificial Intelligence• Language Technology• Linguistics• Computer Science• Cognitive Science• other related fields
• New, groundbreaking methods, paradigms, approaches
• Foster technologies, products, innovation, economy
• Foster education
HLP: Umbrella programmeto turbo-charge and to
coordinate all European R&D&I activities in a
systematic way including EP, EC, Member States.
22EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
Human Language Project• Goal: Deep Natural Language Understanding.
• Breakthroughs in Artificial Intelligence plus a fresh look at Linguistics for the Next Generation of LT!
• All official European and many additional languages
• Broad coverage, high quality, high precision
• Across modalities: text, text types, speech, image, video etc.
• Across platforms: messaging, telephony, social, mobile, IoT etc.
• Across cultures: knowledge, customs, formalities, humour, emotion, subjectivity, biases, opinions, filter bubble etc.
23EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
Human Language Project• Collaboration and coordination between EC, EP,
Member States and all other stakeholders.• Mix of funding sources:
– EU projects: Horizon 2020 (WP 2018-2020) + FP9 (2021+)– National/regional funding sources
• Setup: basic research, applied research, innovation, commercialisation – tightly intertwined
• Timeframe: 10 years • Policy change towards “LT-enabled multilingualism” • Public procurement: EU/EC, MS administrations
should demand certain language technologies
24EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
HLP Topics: Key Ingredients for Future European LT Research
Artificial Intelligenceincluding cognition, perception, vision,
cross-modal, cross-platform, cross-culture, IoT etc.
Machine Learning
Language Technology
• Extend knowledge bases• Semantic Web, ontologies,
linked data, interoperability • More complex models• Multilingual resources that
are grounded, extensible• Subjectivity, objectivity,
further novel dimensions• Web-scale reasoning
• Combine DNNs and symbolic processing
• ML for knowledge acquisition and extension
• DNNs embedded into modular systems including symbolic knowledge bases
• Make it possible to inspect and also to optimise DNNs (beyond end-to-end)
• (Computational) Linguistics research towards deep language understanding
• From corpora to DNNs to annotated data to highly improved symbolic methods
• Language portability• Full and Deep Language
Understanding by 2030 –Human Language Project
Knowledge Technology
25EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)
Human Language
Project
Truly Multilingual
Europe
European Economy (MDSM)
Attractive jobs for
high potentials
Education and young
researchers
Massive boost for research
Foster innovation and new
companies
26
Thank you!
Georg [email protected]
Human Language
Project
Truly Multilingual
Europe
European Economy (MDSM)
Attractive jobs for
high potentials
Education and young
researchers
Massive boost for research
Foster innovation and new
companies