Download - Multilingualism for Digital Europe
META-NET has received funding from the EU’s Horizon 2020 research and innovation programme through the contract CRACKER(grant agreement no.: 645357). Formerly co-funded by FP7 and ICT PSP through the contracts T4ME (grant agreement no.: 249119), CESAR (grant agreement no.: 271022), METANET4U (grant agreement no.: 270893) and META-NORD (grant agreement no.: 270899).
Multilingualismfor Digital Europe
Georg RehmGeneral Secretary META-NET, Coordinator CRACKER
DFKI, [email protected]
Ringvorlesung Digitale Lebenswelten – Universität Hildesheim, 15th November 2016
Outlineq A Multilingual Europe Initiative: META-NET
§ LT Support – META-NET White Paper Series
§ LT Strategy – META-NET SRAq Continuing the Initiative – Recent Developments
§ The Digital Single Market and Multilingualism
§ Cracking the Language Barrier
§ META-FORUM 2015/2016 – MDSM SRIA V0.5/V0.9q Goals and Next Steps
http://www.meta-net.eu 2
META-NET and META:Brief History
http://www.meta-net.eu 3
Multilingual Europe in 2010
4http://www.meta-net.eu
q Challenge: Providing each language community with the most advanced technologies for communication and information so that maintaining their mother tongue does not turn into a disadvantage.
q While research has made considerable progress in recent years, the pace of progress is not fast enough to meet the challenge within the next 10-20 years.
q All stakeholders – researchers, LT industries, policy makers, language communities, funding programmes – should
team up in a strategic alliance for a major dedicated push.
q
60 research centres in 34 countries (founded in 2010)Chair of Executive Board: Jan Hajic (CUNI)Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde) General Secretary: Georg Rehm (DFKI)
q
Multilingual Europe Technology Alliance.826 members in 67 countries
(published in 2013) (31 volumes; published in 2012)
T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET
META-NETWhite Paper Series
http://www.meta-net.eu 6
q Basqueq Bulgarian*q Catalanq Croatian*q Czech*q Danish*q Dutch*q English*q Estonian*q Finnish*q French*
q Galicianq German*q Greek*q Hungarian*q Icelandicq Irish*q Italian*q Latvian*q Lithuanian*q Maltese*q Norwegian
q Polish*q Portuguese*q Romanian*q Serbianq Slovak*q Slovene*q Spanish*q Swedish*q Welsh
* Official EU languagehttp://www.meta-net.eu/whitepapers
Cross-Lingual Comparison
q 1. Machine Translation 2. Text Analytics3. Speech Processing/Synthesis 4. Language Resources
q Ranking: from excellent LT support to weak/no LT support.q Cross-lingual comparison discussed and finalised at a network
meeting with representatives of all languages (Oct., 2011).
http://www.meta-net.eu 8
MT
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian, Italian, Polish,
Romanian
weak or no support through LT
Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish,
Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh
excellent
Czech, Dutch, Finnish, French, German, Italian,
Portuguese, Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Danish, Estonian, Galician,
Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish
weak or no support through LT
Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian,
Welsh
excellent
English
good
Spee
ch
English
good
Dutch, French, German, Italian,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan,Czech, Danish, Finnish,
Galician, Greek, Hungarian, Norwegian, Polish,
Portuguese, Romanian, Slovak, Slovene, Swedish
weak or no support through LT
Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese,
Serbian, Welsh
excellent
English
good
Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish,
Swedish
moderate fragmentary
Basque, Bulgarian, Catalan, Croatian, Danish, Estonian,
Finnish, Galician, Greek, Norwegian, Portuguese,
Romanian, Serbian, Slovak, Slovene
Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh
weak or no support through LTexcellent
Res
ourc
esTe
xt A
naly
tics
Fragmentary
Weak/none
Moderate
Good
Excellent
Welsh
Maltese
Lithuanian
Latvian
Icelandic
Irish
Croatian
Serbian
Estonian
Slovene
Slovak
Roma
nian
Norwegian
Greek
Galician
Danish
Bulgarian
Basque
Swedish
Portu
guese
Finnish
Catal
anPo
lish
Hung
arian
Czech
Italia
nGe
rman
Dutch
Span
ishFre
nch
Engli
sh
Leve
l of s
uppo
rt
Languages with names in redhave little or no MT support
Results of the META-NET White Paper Study (2012)
Observations and Results
http://www.meta-net.eu 11
q When it comes to technology support, there are massive differences between Europe’s languages and technology areas.
q Support for English is ahead ofany other language.
q But: even support for English is far from being perfect.
q Several languages get the weakest score in all four areas (e.g., Icelan-dic, Latvian, Lithuanian, Maltese)!
Digital Language Extinction!
q “At Least 21 European Languages in Danger of Digital Extinction!”
q Press release on European Day of Languages (Sept. 26, 2012).
q Huge global interest in the topic and our key findings!
q 600+ mentions in the press.
q News from 40+ countries in 35+ different languages.
q 20+ television reports and 30+ broadcast interviews (radio, tv) with META-NET representatives.
q Two Parliamentary Questions in the EP on the “digital extinction of languages” topic.
q These results lead to a STOA Workshop in the EP (Dec. 3, 2013).
http://www.meta-net.eu 12
Af Flemming Steen Pedersen// [email protected]
Langt flere kræftpatienter i hovedstadsområ-det skal behandles hurtigt og uden forsinkel-ser.
Det skal være slut med, at undersøgelse og behandling trækker i langdrag og overskrider de tidsfrister, som fagfolk har fastsat for at give patienterne de optimale chancer for at over-leve den frygtede sygdom.
Det er målet, når politikere i Region Hoved-staden nu lægger op til at udmønte en pulje på 32 mio. kr. til at øge personalet og udvide behandlingskapaciteten på kræftområdet på en række af regionens hospitaler.
Pengene kommer, efter at regionen er blevet kritiseret for, at alt for mange kræft-patienter er for lang tid om at komme igen-nem systemet. F.eks. er det ifølge den seneste opgørelse kun godt halvdelen af kvinder med brystkræft, som bliver behandlet inden for det fastsatte mål på 18 dage i de såkaldte kræft-pakker.
»Pengene betyder, at der kommer bedre forhold for kræftpatienter. Det er vigtigt, at folk får mulighed for at blive behandlet hur-tigt, så de ikke skal gå rundt og være bekym-rede,« siger formand for kvalitetsudvalget i Region Hovedstaden, Kirsten Lee (R).
Flere får kræft – og flere overleverKonkret er hensigten at udvide den onkologi-ske kapacitet – det vil sige stråle- og kemobe-handlingen – på såvel Rigshospitalet, Herlev Hospital, Hillerød Hospital og Bornholms Hospital.
Desuden sættes der penge af til at øge antal-let af operationer og udvide ambulatorieka-paciteten på det urologiske område på Herlev,
Bispebjerg og Frederiksberg. Foruden pro-blemer med lange ventetider for brystkræft-patienter er der således også patienter med prostatakræft, som venter for længe. På dags-ordenen er også at sikre hurtigere behandling til en tredje gruppe af patienter med hoved-halskræft, hvor et stort antal patienter ligele-des må vente længere end tidsgrænsen på 16 dage.
Udover at tilføre flere penge overvejes det også at indføre såkaldte servicemål for, hvor stor en andel af patienterne der skal i behandling inden for de fastsatte tidsgrænser i kræftpakkerne. Lignende servicemål findes i forvejen i Region Midtjylland og Region Syddanmark og betragtes som et middel til at presse hospitalerne og signalere, at bestemte områder har særlig høj politisk bevågenhed.
I de to regioner er målet, at henholdvis 90 og 95 pct. af patienterne skal igennem syste-met inden for forløbstiderne, og Kirsten Lee forventer, at et eventuelt servicemål i Region Hovedstaden kommer til at ligge på et tilsva-rende niveau.
I Kræftens Bekæmpelse hilser direktør Leif Vestergaard Pedersen det velkomment, at Region Hovedstaden nu bruger 32 mio. kr. til at udvide kapaciteten .
»Det har vist sig, at der er et forbedringspo-tentiale på dette område, og derfor er det godt, at man prioriterer det. Flere og flere får kræft, og flere og flere overlever. Det betyder, at kapa-citeten gradvist skal øges hele tiden. Service-mål er et godt initiativ, og et mål på 90-95 pct. er nok det realistiske, selv om udgangspunk-tet bør være 100 procent,« siger Leif Vesterga-ard Pedersen og tilføjer:
»Men så er det også vigtigt at holde fast i det mål og ikke stille sig tilfreds med, at 80 eller 85 pct. kommer igennem til tiden.« B
Kræft syge skal have hurtigerebehandling
Oprustning. Region Hovedstaden bruger 32 mio. kr. på at øge behandlingskapaciteten.
Af Jens Ejsing// [email protected]
Det danske sprog har det svært i den digitale verden.
Det konstaterer danske sprogforskere- og eksperter i forbindelse med den nye inter-nationale undersøgelse META-NET, der ser nærmere på, hvordan en lang række mindre, europæiske sprog som dansk klarer sig i den digitale verden.
Forskerne fra bl.a. Københavns Universitet og Dansk Sprognævn når frem til, at dansk i fremtiden kan få det endnu sværere i den digitale verden, fordi Google Translate, GPSer, applikationer til smartphones og andre sprog-teknologiske programmer ikke i tilstrækkelig grad formår at behandle de mange nuancer i det danske sprog.
Professor i sprogteknologi på Københavns Universitet, Bolette Sandford Pedersen, mener, at der er brug for en slags digital dansk sprogbank fyldt med data, så bl.a. oversættel-ser bliver så præcise og gode som muligt. Med
hjælp fra sprogbanken kan forskere ifølge professoren hjælpe virksomheder med at for-bedre programmer, der skal håndtere sproglig viden om bl.a. maskinoversættelse, tale-genkendelse og informationssøgning.
Dermed vil der blive længere mellem fejlag-tige oversættelser, som når »hæld olie på pan-den« med Google Translate bliver til »pour oil on the forehead« på engelsk. Oversættelser, der er i værste fald er så upræcise, at danskere ender med at fravælge deres eget sprog i den digitale verden.
Sproghjælp til virksomhederHun anerkender dog, at »teknologien til auto-matiske oversættelser på mange måder er fantastisk«.
»Den er bare ikke god nok, når det gælder dansk,« siger hun:
»Det er som om, at vi i et vist omfang lægger det i hænderne på Google eller andre virk-somheder at afgøre, om dansk skal behandles godt nok eller ej. Men det danske marked er ikke stort for dem. Spørgsmålet er derfor,
Dårlig sprogteknologi truer dansk på nettetOrd. Forskere arbejder på at forbedre danske oversættelser på internettet.
om vi ikke i højere grad selv skal gøre noget for at sikre, at det fornødne datamateriale er til rådighed, så vi får gode oversættelser og anden god sprogteknologi. Det kunne f.eks. være ved, at vi gjorde en indsats for at få opret-tet en sprogbank med en masse beriget mate-riale om dansk.«
»Hvis vi hele tiden oplever, at oversættel-ser er behæftede med fejl, tør vi ikke stole på dem,« siger hun og understreger, at »fejlagtige oversættelser kan føre til store misforståelser«.
Ifølge Dansk Sprognævns direktør, Sabine Kirchmeier-Andersen, kan dårlig sprogtekno-logi have konsekvenser for mange danskere, der ikke er så gode til engelsk.
»Hvis vi har ambitioner om at bruge det danske sprog i fremtidens teknologiske univers, skal der gøres en indsats nu for at fastholde ekspertise og udbygge den viden, vi har,« mener hun:
»Ellers risikerer vi, at kun folk, der taler fly-dende engelsk, vil få glæde af de nye generatio-ner af web-, tele- og robotteknologi, der er på vej.« B
INFOGRAFIK: HENRIK KIÆR / TEKST: FLEMMING STEEN PEDERSEN KILDE: REGION HOVEDSTADEN
De såkaldte kræftpakker, der blev indført i 2008 og 2009 for at sikre de danske kræftpatienter langt hurtigere undersøgelser og behandling, beskriver et standardudrednings- og -behand-lingsforløb. Det vil sige, hvilke undersøgelser og behandlinger der skal udføres, og hvor lang tid der højst må gå med de enkelte aktiviteter. Opgørelser fra Region Hovedstaden viser imidlertid, at en stor del af patienterne ikke behandles inden for de fastsatte tidsgrænser, og at der især er problemer inden for tre kræftsygdomme: brystkræft, hoved- og halskræft og prostatakræft.
Kræftbehandling trækker ud
PROSTATAKRÆFTServicemål: 35-39 dage
24
76
HOVED- OG HALSKRÆFTServicemål: 16 dage
40
60
BRYSTKRÆFTServicemål: 18 dage
4753
Procentdel inden for servicemål
Procentdel uden for servicemål
Sådan læses grafikken:
Positiv udviklingNegativ udvikling
H Der er omkring 80 sprog i EU. For 21 af dem – også dansk – gælder det, at der er store sprogteknologiske mangler, når det gælder bl.a. maskinoversættelse, talegenken-delse og informationssøgning.
H Ifølge en EU-undersøgelse køber et stigende antal europæiske internetbrugere varer eller tjenester på nettet, hvor det sprog, der bliver anvendt, ikke er deres eget. Det gælder over halvdelen af brugerne.
H Over hver tredje anvender et fremmed-sprog til at skrive mail eller indlæg på nettet.
fakta HSprog i Europa
REDIGERET AF JOANNA VALLENTIN. LAYOUT: JACOB FRIIS/ NATIONALT /06. BERLINGSKE / 1.SEKTION / LØRDAG 22.09.2012
38
Στην ψηφιακή εποχή δεν… µιλούν ελληνικά, όπως και αρκετές άλλες ευρωπαϊκές
γλώσσες, σύµφωνα µε πανευρωπαϊ-κή έκθεση µε την υπογραφή 200 και πλέον ειδικών. Η συγκεκριµένη µελέ-τη δηµοσιεύτηκε από το επιστηµονικό δίκτυο ΜΕΤΑ-ΝΕΤ µε αφορµή τη χτε-σινή Ευρωπαϊκή Ηµέρα Γλωσσών.
Για τις ανάγκες της έρευνάς τους, γλωσσολόγοι από 34 χώρες της Γη-ραιάς Ηπείρου βαθµολόγησαν τις διαθέσιµες γλωσσικές υπηρεσίες και δηµιούργησαν ένα «Λευκό Βι-βλίο» για κάθε ευρωπαϊκή γλώσσα. Στη µελέτη τους, οι ειδικοί αναζήτη-σαν µεταξύ άλλων τέσσερα βασικά ηλεκτρονικά εργαλεία, δηλαδή την ύπαρξη αυτόµατης µετάφρασης, τη δυνατότητα φωνητικής αλληλε-πίδρασης και ψηφιακής ανάλυσης κειµένου, ενώ ταυτόχρονα διερευνή-θηκε και η διαθεσιµότητα γλωσσικών πόρων ή πηγών.
Σε πρώτη φάση εξέτασαν τις ιστο-σελίδες που επιτρέπουν στους χρή-στες να κάνουν µεταφράσεις online, όπως, για παράδειγµα, η υπηρεσία του κολοσσού πληροφορικής Google Translate. Την ίδια ώρα, εξετάστηκε και η «επικοινωνία» των ελληνόφω-νων χρηστών µε τις…συσκευές τους, όπως για παράδειγµα η δυνατότητα
να «µιλήσει» κάποιος στο GPS στη µητρική του γλώσσα. Οι ερευνητές κατέληξαν στο συµπέρασµα ότι υπάρχουν τέτοιες συσκευές, αλλά δεν είναι τόσο διαδεδοµένες όσο οι αγγλόφωνες. Το «χρυσό» µετάλλιο κατακτά,
όπως είναι άλλωστε και λογικό, η αγγλική γλώσσα. Οι αγγλόφωνοι χρή-στες έχουν την καλύτερη δυνατή τε-χνολογική υποστήριξη, κάτι το οποίο ευνοεί την περαιτέρω εξάπλωση της γλώσσας. Από «τεχνολογικό απο-κλεισµό» κινδυνεύουν περισσότερο η ισλανδική, η λετονική, η λιθουανική και η µαλτέζικη γλώσσα, ενώ σε λίγο καλύτερη µοίρα βρίσκονται η ελλη-νική, η βουλγαρική, η ουγγρική και η πολωνική, που όπως αναφέρει η έρευνα έχουν «αποσπασµατική» τε-χνολογική υποστήριξη.
«Μέτρια» χαρακτηρίζεται η υπο-στήριξη χρηστών σε ολλανδική, γαλ-λική, γερµανική, ιταλική και ισπανική γλώσσα. Οι επικεφαλής της επιστη-µονικής οµάδας, Χανς Ουζκοράιτ και Γκεόργκ Ρεµ, αναφέρουν χαρακτηρι-στικά: «Υπάρχουν δραµατικές διαφο-ρές στην υποστήριξη της γλωσσικής
τεχνολογίας ανάµεσα στις διάφορες ευρωπαϊκές γλώσσες. Το χάσµα µετα-ξύ “µικρών” και “µεγάλων” γλωσσών ολοένα και διευρύνεται. Πρέπει να εξασφαλίσουµε τον εφοδιασµό των µικρότερων και λιγότερο πλούσιων σε ψηφιακούς πόρους γλωσσών µε τις απαραίτητες βασικές τεχνολογί-ες. ∆ιαφορετικά, οι γλώσσες αυτές είναι καταδικασµένες σε ψηφιακή εξαφάνιση».
Μάλιστα, οι ειδικοί τονίζουν ότι χω-ρίς αποφασιστική δράση οι γλώσσες αυτές δύσκολα θα… επιβιώσουν στον ψηφιακό κόσµου του 21ου αιώνα. Η κ. Μαρία Γαβριηλίδου, µέλος της επι-στηµονικής οµάδας από το Ινστιτούτο
Επεξεργασίας του Λόγου Ερευνητικό Κέντρο Αθηνά, λέει στον «Ε.Τ.»: «Η έρευνα αυτή δεν λέει ότι δεν θα ζήσει η ελληνική γλώσσα ή ότι κινδυνεύει µε εξαφάνιση». Η ειδικός εξηγεί ότι όσο υπάρχουν άνθρωποι που µιλά-νε, γράφουν και επικοινωνούν µε µια γλώσσα, τότε αυτή θα συνεχίσει να υπάρχει. Είναι σηµαντικό, όµως, να έχουν όλοι οι χρήστες τη δυνατότητα να «µιλήσουν» στις µηχανές, όπως τα GPS τους, στα ελληνικά και να έχουν στη διάθεσή τους γλωσσικά εργαλεία ηλεκτρονικών υπολογιστών.
Μεταξύ αυτών των «εργαλείων» είναι οι διορθωτές ορθογραφικών και συντακτικών λαθών, που χρησιµοποι-ούνται καθηµερινά από εκατοντάδες Ελληνες χρήστες και βασίζονται στη γλωσσική τεχνολογία. Παρ’ όλα αυτά, τονίζει ότι η ψη-
φιακή εξάπλωση µιας γλώσσας είναι σηµαντική «∆εν είναι στα χέρια του µέσου χρήστη. Οι εκάστοτε κυβερ-νήσεις, η Ευρωπαϊκή Ενωση και ο ιδιωτικός τοµέας πρέπει να χρηµα-τοδοτήσουν την ανάπτυξη αυτής της τεχνολογίας για όλες τις γλώσσες», αναφέρει και συνεχίζει: «Οι χρήστες, όµως, πρέπει να απαιτούν να υπάρ-χουν και στη γλώσσα τους τα µέσα αυτά και να µην ικανοποιούνται µε τα αγγλικά».
Πέµπτη 27 Σεπτεµβρίου 2012 ΕΛΕΥΘΕΡΟΣ ΤΥΠΟΣ
LifeΠΟΛΛΕΣ ΕΥΡΩΠΑΪΚΕΣ ΓΛΩΣΣΕΣ ΘΕΩΡΟΥΝΤΑΙ ΤΕΧΝΟΛΟΓΙΚΑ… ΞΕΠΕΡΑΣΜΕΝΕΣ
Με ψηφιακή εξαφάνιση κινδυνεύουν τα ελληνικά
ΕΛΕΝΗ ΒΕΡΓΟΥ[email protected]
Η γλώσσα της αποξένωσης…
XX GREEKLISH
Οι αγγλόφωνοι χρήστες έχουν την καλύτερη δυνατή τεχνολογική υποστήριξη, γεγονός που ευνοεί την περαιτέρω εξάπλωση της γλώσσας
ΜΕ GREEKLISH επικοινω-νούν πλέον µέσω µηνυµά-των ή email οι περισσότεροι νέοι της χώρας µας. Παρά το γεγονός ότι τα τελευ-ταία χρόνια υπάρχουν τα γλωσσικά εργαλεία, τα οποία επιτρέπουν τη χρήση της ελληνικής γραµµατο-σειράς, έφηβοι και νέοι ενήλικες φαίνεται ότι δεν έχουν «αγκαλιάσει» αυτές τις τεχνολογίες. Ο καθη-γητής Γλωσσολογίας, κ. Γιώργος Μπαµπινιώτης, λέει στον «Ε.Τ.»: «Τα greeklish είναι πρόβληµα για την ελληνική γλώσσα, ιδίως για ανθρώπους νέας ηλικίας για έναν καθαρά γλωσσικό λόγο. Με τη χρήση των greeklish αποξενώνονται από τη µορφή της λέξης ή όπως λέµε το ετυµολογικό ίνδαλµα που δηλώνεται µε την ορθογραφία της λέξης και συνδέεται και µε τη ση-µασία της λέξης και µε την προέλευσή της». Ο κίνδυνος, µε τον οποίο έρχονται αντι-µέτωποι οι νέοι άνθρωποι, είναι η αποξένωση από τη γραπτή µορφή της γλώσ-σας. Αυτή η «οικειότητα», όµως, βοηθάει και στην κατανόηση της σηµασίας αλλά και την προέλευση της λέξης. «Αυτή η αποξένωση δεν είναι άνευ σηµασίας», αναφέρει ο ειδικός, ο οποίος εξηγεί ότι η διαδικασία της γραφής βοηθάει να εντυπω-θεί η λέξη και να συνδεθεί µε άλλες οµόρριζες λέξεις. «Οταν χρησιµοποιείται αυτή η µορφή επικοινωνίας, κα-ταστρέφονται, ατονούν. ∆εν είναι προς θάνατο, αλλά θα κάνει ζηµιά», αναφέρει ο κ. Μπαµπινιώτης, ο οποίος συµβουλεύει τους χρήστες να επιλέγουν την ελληνική γραµµατοσειρά.
Γιώργος Μπαµπινιώτης.
Date 30 September 2012 Page 16
Copyright material. This may only be copied under the terms of a Newspaper Licensing Agency agreement (www.nla.co.uk) or with written publisher permission. For external republishing rights see www.nla-republishing.com
49KYPIAKH 30 ΣΕΠΤΕΜΒΡΙΟΥ 2012
Η 26η Σεπτεµβρίου έχει καθιε-ρωθεί από το Συµβούλιο τηςΕυρώπης ως η ΕυρωπαϊκήΗµέρα των Γλωσσών, αλλά,
σύµφωνα µε µια νέα ευρωπαϊκή επι-στηµονική έκθεση, οι 21 από τις 30γλώσσες της Ευρώπης -µεταξύ των οποί-ων και η Ελληνική- αντιµετωπίζουν κίν-δυνο ψηφιακής εξαφάνισης. Η έρευνα κρούει τον κώδωνα κινδύ-
νου, καθώς διαπίστωσε ότι η ψηφιακήβοήθεια για τις περισσότερες ευρωπαϊκέςγλώσσες είναι ελλιπής ή απολύτως ανύ-παρκτη για τους χρήστες.
Τις έφαγαν οι κοινέςΗ έκθεση, µε τη µορφή µιας σειράς
Λευκών Βίβλων (µε τίτλο «Γλώσσες στηνΕυρωπαϊκή Κοινωνία της Πληροφορίας»),από το επιστηµονικό δίκτυο ΜΕΤΑ-ΝΕΤ, το οποίο συνενώνει 60 ερευνητικάκέντρα σε 34 χώρες, επισηµαίνει ότι οιγλώσσες που µιλιούνται από σχετικάµικρό αριθµό ανθρώπων κινδυνεύουν,επειδή δεν έχουν τεχνολογική υποστή-ριξη όπως έχουν οι ευρέως χρησιµο-ποιούµενες γλώσσες. Λευκές Βίβλοιέχουν καταρτιστεί για τις εξής ευρω-παϊκές γλώσσες: αγγλικά, βασκικά,βουλγαρικά, γαλικιανά, γαλλικά, γερ-µανικά, δανικά, ελληνικά, εσθονικά,ιρλανδικά, ισλανδικά, ισπανικά, ιταλικά,καταλανικά, κροατικά, λετονικά, λι-θουανικά, µαλτέζικα, νορβηγικά (µπουκ-µόλ και νινόρσκ), ολλανδικά, ουγγρικά,πολωνικά, πορτογαλικά, ρουµανικά,σερβικά, σλοβακικά, σλοβενικά, σουη-δικά, τσεχικά και φινλανδικά. ΚάθεΛευκή Βίβλος είναι γραµµένη στη γλώσ-σα στην οποία αναφέρεται και είναιµεταφρασµένη στα αγγλικά.
Τέσσερις µεγάλοι κίνδυνοιΣύµφωνα µε τη νέα µελέτη, η Ισ-
λανδική, η Λετονική, η Λιθουανική καιη Μαλτέζικη αντιµετωπίζουν τον µε-γαλύτερο κίνδυνο εξαφάνισης σε µιαευρωπαϊκή τεχνολογική κοινωνία, πουολοένα περισσότερο προωθεί τη χρήσησυγκεκριµένων γλωσσών και ιδίως τηςΑγγλικής. Όµως και άλλες γλώσσες,όπως η Ελληνική, η Βουλγαρική, η Ουγ-γρική και η Πολωνική, επίσης κινδυ-νεύουν στον σύγχρονο ψηφιακό κόσµο. Η έρευνα του ΜΕΤΑ-ΝΕΤ, στην οποία
συνέβαλαν περισσότεροι από 200 ειδικοί,αξιολογεί τον κίνδυνο για κάθε γλώσσαµε βάση τέσσερα βασικά κριτήρια σετεχνολογικό/ψηφιακό επίπεδο: την ύπαρ-ξη αυτόµατης µετάφρασης στη συγκε-κριµένη γλώσσα, τη δυνατότητα φωνη-τικής αλληλεπίδρασης, τη δυνατότηταψηφιακής ανάλυσης κειµένου και τηδιαθεσιµότητα των σχετικών ψηφιακώνγλωσσικών πόρων/πηγών.
Οι δυνατέςΗ γλώσσα µε την καλύτερη βαθµο-
λογία στα κριτήρια είναι ασφαλώς ηΑγγλική, που απολαµβάνει τη συγκριτικάκαλύτερη τεχνολογική υποστήριξη (ανκαι όχι την καλύτερη δυνατή), γεγονόςπου διευκολύνει την περαιτέρω εξά-πλωσή της.
Ακολουθούν µε ικανοποιητική ή µέ-τρια τεχνολογική/ψηφιακή υποστήριξηη Ολλανδική, η Γαλλική, η Γερµανική,η Ιταλική και η Ισπανική. Η Ελληνική,όπως επίσης η Βασκική, η Καταλανική,η Πολωνική, η Ουγγρική κ.ά. κατα-τάσσονται στις γλώσσες µε «αποσπα-σµατική» µόνο υποστήριξη, γι’ αυτόακριβώς θεωρούνται γλώσσες υψηλούκινδύνου προς εξαφάνιση.
Δραµατικές διαφορές Σύµφωνα µε τους επιµελητές της µε-
λέτης Χανς Ουζκοράιτ και Γκέοργκ Ρεµ,«υπάρχουν δραµατικές διαφορές στηνυποστήριξη της γλωσσικής τεχνολογίαςανάµεσα στις διάφορες ευρωπαϊκέςγλώσσες και τεχνολογικές περιοχές. Τοχάσµα µεταξύ ‘µικρών’ και ‘µεγάλων’γλωσσών ολοένα και διευρύνεται. Πρέπεινα εξασφαλίσουµε τον εφοδιασµό τωνµικρότερων και λιγότερο πλούσιων -σεψηφιακούς πόρους- γλωσσών µε τιςαπαραίτητες βασικές τεχνολογίες, αλλιώςοι γλώσσες αυτές είναι καταδικασµένεςσε ψηφιακή εξαφάνιση».Ως ελπίδα αυτών των γλωσσών θεω-
ρείται η βελτίωση και η ευρύτερη αξιο-ποίηση του λογισµικού γλωσσικής τε-χνολογίας, το οποίο επιτρέπει τη φω-νητική και τη γραπτή επεξεργασία τωνδιαφόρων γλωσσών. Παραδείγµατα αυτών των δυνατοτή-
των είναι οι ηλεκτρονικοί ορθογραφικοίκαι συντακτικοί διορθωτές κειµένων,οι διαδραστικοί προσωπικοί «βοηθοί»των έξυπνων κινητών τηλεφώνων (π.χ.η Siri στο iPhone), τα συστήµατα αυ-τόµατης µετάφρασης, τα ηλεκτρονικάσυστήµατα διαλόγου των τηλεφωνικώνκέντρων, οι µηχανές αναζήτησης, ησυνθετική φωνή στα συστήµατα πλοή-γησης των αυτοκινήτων. κ.ά.
Το βασικό πρόβληµαΤο σηµαντικό, σύµφωνα µε την έκ-
θεση, είναι όλες αυτές οι δυνατότητεςνα προσφέρονται στους χρήστες και στηµητρική τους γλώσσα που κινδυνεύειµε εξαφάνιση. Χωρίς αποφασιστική δρά-ση, γίνεται η δυσοίωνη πρόβλεψη ότιοι γλώσσες αυτές δύσκολα θα επιβιώσουνστον ψηφιακό κόσµο του 21ου αιώνα.Ένα πρόβληµα είναι ότι το λογισµικό
αυτών των συστηµάτων γλωσσικής τε-χνολογίας στηρίζεται σε στατιστικές µε-θόδους που απαιτούν τεράστιες ποσό-τητες γραπτών ή φωνητικών δεδοµένων,όµως τόσα πολλά δεδοµένα είναι δύσκολονα αποκτηθούν για γλώσσες που οµι-λούνται από σχετικά λίγους ανθρώπους.Εξάλλου, ακόµα και για ευρέως χρη-
σιµοποιούµενες γλώσσες όπως τα αγ-γλικά, η σχετική γλωσσική τεχνολογίαέχει ακόµα αδυναµίες, που είναι π.χ.φανερές στις άκρως ανεπαρκείς και γε-µάτες λάθη αυτόµατες µεταφράσεις. Ηέκθεση προτείνει ότι πρέπει να αναληφθείµια συντονισµένη µεγάλης κλίµακαςπροσπάθεια στην Ευρώπη, προκειµένουσταδιακά να δηµιουργηθούν ή να βελ-τιωθούν οι αναγκαίες τεχνολογίες καινα βοηθηθούν οι γλώσσες που είναι ψη-φιακά παραγκωνισµένες.
Τη γλώσσα µού... έχασαν
Οι περισσότερες ευρωπαϊκές γλώσσες κινδυνεύουν µε ψηφιακή εξαφάνιση
Πρέπει να εξασφαλιστεί ο εφοδιασµός των µικρότερων και λιγότερο πλούσιων-σε ψηφιακούς πόρους- γλωσσών µε τις απαραίτητες βασικές τεχνολογίες
?049-ΚΟΣΜΟΣ 29/09/2012 1:41 ?Μ Page 49
Update of the Study (2014)
q Study comprised 31 volumes/languages.q Many languages missing! Need for
extension – at least of the comparison.q We invited three language community
bodies to participate in the update:European Federation of National Institutions for Language (EFNIL)Network to Promote Linguistic Diversity (NPLD)Experts Committee of the European Language Charter (Council of Europe)
http://www.meta-net.eu 14
CCURL 2014 – Collaboration and Computing for Under-Resourced Languages in the Linked Open Data Era
$Q 8SGDWH DQG ([WHQVLRQ RI WKH 0(7$1(7 6WXG\³(XURSH¶V /DQJXDJHV LQ WKH 'LJLWDO $JH´
*HRUJ 5HKP +DQV 8V]NRUHLW ,GR 'DJDQ 9DUWNHV *RHWFKHULDQ 0HKPHW 8JXU 'RJDQ &RVNXQ 0HUPHU 7DPiV 9DUDGL 6DELQH .LUFKPHLHU$QGHUVHQ
*HUKDUG 6WLFNHO 0HLULRQ 3U\V -RQHV 6WHIDQ 2HWHU 6LJYH *UDPVWDG
0(7$1(7')., *PE+%HUOLQ *HUPDQ\
0(7$1(7%DU,ODQ 8QLYHUVLW\7HO $YLY ,VUDHO
0(7$1(7$UD[ /WG/X[HPERXUJ
0(7$1(77ELWDN %LOJHP*HE]H 7XUNH\
()1,/ 0(7$1(7+XQJDULDQ $FDGHP\ RI 6FLHQFHV%XGDSHVW +XQJDU\
()1,/ 0(7$1(7'DQLVK /DQJXDJH &RXQFLO&RSHQKDJHQ 'HQPDUN
()1,/,QVWLWXW IU 'HXWVFKH 6SUDFKH0DQQKHLP *HUPDQ\
13/'1HWZRUN WR 3URPRWH /LQJ 'LYHUVLW\&DUGLII :DOHV
&RXQFLO RI (XURSH &RP RI ([SHUWV8QLYHUVLW\ RI +DPEXUJ+DPEXUJ *HUPDQ\
&RXQFLO RI (XURSH &RP RI ([SHUWV%HUJHQ 1RUZD\
$EVWUDFW7KLV SDSHU H[WHQGV DQG XSGDWHV WKH FURVVODQJXDJH FRPSDULVRQ RI /7 VXSSRUW IRU (XURSHDQ ODQJXDJHV DV SXEOLVKHG LQ WKH0(7$1(7 /DQJXDJH :KLWH 3DSHU 6HULHV 7KH XSGDWHG FRPSDULVRQ FRQILUPV WKH RULJLQDO UHVXOWV DQG SDLQWV DQ DODUPLQJ SLFWXUHLW GHPRQVWUDWHV WKDW WKHUH DUH HYHQ PRUH GUDPDWLF GLIIHUHQFHV LQ /7 VXSSRUW EHWZHHQ WKH (XURSHDQ ODQJXDJHV
.H\ZRUGV/5 1DWLRQDO,QWHUQDWLRQDO 3URMHFWV ,QIUDVWUXFWXUDO3ROLF\ ,VVXHV 0XOWLOLQJXDOLW\ 0DFKLQH 7UDQVODWLRQ
,QWURGXFWLRQ DQG 2YHUYLHZ
7KH PXOWLOLQJXDO VHWXS RI RXU (XURSHDQ VRFLHW\ LPSRVHV VRFLHWDO FKDOOHQJHV RQ SROLWLFDO HFRQRPLF DQGVRFLDO LQWHJUDWLRQ DQG LQFOXVLRQ HVSHFLDOO\ LQ WKH FUHDWLRQ RI WKH VLQJOH GLJLWDO PDUNHW DQG XQLILHG LQIRUPDWLRQ VSDFH WDUJHWHG E\ WKH 'LJLWDO $JHQGD (& /DQJXDJH WHFKQRORJ\ LV WKH PLVVLQJ SLHFH RI WKH SX]]OHLW LV WKH NH\ HQDEOHU DQG VROXWLRQ WR ERRVWLQJ JURZWK DQGVWUHQJWKHQLQJ (XURSH¶V FRPSHWLWLYHQHVV5HFRJQLVLQJ (XURSH¶V H[FHSWLRQDO GHPDQG DQG RSSRUWXQLWLHV OHDGLQJ UHVHDUFK FHQWUHV LQ (XURSHDQ FRXQWULHV MRLQHG IRUFHV LQ 0(7$1(7 D 1HWZRUN RI ([FHOOHQFH GHGLFDWHG WR WKH WHFKQRORJLFDO IRXQGDWLRQV RID PXOWLOLQJXDO (XURSHDQ LQIRUPDWLRQ VRFLHW\ 0(7$1(7 ZDV SDUWLDOO\ VXSSRUWHG WKURXJK IRXU SURMHFWVIXQGHG E\ WKH (& 70( &(6$5 0(7$1(78 DQG0(7$125' 0(7$1(7 LV IRUJLQJ WKH 0XOWLOLQJXDO (XURSH 7HFKQRORJ\ $OOLDQFH 0(7$ ZLWK PRUHWKDQ RUJDQLVDWLRQV DQG H[SHUWV UHSUHVHQWLQJ PXOWLSOH VWDNHKROGHUV DQG VLJQHG FROODERUDWLRQ DJUHHPHQWVZLWK PRUH WKDQ RWKHU SURMHFWV DQG LQLWLDWLYHV 0(7$1(7¶V JRDO LV PRQROLQJXDO FURVVOLQJXDO DQG PXOWLOLQJXDO WHFKQRORJ\ VXSSRUW IRU DOO (XURSHDQ ODQJXDJHV5HKP DQG 8V]NRUHLW :H UHFRPPHQG IRFXVLQJRQ WKUHH SULRULW\ UHVHDUFK WKHPHV FRQQHFWHG WR DSSOLFDWLRQ VFHQDULRV WKDW ZLOO SURYLGH (XURSHDQ 5'ZLWK WKHDELOLW\ WR FRPSHWH ZLWK RWKHU PDUNHWV DQG DFKLHYH EHQHILWV IRU (XURSHDQ VRFLHW\ DQG FLWL]HQV DV ZHOO DV RSSRUWXQLWLHV IRU RXU HFRQRP\ DQG IXWXUH JURZWK
7KLV SDSHU H[WHQGV DQG XSGDWHV RQH LPSRUWDQW UHVXOW RIWKH ZRUN FDUULHG RXW ZLWKLQ WKH 0(7$9,6,21 SLOODURI WKH LQLWLDWLYH WKH FURVVODQJXDJH FRPSDULVRQ RI /7VXSSRUW IRU (XURSHDQ ODQJXDJHV DV SXEOLVKHG LQ WKH0(7$1(7 /DQJXDJH :KLWH 3DSHU 6HULHV 5HKP DQG8V]NRUHLW
7KH /DQJXDJH :KLWH 3DSHU 6HULHV$QVZHULQJ WKH TXHVWLRQ RQ WKH FXUUHQW VWDWH RI D ZKROH5' ILHOG LV GLIILFXOW DQG FRPSOH[ )RU /7 QRERG\ KDGFROOHFWHG WKHVH LQGLFDWRUV DQG SURYLGHG FRPSDUDEOH UHSRUWV IRU D VXEVWDQWLDO QXPEHU RI (XURSHDQ ODQJXDJHV\HW 7R DUULYH DW D ILUVW FRPSUHKHQVLYH DQVZHU 0(7$1(7 SUHSDUHG WKH /DQJXDJH :KLWH 3DSHU 6HULHV ³(XURSH¶V /DQJXDJHV LQ WKH'LJLWDO $JH´ 5HKP DQG8V]NRUHLW WKDW GHVFULEHV WKH FXUUHQW VWDWH RI /7 VXSSRUWIRU (XURSHDQ ODQJXDJHV LQFOXGLQJ DOO RIILFLDO (8ODQJXDJHV 7KLV XQGHUWDNLQJ KDG EHHQ LQ SUHSDUDWLRQZLWK PRUH WKDQ H[SHUWV VLQFH PLG DQG ZDVSXEOLVKHG LQ WKH VXPPHU RI 7KH VWXG\ LQFOXGHG DFRPSDULVRQ RI WKH VXSSRUW DOO ODQJXDJHV UHFHLYH LQ IRXUDUHDV 07 VSHHFK WH[W DQDO\WLFV ODQJXDJH UHVRXUFHV7KH GLIIHUHQFHV LQ WHFKQRORJ\ VXSSRUW EHWZHHQ WKH YDULRXV ODQJXDJHV DQG DUHDV DUH GUDPDWLF DQG DODUPLQJ ,QWKH IRXU DUHDV (QJOLVK LV DKHDG RI WKH RWKHU ODQJXDJHVEXW HYHQ VXSSRUW IRU (QJOLVK LV IDU IURP EHLQJ SHUIHFW:KLOH WKHUH DUH JRRG TXDOLW\ VRIWZDUH DQG UHVRXUFHVDYDLODEOH IRU D IHZ ODUJHU ODQJXDJHV DQG DSSOLFDWLRQ DUHDV RWKHUV XVXDOO\ VPDOOHU ODQJXDJHV KDYH VXEVWDQWLDOJDSV 0DQ\ ODQJXDJHV ODFN EDVLF WHFKQRORJLHV IRU WH[W
MT
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German, Hungarian, Italian, Polish,
Romanian
weak or no support
Albanian, Asturian, Basque, Bosnian, Breton, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Frisian, Friulian, Galician, Greek,
Hebrew, Icelandic, Irish, Latvian, Limburgish, Lithuanian, Luxembourgish, Macedonian, Maltese, Norwegian, Occitan,
Portuguese, Romany, Scots, Serbian, Slovak, Slovene, Swedish, Turkish, Vlax Romani, Welsh, Yiddish
excellent
Czech, Dutch, Finnish, French, German, Italian,
Portuguese, Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Danish, Estonian,
Galician, Greek, Hungarian, Irish,
Norwegian, Polish, Serbian, Slovak, Slovene,
Swedish, Turkish
weak or no support
Albanian, Asturian, Bosnian, Breton, Croatian, Frisian,Friulian, Hebrew, Icelandic, Latvian, Limburgish, Lithuanian, Luxembourgish, Macedonian, Maltese, Occitan, Romanian,
Romany, Scots, Vlax Romani, Welsh, Yiddish
excellent
English
good
Spee
ch
English
good
Dutch, French, German, Hebrew,
Italian, Spanish
moderate fragmentary
Basque, Bulgarian, Catalan, Czech, Danish, Finnish, Galician, Greek, Hungarian, Norwegian,
Polish, Portuguese, Romanian, Slovak, Slovene, Swedish
weak or no support
Albanian, Asturian, Bosnian, Breton, Croatian, Estonian, Frisian, Friulian, Icelandic, Irish, Latvian, Limburgish, Lithuanian,
Luxembourgish, Macedonian, Maltese, Occitan, Romany, Scots, Serbian, Turkish, Vlax Romani, Welsh, Yiddish
excellent
English
good
Czech, Dutch, French, German,
Hungarian, Italian, Polish,
Spanish, Swedish
moderate fragmentary
Basque, Bulgarian, Catalan, Croatian, Danish,
Estonian, Finnish, Galician, Greek, Hebrew, Norwegian, Portuguese,
Romanian, Serbian, Slovak, Slovene
Albanian, Asturian, Bosnian, Breton, Frisian, Friulian, Icelandic, Irish, Latvian, Limburgish, Lithuanian, Luxembourgish,
Macedonian, Maltese, Occitan, Romany, Scots, Turkish, VlaxRomani, Welsh, Yiddish
weak/no supportexcellent
Res
ourc
esTe
xt A
naly
tics
Excellent
Good
Moderate
Fragmentary
Weak/nosupport
Lang
uage
Tech
nolo
gy Su
ppor
tM
illions of Native Speakers (Worldwide)
Yiddis
h
Welsh
Vlax R
oman
i
Turki
sh
Scot
s
Roma
ny
Occit
an
Malte
se
Mace
donia
n
Luxe
mbou
rgish
Lithu
anian
Limbu
rgish
Latvi
an
Icelan
dicFri
ulian
Frisia
n
Breto
n
Bosn
ian
Astu
rian
Alban
ian Irish
Croati
an
Serb
ian
Hebr
ew
Esto
nian
Slove
ne
Slova
k
Romanian
Norw
egian
Gree
k
Galic
ian
Danis
hBu
lgaria
n
Basq
ue
Swed
ish
Portu
gues
e
Finnis
h
Catalan
Polish
Hungarian
Czec
h
Italian
German
Dutch
Spanish
French
English
0
50
100
150
200
250
300
350
400
Extension of the META-NET White Paper Study (2013/2014)
META-NETStrategic Research
Agenda (SRA)
http://www.meta-net.eu 17
Three Ingredients
http://www.meta-net.eu 18
AppropriateProgrammeVision & Agenda
Appropriate ActorsResearch &
Commercialisation
Appropriate Support
Funding
Vision Paper
Vision Group Translation and
LocalisationReport
Vision Group Interactive
Systems Report
Vision Group Media and
Information Services Report
PriorityThemesPaper
Expert meetingminutes
Expert meetingminutes
Expert meetingminutes
Planning Process
Strategic Research Agenda
2010 2011 2012
Vision Paper
Vision Group Translation and
LocalisationReport
Vision Group Interactive
Systems Report
Vision Group Media and
Information Services Report
PriorityThemesPaper
Expert meetingminutes
Expert meetingminutes
Expert meetingminutes
Planning Process: Documents
Strategic Research Agenda
2010 2011 2012
www.meta-net.eu [email protected] T: +49 30 23895 1833
The Future European Multilingual Information Society
Vision Paper for a Strategic Research Agenda
“People can’t share knowledge if they don’t speak a common language.” Davenport, Thomas H, and Laurence Prusak, Working Knowledge: How Organizations Manage What They Know, Harvard Business School, Boston, 1997, p. 98.
Join the discussion at www.meta-et.eu/forum
LT 2020 Vision and Priority Themes for Language Technology Research in Europe until the Year 2020 Towards the META-NET Strategic Research Agenda
The development of this paper has been funded by the Seventh Framework Programme and the ICT Policy Support Programme of the Euro-pean Commission under contracts T4ME (Grant Agreement 249119), CESAR (Grant Agreement 271022), METANET4U (Grant Agreement 270893) and META-NORD (Grant Agreement 270899).
Do you have comments, ideas or suggestions
with regard to the content of this document?
Please send them to [email protected] or
discuss them online: http://www.meta-net.eu/sra.
This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.
A Network of Excellence forging the
Multilingual Europe Technology Alliance
Vision Document
Vision Group Translation and Localisation Results of first two meetings
Editors: Aljoscha Burchardt, Georg Rehm
Dissemination Level: Public
Date: 3 December 2010
This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.
A Network of Excellence forging the
Multilingual Europe Technology Alliance
Vision Document
Vision Group Media and Information Services: Results of first two meetings
Editors: Maria Koutsombogera, Stelios Piperidis
Dissemination Level: Public
Date: 10 November 2010
This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.
A Network of Excellence forging the
Multilingual Europe Technology Alliance
Vision Document
Vision Group Interactive Systems: Results of first two meetings
Editors: Joseph Mariani, Bernardo Magnini
Dissemination Level: Public
Date: 28 December 2010
Strategic Research Agenda
q Addresses the problems we identified when preparing the white papers.
q Can put Europe ahead of its competitors in this technology area.
q 200 contributors; >2 years.54% industry; 46% research; 4% (inter)national institutions.
q Presented and discussed at 90+ conferences and major workshops.
q Published in early 2013.
q http://www.meta-net.eu/sra
http://www.meta-net.eu 21
Priority Research Themes
q Three priority research themes:§ Translingual Cloud§ Social Intelligence and
e-Participation§ Socially-Aware Interactive
Assistants
q Two additional themes:§ European Service Platform
for Language Technologies§ Core Technologies for
Language Analysis and Production
http://www.meta-net.eu 22
Providers of operational and research technologies and services
ResearchCentres
EuropeanInstitutions
Othercompanies (SMEs,
startups etc.)
NationalLanguageInstitutions
LanguageTechnologyProviders
LanguageService
ProvidersUniversities
EuropeanInstitutions
ResearchCentres
Public Administrations Enterprises LT User
Industries UniversitiesEuropeanCitizens
Beneficiaries/users of the platform
Interfaces (web, speech, mobile etc.)
Priority Research Theme 1:Translingual
Cloud
Priority Research Theme 2:Social Intelligence& e-Participation
Priority Research Theme 3:Socially Aware
Interactive Assistants
European Service Platform for Language Technologies(Cloud or Sky Computing Platform)
Multilingualtechnologies
Textanalytics
Textgeneration
Languagechecking
Sentimentanalysis
Named entityrecognition
Summari-sation
Knowledge accessand management
Information andrelation extraction
LanguageProcessing
LanguageUnderstanding
Knowledge
Emotion/Sentiment
Data protectionToolsData SetsResourcesComponentsMetadataStandardsInterfacesAPIsCataloguesQuality AssuranceData Import/ExportInput/OutputStoragePerformanceAvailabilityScalability
Featu
res
Icelandic
French
CatalanItalian
Maltese
Greek
Bulgarian
Romanian
Serbian
Croatian
Slovene Hungarian
Slovak
Czech
German
Danish Lithuanian
Latvian
Estonian
Finnish
Swedish
Norwegian
Basque
SpanishPortuguese
Galician
English
Irish
PolishDutch
Polish
English
Irish
Icelandic
Italian
Maltese
Greek
Bulgarian
Romanian
SerbianCroatian
SloveneHungarian
Slovak
Czech
German
Dutch
DanishLithuanian
Latvian
Estonian
Finnish
Swedish
Norwegian
Basque
Spanish
Portuguese
Galician
French
Catalan
http://www.meta-net.eu 24
Concrete result of these activities: One call for proposals around Machine Translation in Horizon 2020 WP 2015-17.
CRACKER
http://www.meta-net.eu 25
1 DFKI Germany Georg Rehm2 CUNI Czech Republic Jan Hajic3 ELDA France Khalid Choukri4 FBK Italy Marcello Federico5 ATHENA RC Greece Stelios Piperidis6 UEDIN UK Philipp Koehn7 USFD UK Lucia Specia
Coordination and Support Action, H2020-ICT17, 2015–2017, 36 months – http://www.cracker-project.eu
Cracking the Language BarrierCoordination, Evaluation and Resources for European MT Research
THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET
Multilingual access to all digital goods and services across Europe1
Geo-blocking:
due to nationality, location, or residence
customers
Language-blocking:
languages they do not speak
however, current online translation is insufficienttrying to conduct
common languages
Geo-blocking and language-blocking are barriers to access
Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.
Customers are six times more likely to buy from sites in their native language.
Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.
Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in
European businesses.
Language can be expensive for SMEs
Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar
and marketing costs.
Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.
English is not the answer52% of EU customers do not purchase
Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today
to increase market share.
6x morelikely to
purchase
Site in buyer’snative language
Site in foreignlanguage
Likel
ihoo
d of p
urch
asin
g
THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET
Multilingual access to all digital goods and services across Europe1
Geo-blocking:
due to nationality, location, or residence
customers
Language-blocking:
languages they do not speak
however, current online translation is insufficienttrying to conduct
common languages
Geo-blocking and language-blocking are barriers to access
Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.
Customers are six times more likely to buy from sites in their native language.
Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.
Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in
European businesses.
Language can be expensive for SMEs
Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar
and marketing costs.
Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.
English is not the answer52% of EU customers do not purchase
Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today
to increase market share.
6x morelikely to
purchase
Site in buyer’snative language
Site in foreignlanguage
Likel
ihoo
d of p
urch
asin
g
Communities• META-NET incl. META-SHARE and META• MT evaluation initiatives – WMT, IWSLT, MT Marathons• MT and other LT industry• Language resources – META-SHARE, ELRA• HT/MT evaluation tools – translate5 • Translation industry, translation profession• MT user communities
Strategic Agenda for the Multilingual Digital Single Market• Version 0.5 presented at META-FORUM 2015 (Riga)• Version 0.9 presented at META-FORUM 2016 (Lisbon)
Strategic Research and Innovation Agenda
Language as a Data Type and Key Challenge for Big Data
Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing
and curating natural language content
SRIA Editorial Team
Version 0.9 – July 2016
Selected Activities
2015 2016 2017M
12M1
M24
M36
Kick-off meetingfor all ICT-17Projects
translate5 WMT2016
WMT2017
IWSLT2015
IWSLT2016
IWSLT2017
QT Marathon2015
QT Marathon2016
Roadmap forEuropean MT
Research
Survey on the Stateof HQMT in Industry
and LSPs
SRIA(initial version)
SRIA(update)
SRIA(final)
version 2version 1
• Production of resources (e.g., for WMT 2016 and 2017, IWSLT 2015-2017)
• Tools (quality control, evaluations)• Strategies and roadmaps (SRIA, Roadmap for European MT Research)
• Exchange and sharing facility for resources (META-SHARE)
Recent or Upcoming Events
• LREC Workshop on MT Eval. (May 25)• META-FORUM 2016 (July 4/5, Lisbon)• WMT 2016 (Aug. 11/12, Berlin)• IWSLT 2016 (Dec. 8/9, Seattle)
• Federation of organisations and projects working on technologies for multilingual Europe.
• 10 organisations; 24 projects.• Areas of collaboration: data
management and repositories, tools, shared tasks, evaluations.
• Goal: provide one umbrella organisation for the whole community.
http://www.cracking-the-language-barrier.eu
q META-FORUM 2016 – July 04/05, Lisbon, PortugalBeyond Multilingual Europe
q META-FORUM 2015 – April 27, Riga, LatviaTechnologies for the Multilingual Digital Single Market
q META-FORUM 2013 – Sept. 19/20, Berlin, GermanyConnecting Europe for New Horizons
q META-FORUM 2012 – June 20/21, Brussels, BelgiumA Strategy for Multilingual Europe
q META-FORUM 2011 – June 27/28, Budapest, HungarySolutions for Multilingual Europe
q META-FORUM 2010 – Nov. 17/18, Brussels, BelgiumChallenges for Multilingual Europe
http://www.meta-net.eu 28
The Multilingual Digital Single Market
http://www.meta-net.eu 29
q Top priority in the European Union.
q Expected to add 400b€ to European GDP and hundreds of thousands of new jobs.
q Unfortunately, the language topic is not included in the EC’s Digital Single Market strategy (published in May 2015).
http://www.meta-net.eu 33
Facts and Figures
http://www.meta-net.eu 34
THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET
Multilingual access to all digital goods and services across Europe1
Geo-blocking:
due to nationality, location, or residence
customers
Language-blocking:
languages they do not speak
however, current online translation is insufficienttrying to conduct
common languages
Geo-blocking and language-blocking are barriers to access
Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.
Customers are six times more likely to buy from sites in their native language.
Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.
Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in
European businesses.
Language can be expensive for SMEs
Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar
and marketing costs.
Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.
English is not the answer52% of EU customers do not purchase
Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today
to increase market share.
6x morelikely to
purchase
Site in buyer’snative language
Site in foreignlanguage
Likel
ihoo
d of p
urch
asin
g
Facts and Figures
http://www.meta-net.eu 35
THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET
Multilingual access to all digital goods and services across Europe1
Geo-blocking:
due to nationality, location, or residence
customers
Language-blocking:
languages they do not speak
however, current online translation is insufficienttrying to conduct
common languages
Geo-blocking and language-blocking are barriers to access
Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.
Customers are six times more likely to buy from sites in their native language.
Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.
Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in
European businesses.
Language can be expensive for SMEs
Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar
and marketing costs.
Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.
English is not the answer52% of EU customers do not purchase
Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today
to increase market share.
6x morelikely to
purchase
Site in buyer’snative language
Site in foreignlanguage
Likel
ihoo
d of p
urch
asin
g
The MDSM Fact Sheet
http://www.meta-net.eu 36
Current eCommerce growth within Europe is about half that of the US,due partially to a lack of language coverage from European SMEs.
Less than 5% of European SMEs currently sell cross-language.
Multilingual Digital Single MarketWhy Europe needs a
No single language accounts for more than 20% of the potential Multilingual Digital Single Market.
Most account for less than 3% of the DSM.
Without a solution, the European Digital Single Market will remain fragmented.
Europe’s 24 officiallanguages presenta tremendousopportunity forEuropean business
Removing language barriers within Europe would open access to 73% (with >€25 trillion in annual revenue!) of the world’s digitally accessible market to European enterprise.
Europe today is not a single market:it is a separated into 20+ small language markets.
www.meta-net.eu
Chinese(510 million) World
Span
ish
(165 million)
World Po
rtugu
ese
(83 million)
English(565 million)
Japan
ese
(100 million)
Russia
n
(60 million)
Europe today(Many small
markets)
LANGUAGE TECHNOLOGY
The Multilingual Digital Single Market
Online Population
Sour
ce: In
terne
t Wor
ld Sta
ts (M
iniwa
tt Ma
rketin
g Grou
p)Int
ernet
World
Stats
(Mini
THREE PRIORITY AREAS FOR ACHIEVING THE MULTILINGUAL DIGITAL SINGLE MARKET
Multilingual access to all digital goods and services across Europe1
Geo-blocking:
due to nationality, location, or residence
customers
Language-blocking:
languages they do not speak
however, current online translation is insufficienttrying to conduct
common languages
Geo-blocking and language-blocking are barriers to access
Both geo-blocking and language-blocking aredaily problems for tens of millions of EU citizens.
Customers are six times more likely to buy from sites in their native language.
Most EU languages address less than 3% of the market, fundamentally limiting SMEs operating in countries where thoselanguages are spoken.
Lack of language technology support (automatic translation, tools to assist human translators, and multilingual support in
European businesses.
Language can be expensive for SMEs
Online businesses face around €5,000 in up-front costs for each new language they translate their websites into, plus similar
and marketing costs.
Even when sites are translated, the vast majority of SMEs cannot respond to support requests or customer feedback in other languages. Such responsiveness is needed to achieve customer satisfaction and build brand loyalty.
English is not the answer52% of EU customers do not purchase
Adding even a few languages to an SME’s website beyond Englishcan have a major impact on revenue. Large organizations today
to increase market share.
6x morelikely to
purchase
Site in buyer’snative language
Site in foreignlanguage
Likel
ihoo
d of p
urch
asin
g
Good
Moderate
Fragmentary
Weak/nosupport
0
50
100
150
200
250
300
350
400
Lang
uage
Tech
nolo
gy Su
ppor
t* Millions of Native Speakers (W
orldwide)
Language Technology Danger Zone(≈150 million EU citizens)
Language Technology Danger Zone(≈150 million EU citizens)
Span
ishEn
glish
Portu
gues
eGe
rman
Fren
chIta
lian
Polis
hRo
man
ian
Dutch
Gree
kHu
ngar
ian
Czec
hSw
edish
Bulg
aria
nDa
nish
Croa
tian
Slova
kFin
nish
Lithu
ania
nSlo
vene
Latv
ian
Esto
nian
Mal
tese
Irish
140 million EU citizens are in the Language Technology Danger Zone, where language technology is inadequate to support the DSM.
Current online automatic translation provided by US tech giants does not solve
less than 30% of automatically translated content is truly useful for online commerce.
Only three European languages
Boosting commerce through multilingual technologies2
Connecting citizens to European digital public services3
Without Language Technology, the European Commission has no way to respond effectively to citizen participation.
Current language technology is inadequate for over half of the EU official languages to help the European Commission solve its citizen engagement problem.
Translation opens 20 times its cost in revenue opportunity. However, translation remains too expensive for many European SMEs, blocking this opportunity and limiting economic growth in Europe. Lowering these costs is a strategic opportunity
TranslationCosts
Increase inRevenue
good
bad
ugly
Online AutomaticTranslation Quality
Most local governmental services are monolingual only. This poses a problem for tourists, expatriates, and linguistic minorities. Language technology can provide the
Multilingual eParticipation can help build the European Identity
with one another in their respective native languages with sophisticated machine translation working behind the scenes. Only when EU citizens can interact in their own languages will they truly develop a sense of European identity and community.
Over half of EU citizens are language blocked from interacting with the European Commission’s web resources for citizen participation.
290 million EU citizens excluded Speakers of otherlanguages are
languageblocked from
full participation
Speakers ofEnglish, French,
German canparticipate
fully
Strategic Agenda for the Multilingual Digital Single Market http://rigasummit2015.eu. META, the Multilingual Europe Technology Alliance, has more than 750 members (http://www.meta-net.euLT-Innovate, the European Association of the Language Technology Industry, has 180 corporate members throughout Europe (http://lt-innovate.eu
Technology support has improved for some languages since this study was completed.
Technology Solutions
Investment in the following solutions will help achieve theMultilingual Digital Single Market
Unified Customer Experience
care, customer relationship, discussion fora,
Multimodal User Experience for Connected Devices
interfaces
household appliances, and consumer
Voice of the Customer
market research
Content Curation and Production
Digital Translation Centre
customers, citizens
The forthcoming Strategic Agenda for the Multilingual Digital Single Market will provide additional details on these and other solutions for the needs of the Multilingual Digital Single Market.
Download this fact sheet from http://cracker-project.eu.For more information contact Dr. Georg Rehm (DFKI) at [email protected].
http://cracker-project.eu/wp-content/uploads/2015/11/mDSM-Fact-Sheet.pdf
META-FORUM 2015 AND MDSM SRIA V0.5
http://www.meta-net.eu 37
Open Letter to the EC
q On Friday, March 20, 2015, we published an open letter to the EC on http://multilingualeurope.eu.
q On Monday, March 23, 2015, we informed President Juncker and all Commissioners about the campaign and the 1300+ signatures.
q By now more than 3600 signatures!
38
q 5 Members of the European Parliament
q 150+ high-level representatives from industry (CxO level)
q 1200+ professorsq 400+ project or research managersq 20+ entrepreneurs and foundersq hundreds of language and language
technology professionals, officials, researchers, administrators and representatives from related stakeholder groups
Who signed?
META-FORUM 2015
q April 27 in Riga, Latviaq Riga Summit 2015 on the Multi-
lingual Digital Single Marketq Two important components:
§ MDSM SRIA Version 0.5
§ Further community fusing
q http://www.meta-forum.eu
Joint EFNIL and NPLD Panel
q Joint EFNIL and NPLD panel at META-FORUM 2015.q Joint position paper.
Initially presented at META-FORUM 2015 and the Riga Summit 2015 on the Multilingual Digital Single Market, April 27, 2015
www.rigasummit2015.eu
Joint NPLD/EFNIL Position Paper on the
Multilingual Digital Single Market
!
“Languages are not only a means of communication. They also have embedded in them people’s values, aspirations and hopes.” (European Roadmap for Linguistic Diversity 2015, NPLD)
“Many European languages run the risk of becoming victims of the digital age as they are un-der-represented and under-resourced online. Huge regional market opportunities remain un-tapped because of language barriers.” (Multilingual Europe: A challenge for language tech. MultiLingual. April/May 2011, page 51/52)
Vision Paper
Vision Group Translation and
Localisation Report
Vision Group Interactive Systems
Report
Vision Group Media and Information Services Report
PriorityThemesPaper
Expert meetingminutes
Expert meetingminutes
Expert meetingminutes
META-NET Strategic Research Agenda for
Multilingual Europe 2020
2010 2011 2012 2013 2014 2015
www.meta-net.eu [email protected] T: +49 30 23895 1833
The Future European Multilingual Information Society
Vision Paper for a Strategic Research Agenda
“People can’t share knowledge if they don’t speak a common language.” Davenport, Thomas H, and Laurence Prusak, Working Knowledge: How Organizations Manage What They Know, Harvard Business School, Boston, 1997, p. 98.
Join the discussion at www.meta-et.eu/forum
LT 2020 Vision and Priority Themes for Language Technology Research in Europe until the Year 2020 Towards the META-NET Strategic Research Agenda
The development of this paper has been funded by the Seventh Framework Programme and the ICT Policy Support Programme of the Euro-pean Commission under contracts T4ME (Grant Agreement 249119), CESAR (Grant Agreement 271022), METANET4U (Grant Agreement 270893) and META-NORD (Grant Agreement 270899).
Do you have comments, ideas or suggestions
with regard to the content of this document?
Please send them to [email protected] or
discuss them online: http://www.meta-net.eu/sra.
This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.
A Network of Excellence forging the
Multilingual Europe Technology Alliance
Vision Document
Vision Group Translation and Localisation Results of first two meetings
Editors: Aljoscha Burchardt, Georg Rehm
Dissemination Level: Public
Date: 3 December 2010
This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.
A Network of Excellence forging the
Multilingual Europe Technology Alliance
Vision Document
Vision Group Media and Information Services: Results of first two meetings
Editors: Maria Koutsombogera, Stelios Piperidis
Dissemination Level: Public
Date: 10 November 2010
This document is part of the Network of Excellence “Multilingual Europe Technology Alliance (META-NET)”, co- funded by the 7th Framework Programme of the European Commission through the T4ME grant agreement no.: 249119.
A Network of Excellence forging the
Multilingual Europe Technology Alliance
Vision Document
Vision Group Interactive Systems: Results of first two meetings
Editors: Joseph Mariani, Bernardo Magnini
Dissemination Level: Public
Date: 28 December 2010
Strategic Research and
Innovation Agenda
roadmaps, agendas and any other input from other initiatives
…
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFTStrategic Agenda for the
Multilingual Digital Single Market
Technologies for Overcoming Language Barriers towardsa truly integrated European Online Market
DRAFT
Version 0.5 – April 22, 2015
Strategic Agenda for MDSM
q Presented at META-FORUM 2015 and Riga Summit for the first time.
q Version 0.5 – work in progress
q Builds upon many strategy papers and roadmaps prepared by several European projects, incl. the META-NET SRA (2013).
q Input and feedback collected at theRiga Summit 2015 to be used for upcoming versions.
http://www.meta-net.eu
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFTStrategic Agenda for the
Multilingual Digital Single Market
Technologies for Overcoming Language Barriers towardsa truly integrated European Online Market
DRAFT
Version 0.5 – April 22, 2015
A Strategy for the MDSM
q Strategic R&I Agenda for the Multilingual Digital Single Market
q Core: Technology Solutions q Data economy is an inherent
component – LT for effective multilingual data value chains.
http://www.meta-net.eu 43
ii Strategic Agenda for the Multilingual Digital Single Market – Version 0.5 – April, 2015
ContentsExecutive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i1 The Digital Single Market is a Multilingual Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overcoming Language Barriers with Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Language Technologies Made for Europe – in Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Online Use of Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Multilingual Big Data Text Analytics for the European Data Economy . . . . . . . . . . . . . . . . . . . . . 61.5 EC and Language Technology – Past and Present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6 The Economic Power of Language Technology and Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 A Strategic Programme for the Multilingual Digital Single Market . . . . . . . . . . . . . . . . . . . . . . . 102.1 Layer 1: Innovative Technology Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Layer 2: Language Technology Services, Platforms, Infrastructures . . . . . . . . . . . . . . . . . . . . . . . 102.3 Layer 3: Priority Research Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Related Areas, Applications, and Societal Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Layer 1: Innovative Technology Solutions for the Multilingual Digital Single Market . . . . . . . 183.1 Technology Solutions for Businesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Unified Customer Experience and Cross-Cultural CRM (E-Commerce) . . . . . . . . . . . . . . 183.1.2 Digital Translation Centre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.3 Content Curation and Content Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.4 Virtual and Real Translingual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.1.5 Voice of the Customer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.6 Business Intelligence using Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.7 Multimodal User Experience for Connected Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.8 Smart Multilingual Assistants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Technology Solutions for Public Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.1 Voice of the Citizen – Social Intelligence on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 Online Dispute Resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.3 E-Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.4 E-Government . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.5 E-Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.6 E-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Layer 2: Language Technology Services, Platforms, Infrastructures . . . . . . . . . . . . . . . . . . . . . . . 295 Layer 3: Priority Research Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Horizontal Framework Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.1 Language Policies and Public Procurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.2 Standards and Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.3 Open Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.4 Copyright and Data Protection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.1 Expected Economic Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.2 Relevance to the EC’s Digital Single Market Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367.3 Potential Funding Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387.4 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Appendix A. Input Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Appendix B. Digital Language Extinction in Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
q Letter from Andrus Ansip (June 2015)
q “We invite the European language technology community to further develop the ideas presented in the draft Strategic Agenda for the multilingual Digital Single Market”
Cracking the Language Barrier
http://www.meta-net.eu 46
Riga Declarationq 12 organisations present at
META-FORUM 2015 and the Riga Summit 2015 drafted and signed the “Declaration of Common Interests”.
q CRACKER: community building, mostly among projects.
q We combined these into the Cracking the Language Barrier federation.
q Important goal: measure against community fragmentation.
http://www.meta-net.eu
DECLARATION OF COMMON INTERESTS We, the undersigned, declare here, at the Riga Summit on the Multilingual Digital Single Market, encouraged by the letter Vice President Andrus Ansip sent to its participants, that we stand united in our goal and interest to:
- support multilingualism in Europe by employing language technology in business, society and governance, to create a truly Multilingual Digital Single Market,
- exchange and share information in our efforts to promote our goals and interests at local, national and European levels,
- raise awareness in society at large using channels available to our associations, alliances and societies.
In the near future, we foresee the establishment of a Memorandum of Understanding among our organisations towards a “Coalition for a Multilingual Europe”, to better serve our members address the language barrier challenges towards establishing a truly integrated Multilingual Digital Single Market.
Riga, 29. April 2015
Signed by (in alphabetical order):
BDVA Laure Le Bars
CITIA Steve Renals
CLARIN Steven Krauwer
EFNIL Sabine Kirchmeier-Andersen, Tamás Váradi
ELEN Davyth Hicks, Claudia Soria
ELRA Nicoletta Calzolari, Khalid Choukri
GALA Laura Brandon, Robert E. Etches, Sergey Gladkov
LT Innovate Jochen Hummel, Philippe Wacker
META-NET Jan Hajic, Josef van Genabith, Georg Rehm, Andrejs Vasiljevs
NPLD Meirion Prys Jones
TAUS Jaap van der Meer W3C Richard Ishida, Felix Sasaki
For any questions, please contact [email protected].
http://www.cracker-project.eu • http://www.meta-net.eu
• A federation of European projects and organisations working on technologies for a multilingual Europe.
• Multi-lateral Memorandum of Understanding; 10 organisations and 24 projects on board already (including FP7 and H2020-ICT15).
• Getting new members on a regular basis.• Selected areas of collaboration: data
management and repositories, tools, shared tasks, evaluations, events.
• Goal: provide one umbrella organisation for the whole community.
Project Members
Organisation Members
http://www.cracker-project.eu • http://www.meta-net.eu
• Website: information about the initia-tive, all projects and organisations
• Downloadable documents
• List of events
• LREC 2016 MT Eval Workshop
• Several new members will join the initiative soonhttp://www.cracking-the-language-barrier.eu
META-FORUM 2016 AND MDSM SRIA V0.9
http://www.meta-net.eu 51
Andrus Ansip’s Blog Post
q Posted on 27 May 2016. q First public acknowledgment
of the EC that the language topic is of very high relevance for the Digital Single Market.
q “Overcoming language barriers is vital for building the DSM, which is by definition multilingual. It is now time to reduce and remove the language barriers that are holding back its advance, and turn them into competitive advantages.”
http://www.meta-net.eu 52
Reorganisation of DG CONNECT (01/07/2016)
01/07/2016
DG CONNECTCommunications Networks,Content & Technology
Director-GeneralR. Viola (60240
AssistantsO. Bringer (92067P. Stuckmann (21097
Deputy Director-Generalin charge of DirectoratesA, C, E & HG. Kent (acting) (91945
AssistantE. Mitjana (81149
Deputy Director-Generalin charge of DirectoratesB, D, F, G & IC. Bury (60499
AssistantP. Lamotte (98892
Directorate FDigital Single Market
G. de Graaf(68466
Directorate EFuture Networks
M. Campolargo(63479
Directorate DPolicy Strategy& OutreachL. Corugedo Steneberg (96383
Directorate CDigital Excellence& Science Infrastructure
Th. Skordas (acting)(68908
Directorate BElectronic Communications Networks & ServicesA. Whelan (50941
Directorate ADigital Industry
K. Rouhana(68057
Principal AdviserF. Lupescu(68538
Directorate RResources& SupportG. Kent(91945
Directorate IMedia Policy
G. Abbamonte(93573
Directorate HDigital Society, Trust& CybersecurityP. Timmers(90245
Directorate GData
J. Hernández-Ros (acting) (34533
F.1: Digital Policy Development & CoordinationM. Bailey (acting)(69176
E.1: Future Connectivity SystemsB. Barani (acting)(69616
D.1: Research Strategy & Programme CoordinationM. Fjalland (50021
C.1: eInfrastructure & Science Cloud
A. Burgueño Arjona (92471
B.1: Electronic Communications PolicyV. Terävä(92381
A.1: Robotics& Artificial IntelligenceJ. Heikkilä(35325
R.1: Human Resources & CompetencesI. Mariën-Dusak(92376
I.1: Audiovisual & Media Services PolicyL. Boix Alonso(90009
H.1: Cybersecurity & Digital Privacy
J. Boratynski(69452
G.1: Data Policy & Innovation
M. Nagy-Rothengass(31680
F.2: E-Commerce & Platforms
P. Agarwal (acting)(87153
E.2: Cloud & Software
P. O’Donohue(91280
D.2: Policy Implementation & PlanningE. Forti(65172
C.2: High Performance Computing & Quantum TechnologyG. Kalbe(32866
B.2: Implementation of the Regulatory FrameworkW-D. Grussmann(58559
A.2: Technologies & Systems for Digitising IndustryM. Lemke(91575
R.2: Budget & Finance
M-C. Laffineur(68515
I.2: Copyright
M. Martin-Prat(65157
H.2: Smart Mobility & LivingE. Hartog(90084
G.2: Data Applications & CreativityJ. Hernández-Ros(34533
F.3: Start-ups & InnovationP. Zilgalvis(50935
E.3: Next-Generation InternetJ. Villasante(63521
D.3: Policy Outreach & International AffairsA. Angelova-Krasteva(91145
C.3: Future & Emerging Technologies (FET)V. Peca(57843
B.3: Markets
R. Krüger(61555
A.3: Competitive Electronics IndustryW. Van Puymbroeck(68138
R.3: Knowledge Management & Support SystemsF. Accordino(98272
I.3: Audiovisual Industry & Media ProgrammeL. Recalde Langarica(91281
H.3: E-Health, Well-Being & AgeingM. González-Sancho (52918
G.3: Learning, Multilingualism & AccessibilityM. Marsella (acting)(32750
F.4: Digital Economy & SkillsL. Sioli(51262
E.4: Internet of ThingsM. Rohen(63674
D.4: Communication
D. Ringrose(93913
C.4: Flagships
Th. Skordas(68908
B.4: Radio Spectrum PolicyA. Geiss(59466
A.4: Photonics
C. Maloney(69082
R.4: Compliance & Planning
K. Engelbosch(54693
I.4: Media Convergence & Social MediaJ. Cotta(66407
H.4: E-Government & Trust
A. Servida(58186
G.4: Administration& FinanceG. Kalbe (acting)(32866
A.5: Administration& Finance *A. Fiala(64787
B.5: Investment in High-Capacity NetworksA. Krzyżanowska(87246
H.5: Administration& Finance **G. Van Caenegem (acting) (61895
R.5: Programme Operations & Common ServicesI. Malekos(52902
Mirror-Unit REA.A.5Fostering Novel Ideas: FET-OpenT. Hallantie(68167
Mirror-Unit EACEA.B.2Creative Europe: MEDIAH. Trettenbrein(84955
Mirror-Unit REA.C.4Expert Contracting& PaymentsA. Oram(97805
Principal AdviserM. Richards (62443
Adviser for Legal& Legislative IssuesŽ. Bahovec (88284
Adviser for cross-cutting Policy/Research IssuesG. Santucci (68963
Adviser for International Relations linked to Future NetworksP. Blixt (68048
Adviser for Societal IssuesN. Dewandre (94925
Adviser for Organisational Transition (Finance)Vacant
Adviser for Societal ChallengesVacant
Adviser for Innovation SystemsB. Salmelin (69564
Reporting lines are:- R. Viola for Directorate R;- G. Kent (acting) for Directorates A, C, E, H;- C. Bury for Directorates B, D, F, G, I.
Luxembourg;
To be transferred to Luxembourg.
Shared Administration & Finance Unit for Directorates A, B, C, D & F.
Shared Administration & Finance Unit for Directorates E, H & I.
Unit G.1 “Data Policy & Innovation”
Unit G.3 “Learning, Multilingualism & Accessibility”
• Support the data economy in the Digital Single Market• Policy initiatives addressing new and emerging issues. • Advance the Commission open data policy by ensuring the
correct implementation of the PSI Directive and the Pan-European Open Data Portal
• Promote the emergence of an ecosystem comprising all the players of the data value chain.
• Steers together with industry the SRIA. • Addresses key framework conditions of the data economy• Fund research and innovation in data technologies and
applications inter alia by driving the big data PPP.
• Make the DSM more accessible, secure and inclusive. • Support policy, research, innovation and deployment of learning
technologies • Support key enabling digital language technologies and
services to allow all European consumers and businesses to fully benefit from the Digital Single Market.
• Responsible for Web Accessibility Directive• Promote a better Internet for children by protecting and
empowering children online, and improving the quality of content available to them.
Communities & Stakeholders
54... and many more research centres, companies, EU projects etc.
MDSM SRIA
q Version 0.5 unveiled at META-FORUM 2015q Version 0.9 unveiled at META-FORUM 2016q Version 1.0 foreseen for Nov./Dec. 2016q Prepared and presented by Cracking the Language
Barrier federation (editorial team: 13 colleagues)q SRIA addresses how the LT community is going
to act united in order to make the DSM multilingualq Document available on http://www.cracker-project.eu
and also on http://www.cracking-the-language-barrier.eu
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFT
DRAFTStrategic Agenda for the
Multilingual Digital Single Market
Technologies for Overcoming Language Barriers towardsa truly integrated European Online Market
DRAFT
Version 0.5 – April 22, 2015
MLV Programme
q Multilingual Value Programe*§ Three-year programme§ Requires modest investment
q “Enabling the Multilingual Digital SingleMarket through technologies fortranslating, analysing, processing andcurating natural language content”
q Three components address the main needs of the Multilingual DSM (MDSM)and how to put them into practice:1. Multilingual Application Areas2. Multilingual Services3. Research
http://www.meta-net.eu 57
Strategic Research and Innovation Agenda
Language as a Data Type and Key Challenge for Big Data
Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing
and curating natural language content
SRIA Editorial Team
Version 0.9 – July 2016
* SRIA V0.9 and MLV Programme devisedbefore re-organisation of DG CONNECT.
MDSM: Goals and Needs
q Crosslingual communication for SMEs, public institutions, citizensq Crosslingual SME presales communication and aftersales servicesq Multilingual (big) data, language and knowledge value chainsq Multilingual websites, product catalogues, product descriptionsq Multilingual knowledge bases and knowledge graphs (and services)q Multilingual conversational interfaces for connected devices (IoT)q Crosslingual business intelligence (e.g., based on UGC)q Crosslingual social media analytics for EU-wide societal issuesq Multilingual text and report generation (knowledge/data to text)q All services must be domain-adaptable (no one size fits all)q Translation Centre (Cloud) – HQ automated translation for all
http://www.meta-net.eu 58
Multilingual Digital Single Market
Automated Translation
E-Commerce Content, Media, Verticals
Translation, Language, Knowledge, Data
Knowledge andData Repositories
Multilingual Applications
Multilingual Services
ResearchCrosslingual Big Data Language
Analytics
Meaning, Semantics, Knowledge
High-Quality Machine
Translation
SMEs CEF DSIs IT Integrators Researchprovide innovative
applications
fills gaps
H2020 RIAs
H2020 CSAs, IAs, RIAs
H2020 CSAs, RAs, national funding
Multimodal Interaction
Language Processing, Analysis and Production – Language Resources
Citizens Public Business
interoperable and standardised
collaboration with member states
Conversational Technologies
Strategic Research and Innovation Agenda
Language as a Data Type and Key Challenge for Big Data
Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing
and curating natural language content
SRIA Editorial Team
Version 0.9 – July 2016
MLV Programme
Application Areas (Selection)
q Multilingual E-commerce§ Customer-facing vs. back-office facing (after-market, after-sales)§ Crosslingual search, CRM, helpdesks, processes, workflows§ Semantic, crosslingual product descriptions and catalogues§ Online dispute resolution
q Multilingual Content, Media, Verticals§ Content analytics, curation, generation (incl. authoring support)§ Multimodal communication (conversational, written, IoT)§ Vertical domains: health, government, mobility, energy, legal.
q Translation, Language, Knowledge, Data§ Translation Cloud – written/spoken, automatic/human§ Crosslingual public and social intelligence, business intelligence§ HQ resources, under-resourced languages, domain-specific LRs
Setup – Timeframe – Costs
q Close collaboration with EC, EP and all other stakeholders (including SMEs, research centres, universities, NGOs etc.).
q Mix of funding sources: § Horizon 2020 (WP 2018-2020) for EU projects (RA, RIA, CSA)§ National/regional funding sources for work on monolingual LTs
and LRs and also to support and grow SMEs in this area§ Include, strengthen and broaden role of CEF AT (public services)
q Estimated costs for basic MLV implementation: ca. 175-200M€ § Includes set of mission-critical services and applications § Timeframe: 2018, 2019, 2020
http://www.meta-net.eu 61
Conclusionsand Next Steps
http://www.meta-net.eu 62
q There is a lot of traction for the multilingualism/language topic.
q The EU should develop a Multilingual Strategy (incl. technology).
q Strategy must take into account several stakeholders: citizens, business/innovation, DSM, research (multiple communities).
q Most components in place: Communities, SRIAs, STOA Study etc.
q We need the political will to establish language policy change to support multilingualism (both member state level, EU level).
q Some Member States are ahead (DK, IE, EE, ES, LT, LV, NL, SL).
q Coordinate, intensify the push and keep up the pressure from Member States, EP, EC, research community, businesses etc.
q Goal: a shared programme (EU/MSs) as a concerted action.
http://www.meta-net.eu 63
Conclusions
Next Steps
q Several tightly interconnected goals:§ Multilingual Technologies for Europe§ Technologies for the Multilingual Digital Single Market§ Multilingual Strategy of the European Union§ The Human Language Project
1. Discuss and further shape MLV Programme V0.9 with EC2. Extend the Cracking the Language Barrier federation 3. LT brainstorming meeting at EC, Unit G.3 (Dec. 2016)4. EP STOA Workshop on Language Technologies (Jan. 2017)5. MDSM SRIA V1.0 to be finalised (Q1 2017)
http://www.meta-net.eu 64
Language Technology Topics
q Multilingual Europe – Technologies for all European languages q Machine Translation, Text Analytics, Semantic Web etc.q Healthcare, societal challenges (ageing population, refugees etc.)q IoT, Smart Assistants and Conversational Interaction Technologiesq E-Learning – Language Technology for E-Learningq Smart Homes, Cities, Manufacturingq Smart Virtual Assistantsq Social Media Analyticsq E-Participationq Gamesq etc.
http://www.meta-net.eu 67
Digital Language Extinction
q Many smaller languages are experiencing problems digitally:§ Loss of function – other languages take over entire functional areas
such as, e.g., texting, email, search, e-commerce etc.§ Loss of prestige – if it’s not on the web, the languages doesn’t exist§ Loss of competence – can you raise a digital native in your language?
q Andras Kornai’s classification – corresponds to the amount of digital communication in that language: 1. digitally thriving languages (comfort zone languages)2. vital languages3. heritage languages4. still/moribund/dead languages
q Implications for the European/global multilingual web?
http://www.meta-net.eu 68
potentially facing digital extinction …
http://www.meta-net.eu
q Pan-European infrastructure, bringing together providers and consumers of language data, tools and services.
q LRs are documented, uploaded, stored, catalogued, downloaded, shared – to improve visibility, documentation, identification, availability, interoperability.
q Caters for datasets, tools, services for LT research and development (both academic and commercial); META-SHARE includes repository software, a metadata model, licensing kit, statistics.
q 29 distributed repositories maintained by 37 organisations in 25 countries.
q 2.600+ resources (corpora: 49%, lexical: 38%, tools/services: 12%),covering ca. 100 languages.
q 7.000+ downloads in total; ca. 70%of all LRs have been downloaded.
Preparation of the SRA
q Strategic Research Agendas of other initiatives were screened.q Many suggestions as input from Vision Group members.q We discussed procedures, input and structure of the SRA in four
meetings of the META Technology Council.§ Brussels, Belgium, November 16, 2010§ Venice, Italy, May 25, 2011§ Berlin, Germany, September 30, 2011§ Brussels, Belgium, June 19, 2012
q Additional input in talks, meetings, workshops, discussions, etc.§ Example: Three HLT Expert Meetings organised by the EC (end of 2011)
q Almost 200 experts contributed to the SRA (54% from industry; 46% from research; 4% from national/international institutions).
http://www.meta-net.eu 71
• Published in early 2013.
• First strategic research agenda for our field.
• Complex process of collecting and shaping technology visions.
• Hundreds of researchers participated.
• Broad topics around multi-lingual Europe in general.
PT1: Translingual Cloud
q Europe has a big need for translations of publishable quality. q Focus on high-quality translation.q New research paradigms
§ Inclusion of professional translators into the research process
§ Inclusion of technologists into research on human translation processes
q Different technological approaches§ Stronger emphasis on the properties of
individual languages § A central role for semantics
q Methods for specific genres & domains
http://www.meta-net.eu 73
Priority Research Theme 1: Translingual Cloud
Anydevice
Target groups: European citizen, language professional, organisations, companies, European
institutions, software applications
Multiple target formats
Single accesspoint
Automatic translation and interpretation
Language checking Post-editing Workbenches for creative
translations Novel translation and authoring
workflows
Quality assurance Computer-supported human
translation Multilingual content production and
text authoring Trusted service centre (privacy,
confidentiality, security of source data)
Services and Technologies:
Crosslingual communication, translation and search
Real-time subtitling, voice-over generation and translating speech from live events
Mobile interactive interpretation
Multilingual content production (media, web, technical, legal documents)
Showcases: translingual spaces for ambient translation
Applications:
Written (twitter, blog, article, newspaper,text with/without metadata etc.) orspoken input (spontaneous spoken
language, video/audio, multiple speakers)
Modular combination of analysis, transfer
and generation models
From very fast but lower quality to slower but very high quality (including
instant quality upgrades)
Exploiting strong monolingual analysis
and generation methods and resources
Multiple target formats
Domain, task and genre specialisation
models
Extending translation with
semantic data and linked open data
PT2: Social Intelligence
q Better decisions by monitoring social mediaq Inclusion of citizens into collective decision processesq Opinion formation, consensus building, decision makingq Evolution of new solutionsq New forms of democracy: e-democracy,
massive participation, transparencyq Dialogues and debates across language
boundaries and across parties, political alliances, social classes
q Better than binary votingq Documented transparent
decision processes
http://www.meta-net.eu 75
Priority Research Theme 2: Social Intelligence and e-Participation
From shallow to deep, from coarse-grained to
detailed processing techniques
Making language technologies interoperable
with knowledge representa-tion and the semantic web
“Semantification” of the web: tight integration with the Semantic Web and Linked Open Data
Mapping large, heterogeneous, unstructured volumes of online content to structured, actionable
representations
Unleashing social intelligence by detecting and monitoring opinions,
demands, needs and problems
Target groups: European citizen, European institutions, discussion
participants, companies
Make use of the wisdom of the
crowds
Improved efficiency and
quality of decision processes
Understanding influence diffusion across social media
especially social media, comments, blogs, forums
decision-relevant information
support
sentiment analysis and opinion mining including the temporal dimension)
cues
from arbitrary online content
visualising discussions and opinion statements
Services and Technologies:
collective deliberation and e-participation
-wide deliberation on pressing issues
and processes; modeling evolution of opinions
analysis technologies
Applications:
Priority Research Theme 3: Socially-Aware Interactive Assistants
Interacting naturally
with and in groups
Learning and
forgetting information
Adaptable to the user’s needs and preferences and the environment
Include human-computer, human-artificial agent and
computer-mediated human-human communication
Proactive, self-aware,
user-adaptable
Interacts naturally with humans, in any
language and modality
Can be personalised to individual communication
abilities including special needs
Can learn incrementally from all interactions and
other sources of information
recognition
and synthesis, providing expressive voices
understanding
incremental conversational speech
models of human communication
inter-dependencies
priority themes
Services and Technologies:
Applications:
dialogue systems
environment
modalities (visual, tactile, haptic) verbal/non-verbal behaviour, social context
ments, any
vocabulary
recovery,self-
assessment
Multilingualcapabilities
ii Strategic Agenda for the Multilingual Digital Single Market – Version 0.5 – April, 2015
ContentsExecutive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i1 The Digital Single Market is a Multilingual Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overcoming Language Barriers with Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Language Technologies Made for Europe – in Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Online Use of Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Multilingual Big Data Text Analytics for the European Data Economy . . . . . . . . . . . . . . . . . . . . . 61.5 EC and Language Technology – Past and Present . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6 The Economic Power of Language Technology and Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 A Strategic Programme for the Multilingual Digital Single Market . . . . . . . . . . . . . . . . . . . . . . . 102.1 Layer 1: Innovative Technology Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Layer 2: Language Technology Services, Platforms, Infrastructures . . . . . . . . . . . . . . . . . . . . . . . 102.3 Layer 3: Priority Research Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Related Areas, Applications, and Societal Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Layer 1: Innovative Technology Solutions for the Multilingual Digital Single Market . . . . . . . 183.1 Technology Solutions for Businesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Unified Customer Experience and Cross-Cultural CRM (E-Commerce) . . . . . . . . . . . . . . 183.1.2 Digital Translation Centre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.3 Content Curation and Content Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1.4 Virtual and Real Translingual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.1.5 Voice of the Customer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.6 Business Intelligence using Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.1.7 Multimodal User Experience for Connected Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.8 Smart Multilingual Assistants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Technology Solutions for Public Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.1 Voice of the Citizen – Social Intelligence on Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 Online Dispute Resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.3 E-Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.4 E-Government . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2.5 E-Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.6 E-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Layer 2: Language Technology Services, Platforms, Infrastructures . . . . . . . . . . . . . . . . . . . . . . . 295 Layer 3: Priority Research Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Horizontal Framework Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.1 Language Policies and Public Procurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.2 Standards and Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.3 Open Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346.4 Copyright and Data Protection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.1 Expected Economic Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357.2 Relevance to the EC’s Digital Single Market Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367.3 Potential Funding Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387.4 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Appendix A. Input Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Appendix B. Digital Language Extinction in Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
q European Parliament§ Upcoming STOA Study and Workshop (Jan. 2017)
q European Commission § DG CONNECT: Horizon 2020 WP 2018-2020 (G1)§ DG CONNECT: New Unit “Learning, Multilingualism, Inclusion” (G3) § DG Translation: Connecting Europe Facility, AT
q Language Communities: EFNIL and NPLD§ Joint position paper META-FORUM 2015, 2016
q EU Member States and Non-Member States§ National and regional funding agencies (ES, NL etc.)
q Research Communities, especially Big Data community (BDVA SRIA V3.0), Web community and many others (Robotics, IoT etc.)
q Standardisation – W3C and others
http://www.meta-net.eu 80
Multilingual Europe Stakeholders
Multilingual Success Stories
q Moses SMT toolkit as well as research and technology ecosystem
q CEF AT for public online services – good and timely development
q eBay: MT to Russian – 50% increase in sales
q Hugo.lv for Latvian public services – better than Google Translate
q Hundreds of European startups in Language Technology and AI
q Conversational interfaces (Siri, Echo, Cortana): the next big thing
q IBM Watson – a billion dollar LT business
q Great Neural MT results reported by European researchers (QT21)
q Very rapid development – many opportunities for European R&D&I
http://www.meta-net.eu 81