human language technologies & the european research area joseph mariani
DESCRIPTION
Human Language Technologies & the European Research Area Joseph Mariani Former Director, ICT Department, French Ministry of Research & LIMSI-CNRS. LT for a Multilingual Europe. Language as a specific issue for Europe Economical, cultural and political challenge with 2 dimensions: - PowerPoint PPT PresentationTRANSCRIPT
Human Language TechnologiesHuman Language Technologies
& the European Research Area& the European Research Area
Joseph MarianiJoseph MarianiFormer Director, ICT Department, French Ministry of ResearchFormer Director, ICT Department, French Ministry of Research
&&
LIMSI-CNRSLIMSI-CNRS
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
2
LT for a Multilingual EuropeLT for a Multilingual Europe• Language as a specific issue for EuropeLanguage as a specific issue for Europe
– Economical, cultural and political challenge with 2 dimensions:• Preserve the EU Member States cultures
– Preference for native language (Web sites in German (75%)...)• Allow for communication across member states
– 50% of European citizens speak only one language (97% in Japan)– 1650 translators at the EC - 1.4 Mpages translated per year– 30% European Parliament budget (300 M€) – 500 translators– EU: 25 countries, 20 official languages / 380 language pairs
– Enormous cost for the EU, while mandatory– Need for the assistance of Language Technologies
• Huge effort (# LT * # languages), too large for the EC aloneHuge effort (# LT * # languages), too large for the EC alone• Effort should be shared with EU Member StatesEffort should be shared with EU Member States• Would meet the needs of the European Union, but would also Would meet the needs of the European Union, but would also
put Europe, and the Europeput Europe, and the European industry,an industry, in a strong position for in a strong position for providing tools for handling multilingualism worldwideproviding tools for handling multilingualism worldwide
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
3
Building the European Research AreaBuilding the European Research Area• European Research Area (ERA)European Research Area (ERA)
– Need to coordinate EC (< 15%) and MS (> 85%) research efforts• The ERA instrumentsThe ERA instruments
– ERA-Net in FP6 (CA & SSA) to coordinate MS national / regional programs (specific action in DG-Research) : EC only funds coordination activities, not R&D
– ERA-Net+ in FP7 (CSA) to also coordinate with EC programs (thematic action) : EC may also fund R&D activities (?)
– Article 169 to coordinate EC+MS+industrial efforts• Needs a joint European Council and Parliament decision• Single experience in infectious diseases (200 M€ * 3= 600 M€)• Topics evocated for FP7: SMEs, research in Baltic sea, Metrology…
– ESFRI (European Strategy Forum on Research Infrastructure)• LT well and naturally fitted with ERALT well and naturally fitted with ERA
– Coordinate the national / regional efforts, mostly devoted to national / regional languages, with the EC effort, mostly addressing the multilingual dimension and the general coordination
– Show a major value added by EU, in full agreement with subsidiarity principle
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
4
Support to LT in France : Techno-langueSupport to LT in France : Techno-langue
• Report to the Prime Minister (November 2000)Report to the Prime Minister (November 2000)– Need to develop Language Technologies for the French language
• Techno-langue Action launched in 2002Techno-langue Action launched in 2002– Basic Technological Research (RTB)– Articulate with related existing programs (RRIT)
• Funded by 3 ministries :Funded by 3 ministries :– Research, Industry, Culture
• Call for ProposalsCall for Proposals– Up to 3-year projects (2003-2006)– Set up an infrastructure to conduct research in LT for French
• Language Resources (Data / Tools)• Evaluation (Technology / Applications)• Standards• Technological survey
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
5
Funded projectsFunded projects• Budget: 20 M€ effort - 7.5 M€ public funding (over 3 years)Budget: 20 M€ effort - 7.5 M€ public funding (over 3 years)• 94 participants (industry, research, public agencies, foreign)94 participants (industry, research, public agencies, foreign)• 21 funded projects:21 funded projects:
– 10 on Language Resources (data and tools)– 2 on Standards (Spoken / Written)– 1 on Technological survey (Portal) : http://www.technolangue.net– 8 on Technology Evaluation (campaigns)
• Written language processing (5)– EASY: Syntactic parsing– ARCADE 2: Text alignment– CESART: Terminology extraction– EQUER: Information query– CESTA: Machine translation
• Spoken Language processing (3)– EVASY: Speech synthesis– MEDIA: Spoken dialog– ESTER: Speech transcription / automatic indexing
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
6
Sharing efforts on LT in EuropeSharing efforts on LT in Europe• LT well and naturally fitted with ERALT well and naturally fitted with ERA
– The EC would primarily support :• the coordination: management, standards, technology evaluation,
communication...
– Each Member State would primarily support the cost for covering its language(s):
• Language Resources (essential) : (annotated) corpus (spoken / written), lexicon (including pronunciations), dictionaries…
• Language specific technology development/adaptation
– EC and MS would support the cost of:• Developing core Language Technologies:
– Speech recognition, synthesis, understanding, spoken dialog, language tagging, parsing, analysis, generation, text retrieval, document understanding, machine translation, spoken translation, etc
• Developing innovative applications using HLT
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
7
Lang-Net proposalLang-Net proposal• Build-up ERA-Net proposal of infrastructural natureBuild-up ERA-Net proposal of infrastructural nature
– Language Resources, LT evaluation, Standards, Survey • Share of information• Strategic activities and Best Practices• Implementation of joint activities• Transnational research activities
– Partnership of EU countries or regions having LT programs• 11 countries / regions in partnership : Germany, France, Italy, Trento region,
Czech Republic, Denmark, Norway, The Netherlands / Belgium-Flanders (Dutch Language Union), Spain, Basque region, Sweden
• Austria, Catalonia, Finland, Greece, Iceland, Portugal, Switzerland, UK (contacts)
– Extendable to other partners• NMS (Slovenia, Cyprus, Poland, Hungary, Malta, Baltic countries…)• AS (Romania, Bulgaria…)• USA, Japan, South Africa, Israel, Canada… (contacts)
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
8
Situation at the ECSituation at the EC• DG Research (ERA-Net program)DG Research (ERA-Net program)
– Lang-Net proposal submitted in march 2005, not selected– Look forward for Thematic ERA-Net+ in FP7
• DG INFSO + MediaDG INFSO + Media– «Science & Technology Forum on Multilingualism»
• June 2005 and February 2006 in Luxembourg
– Visit of a French delegation to H. Forster & B. Smith (September 2005)
• DG Education, training, culture and mulDG Education, training, culture and multilingualismtilingualism– « A new framework strategy for multilingualism » (Nov. 2005)
• http://europa.eu.int/languages/ Web site in the 20 EU languages• EC will set up a High Level Group on Multilingualism• A EU ministerial conference will be held• Further communication will be presented by EC to Parliament and Council
– Committee of EU regions (official use of regional Spanish languages)
• TC-Star report : Introduction signed by V. Reding & J. FigelTC-Star report : Introduction signed by V. Reding & J. Figel
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
9
Situation at the ECSituation at the EC• MMemorandum for a emorandum for a DDigital Europe (submitted by France to Finnish igital Europe (submitted by France to Finnish
presidency)presidency)– Includes « LT for a Multilingual Europe » as a specific research topic
• EuropeaEuropeann DDigital igital LLibraryibrary– Stresses the multilingual (crosslingual ?) dimension and need for tools
• ENISA (European Network and Information Security Agency)ENISA (European Network and Information Security Agency)– Create a European multilingual information sharing and alert system
• CLARIN : Common Language Resources & Technology InfrastructureCLARIN : Common Language Resources & Technology Infrastructure– Labelled within ESFRI– Easy access to Language Resources and Technology for the Humanities
community– Well in agreement with the objective of coordinating activities and of settling a
necessary infastructure– But addresses only part of the needs:
• Considers only the Humanities scientific area, neither the ICT, nor the industrial ones• A network of research centers specialized in Humanities, not of national programs
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
10
Situation in FP7Situation in FP7• FP7 ICT program (2007-2013)FP7 ICT program (2007-2013)
– Technology pillar : Simulation, Visualization, Interaction, mixed realities• Tools for innovative design, and creativity in products, services and digital
media, and for natural, language-enabled and context-rich interaction and communication
– Workprogram WP1 (2007-2008)• Challenge 2 « Cognitive systems, interaction, robotics »
– Objective 2.1 « Cognitive systems, interaction, robotics »» Essentially oriented towards Cognitive robotics
• Challenge 4 « Digital libraries »– Multilingual (crosslingual ?) content, summarization…
• Strong MS reaction in favor of HLT at ISTC meeting (September 20, 2006) : France asked to add a second objective in Challenge 2 on interaction / LT
• Similar V3.0 draft WP content (November 17, 2006)
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
11
HLT in WP1 (main)HLT in WP1 (main). CHALLENGE 2 : COGNITIVE SYSTEMS, INTERACTION, ROBOTICS. CHALLENGE 2 : COGNITIVE SYSTEMS, INTERACTION, ROBOTICS
– Objective 3.2.1.1: Cognitive Systems, Interaction, Robotics
• Intuitive multimodal interfaces and interpersonal communication systems providing personalized interactivity in real-world and virtual environments, based on improved human interaction modelling and understanding of contextually-referred communication, for example, by signs and signals in all modes (such as sound, vision, touch) and modalities (such as natural language, both spoken and written), through autonomous adaptation and by addressing user needs, intentions and emotions.
• New markets such as novel functionalities for embedded systems and assistive systems for interpersonal communications, such as support of dynamic translation, and effective medical diagnostics and therapeutics.
• Explore and validate the use of new ways of combining statistical, knowledge driven and cognitive approaches to language understanding, generation, and translation, by machines.
• A principled approach to structuring research in relevant areas, addressing in particular learning in artificial systems, the requirements for cognitive capacities of robotic, interactive and language support systems, and including the development of experimental scenarios, the development or construction of resources for experimentation, and the development of performance metrics and definitions of autonomy levels for artificial systems.
• Co-ordination with related national or regional research programmes or initiatives.
• Indicative budget distributionIndicative budget distribution– 193 M€ (Call 1 [96 M€], Call 3 [97 M€])– CP 173 M€, NoE 16 M€, CSA 4 M€
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
12
HLT in WP1 (international cooperation)HLT in WP1 (international cooperation)
• Development-related ICT research exploitation and cooperation Development-related ICT research exploitation and cooperation roadmaps (3 sub-themes) roadmaps (3 sub-themes) – Sub-theme 1: « Language and speech technologies with particular focus
on Arabic-speaking regions / countries (including Mediterranean Partner Countries and ACP countries). The overall objective is to reduce language barriers and broaden access, usage and interaction between ICT services and applications. This preparatory action will focus on requirements and options for cost-effective natural language systems (written or spoken) in domains such as automated translation, information retrieval and indexing. It will also aim to reinforce collaboration with Arabic research communities on natural language processing (NLP) methods and benchmarking, including for language resources such as corpora and knowledge bases. »
• Indicative budget distributionIndicative budget distribution– 2 M€ (for the 3 sub-themes, one action per sub-theme) (CSA)
November 22, 2006 Multilingualism & Language Technology : a challenge for Europe
13
ConclusionsConclusions– Language Technologies needed for a Multilingual Europe,– Effort too large for the EC alone,– Programs exist in several EU Member States, at the EC and
in various countries worldwide,– Maybe the most adequate topic for the EC/MS cooperation
scheme, promoted in the construction of the European Research Area,
– Need to address permanent infrastructural issues and to install an experimental framework : Language Resources, Evaluation, Standards and Survey.
• A great opportunity & a grand challenge for EuropeA great opportunity & a grand challenge for Europe• Which is insufficiently present in WP1 of FP7 !!!Which is insufficiently present in WP1 of FP7 !!!