human language technologies & the european research area joseph mariani

13
Human Language Technologies Human Language Technologies & the European Research Area & the European Research Area Joseph Mariani Joseph Mariani Former Director, ICT Department, French Ministry Former Director, ICT Department, French Ministry of Research of Research & & LIMSI-CNRS LIMSI-CNRS

Upload: trilby

Post on 08-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Human Language Technologies & the European Research Area Joseph Mariani Former Director, ICT Department, French Ministry of Research & LIMSI-CNRS. LT for a Multilingual Europe. Language as a specific issue for Europe Economical, cultural and political challenge with 2 dimensions: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Human Language Technologies & the European Research Area Joseph Mariani

Human Language TechnologiesHuman Language Technologies

& the European Research Area& the European Research Area

Joseph MarianiJoseph MarianiFormer Director, ICT Department, French Ministry of ResearchFormer Director, ICT Department, French Ministry of Research

&&

LIMSI-CNRSLIMSI-CNRS

Page 2: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

2

LT for a Multilingual EuropeLT for a Multilingual Europe• Language as a specific issue for EuropeLanguage as a specific issue for Europe

– Economical, cultural and political challenge with 2 dimensions:• Preserve the EU Member States cultures

– Preference for native language (Web sites in German (75%)...)• Allow for communication across member states

– 50% of European citizens speak only one language (97% in Japan)– 1650 translators at the EC - 1.4 Mpages translated per year– 30% European Parliament budget (300 M€) – 500 translators– EU: 25 countries, 20 official languages / 380 language pairs

– Enormous cost for the EU, while mandatory– Need for the assistance of Language Technologies

• Huge effort (# LT * # languages), too large for the EC aloneHuge effort (# LT * # languages), too large for the EC alone• Effort should be shared with EU Member StatesEffort should be shared with EU Member States• Would meet the needs of the European Union, but would also Would meet the needs of the European Union, but would also

put Europe, and the Europeput Europe, and the European industry,an industry, in a strong position for in a strong position for providing tools for handling multilingualism worldwideproviding tools for handling multilingualism worldwide

Page 3: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

3

Building the European Research AreaBuilding the European Research Area• European Research Area (ERA)European Research Area (ERA)

– Need to coordinate EC (< 15%) and MS (> 85%) research efforts• The ERA instrumentsThe ERA instruments

– ERA-Net in FP6 (CA & SSA) to coordinate MS national / regional programs (specific action in DG-Research) : EC only funds coordination activities, not R&D

– ERA-Net+ in FP7 (CSA) to also coordinate with EC programs (thematic action) : EC may also fund R&D activities (?)

– Article 169 to coordinate EC+MS+industrial efforts• Needs a joint European Council and Parliament decision• Single experience in infectious diseases (200 M€ * 3= 600 M€)• Topics evocated for FP7: SMEs, research in Baltic sea, Metrology…

– ESFRI (European Strategy Forum on Research Infrastructure)• LT well and naturally fitted with ERALT well and naturally fitted with ERA

– Coordinate the national / regional efforts, mostly devoted to national / regional languages, with the EC effort, mostly addressing the multilingual dimension and the general coordination

– Show a major value added by EU, in full agreement with subsidiarity principle

Page 4: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

4

Support to LT in France : Techno-langueSupport to LT in France : Techno-langue

• Report to the Prime Minister (November 2000)Report to the Prime Minister (November 2000)– Need to develop Language Technologies for the French language

• Techno-langue Action launched in 2002Techno-langue Action launched in 2002– Basic Technological Research (RTB)– Articulate with related existing programs (RRIT)

• Funded by 3 ministries :Funded by 3 ministries :– Research, Industry, Culture

• Call for ProposalsCall for Proposals– Up to 3-year projects (2003-2006)– Set up an infrastructure to conduct research in LT for French

• Language Resources (Data / Tools)• Evaluation (Technology / Applications)• Standards• Technological survey

Page 5: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

5

Funded projectsFunded projects• Budget: 20 M€ effort - 7.5 M€ public funding (over 3 years)Budget: 20 M€ effort - 7.5 M€ public funding (over 3 years)• 94 participants (industry, research, public agencies, foreign)94 participants (industry, research, public agencies, foreign)• 21 funded projects:21 funded projects:

– 10 on Language Resources (data and tools)– 2 on Standards (Spoken / Written)– 1 on Technological survey (Portal) : http://www.technolangue.net– 8 on Technology Evaluation (campaigns)

• Written language processing (5)– EASY: Syntactic parsing– ARCADE 2: Text alignment– CESART: Terminology extraction– EQUER: Information query– CESTA: Machine translation

• Spoken Language processing (3)– EVASY: Speech synthesis– MEDIA: Spoken dialog– ESTER: Speech transcription / automatic indexing

Page 6: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

6

Sharing efforts on LT in EuropeSharing efforts on LT in Europe• LT well and naturally fitted with ERALT well and naturally fitted with ERA

– The EC would primarily support :• the coordination: management, standards, technology evaluation,

communication...

– Each Member State would primarily support the cost for covering its language(s):

• Language Resources (essential) : (annotated) corpus (spoken / written), lexicon (including pronunciations), dictionaries…

• Language specific technology development/adaptation

– EC and MS would support the cost of:• Developing core Language Technologies:

– Speech recognition, synthesis, understanding, spoken dialog, language tagging, parsing, analysis, generation, text retrieval, document understanding, machine translation, spoken translation, etc

• Developing innovative applications using HLT

Page 7: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

7

Lang-Net proposalLang-Net proposal• Build-up ERA-Net proposal of infrastructural natureBuild-up ERA-Net proposal of infrastructural nature

– Language Resources, LT evaluation, Standards, Survey • Share of information• Strategic activities and Best Practices• Implementation of joint activities• Transnational research activities

– Partnership of EU countries or regions having LT programs• 11 countries / regions in partnership : Germany, France, Italy, Trento region,

Czech Republic, Denmark, Norway, The Netherlands / Belgium-Flanders (Dutch Language Union), Spain, Basque region, Sweden

• Austria, Catalonia, Finland, Greece, Iceland, Portugal, Switzerland, UK (contacts)

– Extendable to other partners• NMS (Slovenia, Cyprus, Poland, Hungary, Malta, Baltic countries…)• AS (Romania, Bulgaria…)• USA, Japan, South Africa, Israel, Canada… (contacts)

Page 8: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

8

Situation at the ECSituation at the EC• DG Research (ERA-Net program)DG Research (ERA-Net program)

– Lang-Net proposal submitted in march 2005, not selected– Look forward for Thematic ERA-Net+ in FP7

• DG INFSO + MediaDG INFSO + Media– «Science & Technology Forum on Multilingualism»

• June 2005 and February 2006 in Luxembourg

– Visit of a French delegation to H. Forster & B. Smith (September 2005)

• DG Education, training, culture and mulDG Education, training, culture and multilingualismtilingualism– « A new framework strategy for multilingualism » (Nov. 2005)

• http://europa.eu.int/languages/ Web site in the 20 EU languages• EC will set up a High Level Group on Multilingualism• A EU ministerial conference will be held• Further communication will be presented by EC to Parliament and Council

– Committee of EU regions (official use of regional Spanish languages)

• TC-Star report : Introduction signed by V. Reding & J. FigelTC-Star report : Introduction signed by V. Reding & J. Figel

Page 9: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

9

Situation at the ECSituation at the EC• MMemorandum for a emorandum for a DDigital Europe (submitted by France to Finnish igital Europe (submitted by France to Finnish

presidency)presidency)– Includes « LT for a Multilingual Europe » as a specific research topic

• EuropeaEuropeann DDigital igital LLibraryibrary– Stresses the multilingual (crosslingual ?) dimension and need for tools

• ENISA (European Network and Information Security Agency)ENISA (European Network and Information Security Agency)– Create a European multilingual information sharing and alert system

• CLARIN : Common Language Resources & Technology InfrastructureCLARIN : Common Language Resources & Technology Infrastructure– Labelled within ESFRI– Easy access to Language Resources and Technology for the Humanities

community– Well in agreement with the objective of coordinating activities and of settling a

necessary infastructure– But addresses only part of the needs:

• Considers only the Humanities scientific area, neither the ICT, nor the industrial ones• A network of research centers specialized in Humanities, not of national programs

Page 10: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

10

Situation in FP7Situation in FP7• FP7 ICT program (2007-2013)FP7 ICT program (2007-2013)

– Technology pillar : Simulation, Visualization, Interaction, mixed realities• Tools for innovative design, and creativity in products, services and digital

media, and for natural, language-enabled and context-rich interaction and communication

– Workprogram WP1 (2007-2008)• Challenge 2 « Cognitive systems, interaction, robotics »

– Objective 2.1 « Cognitive systems, interaction, robotics »» Essentially oriented towards Cognitive robotics

• Challenge 4 « Digital libraries »– Multilingual (crosslingual ?) content, summarization…

• Strong MS reaction in favor of HLT at ISTC meeting (September 20, 2006) : France asked to add a second objective in Challenge 2 on interaction / LT

• Similar V3.0 draft WP content (November 17, 2006)

Page 11: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

11

HLT in WP1 (main)HLT in WP1 (main). CHALLENGE 2 : COGNITIVE SYSTEMS, INTERACTION, ROBOTICS. CHALLENGE 2 : COGNITIVE SYSTEMS, INTERACTION, ROBOTICS

– Objective 3.2.1.1: Cognitive Systems, Interaction, Robotics

• Intuitive multimodal interfaces and interpersonal communication systems providing personalized interactivity in real-world and virtual environments, based on improved human interaction modelling and understanding of contextually-referred communication, for example, by signs and signals in all modes (such as sound, vision, touch) and modalities (such as natural language, both spoken and written), through autonomous adaptation and by addressing user needs, intentions and emotions.

• New markets such as novel functionalities for embedded systems and assistive systems for interpersonal communications, such as support of dynamic translation, and effective medical diagnostics and therapeutics.

• Explore and validate the use of new ways of combining statistical, knowledge driven and cognitive approaches to language understanding, generation, and translation, by machines.

• A principled approach to structuring research in relevant areas, addressing in particular learning in artificial systems, the requirements for cognitive capacities of robotic, interactive and language support systems, and including the development of experimental scenarios, the development or construction of resources for experimentation, and the development of performance metrics and definitions of autonomy levels for artificial systems.

• Co-ordination with related national or regional research programmes or initiatives.

• Indicative budget distributionIndicative budget distribution– 193 M€ (Call 1 [96 M€], Call 3 [97 M€])– CP 173 M€, NoE 16 M€, CSA 4 M€

Page 12: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

12

HLT in WP1 (international cooperation)HLT in WP1 (international cooperation)

• Development-related ICT research exploitation and cooperation Development-related ICT research exploitation and cooperation roadmaps (3 sub-themes) roadmaps (3 sub-themes) – Sub-theme 1: « Language and speech technologies with particular focus

on Arabic-speaking regions / countries (including Mediterranean Partner Countries and ACP countries). The overall objective is to reduce language barriers and broaden access, usage and interaction between ICT services and applications. This preparatory action will focus on requirements and options for cost-effective natural language systems (written or spoken) in domains such as automated translation, information retrieval and indexing. It will also aim to reinforce collaboration with Arabic research communities on natural language processing (NLP) methods and benchmarking, including for language resources such as corpora and knowledge bases. »

• Indicative budget distributionIndicative budget distribution– 2 M€ (for the 3 sub-themes, one action per sub-theme) (CSA)

Page 13: Human Language Technologies & the European Research Area Joseph Mariani

November 22, 2006 Multilingualism & Language Technology : a challenge for Europe

13

ConclusionsConclusions– Language Technologies needed for a Multilingual Europe,– Effort too large for the EC alone,– Programs exist in several EU Member States, at the EC and

in various countries worldwide,– Maybe the most adequate topic for the EC/MS cooperation

scheme, promoted in the construction of the European Research Area,

– Need to address permanent infrastructural issues and to install an experimental framework : Language Resources, Evaluation, Standards and Survey.

• A great opportunity & a grand challenge for EuropeA great opportunity & a grand challenge for Europe• Which is insufficiently present in WP1 of FP7 !!!Which is insufficiently present in WP1 of FP7 !!!