spanish language technology plan. david pérez fernández, cabinet of state secretary for...

29
Spanish Language Technologies Plan TAUSS Roundtable, Barcelona 12 May 2016

Upload: taus-enabling-better-translation

Post on 22-Jan-2018

332 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Spanish Language Technologies PlanTAUSS Roundtable, Barcelona

12 May 2016

Spanish Language Technologies Plan

1. Introduction

2. Plan for Language Technologies

3. LT Plan Road Map

4. Governance

5. Implementation

2

3

OPORTUNITIES

• High potential for internationalization of the Spanish language and

cooperation with Latin America.

• New public services for citizens and enterprises on strategic sectors (health,

justice, tourism, security, etc.).

• Strong market growth associated with innovation and development.

Introduction

4

STRENGTHS• Good governance of the Spanish language (RAE, ASALE, Cervantes Institute,

BNE).

• High research level in NLP, with a right coordination (SEPLN)

• Available linguistic resources in the Administration as a data source for the

industry and the research development (RISP open data policy).

Introduction

5

WEAKNESSES• SMEs don't reach enough industrial capacity for:

• Compete in the international market.

• Complete de value chain in Spain.

• Difficulties at the knowledge transfer from the research sector to the

Industry

Introduction

6

THREATS• Loss of economic and industrial competitiveness from Spain and Latin

America against other countries (United States).

• Digital underdevelopment of the Spanish language technologies and digital

extinction of the co-official languages.

• Researchers and specialized professionals brain drain and damage of the

Spanish research sector.

Introduction

Introduction

Conclusions:

• The sector of language technologies is an emerging cross sector with a capacity

to encourage growth, competitiveness and quality jobs.

• It’s development is unstoppable, but if we don’t take advantage of the

opportunity, other will occupy this site.

• Spain has the means, but it is necessary to drive and coordinate actions of

Public Administration collaboratively with the Autonomous Communities,

European Union and Latin America.

7

8

Kick off: Meeting SE, SS y AGE CIO

13/051/06

15/06 28/07

Setting up: Steering Committee and

Technical Secretary(SETSI)

Setting up Experts Committee

Presentation Experts Report

1

2

3

4

Preliminary study0

16/9

Proposal of the Plan form the steering

committee 5

Schedule of the development of the Plan

7/10

Approval of the Plan Steering Committee

6

Development of the Plan

Dirección técnica del

Plan

Dirección ejecutiva del Plan

Comité Directivo

Designación Comité de Expertos

Solicitud de Informe

Comité de Expertos

Secretaría Técnica (SETSI)

Coordinación

Presentación de Informe de

Expertos

Elaboración de Plan

Aprobación de Plan por el

Comité Directivo

Plan de Impulso Industria Lenguaje Natural en Español

1/jun/2015 15/jun/2015

15/sep/2015 30/oct/2015

13/may/2015

Acciones Órganos

ReuniónS.E., S.G. y DTIC

de la AGE

9

Presentación de Borrador de Informe de

Expertos

15/jul/2015

Comité Directivo + Secretaría Técnica

Comité de Seguimiento del Plan

Expert Committee: preliminary report to

the development of the Plan

10

Language Technologies Plan Action axis

11

1. Linguistic infrastructures development.

2. Boost of the language technologies industry

Improvement of the visibility and knowledge transference of the sector (from academy to industry).

Support for internationalization and commercialization of the sector.

3. Public Administration as a driver of Language Industry

Platforms for natural language processing and automatic translation in the public administrations.

Linguistic resources of the public administrations and reuse policy of public sector information (open ling data on RISP).

4. Flagship projects Health

Justice

Education

Tourism, Sectorial Monitoring, Digitised Heritage, etc.

http://www.agendadigital.gob.es/planes-actuaciones/Paginas/plan-impulso-tecnologias-lenguaje.aspx

Axis 1: Linguistic infrastructures development

12

Axis 1

Axis 3

Action1

Action2

Action2

Action1

Axis 2

• Linguistic infrastructure = resources + processors +

evaluation campaigns.

• They are the asset of the language industry.

• Elaborate and implement a linguistic infrastructure

development Plan of general purpose in Spanish and co-

official languages.

Budget: 30 M€.

Axis 1: Linguistic infrastructure development

Purposes:

• Boost NLP industry in Spanish and co-official languages.

• Improve public sector and industry LT innovation.

Actions:

• Elaborate and implement a linguistic infrastructure development Plan. Infrastructure governance and sustainability

• Technical standards for interoperability, license policies and mechanisms of personal data protection.

• Common tools for resource generation and evaluation.

• Facilitate the public access to linguistic infrastructure.

13

14

Axis 1

Axis 3

Action 1

Action 2

Action2

Action 1

Axis 2

Axis 2: Boost of the language technologies industry

• Action 1: Improvement of the visibility and knowledge transfer

between academy and industrial sector.

• Action 2: Support for internationalization and commercialization

of the sector.

Budget: 2 M€.

Axis 2.1: Sector visibility and transfer

Purposes:

• Improve the transference from the academic sector to the industry.

• Increase the visibility of the language technologies sector.

Actions:

• Improve the training (MOOCs, Training sessions to teachers). Promote studies (University Masters, specialised courses).

• Research support (viability of a Network of Centres of Excellence, aid programs for reasearch excellence).

• Promotion and detection of talent (hackatones, university sessions and youth olympics).

• Informative sessions: general (InfoDays), specific domains (Language Technologies applied to health).

• Enterprise register, products. Create a network of experts.

15

Axis 2.2: Internationalization and commercialization

Purposes:

• Improve the internationalization of the Spanish enterprises on this sector.

Actions:

• Participation in congresses and international events (LT-Summit, TAUSS, LREC, META-FORUM, etc.)

• Spanish participation in organizations and European research infrastructures (CEF, CLARIN, ELRA, META-NET).

• National congress and events (SEPLN, IODC, MWC).

• ICEX missions on language technologies.

• Latin america: Events Ibero-American Summit (SEGIB, AECID). Collaboration ASALE, BNE network. Domain specific.

• Others: commercial missions, MOUs, OFECOMES, Invest in Spain.

16

Axis 3: Administration as a driver of the Language

Industry

17

Axis 1

Axis 3

Action1

Action2

Action2

Action1

Axis 2

• Action 1: Development of platforms for natural language processing

and automatic translation in the public administrations.

Objectives:

- Promote advanced services to the citizen.

- Improve the Administration performance.

- CORA: reuse, simplify and achieve economies of scale.

- Improve the accessibility for people with special needs.

Budget: 4 M€.

Axis 3: Administration as a driver of the Language

Industry

Design and creation of a common platform of natural language processing and automatic translation for the Public Administration:

• Develop a scalable infrastructure oriented to interoperable multi-supplier components

• Maintain confidentiality guarantees of the public services.

• Add different components and linguistic resources to the linguistic processing flow with various models of licenses and execution scenarios.

• Multiple instances. Advanced distribution model of interconnected scalable components: extensible, light distribution, multi-cloud.

18

Axis 3.1: NLP platform

19

20

Eje 1

Eje 3

Línea 1

Línea 2

Línea2

Línea 1

Eje 2

Axis 3: Administration as a driver of the Language

Industry

• Action 2: Linguistic resources of the public administrations and reuse policy

of public sector information.

Objective:

- Within the framework of RISP policy: new line, open data of linguistic interest, to take advantage of the huge potential of the public sector information for the language industry.

(names, people and enterprises; place names; taxonomies; glossaries; multilingual vocabularies; translation memories; etc.)

Budget: 2 M€.

Flagship projects

Objectives:

• New public services or to improve the capacity and quality of the existing public services by the application of the language technologies.

• Facilitate the work of the Administrations in the internal treatment of the information and its use for defining and monitoring public policies.

• Demonstrate, in Spain and abroad, the capacities and benefits of language technologies.

• Generate reusable elements for other projects. • Immediate implementation of Plan cross actions; using general linguistic

infrastructure and common platforms.

21

Budget: 49 M€.

22

Flagship projects

Search criteria of flagship projects:

• Compromise of involved bodies. Ensure the leadership of who knows the

issue and have competences to solve it.

• Precision. Answer to already identified problems justifying the suitability

and the right time to start the Project.

• High economic and social impact.

• Generation of reusable resources.

• Stablish synergies with the other plan actions.

• Particular attention to the acquisition of experience for future projects.

Main projects WP2016 (I)

23

Flagship projects:

• Health 1: Electronic Medical Record processing (EMR)

• Health 2: Drug data sheets processing (FTM)

• Health 3: Phenotyping and genomics

• Justice: legal information processing

• Touristic intelligence

• Sectorial monitoring for innovation

• Digitized and online heritage

• Advanced attention to the citizen

• Education

Cross projects

• Linguistic infrastructure

• Natural language processing platform for public administrations

• Automatic translation platform for public administrations

Other actions

• Studies and strategies

• Internationalization

• Training

• Open data of linguistic interest

Initial projects 2016 (II)

24

New

ver

tica

ls

He

alth

Tou

rism

Edu

cati

on

Platform NLP y TA

Linguistic Infrastructure

General Resources

Domain Resources

Open Ling Data

Citizen service Innovation Investigation

Axis 1

Axis 3.1

Axis 1

Axis 4

Axis 3.2

Road map of the Plan

Axis 4

26

Co

mit

é D

irec

tivo

Co

mit

é d

e Ex

per

tos

Ofi

cin

a Té

cnic

a G

ener

al

G o

b e

r n

a n

z a

P

l a

n

Coordinador

Infr

aest

ruct

. Li

ngü

ísti

cas

WP1, WP2…

WP3, WP4…

Pla

tafo

rma

NLP

Pla

tafo

rma

TAV

igila

nci

a se

cto

rial

Turi

smo

OTG

Coordinador

San

idad

1

WP1, WP2…

WP3, WP4…

Coordinador

San

idad

2

WP1, WP2…

WP3, WP4…

San

idad OTG

OTG

OTG

Expertos NLP/TA + Ejecutivos + Administrativos

27

Real prototype

28

• Presidency: SETSI• State Secretaries and Sub-

secretaries • MAEC, MINHAP, MECD, MINETUR,

MPRE, MINECO, MSSSI, MJUSFuturo: Interior, Defensa …

• Strategic planning of the Plan. • Periodic evaluation about progress and impact of the

Plan. • Elect and remove members of experts committee. • Supervise proposal from OTG.

Steering Committee

Experts Committee

General Technical Bureau

Governance Members Functions

• Research sector.• Industrial sector.• Academic and institutional sector.• Technical from public sector.

• Technical advisory to Steering committee. • Mechanism of interaction with the sector.• Facilitate collaboration and exchange of experiences

and best practices.• Spreading the plan actions.

• Technical from SETSI and supporting staff:

• Legal profile/ Adm. • Executive profile. • NLP and AT technical

• Administrative and technical management of the projects. Interlocutor CD.

• Defining projects with vertical • Setting interoperability standards , license models,

etc.

Governance

Thank you