elrc country profiles - european language grid€¦ · collecting? •ideally, bi-or multilingual...

20
ELRC COUNTRY PROFILES LANGUAGE DATA SHARING IN EUROPEAN PUBLIC SERVICES Lilli Smal (DFKI)

Upload: others

Post on 18-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

ELRCCOUNTRY PROFILESLANGUAGE DATA SHARING IN EUROPEAN PUBLIC SERVICES

LilliSmal(DFKI)

Page 2: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

• TheEuropeanLanguageResourceCoordination• WhatisELRC?• TheactorsbehindELRC• Ourvision,missionandpurpose

• TheELRCCountryProfiles• Theidea• Mainfindingsexemplified• Outlook

OUTLINE:

|2|META-Forum|LilliSmal8/10/2019

Page 3: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

THE EUROPEAN LANGUAGE RESOURCE COODINATIONWhatisELRC?TheactorsbehindELRCOurvision,missionandpurpose

Page 4: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

WHAT IS ELRC?

• Servicecontract (SMART2015/1091)for the EuropeanCommission under the Connecting EuropeFacility(CEF)programme• Coordination project to collect language resources inallCEFcountries(EUMemberStates+Norway and Iceland)• Firstproject started inApril2015,the second project ends inFebruary 2020

|4|META-Forum|LilliSmal8/10/2019

Page 5: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

THE ACTORS BEHIND ELRC

ELRCconsortium• DFKI– GermanResearchCenter forArtificialIntelligence• ELDA– EvaluationsandLanguageResourcesDistributionAgency• ILSP– InstituteforLanguageandSpeechProcessing• Tilde– Language technologycompany

|5|META-Forum|LilliSmal8/10/2019

Page 6: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

THE ACTORS BEHIND ELRC

NationalAnchorPoints

|6

30ELRCTechnologyNationalAnchorPoints

30ELRCPublicServicesNationalAnchorPoints

ComposetheLanguageResourceBoard(LRB)-thegovernanceboardofELRC

|META-Forum|LilliSmal8/10/2019

Page 7: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

Vision:TrueDigitalSingleMarket

Mission:Createsustainabledatapipelines

Purpose: Identifyandcollectlanguageresources

|7|META-Forum|LilliSmal8/10/2019

Page 8: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

WHAT DATAARE WECOLLECTING?

• Ideally,bi- ormultilingualtextcorporaintmx format• Non-personalpublicsectorinformation(accordingtothePSIDirective2019/1024/EC)• “PublicSectorInformationisinformationgenerated,created,collected,processed,preserved,maintained,disseminated,orfundedbyorfortheGovernmentorpublicinstitution”(EuropeanDataPortal:AnalyticalReport9:TheEconomicBenefitsofOpenData,https://www.europeandataportal.eu/sites/default/files/analytical_report_n9_economic_benefits_of_open_data.pdf,2017,p.7.)

|8|META-Forum|LilliSmal8/10/2019

Page 9: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

ELRCCOUNTRY PROFILESTheideaMainfindingsexemplifiedOutlook

Page 10: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

Identification of mainchallenges

Publication of results

Languagedata creation andsharing infrastructure

COUNTRY PROFILES

Identification ofstakeholders

Actionplan

|10|META-Forum|LilliSmal8/10/2019

Page 11: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

EXAMPLE:NORWAY

Keystakeholders:• MinistryofCultureandsubordinated:• TheNationalLibraryofNorway• TheLanguageCouncilofNorway

• AgencyforPublicManagementandeGovernment(Difi)

8/10/2019 |META-Forum|LilliSmal |11

Page 12: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

Identification of mainchallenges

Publication of results

Languagedata creation andsharing infrastructure

COUNTRY PROFILES

Identification ofstakeholders

Actionplan

|12|META-Forum|LilliSmal8/10/2019

Page 13: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

EXAMPLE:NORWAY

8/10/2019 |META-Forum|LilliSmal |13

Page 14: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

Identification of mainchallenges

Publication of results

Languagedata creation andsharing infrastructure

COUNTRY PROFILES

Identification ofstakeholders

Actionplan

|14|META-Forum|LilliSmal8/10/2019

Page 15: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

EXAMPLE:NORWAY

Identifiedchallenges:• NotmuchlanguagedataavailableinNorwegianNynorskandevenlessparalleldata(EN– NO)• Languagedata ingeneral and translations specifically are notconsidered valuable apartfrom their inherent purpose• Lackof appropriate language data management structures• Uncertainties with respect to confidentiality and personaldata

8/10/2019 |META-Forum|LilliSmal |15

Page 16: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

Identification of mainchallenges

Publication of results

Languagedata creation andsharing infrastructure

COUNTRY PROFILES

Identification ofstakeholders

Actionplan

|16|META-Forum|LilliSmal8/10/2019

Page 17: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

• IncreasethenumberofbilingualresourcesinEnglish-Norwegian(BokmålorNynorsk),includingterminology• Collect/createaparallelcorpusinNorwegianBokmålandNynorsk• Raisingawarenessoflanguagedataasopendata• IncreaseinterestinMTinpublicservices• Establishgooddatamanagementpracticesinpublicservices• Tacklelegalconcerns

EXAMPLE:NORWAY

8/10/2019 |META-Forum|LilliSmal |17

Page 18: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

Identification of mainchallenges

Publication of results

Languagedata creation andsharing infrastructure

COUNTRY PROFILES

Identification ofstakeholders

Actionplan

|18|META-Forum|LilliSmal8/10/2019

Page 19: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

OUTLOOK

• PublicationofresultsfromallparticipatingcountriesinaWhitePaper• ResultswillbepresentedanddiscussedattheFourth ELRCConferenceon26/27NovemberinHelsinki• LayingthegroundworkforfurthercollaborationandapplicationforCEFfundedprojects

|19|META-Forum|LilliSmal8/10/2019

Page 20: ELRC COUNTRY PROFILES - European Language Grid€¦ · COLLECTING? •Ideally, bi-or multilingual text corpora in tmxformat •Non-personal public sector information (according to

THANK YOU FOR YOUR ATTENTION!

Website: www.lr-coordination.euTwitter: @LR_Coordination

Email: [email protected]