taus 2.0 and the game changers in localization (jaap van der meer, director of taus

11
TAUS Translation Data Landscape Report Authors: Andrew Joscelyne & Anna Samiotou Reviewer: Jaap van der Meer

Upload: taus-enabling-better-translation

Post on 15-Apr-2017

273 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS

TAUS Translation Data Landscape Report

Authors: Andrew Joscelyne & Anna SamiotouReviewer: Jaap van der Meer

Page 2: TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS

The report…

• was published in December 2015• has been written by TAUS in consultation with

the EU project LT Observatory supervised by LT Innovate

• has drawn insights through surveys of industry and interviews with a broad range of stakeholders

Page 3: TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS

The report attempts to answer to:

• Who are the producers and consumers of translation data? How are they changing?

• Is there a viable “market” for translation data, beyond the current informal sharing or web- scraping model?

• What can we do to overcome the legal/technical issues and concerns regarding translation data sharing?

• How could translation data sharing as a natural practice integrate with the European Digital Single Market program?

• Which models of translation data circulation work best? For how long? What could disrupt them?

Page 4: TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS

Methods to obtain Translation data

• Leveraging public and open resources• Creating one’s own resources by human, semi-

automatic or automatic means• Scraping the web by web crawling: Parallel text

collections to be used mainly by MT systems

• Sharing or exchanging data• Paying for data: Stakeholders will pay for translation data

when these are known to be uniquely valuable in terms of relevance and impact to the task at hand, are affordable and there is no other solution

Page 5: TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS

Translation data user types

Page 6: TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS

Scenarios for a Translation data Marketplace

• Datasets: Buy data, sell data, exchange data, bid for data, order data, offer specific in-domain translation data.

• Datasets & Tools: A commercial service for translation data together with multilingual enablers and tools that can provide fingerprints of the data, curate, benchmark, validate the quality and relevance of the data to the task at hand.

• Trained domain MT engines: Deliver in-domain translation engines

• Plug & play model: This is the current model used today for accessing a service in one go.

Page 7: TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS

Translation data provision models SWOT analysis 1/2

Page 8: TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS

Translation data provision models SWOT analysis 2/2

Page 9: TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS

How about a Translation data Marketplace?

Drivers: highly globalized market – providing translation data for reasonable price – allow for benchmarking prior to purchaseInhibitors: Using other peoples’ resources can be a blind guess – current lack of tools – imbalance of high & low resource languagesChallenges: enhance language coverage – address high risk of local markets being edged by global players and by plug & play technologies

Page 10: TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS

Impact of drivers and inhibitors

Page 11: TAUS 2.0 and the Game Changers in Localization (Jaap van der Meer, director of TAUS

Critical determinants of the way ahead

• We are at the beginning of the translation data age. • Content will be king and queen. • Innovation will be vital: many different competing

solutions will emerge for streamlining the value chain between raw data and specific translation requirements.

• The term “translation data” has two meanings:– we need the data to drive translation automation. – we also vitally need data about translation: find good

data about global data usage.