2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

40
Translation and Crowd Sourcing: Opportunity or Heresy? Professional translators’ attitudes towards massive collaboration Alain Désilets Conseil national de recherches du Canada

Upload: alain-desilets

Post on 09-May-2015

2.058 views

Category:

Technology


0 download

DESCRIPTION

Alain Désilets's talk at the Translation CrowdSourcing workshop organized by University of Maryland in June 2010

TRANSCRIPT

Page 1: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Translation and Crowd Sourcing: Opportunity or Heresy?

Professional translators’ attitudes towards massive collaboration

Alain Désilets

Conseil national de recherches du Canada

Page 2: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

“The most reliable way to forecast the future is to try to understand the present.”

“Trends, like horses, are easier to ride in the direction they are going.”

-- John Naisbitt

"You have to talk to [customers], watch them; this is the only way to understand their interests, their motives, their needs".

-- Donald Norman

Page 3: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Observing Translators

Multi-disciplinary project that includes technology researchers from NRC and translation studies researchers from Université du Québec en Outaouais.

Contextual Inquiry: well known and tried technique in Human Computer Interaction for learning about end users.

• Mix between observation and interviewing.• Observe potential end users while they work.• Ask them to think aloud.• Interrupt with lots of questions.• Use Qualitative and Quantitative data Analysis to make sense of

what you witnessed.

Page 4: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

25 subjects

Organization type large (250+ employees) LSPs (13), medium (<30 employees) LSPs (6), freelance (2), academic (2), amateur(2)

Type of work “conventional” translation (15), MT-Post Editing (8), Revision (2)

Language pairs English -French (13),English-Spanish (6), English-Japanese (2) , Portuguese-Spanish (1), Chinese-English (1), English+Italian-Estonian (1), English-Inuktitut (1)

Years of experience Ranged from < 2 years up to 20+ years.

Source text domain Aboriginal affairs, Municipal affairs, Public administration, Education, Legal, Health, Software manual, Politics, Job offers.

Source text length Min: ~20, Max:7000

Country Canada (15), Europe (5), US (3), Japan (2)

Professional translators All but 2.

Page 5: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Preaching the Wiki Word to Translators

Involved in world of wikis since 2002 • Chaired WikiSym conference in 2007 (Montreal)

Have been telling professional translators about wikis since 2006• Keynote at Translating and the Computer 2007: “Translation Wikified”.• Organized a workshop and panel on those topics.

Co-implemented wiki-based tools to support translation work• Cross lingual wiki engine: translate in a wiki context where pre-

conditions of traditional translation workflows do not apply (ex: master language).

• Tiki-CMT: TikiWiki module to support Collaborative Multilingual Terminology work.

Page 6: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Talk Outline

Are professionnal translators technology averse?

Professional translators attitudes and workpractices with respect to:

• collaboratively built linguistic resources• collaborative translation and crowdsourcing

Please interrupt with questions at any point!

Page 7: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Are professional translators technology averse?

Page 8: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Translation Problems

Translators use a lot of technology when trying to resolve translation problems.

Translation problem =any source language word or expression which

presents a difficulty for a human translator (not machine) during the process of translation.

Term, idiomatic expression, named entity, etc...

Page 9: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Tools, Tools, more Tools!

private (to the individual) lexicons built using simple office suites (ex: Excel spreadsheets, MS-Word documents)

1 large, public general purpose bi-text (TransSearch) private (to the individual) or institutional Translation

Memories built with 3 different products (Trados, Multitrans, LogiTerm)

2 private (to the individual) or institutional, unaligned archives of previous translations, either stored in a database or the file system

9 unilingual general purpose dictionaries (Multidictionnaire, Petit Robert, Merriam-Webster, Dictionnaire des cooccurrences, dictionary.references.com, Canadian Oxford, Trésor de la langue française, www.dictionary.com, urban dictionary)

2 unilingual thesauri (Dictionnaire analogique, Dictionnaire des synonymes de l'Université de Caen)

2 unilingual specialized dictionaries and lexicons (Dictionnaire de droit québécois, Lexique des noms géographiques)

3 bilingual dictionaries (LexibasePro, René Merteens, Robert & Collins)

the source text being translated, as well as its partial translation

bilingual documents related to the source text (ex: minutes of meetings being discussed in the source text)2 instances of the client's Web sites

2 large, bilingual Web sites not directly related to the domain of the source text (gc.ca domain, Canadian Broadcasting Corporation)

3 large, bilingual Web sites directly related to the domain of the source text (CanLII, Canadian Federal Court, University of Ottawa)

the whole Web in the source or target language (mined using Google search engine)

2 manuals of style (Guide du rédacteur, Le Ramat de la typographie)

2 spell and grammar checkers (MS-Word, Antidote) 1 database of newspaper articles in the target language

Average of 10 resources in our subjects’ toolboxes!!!

Page 10: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Translators use a wide range of tools when resolving translation problems.

Vast majority of those are electronic.

Tools, Tools, more Tools! (2)

p < 0.001

Page 11: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Adoption of Corpus-Based Tools

• Both type equally used.• Corpus-based tools have made it into the mainstream.• But they have not displaced Termino-lexicographic tools.

Termino-lexicographic =

• Dictionary

• Terminology Database

• Lexicon, etc.

Corpus-based =

• Translation Memory

• Bilingual web site, etc.

p > 0.05

Page 12: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Advanced Google Use

Translators are among the world’s most advanced Google users.

• They know the advanced syntax, and expect it in most search tools they use.

• They use Google in various ways to mine the web-as-a-corpus– Ex: search bilingual sites for solutions, assessing

usage in target language of particular solutions.

Page 13: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Searching Bilingual Sites

Page 14: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Searching Bilingual Sites (2)

Page 15: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Hot Buttons

That said, translators strongly resist technology that either:

• disrupts the fair compensation equation , or• exerts downward pressure on quality of end product

Translation crowdsourcing is likely to press on both these buttons.

Page 16: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Fair Compensation Equation

Translators are paid by the word.Technology that increases productivity exerts strong downward pressure

on per-word ratePressure is not always commensurate with actual productivity gain.

Example:• 10 words sentence with 80% fuzzy match level. • Should translator only get paid for two words? • Eventhough she still has to read the whole sentence...• ... and may have to change the rest of the sentence to make it work

with the translation of those 2 words?

Once a new fair equilibrium has been reached, this initial resistance may go away.

Page 17: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Lowering Quality?

TranslatorsCraftspeople who take pride in the quality of their end product.Quality = original sense is rendered, AND translation reads as

though it was an original text written by a native speaker.

CustomersTranslation = cost center, not part of their core businessCan’t always tell quality when they see it, nor measure clear link

between translation quality and bottom line. liable to introduce cost-reducing technologies without realizing

impact on quality.

Page 18: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Attitudes and workpractices with respect to collaboratively built linguistic resources

Page 19: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Is Wikipedia Useful for Translators?

We witnessed very little use of Wikipedia in our translator observation.

On a few occasions, subjects consulted Wikipedia to get background information on a particular concept, but never to get a solution to a terminology difficulty.

Analysis conducted in June 2007 indicates that coverage of typical terminology difficulties may be insufficient for the later task (finding equivalents).

Page 20: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Is Wikipedia Useful for Translators (2)?

WikiPedia Wiktionary TERMIUM

Has English entry 71.4% 47.6% 80.1%

Has English entry in right sense

57.1% 45.2% 76.2%

Has French equivalent 33.3 % 35.7% 76.2%

Has French equivalent in correct sense

26.1 % 33.3% 76.2%

Wikipedia’s coverage of 42 observed terminology problems (June 2007)

Note: TERMIUM = Terminology DB of the Gov. Of Canada.

Page 21: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Is Wikipedia Useful for Translators (3)?

Evolution of translators attitudes towards Wikipedia and wikis:

4 yrs ago: “Wikipedia, what’s that?”

3 yrs ago: “I know about Wikipedia and I think it’s crap because any clown can write to it.”

2 yrs ago: “You know, Wikipedia is surprisingly good and I use it all the time in my work now.”

1 yr ago: “This collaborative, wiki stuff is bound to be important for translation, but I am not sure how best to leverage it.”

Page 22: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

What Makes a Good Source?

Professional Translators are trained to:• only use trusted sources.• focus on sources that are specialised for their domain or client.• never use content that may have been translated, or written by

non-native speakers.

In theory, that would rule out most collaborative sources.

In practice, translators are pragmatic and will consult sources that do not meet those criteria when necessary.

Page 23: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Use of Public Sources

• Our subjects used significantly more public resources.• Many of them available for free (ex: customer’s web site).• Caveat: Situation is different for highly repetitive, technical translation.

p < 0.001

Public =

Anyone can access, possibly at a fee.

Private =

Only accessible to certain translators (ex: those working for particular employer)

Page 24: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Use of General Sources

• Our subjects used significantly more multidomain resources.• Seemed to prefer casting a wide net, and then sift the results.• Caveat: again here, situation is different for highly repetitive, technical

translation.

p < 0.05

Multidomain =

Covers multiple domains, and subject searched it without restricting domain

Single domain =

Covers single domain, or, covers many, but subject restricted search by domain

Page 25: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Use of Translated Material

Our subjects frequently searched in bilingual Canadian sites for French equivalents.

Estimated 75% of French content on those sites was translated.

Thus, in 75% of the case, this strategy ended up yielding solutions taken from translated material.

Frowned upon in Terminology, and, to a lesser extent in Translation.

But our subjects did it anyway. Here’s why…

Page 26: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Translator Jugement

Subjects exercised a lot of critical judgment w.r.t to resources.• Did not blindingly trust any source, even highly reputed ones like

TERMIUM (Terminology DB of the Gov. of Canada).• In 35% of the cases, searched in a second resource, after finding

some relevant information.• Subjects adept at rapidly scanning list of suggestions and sifting

grain from chaff.• Problem Coverage (i.e. probability that at least one relevant

solution found in top 10), seemed more important than Precision (i.e. probability that a proposed solution is relevant).

• Recall (i.e. percentage of all relevant solutions that is actually proposed by the resource) also seemed important, but to a much lesser degree.

Page 27: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Resources Quality Control

• Our subjects preferred more tightly controlled resources.• But still made non-negligible use of Moderately controlled ones (38% of all

consultations).• Almost no use of completely Open resources.

Tight =

Carefully crafted and revised (linguists, terminologists, revisers). Ex: TERMIUM.

Moderate =

Comes from reputed organizations, but may not be as carefully crafted and revised. Ex: Gov of Canada web sites.

Open =

Could have been produced by anyone. Ex: the whole web.

p < 0.05

p < 0.001

Page 28: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Write Access

• Our subjects were predominantly consumers of resources, as opposed to contributors.• Many comments about lack of time to contribute.• But in most collaborative resources, only need a small percentage of

contributors.

Read-only =

Subject cannot write, or can only do so through an intermediary. Ex: TERMIUM

Read-Write =

Subject can write directly without an intermediary. Ex: subject’s own lexicon.

p < 0.001

Page 29: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Use of collaborative sources is not that common yet, but growing.

Collaborative resources go against the grain of some translator attitudes, but nothing that can’t be surmounted.

Need to address perception of quality and trustworthiness.

Cannot expect majority of translators to contribute.

Page 30: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Attitudes and workpractices with respect to collaborative translation

Page 31: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Flavours of Collaborative Translation

In increasing order of controversy:

Translation teamwareAllow multidisciplinary teams of translators, terminologists, customers,

domain experts to collaborate efficiently on a translation project.

Online market place for translatorsE-bay like platforms for connecting customers and translators with

minimal intervention by a middle man.

Translation crowdsourcingMechanical Turk style platform for distributing translation projects

across large crowds of mostly amateur translators.

Page 32: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Translation Teamware

Allow multidisciplinary teams of translators, terminologists, customers, domain experts to collaborate efficiently on a translation project.

• Relatively uncontroversial.• Many commercial translation workflow products are along those

lines, but follow a somewhat assembly-line model.• More resistance to wiki-like platforms that breakdown barriers and

open up horizontal communication channels– Ex: Customer seeing early drafts of translations, and commenting on

them.– Translators like to (need to?) stay in their own bubble.– Fear of undue interference by non-qualified staff.– But starting to see more and more case studies of this (ex: using

BaseCamp or wikis to coordinate translation teams)

Page 33: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Online Market Place for Translators

E-bay like platforms for connecting customers and translators with minimal intervention by a middle man.

– Ex: ProZ, Translated.net

Usually includes – automatic reputation management.– free, open resources for translators (ex: Kudoz, MyMemory).

Somewhat controversial:– Some freelancers perceive it as empowering (cut out the middle

man).– Others perceive it as an impersonal “Wallmart of translation”, i.e.

something that encourages , low-quality translation.

Page 34: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Translation Crowdsourcing

Mechanical Turk style platform for distributing translation projects across large crowds of mostly amateur translators.

This is REALLY controversial.• So far, only heard one professional translator say that this is a

good thing.

Disrupts the fair compensation equation AND exerts downward pressures on quality.One crowdsourcing vendor quotes average of $0.0008/word (vs $0.25-

0.30/word for your average professional translator).Translating out of context is known to be error-prone.Amateur translators tend to produce texts that read like translations.

Page 35: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Translation Crowdsourcing (2)

One hopes that CrowdSourcing technology will be used wisely and in a way that continues to leverage professional translator skills, for example:- Crowdsourcing used mostly for low-stake or user-generated content that

is currently not being translated at all.- Professionals continue to play a pivotal role, by revising translations

produced by the crowd and paying special attention to amateurs’ main weakness: native-sounding translation.

But we, researchers and developers, cannot guarantee that this is how things will unfold.

We need to be sensitive to those issues while we build the future of translation crowdsourcing.

Page 36: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Conclusions

Page 37: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Conclusions

• Professional translators are NOT technology averse, but they will resist technologies that disrupt the fair compensation equation, or exert downward pressures on quality.

• Crowdsourcing of large linguistic resources is compatible with the views of professional translators, although it is not yet part of their mainstream work practices.

• Also non-controversial, is the use of online collaboration to facilitate team coordination, or to create “fair” marketplaces for freelance translators.

• Crowd-sourcing of translation on the other hand is very controversial in translator circles, and we need to be sensitive to that issue in building and designing translation crowdsourcing environments.

Page 38: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Questions?

Page 39: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44

Thank you for your attention.For more details…

Alain Désilets

National Research Council of Canada

[email protected]

Page 40: 2010 06-u maryland-crowd_sourcing-workshop-v2010-06-16-10h44