mt and post-editing user-generated content amta 2014

42
MT and Post-Editi User Generated Content (UGC) Elaine O’Curran Language Tools Team Welocalize

Upload: welocalize

Post on 29-Jun-2015

289 views

Category:

Business


11 download

DESCRIPTION

Welocalize Elaine O’Curran presented MT and Post-Editing User-Generated Content at AMTA 2014 in Vancouver. The October 2014 presentation highlights machine translation and UGC. As more and more content is produced, there is a growing demand for translation of this content type using language automation tools. Association Machine Translation of Americas

TRANSCRIPT

Page 1: MT and Post-Editing User-Generated Content AMTA 2014

MT and Post-EditingUser Generated Content (UGC)

Elaine O’CurranLanguage Tools TeamWelocalize

Page 2: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Table of Content What is UGC? What does it look like? Challenges for Localization How useful is MT for UGC? Evaluation Methods Some Results Localization Strategies Resourcing Strategies Quality Assurance Conclusions Q & A

Page 3: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

What is UGC?Definitions

Since the Web evolved from static pages to dynamic websites in the Web 2.0 era, many websites include content created by visitors.

A 2007 report* by the OECD, defines UGC as i) content made publicly available over the Internet, ii) which reflects a certain amount of creative effort, and iii) which is created outside of professional routines and practices.

Examples of UGC can be found on web forums, wikis, social networking sites, blogs, travel sites, C2C online marketplaces, etc.

*Participative Web: User-Created Content

Page 4: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

What is UGC?• Contracts• Patents• Annual Reports• Light Marketing• Software Documentation• Software User Interface• SEO (Search Engine Optimization)• e-Learning Content • User Guides• Internal Corporate Communications• Knowledge Bases• Proposals / Draft Applications• User Generated Content Most of our clients have some UGC in their

content portfolio

Page 5: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

What is UGC?Impact

User generated content such as customer reviews, web forums and blogs have become a major influence in peoples’ buying decisions.

Studies show that online consumer reviews are the second most trusted form of advertising after word-of-mouth.

UGC in the form of support forums reduce the cost of supporting customers.

Page 6: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

What does it look like?Travel Reviews

Yellow= UGCGreen= Web UI

Page 7: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

What does it look like?Technical User Forums Yellow= UGC

Green= Web UI

Page 8: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

What does it look like?Online marketplaces for buyers and sellers

Yellow= UGCGreen= Web UI

Page 9: MT and Post-Editing User-Generated Content AMTA 2014

Challenges for Localization

Page 10: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Challenges for LocalizationCharacteristics of UGC

authored by non-professionals and/or non-native speakers

often similar pattern to oral speech

sometimes authored by power users / techies

often highly perishable content

multitude of authors / lexical and stylistic diversity

(Roturier & Bensadoun, 2011)

Page 11: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Challenges for LocalizationMultitude of authors / lexical and stylistic

diversity

Page 12: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Challenges for LocalizationClassification of non-standard input Short forms (nite (night), sayin (saying), gr8 (great)),

Acronyms (lol (laugh out loud), iirc (if I remember correctly)),

Typing errors/misspellings (wouls (would), rediculous (ridiculous)),

Punctuation omissions/errors (im (I’m), dont (don’t)),

Non-dictionary slang (that was well mint (that was very good)),

Wordplay (that was soooooo great (that was so great)),

Censor avoidance (sh1t, f***),

Emoticons (:) (smileys), <3 (heart))

Foreign words used intentionally (al dente, bon voyage)

(Jiang et al, 2012; Clark & Araki, 2011)

Page 13: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Challenges for Localization

ExamplesEmoticons, typing errors, missing punctuation, grammar errors, authored by power user ‘techies’, slang, highly technical terms… this is our source text

Page 14: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Challenges for LocalizationExamplesNon-native writers, typos, grammar errors, two authors with completely different styles & opinions, idioms don’t make sense … this is our source text

Page 15: MT and Post-Editing User-Generated Content AMTA 2014

UGC & MT

Page 16: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

How useful is MT for UGC?How useful is MT actually for UGC?

Raw MT is most often used to publish UGC into other languages

Utility scoring can be used to measure the quality of raw MT

o It rates the comprehensibility & utility of the output

Automatic confidence scoring can also be used to measure the quality of raw MT

Some MT research results for UGC indicate that around 50% of comments, reviews are considered comprehensible

Efforts are focused on normalization and preprocessing steps of UGC in order to improve MT output for this content type

Page 17: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

How useful is MT for UGC?Normalization

Normalization is the manual or automated process of taking non-standard input, which are not recognized by MT engines, and pre-translating them using scripts, regular expressions and other processes in order to make the source text more ‘normal’ before machine translation.

Page 18: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

How useful is MT for UGC?Examples of Normalization from TripAdvisor

Page 19: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

How useful is MT for UGC?Utility Scoring is a human evaluation method used to rate how understandable

and usable the raw MT output is. A score of 3 or higher is usually considered a pass and means the content is publishable.

Scoring Definitions5 The document is understandable and actionable. Nearly all of the text is well translated. That is to say that

you perfectly understood the document context, such as comprehending a property description, or a travel review.

4 The document is understandable and actionable. Most of the text is well translated. That is to say that you properly understood the document context, such as comprehending a property description, or a travel review.

3The document is not entirely understandable, but it is actionable. The text is stylistically and grammatically odd. Some of the text is well translated. That is to say that the text contains many errors but you are still able to extract from it basic context, such as comprehending essential aspects of a property description, or a travel review.

2The document is possibly understandable and actionable given enough context and/or time to work it out. That is to say that the text contains many errors and it is difficult to extract from it basic context, but given a lot of time, it could be deciphered to comprehend aspects of a property description, or a travel review.

1 The document is not understandable and it is impossible to understand the information it contains.

Page 20: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

How useful is MT for UGC?

Some studies of raw MT evaluation results for UGC found that on average around 50% of the posts are comprehensible (Roturier & Bensadoun, 2011; Mitchell & Roturier, 2012).

In 2013, we saw similar results on our first large scoring exercise for UGC in the travel domain

We used the utility scoring method and found between 26% and 50% of reviews were somewhat understandable or actionable.

Evaluations of Raw MT

Indonesian Korean Thai Traditional Chinese

40.66% 50.33% 48.667% 26.33%

Page 21: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

How useful is MT for UGC? In 2014, we performed new evaluations after the engine owner

implemented normalization processes and further customizations of the engines.

We found that between 54% and 96% of travel reviews scored between 3 and 5 on the Utility scale.

EN-DE EN-NL EN-IT EN-ES EN-PL EN-DA EN-JP EN-EL EN-PT EN - TH EN - KO EN - ID EN - ZH0%

10%20%30%40%50%60%70%80%90%

100%

84% 85%

73%

89% 85%96%

56%

85% 87%

55% 58%54%

65%

Page 22: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

How useful is MT for UGC? In 2014, we trained an SMT engine and performed utility scoring on

technical forum content in the automotive domain

We found that between 44% and 68% of cases scored between 3 and 5 on the Utility scale.

de-DE fr-FR de-EN fr-EN0%

10%20%30%40%50%60%70%80%90%

100%

68% 64%

44%50%

Page 23: MT and Post-Editing User-Generated Content AMTA 2014

Localization Strategies

Page 24: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Localization Strategies

The bulk of UGC is published with raw MT, but each customer determines the impact that the UGC has on the business and brand, and this drives the content selection for post-editing.

We generally post-edit the UGC content which is expected to deliver information – forums, reviews, knowledge bases - and only the highest visibility part of this content, or content that meets certain criteria, such as a high number of visits or clicks.

Google does not index content that is identified as machine translated, and as a consequence machine translated content cannot be found in Google searches. This can also be a crucial factor in deciding which content to post-edit.

Content Selection

Page 25: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Localization Strategies

There are exceptional scenarios where UGC qualifies as high-end, high-visibility content and where a translation approach for Marcom or Transcreation is required.

Human Translation for High-end UGC

LEVELS OF TRANSLATION/PE TYPICAL CONTENT TYPES STRATEGY

HUMAN TRANSLATION

If the objective of the UGC content is to trigger “emotional impact” it is best not to post-edit such content at all and go with human translation, examples of such content is usually “new marketing” - CEO blogs, first page product reviews, tweets - anything that’s expected to convey the company image.

Allow the author’s unique voice and personality to shine through and avoid using corporate tone. The aim is to maintain the emotional core of the message.

The main differentiator from our other content types is that this content is C2C, not B2C or B2B.

Page 26: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Localization StrategiesLevels of Post-editing

LEVELS OF POST-EDITING DESCRIPTION STRATEGY

FULL POST-EDITINGWe generally fully post-edit UGC content which meets certain criteria: high number of visits or clicks . This is a client driven content selection.

Post-edit to human level , correct for grammar/fluency, terminology, style & voice

MT + Crowd is a viable option to reproduce the multitude of styles & voices found in UGC

MEDIUM POST-EDITING

A good example for this level is Technical Forum content. The aim is to provide technically accurate translations that will enable readers to solve the problem they are experiencing. Style and fluency are not important.

Human-level but with style and fluency allowances. Emphasis on meaning and readability, interim between full and light .

LIGHT POST-EDITINGThis level can be redefined as Extra Light or Sanity Check when we compare with other non-UGC content types. Emphasis here is on quick turnaround and large volumes.

Sliding scale depending on the light PE strategy. This can be simply a sanity check to ensure that UGC content is not published with severe misrepresentations or offensive statements.

CORPUS MANAGEMENT

Customized post-editing to create corpus data for MT engine training, for example as we are currently doing for C2C seller listings for an online marketplace.

Strategy driven by client requirements and corpus purpose. Avoid over-editing.

Page 27: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Localization Strategies Light Post-editing Examples

Source Raw MT Post-Edited NECESSARY CHANGESDelicious food (thanks to the chef!!!), is very pleased with service.

อาหารแสนอร�อย (ต้�องขอบคุ�ณพ่�อคุร�ว!!!) ไม่�พ่อใจกั�บกัารบร�กัารม่ากั

n/a (Thai) Misrepresentation/added negation: Thai means "not very pleased with the service"

My experience was absolutely awesome.

ฉั�นกั�ม่�ประสบกัารณ น�ากัลั�วอย�างแน�นอน

n/a (Thai) Misrepresentation: awesome has been translated as 'awful/horrific/frightening'

Von dem Moment, als wir ankamen, bis zu unserer Abreise, waren wir total hin und weg. [German]

From the moment we arrived until we left, we were totally out and away.

From the moment we arrived until we left, we were totally blown away.

Misrepresentation/ (idiomatic expression missed), could be perceived as a negative statement

Das Hotel ist zwar etwas zu kolossal konzipiert, riecht diskret nach den vergangenen Zeiten /z.T. auch nicht ohne Charm/ Einkaufsmöglichkeiten vor Ort etwas dürftig, dehalb ist Tagesausflug nach Hamilton empfehlenswert /Ferry oder Bus/. Zimmer sind sehr räumig und mit schönem Ausblick wenn man auch ein Balkon hat. [German]

Although the hotel is designed something colossal, smells discreetly recent times / sometimes not without Charm / Shopping on site a bit poor, dehalb day trip is recommended to Hamilton / Ferry or bus /. Rooms are very räumig and with nice views if you also have a balcony.

Although the hotel is designed something colossal, exudes bygone times / sometimes not without Charm / Shopping on site a bit poor, day trip is recommended to Hamilton / Ferry or bus /. Rooms are very spacious and with nice views if you also have a balcony.

Misrepresentation/offensive statement (idiomatic expression missed)

Un-translated word deleted to avoid confusing the reader

Un-translated word translated because it is critical to understanding the positive statement made about the room

Page 28: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Localization Strategies

Source [German] Raw MT English, Post-Edited NECESSARY CHANGESFehlermeldung!!! Hilfe schnellBei mir kommt eine Fehler meldung wo rin drin steht: Folgende Probleme sin aufgetreten: Der Remoteserver reagiert nicht ordnungsgemäß.Versuchen sie es in einigen Minuten erneut. so das war meine Frage

Error message! Help fastWith me comes an error message where is rin in it:Following problems sin:The remote server responded incorrectly.Try it again in a few minutes.so that was my question

Error message! Help fastWith me comes an error message where is in it:Following problems:The remote server responded incorrectly.Try it again in a few minutes.so that was my question

Light PE:

Sanity check, untranslated words (in red) were deleted to avoid confusion.

Fehlermeldung!!! Hilfe schnellBei mir kommt eine Fehler meldung wo rin drin steht: Folgende Probleme sin aufgetreten: Der Remoteserver reagiert nicht ordnungsgemäß.Versuchen sie es in einigen Minuten erneut. so das war meine Frage

Error message! Help fastWith me comes an error message where is rin in it:Following problems sin:The remote server responded incorrectly.Try it again in a few minutes.so that was my question

Error message! Help fastI got an error message where it says:Following problems occurred:The remote server responded incorrectly.Try it again in a few minutes.so that was my question

Medium PE:

Emphasis on meaning , readability and technical accuracy.

[Fit for purpose: Can users follow these steps?]

Fehlermeldung!!! Hilfe schnellBei mir kommt eine Fehler meldung wo rin drin steht: Folgende Probleme sin aufgetreten: Der Remoteserver reagiert nicht ordnungsgemäß.Versuchen sie es in einigen Minuten erneut. so das war meine Frage

Error message! Help fastWith me comes an error message where is rin in it:Following problems sin:The remote server responded incorrectly.Try it again in a few minutes.so that was my question

Error message! Please help quickly!I received an error message that reads:The following problems occurred:The remote server responded incorrectly. Try again in a few minutes.So that was my question.

Full PE:

Correctness + Clarity. Accurate meaning, spelling, grammar + punctuation, country standards and Terminology. [Should normalization take place in target such as correct caps and punctuation? This is a client decision]

Post-editing Levels Examples

Page 29: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Localization Strategies Some Scenarios

Page 30: MT and Post-Editing User-Generated Content AMTA 2014

Quality Models

Page 31: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Quality Models We recommend a simple workflow for UGC in order to reduce the touch

points that increase effort and reduce productivity.

Avoid the traditional LQA or quality program for light or medium post-editing or crowd translation. For these quality levels, there are a number of approaches that can be used based on the purpose, platform, impact of the localized content:

Automated spell-checking pass Automatic QA pass (terminology, source/target consistency) Community or user feedback Level of quality measured by reach (unique visits, clicks, etc.) Readability (adequacy & fluency) score of a sample on a 4 or 5

point scale, i.e. TAUS Dynamic Quality Framework definitions.

Page 32: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Quality ModelsCommunity Feedback in the User Interface

Page 33: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Quality ModelsCommunity Feedback in the User Interface

Page 34: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Quality ModelsTAUS Adequacy & Fluency Scoring

AdequacyOn a 4-point scale rate how much of the meaning is represented in the translation:

o Everythingo Mosto Littleo None

FluencyRate on a 4- point scale the extent to which the translation is well-formed grammatically, contains correct spellings, adheres to common use of terms, titles and names, is intuitively acceptable and can be sensibly interpreted by a native speaker:

o Flawlesso Goodo Dis-fluento Incomprehensible

Page 35: MT and Post-Editing User-Generated Content AMTA 2014

ResourcingStrategies

Page 36: MT and Post-Editing User-Generated Content AMTA 2014

Resourcing Strategies

Highly scalable with short turn-around-times (TATs).

The chunks of content in a review or a thread are perfectly sized for distribution to a large crowd.

The concerns we have in traditional projects about limiting the size of the translation team to ensure consistency across projects do not apply to most UGC content

We value the lexical and stylistic diversity we can achieve through crowdsourcing that would be difficult to attain with a traditional resourcing model.

UGC translation is often well-suited for crowdsourcing

Page 37: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Resourcing Strategies

Recent language graduates looking for translation experience to start their careers

Qualified linguists/bilinguals working in other areas (teaching, call centre support, etc.) who do not have professional translation experience or specialist domain expertise, but have the skills to engage in translation in a casual way

Retired translators who want to do some light part-time work, maybe out of touch with the industry but still capable of translating

Bilingual teens, undergrads with sufficient linguistic skills to produce "good enough" translation in social media/community context where information is targeted at youth

Target Groups

Page 38: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Resourcing Strategies

Native fluency in target language

Fluent comprehension of source language

Computer literate

Internet savvy

Competent at writing in their own language

Experience with the domain, i.e. travel, retail, technical products, etc.

Vetting Basic Skills

Page 39: MT and Post-Editing User-Generated Content AMTA 2014

SubheaderSample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

- Sample text here sample text here Sample text here.- Sample text here sample text here Sample text here.

Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here.

Resourcing Strategies

Generic training course in UGC translation and post-editing

Customized UGC translation / post-editing training based on specific content, purpose and target group

o Post-editing level instructions

o Quality expectations

o In-context / out-of context translation

o Normalization of source and or target text

Use of CAT tools where appropriate

We provide training to our teams in …

Page 40: MT and Post-Editing User-Generated Content AMTA 2014

Conclusions

Page 41: MT and Post-Editing User-Generated Content AMTA 2014

Conclusions

We can expect the quality of the raw MT output to increase, as more time and research is invested in normalization processes for UGC.

When translating UGC, we have to identify the appropriate content and quality levels based on impact and visibility.

We can use strategies like machine translation and crowdsourcing to address time and cost issues.

We must adapt our quality models to meet the needs of this growing content type.

What we learned …

Page 42: MT and Post-Editing User-Generated Content AMTA 2014

Questions?Language Tools Team, [email protected]