exploding the myth the gerund in machine translation nora aranberri

26
Exploding the Myth the gerund in machine translation Nora Aranberri

Upload: rowan-peiser

Post on 29-Mar-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Exploding the Myth the gerund in machine translation Nora Aranberri

Exploding the Myththe gerund in machine translation

Nora Aranberri

Page 2: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Background

• Nora Aranberri– PhD student at CTTS (Dublin City University)

– Funded by Enterprise Ireland and Symantec (Innovation Partnerships Programme)

• Symantec– Software publisher

– Localisation requirements

• Translation – Rule-based machine translation system (Systran)

• Documentation authoring – Controlled language (CL checker: acrocheck™)

– Project: CL checker rule refinement

Page 3: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

The Myth

• Sources: translators, post-editors, scholars

– Considered a translation issue for MT due to its ambiguity• Bernth & McCord, 2000; Bernth & Gdaniec, 2001

– Addressed by CLs• Adriaens & Schreurs, 1992; Wells Akis, 2003; O’Brien 2003; Roturier, 2004

The gerund is handled badly by MT systems

and should be avoided

• Sources: translators, post-editors, scholars

– Considered a translation issue for MT due to its ambiguity• Bernth & McCord, 2000; Bernth & Gdaniec, 2001

– Addressed by CLs• Adriaens & Schreurs, 1992; Wells Akis, 2003; O’Brien 2003; Roturier, 2004

The gerund is handled badly by MT systems

and should be avoided

Page 4: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

What is a gerund?

• -ing either a gerund, a participle, or continuous tense keeping the same form

• Examples– GERUND: Steps for auditing SQL Server instances.

– PARTICIPLE: When the job completes, BACKINT saves a copy of the Backup Exec restore logs for auditing purposes.

– CONTINUOUS TENSE: Server is auditing and logging.

• Conclusion: gerunds and participles can be difficult to differentiate for MT.

Page 5: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Methodology: creating the corpus

• Initial corpus– Risk management components texts– 494,618 words – uncontrolled

• Structure of study– Preposition or subordinate conjunction + -ing

• Extraction of relevant segments– acrocheck™: CL checker asked to flag the patterns of the

structure• IN + VBG|NN|JJ “-ing”

– 1,857 sentences isolated

Page 6: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Methodology: translation

• Apply machine translation for target language

– MT used: Systran Server 5.05

– Dictionaries • No specific dictionaries created for the project

• Systran in-built computer science dictionary applied

– Languages• Source language: English

• Target languages: Spanish, French, German and Japanese

Page 7: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Methodology: evaluation (1)

• Evaluators

– one evaluator per target language only

– native speakers of the target languages

– translators / MA students with experience in MT

• Evaluation format

Page 8: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Methodology: evaluation (2)

• Analysis of the relevant structure only

• Questions:

– Q1: is the structure correct?

– Q2: is the error due to the misinterpretation of the source or because the target is poorly generated?

• Both are “yes/no” questions.

Page 9: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Results: prepositions / subordinate conjunctions    

prepositionexamples

by + ing 377

for + ing 339

when + ing 256

before + ing 163

after + ing 122

about + ing 96

on + ing 89

without + ing 75

of + ing 71

from + ing 68

while + ing 54

in + ing 36

if + ing 19

rather than + ing 14

such as + ing 13

TOTAL 1857

%  

Page 10: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Results: correctness for Spanish

    Spanish

prepositionexamples

correct

incorrect

by + ing 377 351 26

for + ing 339 243 96

when + ing 256 205 51

before + ing 163 145 18

after + ing 122 107 15

about + ing 96 82 14

on + ing 89 38 51

without + ing 75 47 28

of + ing 71 65 6

from + ing 68 30 38

while + ing 54 3 51

in + ing 36 27 9

if + ing 19 15 4

rather than + ing 14 0 14

such as + ing 13 9 4

TOTAL 1857 1393 464

%  75.01

% 24.99%

Page 11: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Results: correctness for French

    Spanish French

prepositionexamples

correct

incorrect correct

incorrect

by + ing 377 351 26 358 19

for + ing 339 243 96 284 55

when + ing 256 205 51 2 254

before + ing 163 145 18 146 17

after + ing 122 107 15 117 5

about + ing 96 82 14 82 14

on + ing 89 38 51 80 9

without + ing 75 47 28 65 10

of + ing 71 65 6 65 6

from + ing 68 30 38 31 37

while + ing 54 3 51 45 9

in + ing 36 27 9 9 27

if + ing 19 15 4 10 9

rather than + ing 14 0 14 0 14

such as + ing 13 9 4 9 4

TOTAL 1857 1393 464 1341 516

%   75.% 24.99% 72.21% 27.79%

Page 12: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Results: correctness for German

    Spanish French German

prepositionexamples

correct

incorrect correct

incorrect

correct

incorrect

by + ing 377 351 26 358 19 364 13

for + ing 339 243 96 284 55 262 77

when + ing 256 205 51 2 254 213 43

before + ing 163 145 18 146 17 145 18

after + ing 122 107 15 117 5 114 8

about + ing 96 82 14 82 14 88 8

on + ing 89 38 51 80 9 58 31

without + ing 75 47 28 65 10 71 4

of + ing 71 65 6 65 6 60 11

from + ing 68 30 38 31 37 24 44

while + ing 54 3 51 45 9 27 27

in + ing 36 27 9 9 27 23 13

if + ing 19 15 4 10 9 17 2

rather than + ing 14 0 14 0 14 0 14

such as + ing 13 9 4 9 4 9 4

TOTAL 1857 1393 464 1341 516 1514 343

%  75.01

% 24.99% 72.21% 27.79%81.53

% 18.47%

Page 13: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Results: correctness for Japanese

    Spanish French German Japanese

preposition

examples correct incorrect correct incorrect correct

incorrect correct

incorrect

by + ing 377 351 26 358 19 364 13 301 76

for + ing 339 243 96 284 55 262 77 224 115

when + ing 256 205 51 2 254 213 43 161 95

before + ing 163 145 18 146 17 145 18 134 29

after + ing 122 107 15 117 5 114 8 108 14

about + ing 96 82 14 82 14 88 8 88 8

on + ing 89 38 51 80 9 58 31 29 60

without + ing 75 47 28 65 10 71 4 66 9

of + ing 71 65 6 65 6 60 11 57 14

from + ing 68 30 38 31 37 24 44 33 35

while + ing 54 3 51 45 9 27 27 44 10

in + ing 36 27 9 9 27 23 13 9 27

if + ing 19 15 4 10 9 17 2 17 2

rather than + ing 14 0 14 0 14 0 14 1 13

such as + ing 13 9 4 9 4 9 4 8 5

TOTAL 1857 1393 464 1341 516 1514 343 1303 554

%   75.% 24.99% 72.21% 27.79% 81.53% 18.47% 70.17% 29.83%

Page 14: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Significant results

    Spanish French German Japanese

prepositionexamples

correct

incorrect correct

incorrect correct

incorrect correct

incorrect

by + ing 377 351 26 358 19 364 13 301 76

for + ing 339 243 96 284 55 262 77 224 115

when + ing 256 205 51 2 254 213 43 161 95

before + ing 163 145 18 146 17 145 18 134 29

after + ing 122 107 15 117 5 114 8 108 14

about + ing 96 82 14 82 14 88 8 88 8

on + ing 89 38 51 80 9 58 31 29 60

without + ing 75 47 28 65 10 71 4 66 9

of + ing 71 65 6 65 6 60 11 57 14

from + ing 68 30 38 31 37 24 44 33 35

whil e + ing 54 3 51 45 9 27 27 44 10

in + ing 36 27 9 9 27 23 13 9 27

if + ing 19 15 4 10 9 17 2 17 2

rather than + ing 14 0 14 0 14 0 14 1 13

such as + ing 13 9 4 9 4 9 4 8 5

TOTAL 1857 1393 464 1341 516 1514 343 1303 554

%   75.% 24.99% 72.21% 27.79% 81.53% 18.47% 70.17% 29.83%

Page 15: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Results: correlation of problematic structures

0

10

20

30

40

50

60

70

80

Spanish French German Japanese

for when from on while by

• The most problematic structures seem to strongly correlate across languages

• Top 6 prep/conj account for >65% of errors

Page 16: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 17: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 18: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 19: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 20: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 21: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Analysis and generation errors

    Spanish French German Japanese

prepositionexamples

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

by + ing 377 4 27 10 13 4 9 16 58

for + ing 339 37 120 37 55 33 47 30 82

when + ing 256 13 49 0 256 10 38 3 93

before + ing 163 4 27 4 17 4 14 8  22 

after + ing 122 5 12 5 5 1 7 4  11 

about + ing 96 7 51 10 13 5 3 4  1 

on + ing 89 3 51 0 9 1 30 2 57

without + ing 75 3 26 2 8 2 2 1  8 

of + ing 71 4 4 3 7 4 8 7  11 

from + ing 68 5 36 1 37 1 43 8 33

while + ing 54 2 50 2 8 3 26 0 10

in + ing 36 5 7 6 27 2 13 12  18 

if + ing 19 1 3 1 9 2 0 0  2 

rather than + ing 14 0 14 0 14 0 14 0  13 

such as + ing 13 3 8 1 4 2 2 3  2 

TOTAL 1857 106 523 83 514 85 267 98 459

%   0.60% 0.63% 0.54% 0.74% 0.61% 0.72% 0.60% 0.72%

Page 22: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Source and target error distribution

• Target errors seem to be more important across languages

• The prep/conj with the highest error rate and common to 3 or 4 target languages cover 43-54% of source errors and 48-59% of target errors

  Spanish French German Japanese

 

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

Source-error

Target-error

for + ing 37 120 37 55 33 47 30 82

when + ing 13 49 0 256 10 38 3 93

from + ing 5 36 1 37 1 43 8 33

on + ing 3 51 0 9 1 30 2 57

SUM 58 256 38 357 45 158 43 265

Total 106 523 83 514 85 267 98 459

%54.72

% 48.95 45.78 69.45 52.94 59.18 43.88 57.73

Page 23: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Conclusions

• Overall success rate between 70-80% for all languages

• Target language generation errors are higher than the errors due to the misinterpretation of the source.

• Great diversity of prepositions/subordinate conjunctions with varying appearance rates.

• Strong correlation of results across languages.

Page 24: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Next steps

• Further evaluations to consolidate results– 4 evaluators per language– Present sentences to the evaluators out of alphabetical order by

preposition/conjunction– Note the results for the French “when”.

• Make these findings available to the writing teams• Take our prominent issues

– Source issues • controlled language or pre-processing

– Formulate more specific rules in acrocheck to handle the most problematic structures/prepositions and reduce false positives

• Standardise structures with low frequencies

– Target issues • post-processing or MT improvements

Page 25: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

References

• Adriaens, G. and Schreurs, D., (1992) ‘From COGRAM to ALCOGRAM: Toward a Controlled English Grammar Checker’, 14th International Conference on Computational Linguistics, COLING-92, Nantes, France, 23-28 August, 1992, 595-601.

• Bernth, A. and Gdaniec, C. (2001) ‘MTranslatability’ Machine Translation 16: 175-218.

• Bernth, A. and McCord, M. (2000) ‘The Effect of Source Analysis on Translation Confidence’, in White, J. S.,  eds., Envisioning Machine Translation in the Information Future: 4th Conference of the Association for Machine Translation in the Americas, AMTA 2000, Cuernavaca, Mexico, 10-14 October, 2000, Springer: Berlin, 89-99.

• O’Brien, S. (2003) ‘Controlling Controlled English: An Analysis of Several Controlled Language Rule Sets’, in Proceedings of the 4th Controlled Language Applications Workshop (CLAW 2003), Dublin, Ireland, 15-17 May, 2003, 105-114.

• Roturier, J. (2004) ‘Assessing a set of Controlled Language rules: Can they improve the performance of commercial Machine Translation systems?’, in ASLIB Conference Proceedings, Translating and the Computer 26, London, 18-19 November, 2004, 1-14.

• Wells Akis, J. and Sisson, R. (2003) ‘Authoring translation-ready documents: is software the answer?’, in Proceedings of the 21st annual international conference on Documentation, SIGDOC 2003, San Francisco, CA, USA, October 12-15, 2003, 38-44.

Page 26: Exploding the Myth the gerund in machine translation Nora Aranberri

Optional Footer Information Here

Thank you!

e-mail: nora.aranberrimonasterioATdcu.ie