can controlled language rules increase the value of mt? fred hollowood & johann rotourier...

20
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Upload: anissa-gaines

Post on 12-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Can Controlled Language Rules increase the value of MT?

Fred Hollowood & Johann Rotourier

Symantec Dublin

Page 2: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Localisation Challenge

Databases filled with English content• Large volumes• Perishable• Technical

Fast delivery

Cost effective

Page 3: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Goals

Reduce cost of Translation to 30%• Implement CL within the authoring community• Foster the use of editor software to police the CL rule set• Identify the most efficient MT system for each target language• Develop Post-Editing guidelines• Refine Symantec glossaries to assist in dictionary preparation

Page 4: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Controlled Language and MT

ControlledLanguage

MT system

Rule SetsTerminologyStyleEditors

Language PairsJp, De, Fr, It, Es

Post Editing Assessment

Page 5: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Sequence of Events

Identify a corpus

Develop a test suite

Develop terminology

Work with MT engines

Assess results

Page 6: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Two Questions

How effective are CL rules in terms of post-editing effort?

Which CL rules provide the best results?

Page 7: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Corpus Selection

Origin • stream of XML messages

Volume• 30,000 words

Process• Use TM technology to pre-process raw XML to provide strings for MT• Use Macros to tidy up untranslatable text

Page 8: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Terminology Extraction

Extraction• Tools: Wordsmith Tools 4

Removal of duplicates• Spelling variants• Hyphenation variants• Capitalisation variants• Symbol/Plain• Abbreviation/Plain

Removal of synonyms

Page 9: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Custom Dictionaries

Current MT systems• Systran Premium 4.0• Logomedia Translate Pro

— Differing capabilities

— Differing function

Per target language• Grammars• Styles

Page 10: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Test Suite

59 rules examined

17 of which already encapsulated in Symantec’s writing guidelines

Classification• 8 lexical• 40 syntactic• 11 textual

Page 11: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Controlled Language Sources

Breakdown of CL sources (59)

17

18

411

5

13Attempto

Bernth & Gdaniec

Personal

PACE

AECMA

Easy English

O'Brien

Page 12: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Testing the Rules

Process• Find an example sentence that does not conform to the rule• Edit it to conform to all other rules under study• Minimize the linguistic complexity (single test)• Apply the CL rule• Repeat the procedure to obtain 3 test examples

Test Suite• 59 rules expressed as 177 sentences

Page 13: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Post Editing Guidelines

Ensure information transfer

Modify what is grammatically deviant from commercial quality

Modify what is lexically essential for understanding in target.

Avoid the use of synonyms for the sake of originality

Don’t forget that all the words are probably present in the output ( possibly wrong order)

Remember style does not matter but information accuracy does.

Don’t dally, if an improvement is not obvious, move along

Page 14: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Metrics Generation

Quality levels• Excellent (4), Good (3), Medium (2), Poor (1)• Uncontrolled source generates output A• Controlled source generates output B

Focus is on Usability

Evaluation by native speakers

Further study is being done to link into other systems of quality evaluation

• Blackjack• SAE J 2450

Page 15: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Overall evaluation (French)

O v e r a l l e v a l u a t i o n o f 1 7 7 e x a m p l e s ( S y s t r a n F r e n c h )

3 84 6 4 4 4 9

2 2

5 8

1 1 5

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

1 4 0

P M G E

S c o r e s

Numb

er of

exam

ples M T o u t p u t A

M T o u t p u t B

Page 16: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Overall evaluation (Japanese)

Overall evaluation of 177 examples (Logomedia Japanese)

32

72

42

30

13

50 52

62

0

10

20

30

40

50

60

70

80

P M G E

Scores

Num

ber

of

exam

ple

s

MT output A

MT output B

Page 17: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Overall evaluation (German)

Overall evaluation of 177 examples (Systran German)

25

53 57

42

0

20

71

86

0

20

40

60

80

100

P M G E

Scores

num

ber

of

exam

ple

s

MT output A

MT output B

Page 18: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Preliminary Results

CL significant impact

Benefit varies by language• Lots of scope for further study

Some rules are more effective than others (score range: 0- 17)

Symantec’s implied rules have mixed effectiveness

Recommend 7 additional rules

Page 19: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Additional rules

Rules with an impact in all languages• Do not omit words within lexical items, even when the term has already been

used in the sentence (12). Repeat the head noun with conjoined articles or prepositions. (15)

• Do not use slashes to list lexical items (except for product names). (14)• Always write a verb next to its particle. (17)• Only use the modal ‘could’ when the sentence contains ‘if’, otherwise use ‘can’.

(10)• Be very careful with the –ing words: If it is a gerund, use an article in front of it.

(7). If it is introducing a new clause, use ‘by’ in front it (8). If it is modifying a noun in a non-finite clause, replace it with a relative clause. (5)

•  Make sure that every segment can stand syntactically alone. (11)• Avoid footnotes in the middle of a segment. Turn footnotes into independent

segments. (11)

Page 20: Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Next Steps

Apply subsets of rules to a larger corpus.• Language checker Acrolinx

Increase the number of MT engines studied• Comprendium/Prompt (European languages)• Fujitsu/Nova’s PC Transer (Japanese)

Further refine Post Editing guidelines

Keep abreast of upgrades in current systems• Bugs fixed• New versions of software

Move to a production pilot project