![Page 1: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/1.jpg)
IRF Symposium 2007 Vienna, AustriaNovember 8-9, 2007, Mariott Hotel
Presentation: Machine Translation Chinese-EnglishSome experiments
Dr. Barrou DIALLO, Head of Research, EPO
![Page 2: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/2.jpg)
2
EPO Research The case of Machine Translation
Our Vision & Mission
MT versus Patents
The Chinese language caseOur Experiments
Our Accomplishments
Perspectives
![Page 3: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/3.jpg)
3
Our Vision & Mission (1/3)
R&D center as a source of Efficiency:
• Efficient Reading
• Accurate Searching
• Fast Granting
Our Vision: Turning Technology into IP Business
![Page 4: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/4.jpg)
4
The EPO Research Department
Merged in March 2007 in a new Information Management structure; became "horizontal"
Located in The Hague, Netherlands Large portfolio of academic contacts (Labs, Universities) Entry point for testing and evaluating industrial solutions since 1990 Partnerships with International institutions (WIPO, EC) Strong background in mathematics, algorithms, and data structures Network of active users and testers inside the EPO
Our Vision & Mission (2/3)
![Page 5: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/5.jpg)
5
Our mission & Mission (3/3)
Coordinating research initiatives across departments Technology watch and green-field research Performing quantitative analysis Identifying and communicating business opportunities Providing users with sensible options - courses of action Ensuring smooth transition from research to development Communicate practices and experiences Report and advise over technical solutions to decision-makers
Help addressing Challenges
![Page 6: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/6.jpg)
6
EPO Research The case of Machine Translation
Our Vision & Mission
MT versus Patents
The Chinese language caseOur Experiments
Our Accomplishments
Perspectives
![Page 7: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/7.jpg)
7
MT versus PatentsA Strategic Domain foreseen 5 years ago
Needs less investment than expected Can re-use existing data and knowledge Mature enough to improve efficiency Satisfies patent professionals Offers a key technology for future language
challenges
Lessons learned from the European Machine Translation Programme
![Page 8: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/8.jpg)
8
EPO Research The case of Machine Translation
Our Vision & Mission
MT versus Patents
The Chinese language caseOur Experiments
Our Accomplishments
Perspectives
![Page 9: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/9.jpg)
9
Chinese language case (1/)
Issue 1: Sentence + Word Segmentation Issue 2: Text ReorderingIssue 2: Text Reordering Issue 3: Alignment + System training Issue 4: Translation with proper terms Issue 5: Regeneration
![Page 10: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/10.jpg)
10
Example: The Re-ordering Issue
[Brown & al. 93] set the foundations of the SMT approach (use of Bayes' theorem)
[Knight 99] approach (Model 3) to word re-ordering does bring in some improvement in the target sentence, but it is rather oriented towards French or English structures.
[Chiang 05] proposes to re-order sentences in Chinese by using hierarchical phrase pairs, which are phrases that contain subphrases. Produce better results than the traditional phrase-based
approach.
Many Years of research on the subject:
![Page 11: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/11.jpg)
11
The Re-ordering Issue
Re-ordering: the phrase-base approach
"Australia is diplomatic relations with North Korea is one of the few countries"
![Page 12: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/12.jpg)
12
Step 1
Step 2
Re-ordering :
Hierarchical-phrase approach (1/2)
![Page 13: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/13.jpg)
13
"Australia is one of the few countries that have diplomatic relations with North Korea".
Step 3
Re-ordering :
Hierarchical-phrase approach (2/2)
![Page 14: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/14.jpg)
14
Solution?A semi-automatic approach
Computer-Assisted Translation (CAT) Using high-quality manually-aligned texts based on international
organizations bi-text repositories and translation memories. Using a bilingual ontology to align words or phrases which are
not present in the training corpuses. There are available ontologies of patent vocabulary in English; a manual Chinese translation of the central concepts could be
gradually added by IPC category Use syntactic rules to improve lexical choices and collocation
processing. I.e Univ. of Geneva (Chomsky syntactic parser for English) process to guarantee a well-formed final English sentence
![Page 15: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/15.jpg)
15
EPO Research The case of Machine Translation
Our Vision & Mission
MT versus Patents
The Chinese language caseOur Experiments
Our Accomplishments
Perspectives
![Page 16: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/16.jpg)
16
Comparison of MT systemAn empirical approach (1/3)
Rule based system (Systran) Statistical system (Language Weaver) Hybrid system (CCID prototype)
1 Evaluation grid
3 systems on the test bench
Scores of 1-4 Usability & Readability criteria
![Page 17: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/17.jpg)
17
Comparison of MT systems (2/3)
Poor (1) Medium (2) Good (3) Excellent (4)
Rule-based MTHybrid MT ? ???Statistical MT
![Page 18: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/18.jpg)
18
Comparison of MT systemAn empirical approach (3/3)
No MT system performs properly, CAT (Computer Aided Translation) seems necessary
The hybrid system seems more promising Post-editors needed for checking outputs?
No statistical significance is to be reported - further investigations needed!
![Page 19: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/19.jpg)
19
Readability Tests on Human Translations: Flesch et al.
Designed to indicate how difficult a reading passage is to understand.
There are two tests: Flesch Reading Ease Flesch–Kincaid Grade Level.
This test has become a standard. Bundled with popular word processing programs
![Page 20: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/20.jpg)
20
Flesch Reading Ease score : 206.835 – (1.015 x ASL) – (84.6 x ASW)
Rates text on a 100-point scale; the higher the score, the easier it is to understand the document (60 to 70 for standard docs).
Where:ASL = average sentence length (# words / # of sentences)ASW = average number of syllables per word (# syllables / # of words)
Flesch-Kincaid Grade Level score: (.39 x ASL) + (11.8 x ASW) – 15.59
Rates text on a U.S. school grade level. A score of 8.0 means that an eighth grader can understand the document (7.0 to 8.0 for standard docs)
Readability Tests on Human Translations: Flesch et al.
![Page 21: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/21.jpg)
21
Human Translation assessmentExample (1/2)
CN1926077 The Making and Using Methods of Plant/Soil Activated Liquid
Abstract
In the mineral composition ion water of concentrated sulfuric acid, which add the vegetal leavening confected by enzyme and microbe used to produce enzyme and the muscovado made by sugarcane together, under the aerobic condition, the selective preference is, do the commensalisms cultivation at about 25 Centigrade. After decomposing the sugar, before rot and ferment, the selective preference is, spreading on the leaf surface or pouring in the soil during the alcohol fermenting stage.
Flesch-Kincaid Reading Ease score: 13/100Flesch-Kincaid Grade level: 17.Score: 7/10
Comments: The Abstract and parts of the claims are convoluted/badly structured in parts and some spelling mistakes.
What's Important?Figures or
Comments?
![Page 22: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/22.jpg)
22
Human Translation assessmentExample (2/2)
CN2354381 Claims 1. A time switch of gas appliances, composing of mechanical gear timer and fuel
gas valve, wherein it also comprises round upper cover board subassembly and lower cover board subassembly, a valve switch knob (4) fixed on the upper end of the valve switch spigot shaft (7) is installed on the front of the upper cover board, the valve switch spigot shaft (7) penetrates through the upper cover board (6) and the lower cover board (29), a timer hollow shaft (8) is installed out of the valve switch spigot shaft (7), the timer hollow shaft (8) penetrates through uthe pper cover board (6), a round time knob (5) is installed between the upper end valve switch knob of the timer hollow shaft and the upper cover board (6), a time indicating dial (3) interlocking with the timer hollow shaft (8) is installed between the round time knob (5) and the upper cover board (6); a mechanical gear timer is installed on the reverse side of the upper cover board (6), an unlocking cam(9) is installed out of the timer hollow shaft (8) in the central part;
Flesch-Kincaid Grade level: 49.Flesch-Kincaid Reading Ease score: -45.Score: 9/10Comments: Long convoluted sentences. Diagrammatical explanations. Minor grammatical and typo errors.
![Page 23: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/23.jpg)
23
Human vs machine: unfair competition?
One kind to combs the type generator using a phase lock agility frequency modulation output signal to form the output any to designate channel's installment and the method. The track input signal's phase error, this input signal is modulated the carrier output frequency, with should modulate the output frequency, the use subtracts this input signal the method to lock combs the type generator output, and eliminates this phase error
一种利用相位锁定一捷变频率调制输出信号到梳式发生器形成输出的任何选定信道的装置和方法。跟踪输入信号的相位误差,该输入信号被调制成载波输出频率,和该调制过的输出频率,
利用减去该输入信号的方法锁定到梳式发生器输出,并消除该相位误差。
An apparatus and method is disclosed which phase locks a frequency-agile modulated output signal to any selected channel of a comb generated output. The phase error of an input signal is tracked, the input signal is modulated up to a carrier output frequency, and the modulated output frequency is locked to the comb generator output by subtracting the input signal and negating the phase error.
Systran
Human translation
Original text
Is such an MT useful?Is such an MT useful?
![Page 24: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/24.jpg)
24
EPO Research The case of Machine Translation
Our Vision & Mission
MT versus Patents
The Chinese language caseOur Experiments
Our Accomplishments
Perspectives
![Page 25: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/25.jpg)
25
Chinese patents showing Priority documents 105000 CN documents with US priorities 15000 CN documents with EP priorities 15000 CN documents with GB priorities 15000 CN documents with EP priorities 400 CN documents with WO priorities
A sufficient source for starting-up an alignment?
# of aligned sentences
Our Accomplishments
(June 2006)
![Page 26: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/26.jpg)
26
Manual Data cleaningDirty texts generate XML failures
CN86103346
Spherical particles of vinyl resins having high bulk density can be prepared by the suspension polymerization process by using as a dispersant an alkyl hydroxy cellulose having a viscosity of from about 1000 to about 100,000 cps. A suitable dispersant is a hydroxypropyl methyl cellulose polymer having the formula: <IMAGE> +TR <IMAGE> where n is from about 300 to about 1500.
Use of XMLSpy Professional to check text
![Page 27: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/27.jpg)
27
Methodology of World Alignment
[OCH93]
![Page 28: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/28.jpg)
28
First Example of alignment
![Page 29: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/29.jpg)
29
Second example of alignment
![Page 30: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/30.jpg)
30
TMX Formatting of aligned texts
<?xml version="1.0" ?> <!DOCTYPE tmx SYSTEM "tmx14.dtd"> <tmx version="1.4"> <header creationtoolversion="1.0.0" datatype="plaintext"
segtype="sentence" adminlang="EN-US" srclang="EN" o-tmf="txt" creationtool="MetaReadAlign" >
</header> <body> <tu> <tuv xml:lang="EN"><seg> In a preferred embodiment, a low-band
isolator network, coupled to the antenna element, provides signal isolation between high-band and low-band signal paths during high-band operation.</seg></tuv>
<tuv xml:lang="ZH"><seg> NOT DISPLAYABLE </seg></tuv> </tu>
Provides compatibility to Industry standards
![Page 31: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/31.jpg)
Evaluation record CN85108669
Welcome EvaluatorX
Save Status Reset
• 100% match
•>70% match
•<50% match
•partial translation
•bad translation
•total mismatch
Radio buttons, multiple entries possible (e.g. partial translation, 100% match), default value "100% match"Entries saved on server
Save status for next time
Transmit EvaluationReset the complete evaluation process (everything gets resetted and lost)
Record Evaluated,Proceed with next
Saves the selected buttons for this record and jump to next record
Evaluated/not evaluated
Record Status
Allows browsing
QUALITY CONTROL PANEL BEFORE ALIGNMENT
![Page 32: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/32.jpg)
32
EPO Research The case of Machine Translation
Our Vision & Mission
MT versus Patents
The Chinese language caseOur Experiments
Our Accomplishments
Perspectives
![Page 33: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/33.jpg)
33
Acknowledgments
EPO Staff experts in Research & Development
Jan Mannekens
Betty Yang
CrossLanguage
Metaread
University of Geneva
Questions?
![Page 34: IRF Symposium 2007 Vienna, Austria November 8-9, 2007, Mariott Hotel Presentation: Machine Translation Chinese-English Some experiments Dr. Barrou DIALLO,](https://reader036.vdocument.in/reader036/viewer/2022070307/551aa98f550346856e8b4a3a/html5/thumbnails/34.jpg)
34
References
Brown & al. 93 Brown, Della Pietra, Mercer: The Mathematics of Statistical Machine Translation: Parameter Estimation, ACL vol.19 no.2, 1993
Kevin Knight: A Statistical MT Tutorial Workbook, April 1999
David Chiang: A Hierarchical Phrase-Based Model for Statistical Machine Translation, Proceedings of the 43rd Annual Meeting of the ACL, 2005