optimized big data approach to machine translation @ taus
DESCRIPTION
The volume of available multilingual data has exploded. One option is to create machine translation systems based on the previously translated segments, with as much data as possible. This works in many cases, but it is well known in the market that the cleaner the data, the better the results in terms of productivity, cost, and even quality. tauyou has made an additional step in the process by optimizing the translation engines on a per-document basis, which has proven to provide a significant quality increase in the machine translation output. This approach, linked to a joint source content optimization and summarization, leads to significant savings in multilingual communications.TRANSCRIPT
![Page 2: Optimized Big Data Approach to Machine Translation @ TAUS](https://reader034.vdocument.in/reader034/viewer/2022052523/5559298cd8b42a4f3d8b45d8/html5/thumbnails/2.jpg)
![Page 3: Optimized Big Data Approach to Machine Translation @ TAUS](https://reader034.vdocument.in/reader034/viewer/2022052523/5559298cd8b42a4f3d8b45d8/html5/thumbnails/3.jpg)
Data for Machine Translation
![Page 4: Optimized Big Data Approach to Machine Translation @ TAUS](https://reader034.vdocument.in/reader034/viewer/2022052523/5559298cd8b42a4f3d8b45d8/html5/thumbnails/4.jpg)
Some techniques might work...
Baseline engines
In-domain + out-of-domain balance
Domain-specific engines
![Page 5: Optimized Big Data Approach to Machine Translation @ TAUS](https://reader034.vdocument.in/reader034/viewer/2022052523/5559298cd8b42a4f3d8b45d8/html5/thumbnails/5.jpg)
But ...
Baseline engines
In-domain + out-of-domain balance
Domain-specific engines
![Page 6: Optimized Big Data Approach to Machine Translation @ TAUS](https://reader034.vdocument.in/reader034/viewer/2022052523/5559298cd8b42a4f3d8b45d8/html5/thumbnails/6.jpg)
![Page 7: Optimized Big Data Approach to Machine Translation @ TAUS](https://reader034.vdocument.in/reader034/viewer/2022052523/5559298cd8b42a4f3d8b45d8/html5/thumbnails/7.jpg)
Big Data Approach
UnilingualTexts
Small Data
Glossaries/Dictionaries
Translation Memories
.
.
.External Data
![Page 8: Optimized Big Data Approach to Machine Translation @ TAUS](https://reader034.vdocument.in/reader034/viewer/2022052523/5559298cd8b42a4f3d8b45d8/html5/thumbnails/8.jpg)
Average Improvement
+21%
![Page 9: Optimized Big Data Approach to Machine Translation @ TAUS](https://reader034.vdocument.in/reader034/viewer/2022052523/5559298cd8b42a4f3d8b45d8/html5/thumbnails/9.jpg)
Sample Results
![Page 10: Optimized Big Data Approach to Machine Translation @ TAUS](https://reader034.vdocument.in/reader034/viewer/2022052523/5559298cd8b42a4f3d8b45d8/html5/thumbnails/10.jpg)
Important topics
Supervised Data Classification
Data Clustering
Parameter optimization
Key Performance Indicators (KPIs)
Predictive MT quality estimation
Measure + Measure + Measure
![Page 11: Optimized Big Data Approach to Machine Translation @ TAUS](https://reader034.vdocument.in/reader034/viewer/2022052523/5559298cd8b42a4f3d8b45d8/html5/thumbnails/11.jpg)
![Page 12: Optimized Big Data Approach to Machine Translation @ TAUS](https://reader034.vdocument.in/reader034/viewer/2022052523/5559298cd8b42a4f3d8b45d8/html5/thumbnails/12.jpg)
Not everything that can be counted counts, and not everything that
counts can be counted.
William Bruce Cameron