taus review - no.1 - october 2014

1

October 2014 - No. I

Reviews of Language Business & Technology in: • Europe• Africa• Asia• Americas

‘Perfect storm’ conditions for MT

Conversation with Facebook’s Alex Waibel about Speech Translation

Call for the Human Language Project

Life of a Translator who developed his own tool

(AmigoCat)

Plus... columns by Nicholas Ostler, Lane Greene, Jost Zetzsche and Luigi Muzii

TAUS REVIEWof language business and technology

3

How do we communicate in an ever more globalizing world? Will we all learn to speak the same language? A lingua franca, English, Chinese, Spanish? Or will we rely on translators to help us bridge the language divides?

Language business and technology are core to the world economy and to the prevailing trend of globalization of business and governance. And yet, the language sector, its actors and innovations do not get much visibility in the media. Since 2005 TAUS has published numerous articles on translation automation and language business innovation on its web site. Now we are bundling them in TAUS Review, an online quarterly magazine.

TAUS Review is a magazine with a mission. We believe that a vibrant language and translation industry helps the world communicate better, become more prosperous and more peaceful. Communicating across hundreds – if not thousands – of languages requires adoption of technology. In the age of the Internet of Things and the internet of you, translation – in every language – becomes embedded in every app, on every screen, on every web site, in every thing.

In TAUS Review reporters and columnists worldwide monitor how machines and humans work together to help the world communicate better. We tell the stories about the successes and the excitements, but also about the frustrations, the failures and shortcomings of technologies and innovative models. We are conscious of the pressure on the profession, but convinced that language and translation technologies lead to greater opportunities.

TAUS Review follows a simple and straightforward structure. In every issue we publish reports from four different continents – Africa, Americas, Asia and Europe – on new technologies, use cases and developments in language business and technology from these regions. In every issue we also publish perspectives from four different ‘personas’ – researcher, journalist, translator and language – by well-known writers from the language sector. This is complemented by features and conversations that are different in each issue.

The knowledge we share in TAUS Review is part of the ‘shared commons’ that TAUS develops as a foundation for the global language and translation market to lift itself to a high-tech sector. TAUS is a think tank and resource center for the global translation industry, offering access to best practices, shared translation data, metrics and tools for quality evaluation, training, research.

Colophon

TAUS Review is a free online magazine, published four

times per year. TAUS members and non-members may

distribute the magazine through their web sites and

online media by embedding this code …… Readership in

2014 is estimated at 5,000.

Publisher and managing editor: Jaap van der Meer

Editor and publication manager: Anne-Maj van der Meer

Distribution and advertisements: Yulia Korobova

Enquiries about distribution and advertisements:

[email protected].

Editorial contributions and feedback can be sent

to:

General: [email protected].

Continental reviews of language business and

technology:

• Africa review: [email protected]

• Americas review: [email protected]

• Asia review: [email protected]

• Europe review: [email protected]

Persona’s perspectives of language business and

technology:

• Translator: [email protected]

• Research: [email protected]

• Language: [email protected]

• Journalist: [email protected]

Magazine with a Mission

4

Leader

5. Jaap van der Meer

Reviews of language business & technologies

8. In Europe, by Andrew Joscelyne

12. In the Americas, by Brian McConnell

15. In Asia, by Mike Tian-Jian Jiang

19. In Africa, by Amlaku Eshetie

Columns

22. The Language Perspective, by Nicholas Ostler

24. The Journalist’s Perspective, by Lane Greene

26. The Translator’s Perspective, by Jost Zetzsche

28. The Research Perspective, by Luigi Muzii

Conversation

32. Speech Translation Technology: Interview with Mark Seligman & Alex Waibel

Features

36. Call for Human Language Project

38. Life of a Translator who Developed his own Tools, by Nicolas Gregoire

42. Contributors

44. Directory of Distributors

45. Industry Agenda

Content

5

Leaderby Jaap van der Meer

Whether the need is becoming too great or the technology is reaching a level of maturity – that remains an interesting question – it is a fact though that machine translation (MT) has made a definite entrance into enterprises and governments. After some sixty years of research, high hopes and deep disappointments, we now see a rapid adoption of a technology that is not perfect but apparently good enough.

Some industry analysts and market researchers estimate a billion dollar plus MT technology sector. However, in the MT Market report that TAUS recently published we concluded that the size of the MT market is relatively small compared to its innovation power and impact. MT technology is a key enabler and a force multiplier for new services and growth. MT technology finds a high adoption rate among language service providers. Innovative companies in information technology and other sectors are converging MT technology in new applications and products or they use MT to enhance their existing products. But revenues from MT licenses, subscriptions to hosted services and professional services however add up to just 250 Million US dollars in 2014.

For the longest time in the history of the modern world lingua francas have served politicians, traders and scientists very well. Latin was the language of the church and governments until late in the nineteenth century in Europe. German was the language of science until the twentieth century. During the colonial phase of globalization the Portuguese, Spanish,

French, Dutch and British all ‘exported’ their languages to new parts of the world. Those who had the knowledge and the power ‘dictated’ the languages of trade and business.

Until well into the nineteen sixties and seventies global communications were one-directional. Radio and television broadcasted the news and the advertising industry pushed the fancy new products of a modernizing society to consumers. It took until the nineteen eighties before businesses started to communicate to each other using more advanced means than telex and fax machines.

In the last few decades, circumstances are adding up to ‘perfect storm’ conditions for MT. The incredible growth of the internet plays a crucial role in most of these circumstances. First of all, there is the ease of communications. Since the 1990’s businesses – large and small – and governments experience that consumers and citizens start to communicate back to them: this is the end of one-directional mass-communication. The term ‘call-deflection’ is coined as a number one business concern: avoid these expensive incoming calls and emails, and defer them to efficient call and support centers. Consumers and citizens, however, revert more often to bulletin boards and peer review sites as alternative ways to find answers to their questions. The immense popularity of social media is the culmination of the emancipation of the consumer, who is now taking at least an equal role in the economy. Producers and exporters no longer dictate the languages of trade and business.

Now what’s coming is the Internet of Things. Communications are no longer limited to businesses, governments, consumers and citizens. Add to these: machines and sensors that broadcast and communicate messages. Self-driving cars, wearable technology, implantable medical devices, intelligent home appliances and apps for everything, they will

Around the world some 7,000 languages are being spoken, four hundred of which have more than one million speakers. Businesses – large and small – need to ‘speak’ the languages of their customers. The translation services industry is struggling to keep up with the demand for capacity and speed.

‘Perfect Storm’ Conditions for Machine Translation

6

all ‘talk’ to each other and to humans. The twenty-twenties mark the ‘big bang of content and communications’ MT is a crucial enabling

technology in this evolution. A second circumstance that influences the adoption of MT is the hyperglobalization. According to a report published by the Peterson Institute for International Economics, the world has seen an unprecedented spread of welfare in the last few decades. Seventy-five percent of developing nations are catching up with the economic frontier. World trade is growing rapidly. McKinsey expects the global flow of goods and services to double or triple in the next decade. Growth comes from new markets, or as some global companies say: “from the next billion users”. And that will be true for many smaller and medium-sized companies that see their trading going online. Inevitably this leads to the need to translate more information in more languages.

Thirdly, there is the democratization of knowledge, a trend that started five hundred years ago with the invention of movable type printing. Now the democratization of knowledge is suddenly accelerating thanks to the success of Wikipedia and MOOCs (Massive Online Open Courses).

The spread of knowledge will amplify a fourth circumstance that we see as a major influencer for the growth of MT, i.e. linguistic diversity. It is no longer self-evident that customers and audiences will adapt and speak ‘our’ language. English is losing its dominant role on the internet. Companies trading internationally very quickly find themselves confronted with

the need to translate into 20-30 languages. Global IT companies support up to 100 languages. Wikipedia breaks the record with

285 languages being supported. To reach 95% of the world population a goal of 400 languages must be set.

The spread of knowledge also turned the world of inventions upside down. In 2012 – for the first time – China topped the

ranking of new patent applications being filed in the world. It is no surprise that MT has been applied for years already by patent offices and commercial patent service providers. It is the only way to explore and discover prior art and stay abreast.

Zooming in on the translation industry we see paradigm shifts every decade with increases in content and number of languages and languages that need to be covered. Now most translation operators find themselves in what we at TAUS call the Globalization or Integration phase. Translation tools and resources (translation memories) are centralized and integrated in a workflow system. Enterprises start to think strategically about translation, which means that the focus is shifting from business necessity to opportunity. Understanding and speaking the languages of customers in expanding global markets leads to more revenue and more insights in how to develop better products and services. Preparing themselves for what we call the Convergence Era businesses explore how translation or linguistic expertise can be embedded in products and services. MT technology becomes an integral part of the business solution. There is no other way to keep up with the demands for producing volumes, supporting languages and listening to customers in so many markets.

In the TAUS Machine Translation Market


7

Report, we zoom in on the different types of offerings and players in the realm of machine translation. We talked to more than fifty operators of MT technology: developers, users, value-added resellers, consultants. We ran several surveys. Our conclusion is that MT is on an irreversible journey. Despite the many challenges – documented in this report as well – we expect massive adoption. However, the biggest growth will not be in pure MT related revenue. MT technology is an enabler, a crucial component in what we at TAUS call the Convergence Era. MT is not a goal on its own. It empowers many new forms of communications built in to wearable technology, search, social media, Internet of Things, Apps. Moreover, MT has entered human translation production as well, complementary to or replacing outdated translation memory tools. Post-editing MT output is likely to overtake translation memory leveraging as the primary production environment in industrial translation in the next five years.

Another take-way from this report: MT technology itself is on its way to become a commodity, shifting the Holy Grail to the data, i.e. the language data (pairs, speech and monolingual) that train the engines.

The TAUS quarterly review of language business and technology will track the perfect storm of MT, but only for plain business reasons. In other worlds: to understand the challenges and opportunities for the profession and for the buyers and providers of translation.

When it comes to understanding the world and our fellow global citizens, we speak with our columnists Lane Greene and Nicholas Ostler: there is nothing better than learning another language.


8

This summer has seen a decidedly growth-free Europe commemorate some of its most violent military savagery ever, with the centenary of the outbreak of the First World War and the 70th anniversary of the finale to the Second.

The Language of War?Bogged down in Flanders’ mud, the first war ended in stalemate and recrimination following the Versailles conference at which French was used as the diplomatic lingua franca. The second war, on the other hand, was won with the aid of massive US military might, marking the start of English language domination of post-war international organizations. Yet between these two moments of linguistic realpolitik, the first years of the 20th century saw one of the more interesting experiments in language engineering: the championing of Esperanto by European intellectuals and cosmopolitan politicians.

Created initially by one man rather than as a EC-funded project as might be the case today, Esperanto was motivated literally by the ‘hope’ that a shared artificial tongue would enable nations to avoid the kind of nationalistic fanaticism that led to the killing fields of the autumn of 1914, when French vignerons marched enthusiastically off to the front hoping to snatch victory before returning home for that year’s grape harvest.

In his quest for world peace through unilingualism, Esperanto’s inventor Zamenhof was possibly inspired by scenes from the first large-scale multilingual parliament of the Austro-Hungarian Empire, where deputies would shout at each other in their own tongues, achieving nothing and disagreeing on which language should predominate. When the world’s leaders decided to construct the first genuinely international talking shop in the 1920s – the League of Nations in Geneva – ostensibly to prevent such wars in the future, there was a strong lobby for choosing Esperanto as the preferred language of business for nations as linguistically and typographically different as China, the new Soviet Union, Spain, France

and Great Britain. But natural language won the day.

Would the adoption of Esperanto have helped extinguish the flames of global fascism and avoid a second world war? Of course not. Language can mediate, but as a cognitive structure it carries no particular ideology. And although there’s a large cohort of unilingualists in favor of English as a lingua franca in various European institutions today, all the evidence suggests that the best bet for sharing our ideas in any language will be the emerging pact between humans and their software-driven automats. But it’s the content that ultimately counts when it comes to winning battles.

Here are some of the stories that have emerged in the last quarter testifying to the concerns and hopes among companies and developers in Europe in the field of language automation.

Measure the qualityMeasuring translation quality to ensure compliance with collective or individual standards is a constant preoccupation in the translation industry: some sort of quality level is in a real sense the be-all and end-all of the communicative success of the act of translation. Some translation supplier’s use existing quality frameworks to guarantee their service; others have developed their own.This year in February, TAUS launched its online

Review of language business & technologies in Europe

by Andrew Joscelyne

9

Dynamic Quality Framework (http://tinyurl.com/ohhefor) after three years of development. The DQF consists of a knowledge base, online tools and reporting for translation quality evaluation and a content profiling tool, available on a simple monthly subscription basis to all stakeholders in the global translation industry, buyers of translation, students and academia.

In July, the Finnish-owned translation technology supplier Multilizer, issued yet another take on quality evaluation with its MT Qualifier (http://tinyurl.com/kvk7whb)that will provide an ongoing campaign of assessments of the quality of free online MT engine output, in this case for automatic translation of English to Spanish. The services evaluated comprised Apertium, Microsoft Translator, Google Translate, Linguatec, PROMT Translator, and Systran. Microsoft Translator won the highest score this time round by a small margin at 68.16%.

In August, the Czech Republic-based localization supplier Moravia released its own Language Quality Services (MLQS) (http://tinyurl.com/q9as2wv), offering a comprehensive localization quality-management solution designed to identify and resolve translation quality errors, and their root causes, with an eye on what large global brands need to keep an eye on when keeping local customers happy.

Delivering the serviceIn the realm of statistical MT systems, the whole industry is aware of the on-going MOSES program (originally an EU-funded project) that has already spawned operational systems for a number of translation suppliers worldwide, especially in Europe. The job of fine tuning and optimizing the various MOSES builds is now underway, and Precision Tools, a Thai company led by Tom Hoar, recently released an enterprise DoMT server version of MOSES (http://tinyurl.com/o4ntqbl) that aims to cut the cost and streamline the process for translation professionals.

In yet another move to speed up and simplify

the delivery of localization services for enterprises, SDL launched its own Language Cloud (http://tinyurl.com/n34c3o5) in June as a new component of its existing CEX cloud service.

Business partnering One of the more intriguing business stories in the MT sector in Europe has been the South Korean company CSLI’s bid for a majority share in the grand old lady of machine translation companies - Systran, one of the few pure play language tech companies to be listed on a stock exchange. In fact CLSI is trying to gain 95% of the capital in order to delist the company from the French bourse and own it outright. It is still not clear whether CSLI has achieved this, but in any case the company will henceforth be called Systran International and CSLI has promised to devote $2 million a year in R&D in France, as a sort of compensation.

Systran started out back in the late 1960s, when Peter Toma decided to build a commercial service out of the wave of academic and government experiments in MT in the United States. In 1986, the French industrial valve manufacturer Jean Gachot acquired the company to support his export drive and celebrate his general fascination with language. By 1990, Systran was powering the very first online translation solution in the world when it offered self-service translation on the French Minitel (http://en.wikipedia.org/wiki/Minitel), a local precursor to the internet that allowed people to use online services via a nation-wide videotext terminal.

In 2012, Samsung contracted Systran to build the S-Translator, with the support of CSLI’s services. This apparently started the M&A process in which CSLI had the idea of building a large, R&D driven company to beat the likes of Google Translate and Bing. The then CEO of Systran, Dimitris Sabatakakis has now retired. He ruffled feathers during the Snowden spy document controversy in 2013, when he was reported to have said that “without the help of Systran, the American NSA wouldn’t exist.”

Review of language business & technologies in Europeby Andrew Joscelyne

10

This suggested to many that the company’s technology was also central to France’s own military intelligence facilities, and clearance was no doubt required from Western agencies of all stripes that have used Systran since the Cold War epoch. Guillaume Naigeon, previously deputy CEO, is now heading the company.

Commercialized as different language pairs (Systran was originally a rule based MT system that hand-crafted transposition rules between phrases in two languages), one part of Systran was picked up by a Luxembourg company that pioneered an application that translated documents for the European Commission back in the 1990s. The EC has been a source of legal problems as well as income. In 2010, the European Court in Luxembourg awarded Systran $12 million in damages because the EC infringed Systran’s copyright and patent rights. But the EC has apparently decided to appeal against the lawfulness of the fine, and, once again, it is not clear that Systran is out of the legal woods. Bon voyage to Systran International.

In Europe, there are plenty of examples of success stories for growing LSPs from scratch, but very few examples of building bigger European-scale businesses through task-sharing and partnerships. So it was interesting to hear in August that the UK firm Linguagloss and the French LSP Lingua Custodia had agreed to cooperate (http://tinyurl.com/nejxrnw)around one of the more logical divisions of labor in the current translation landscape: post-editing machine-generated translation output, in this case for the financial market.

Translation for the education marketThe education sector is not usually much of a focus for the translation industry. But the rapid arrival of online education and training platforms, and the huge growth in videos

of lectures and training content is creating new demand to localize digital content that was previously geared exclusively to a single university classroom or company training facility.

In mid-July, the so-called transLectures platform (http://tinyurl.com/po2eqlq) was released following three years of development as part of an EU-funded project led by the Universitat Politecnica de Valencia (Spain). Basically it provides a software solution for transcribing and translating videos of lectures, and is already being evaluated at both VideoLectures.NET and Valencia’s own poliMedia. It includes a transLectures Player for editing automatic subtitles.

To get an idea of the power of this kind of translation technology, it’s worth listening to Alex Waibel’s talk (https://www.youtube.com/watch?v=wKgRCRzBg3c) at the European Parliament a year ago. Waibel’s speech translation company Jibbigo was sold to Facebook a couple of years ago. Waibel is also speaking at the TAUS Annual Conference in Vancouver, Canada on October 27-28. See also the conversation between Alex Waibel and Mark Seligman further on in this issue.

Meanwhile in June, ABBYY Language Services announced (http://www.pr.com/press-release/566608) that its unified cloud environment for translation automation had completed the first million words of crowdsourced localization of Coursera content into Russian subtitles. Coursera is the for-


11

profit educational technology company offering so-called “massive open online courses.” The demand for subtitling looks set to be equally massive.

New semantics company to watchLots of companies in Europe and elsewhere are trying to provide the keys to automating language understanding. They are using a combination of semantics, linked data, parsing, taxonomies and more to build the kind of skills that machines need to analyze vast quantities of text for sentiment analysis or mine databases for unexpected new facts about chemical molecules or predictions about behavior patterns. One new European company working in this space is the Austrian company Cortical (http://www.cortical.io/news.html#lead_1) which has developed a system for turning any kind of text content into a semantic fingerprint. Their emphasis is on simplicity and low power requirements, providing semantic understanding as a service via an API.

Yes, there’s an app for thatA Welsh theatre company has released an app called Sibrwd (http://www.bbc.com/news/entertainment-arts-28522598), (meaning Whisper) which allows you to follow plays in languages you don’t know by hearing selected key sentences or descriptions of speeches via earphones from your smartphone. Not a complete translation but a summary guide to the play.

If you prefer subtitles, the UK company Giojax uses 3D technology to create invisible subtitles (http://iheartsubtitles.wordpress.com/2014/07/24/invisible-subtitled-live-theatre-trial-in-the-uk/) for use by cinemas and will test it in a theatre. Audience members who wish to see the captions running during the live performance can wear 3D glasses and view the subtitles via a box situated on the theatre stage.


12

Until recently, building a content-rich multilingual web site meant investing in a multilingual content management system. CMS platforms, whose primary purpose is to enable a mix of technical and non-technical people to collaboratively edit, publish and update content, have typically not been designed with multilingual operation in mind.

Review of language business & technologies in the Americasby Brian McConnell

Popular platforms, such as WordPress (www.wordpress.com), which is used to publish millions of web sites and blogs, including some of the world’s leading media properties, were never designed for multilingual operation (although there are third party extensions that do this, none of them are especially easy to work with).

Building a truly multilingual web site, until recently, meant investing time and money in a multilingual content management system. None of them are enjoyable to work with, and most of them are extremely expensive, especially compared to platforms like WordPress, which is inexpensive, and a joy to work with.

This isn’t really the fault of the software producers, whether they are web publishing specialists, or industry leading translation companies like SDL (www.sdl.com). The problem is that it is very rare for a company to excel in more than one domain. Simply put, a company can either be competent at producing publishing software, or at producing translation software, but it’s very unlikely they can do both.

The problem with multilingual publishing systems is several fold, but basically boils down to the difficulty of dealing with translation workflow in an environment where things are constantly being published and updated. Traditional translation management systems were not designed with this sort of process in mind, and neither were many monolingual publishing systems. Typically, one has to duct

tape one system to the other, and then have staff manually keep track of what articles have been translated, and import and export those translations back and forth to the publishing system. Sometimes this can be partially automated, but it rarely works very smoothly.

Two products have recently emerged that make this process a lot easier to deal with, and enable publishers to continue using their existing publishing platform with minimal changes. The solution is to employ Javascript to dynamically translate web content within each user’s browser, using a cloud-based translation memory as the source of these translations.

The way this process works is pretty straightforward. The web site includes a block of Javascript code within the page as it is served. This code then executes when the page is finished loading, crawls through the page, and replaces each human readable block of text with its best available translation (served from a cloud-based translation memory). The original content is “always” served in its original language, but within a fraction of a second typically, the Javascript widget redraws the page into the user’s

Client Side (Javascript) Translation For Web sites

13

preferred language. With modern computing devices and browsers, this works quite well.Two companies have begun offering this type of solution in recent weeks. Transifex (www.transifex.com), which has offered a robust localization management platform for about five years now, recently rolled out Transifex Live. Localize.JS provides a similar offering. From the content producer’s perspective, both products are easy to work with. You paste some Javascript into your web site’s master HTML template and then manage translations behind the scenes via each company’s translation management tools (also web-hosted).

What’s great about this approach is that no major changes to the server environment are necessary. You make some fairly trivial changes to your HTML templates, usually no more involved than updating a style sheet, and then whenever a visitor needs translation (usually detected automatically via browser settings), the widget kicks in and loads the available translations. It’s similar to the way machine translation widgets work, except the translations are coming from a repository of human translations versus a translation bot. Meanwhile, you simply publish your content in your source language, and you don’t need to worry about translation workflows and procedures within your normal publishing and editing process.

Another big plus of these tools is that they automatically pick up incremental changes to content. If someone edits an existing blog post, for example, and adds a paragraph to it, the translation widget senses the new text and sends it to the translation cloud to be queued for translation. It doesn’t force the retranslation of an

entire static document, only the part that has changed. This is important because it is normal in online publishing to make small, incremental changes to articles. Handling these types of incremental changes within a traditional CMS, especially when there are many target languages, is a nightmare.

That said, the idea of a translation overlay is not new. Translation proxy servers, such as Smartling (www.smartling.com) and Motion Point, have been around for several years. These solutions do essentially the same thing, except the proxy server is a network resource that sits in between the end user and the originating server. What’s new with these Javascript-based approaches is that most of the computing work is moved over to the end user’s computer (which generally has lots of idle computing time at its disposal), rather than be done by a server in a data center, which needs to be paid for one way or another. As a result, this client side approach tends to be cheaper, cheap enough that tiny operations can utilize it, where proxy based solutions have relatively steep entry costs.


14

Client-side web technologies, from jQuery to HighCharts, have become popular because they solve several problems for web publishers at once. Firstly, they can serve content rather simply, and assume the browser will handle putting the finishing touches on it (this is true whether you are visiting a financial stats site or your local newspaper). Secondly, they enable the content producer to shift computing and bandwidth costs over to end users, not entirely, but enough to make a difference in their cost structure. Lastly, the technique is equally usable to both tiny and large operations, with very low entry costs. Client-side translation technology fits right in with these trends.

So if you’re looking at ways to go from single language to multi-language publishing, it’s definitely a good idea to look at solutions like Transifex Live and Localize.JS (and for larger operations, proxy-based solutions like Smartling and others).


15

Language business and technology in Asia depends on bilingual people. Crowd-translation pioneers, Gengo and Conyac, intriguingly both started in Tokyo (Japan), invite multilingual speakers to translate content such as subtitles in order to introduce local culture to the world, and vice versa. The major difference between Gengo and Conyac is their quality assurance approach. The former evaluates a translator’s ability by exams, while the latter utilizes peer-review to build up the community.

Recently, DuoLingo launched English learning courses in Japanese and Chinese. It will be interesting to observe how far this online education can reach towards its goals to cultivate translators and choose better works by voting. Flitto from South Korea is also worth

noting for its social network mechanism. It creates an incentive for the consumer by rewarding them for their translations. For instance, a Korean native speaker who is fluent in English can localize an American mobile phone app’s menu to Korean and get gift cards in return.

When it comes to crowd translation, Gengo and Conyac both encourage customers to order their services via API. MemSource, a cloud-based translation project management and CAT platform, even partners up with Gengo and utilizes the crowd-translation API as a pre-translation service. This partnership has changed the conception that pre-translation always equals translation done by machine. To push the boundaries of the API even further, a viable direction on which to embark could be software as a service (SaaS). Instead of selling translation services or computer-assisted translation (CAT) software, SaaS in the language business has begun exploring the potential of selling value-added products on-demand, with various technologies. For example, PIJIN just launched QR Translator, which enables access to localized information by a QR code, while NTT Docomo just integrated speech recognition and optical character recognition APIs to create an augmented reality of translation similar but not limited to WordLens and Waygo’s visual-only approaches. SoftBank Technology, on the other hand, is promoting FonTrans which contains the added perk of open web fonts with translation APIs for web site localizations.

As for localizations, through the New York based Smartling, startups such as the Jordanian Dakwak, OneSky from Hong Kong, and Japanese companies WOVN and Yaraku, Asian business can spread through either web or mobile channels. While Dakwak emphasizes the Search Engine Optimization (SEO) ability and OneSky provides mobile app specific functions such as translation length limit, WOVN sticks to JavaScript one-

The importance of the Asian market for translation and localization is relatively well known, as one may easily find a vast amount of market research statistics in terms of population, supply-and-demand, purchasing power parity, and the number of languages, among other things. Therefore, this review will provide a different perspective on the translation market by examining the “of whom, by whom, and for whom” for language business and technology in Asia, qualitatively.

Review of language business & technologies in Asiaby Mike Tian-Jian Jiang

Of whom, by whom, and for whom?

16

liner solution like Tolq and Google. Yaraku is bringing translation management systems from professionals to ordinary businessmen. This particular angle towards ordinary people could relate to the latest projects by the creator of Moses Machine Translation (MT) toolkits, Cognitive Analysis and Statistical Methods for Advanced Computer Aided Translation (CASMACAT), and MateCat, although they are still quite research-oriented for the time being. The ultimate goal of MateCat resembles that of other companies mentioned above: to increase translators’ productivity with the help of computation, and eventually: to benefit the communication between speakers of different languages.

In terms of computation, or more specifically, computational linguistics, perhaps because of the linguistic distance between Japanese and Western languages, such as the viewpoint on SOV vs. SVO and agglutinative language vs. fusional language, Kyoto University still leads in the field of research exploring Example-Based Machine Translation (EBMT). EBMT matches and extends the translation memory’s ability, and the border between EBMT and statistical machine translation (SMT) is getting less and less distinct. For example, Baidu-I2R Research Centre in Singapore has just completed the most accurate Vietnamese-to-English linguistics-statistics hybrid MT system in the world.

On the linguistic side, Nanyang Technology University, is concentrating on growing the resources of cross-language semantics and grammars, including Wordnet, Bahasa, Open Multilingual Wordnet, which are head-driven phrase structure grammar parsers for Japanese and Korean. Without losing the generality, readers are kindly invited to test their English parser on http://erg.delph-in.net/logon to get an idea of these systems.

At the very least, these resources can help generate paraphrases in Asian languages and then indirectly increase the coverage of translation memory in the near future.

Currently, only Microsoft provides a similar English paraphrase API. Concerning translation memory coverage, Minna no Hon’yaku and Tatoeba Project are nurturing their potential in terms of Asian languages to be the next TAUS or MyMemory in terms of Western languages. Of course, proprietary companies are also accumulating private translation memories, which could lead to the next question: how to find the right leverage to apply translation memory without leaking confidential information or violating intellectual property laws. In order to take a baby step towards the answer to this question, EBMT-alike technologies can take educated guesses on decomposing and recombining translation memories.

To date, Déjà Vu and MemoQ have a feature called AutoAssemble and Fragment Assembly, respectively. It would be a reasonable next step to consider paraphrasing and assembling to enhance positive feedback to improve both precision and recall of translation memory.

A self-disclosure here is that the author, who works for Yaraku, is also working on this very topic concerning numbers and classifiers in particular. To compose a new translation out of a translation memory automatically could be difficult, even if the only difference between them is a number. For example, besides the plural form issue, one may translate “12” in a French sentence into “a dozen” in English due to fluency concerns, not to mention a classifier is usually required in Chinese. In software localization, a possible compromise is preparing templates like “next {number_goes_here} page(s)” and replacing variables in braces with numbers later.

Despite the fact that this kind of template approach is not pretty, it creates certain possibilities so that one may deduce patterns from numeral/classifier phrases, as well as that those patterns could be in regular or context-free grammars in terms of Chomsky Hierarchy. In other words, a computational treatment is feasible. One may argue that numeral phrase


17

treatments will not help much due to their low occurrences, but they can be crucial in many domains. On patent MT task at NTCIR-10 (the 10th conference of NII Testbeds and Communities for Information access Research) in 2013, BBN Technologies, the first place winner of Chinese-English patent MT, reported that their numeral phrase treatments gained a better BLEU (Bilingual Evaluation Understudy) score efficiently.

Finally, for the dream of pursuing computational semantics, pragmatics, and the related concept of interlingual knowledge representation, not too long ago, the only application was the pivot language of MT. However, the current trend of deep learning has brought a certain kind of word sense disambiguation (disambiguating the sense of “bank” between stream bank and financial bank), back into the spotlight. For instance, since Google released the word2vec source code, and there have already been follow-ups experiments in Asian languages.

For instance, one official example of word2vec is that the result of the vector operation “vector(king) – vector(man) + vector(woman)” is close to “vector(queen)” in English. As for Japanese, turns out that one may easily group “唐揚 (karaage /KAH-rah-AH-ge/; a specific type of deep frying)” with “唐揚げ (karaage),” “から揚げ (karaage),” “空揚げ (karaage)” in terms

of vector similarity without knowing they are the same kind of deep fried cuisine. With other similar words “揚げ (/AH-ge/; deep frying)” and “鶏肉 (chicken)” in the same group, it might further imply that the dish is usually limited to chickens. Imagine working on a Japanese restaurant menu translation with word2vec, the previously confused Chinese characters and Hiragana could cross-reference each other and become useful suggestions to foreign tourists when getting a sense of what is in a meal.Besides the obvious demands of tourism-related information about restaurants, sightseeing spots, transportation, and accommodation, Asia presents other various opportunities. For joint training for ASEAN armies, translation is crucial, yet major machine translation providers, including Yandex from Russia, do not serve Burmese. The National Electronics and Computer Technology Center of Thailand then funded a Network-based ASEAN Language Translation Public Service to fill this gap. As unfortunate as it sounds, other funding sources could come from Australia or the U.S., due to the stereotype that terrorism is planned in Arabic, Malay, Indonesian, and so on.

Another stereotype would be serving typical outsourcing countries like India, where , surprisingly, English does not suffice for team building or customer service. International

pharmaceutical companies in Singapore want drug names localized, hence there is Special Interest Group on Transliteration & Transcription (SIG-T&T) under Asian Federation of Natural Language Processing (AFNLP).

Meanwhile, Ginger, OKpanda, and WritePath are focused on English skills for Asians. Ginger helps people write better articles in English by employing a grammar checker and other natural language processing tools in the freemium plan, and interestingly, only the Japanese version of its


18

web site is different from others. This could be a hint to other language businesses that the Japanese market could be a special target. Unlike Ginger’s pure machine approach to writing assistance, WritePath is focused on professional proofreading of essay/paper writing, and OKpanda takes both speech recognition technology and human tutors to teach English conversation.

As a final example, allow me to introduce myself in Japanese: “八楽のマイクと申します (I am Mike from Yaraku; Ya-raku, of, Mike, is stating)” Firstly, in many other languages, the expression could be translated into “my name is...” or “this is … speaking” depending on usages, especially to the readers who see various expressions for the same meaning as aesthetics. In a controlled language such as Japanese, however, consistency is almost always above fluency, unless the pragmatics is about honorifics, such as using “と申します” rather than “です” as sentence-final participle. Secondly, “八” may look like something about “eight” at the first glance, but it is actually a name made of the Japanese for “8 million spirits,” so it has to be transliterated with a uncommon pronunciation “ya” and excluded

from numeral treatment by translation memory. Lastly, “マイク” being my transliterated first name, is neither Japanese nor Mandarin (different from my “real” Chinese name), so it could be a cost-benefit issue for someone to decide whether to back-transliterate it to English or searching for my Chinese name.

Apart from common big players in language business and technology, Asia is formed more from the bottom-up at the grassroots level with relatively small or non-profit organizations, or anyone who is willing to understand others who live in different places, eat different foods, and speak different tongues, especially in regard to those long misunderstood cultures such as the Muslim world or the mysterious Far East. These “culture shocks” appear ranging from right-to-left scripts and writing systems without word separators to honorific connotations and spiritual interpretations. Considering situations discussed here, strategies in language business and technology may vary drastically, and represent challenges and opportunities to all of us.


19

Similarly, UNESCO states that “[t]he number of languages spoken in Africa varies between 1,000 and 2,500, depending on different estimates and definitions” (http://unesdoc.unesco.org/images/0018/001886/188642e.pdf). Another source that does provide a single figure, Tucker Childs, states that Africa is one of the most diverse continents with its 2,000 languages.

However, for the majority of the African countries this diversity has been dominated by the colonialists’ languages. I’m not going to discuss the role and dominance of English, French and Portuguese in the respective past colonies of Britain, France, Portugal, and others. At least, English and French have curtailed the use of the local languages of many African nations; they are being used as either official or working/business languages. The figure below (taken from Wikipedia: http://en.wikipedia.org/wiki/Languages_of_Africa) illustrates the major official languages across the continent.

Furthermore, not many of the indigenous African languages have been studied. Perhaps they have been studied for their historical, linguistic, cultural or educational values, but

certainly not for their application in technology and technological development or in the context of multilingual data exchange and communication. It is only very recently that some of the more widely spoken languages came to be the focus in technological development, more particularly in the multilingual internet communication technology (ICT). This concerns a mere six languages, according to Childs, namely Swahili, Hausa, Amharic, Yoruba, Igbo and Fula.

For instance, the study and application of Amharic, of which I am native, for computer and computer-based applications started just recently, in the late 1990s, early 2000s, while the studies for the now-technology-friendly languages has started in the 1950s and 60s. Localization business into Amharic is a new phenomenon. Companies such as Microsoft and Samsung that produce computer and mobile applications have now started localizing their products into Amharic.

Review of language business & technologies in Africaby Amlaku Eshetie

Africa houses many tribes speaking many different vernaculars. According to the Association for Computational Linguistics (http://tshwanedje.com/publications/AfLaT2009.pdf), the number of African languages ranges between 1,000 and 2,000. This is quite a crude estimation and does not reflect the actual number of languages in this multilingual content.

20

The introduction and development of information technology in AfricaMost parts of Africa are traditional and agrarian. Information spreads through word-of-mouth. The application of mobile and computer technologies is a luxury. The problem is not just the availability of the technologies, but the majority of the people also lack awareness and the skills to understand the benefits and to use the applications. Adegbola concludes this as:

“The development of language technology for African languages is at a rather embryonic stage. Apart from the efforts in South Africa, there are little or no coherent programmes on language technology in African universities. National language policies where they exist do not accommodate language technology issues and there is a generally low level of awareness of the benefits derivable from language technology.”

Adegbola further stresses the need to change this situation and to enable the benefits of ICT to the majority of the African community. It appears that the situation is now improving: the coverage of mobile telephones as well as internet connectivity is relatively spreading. According to a Kenyan web site, PivotEast (http://www.pivoteast.com/use-mobile-phones-encourage-information-sharing-within-agricultural-value-chains/), that quotes a report by GSMA’s “Mobile penetration across the region was 54% in 2012 and contributed over 6% of the region’s GDP.” For people in the farming and trading industry who would not have access to information otherwise mobile telephones mean a lot. They can get information about pricing of goods and products, about disease outbreaks, industry developments, and more.

Nevertheless, the availability of hardware technology with foreign language software is useless: farmers and the less educated community won’t be able to understand it. The information should be accessible in their mother tongue. Therefore, translation and localization becomes an indispensable industry

in the development endeavour of Africa.

Translation and localization practices in AfricaTranslation has been in practice for a long time. In the beginning, religious scriptures and legal documents had been the main areas of translation. The Bible and the Quran have been translated into innumerable languages. Constitutions and parliamentary procedures of different African countries have been adopted from Western languages, mainly from French and English.

In the current era of ‘information’, translation should play and does really play an essential role in Africa. Not only in religious and legal matters, but also in socioeconomic affairs. Findings of the study on the ‘Need for Translation in Africa’ by CSA (http://www.commonsenseadvisory.com/portals/0/downloads/africa.pdf) proved this “…that translation has the potential to affect nearly every aspect of human rights, safety, and wellbeing for citizens of Africa”. Information, be it for business, education, social wellbeing, or personal consumption, is mostly obtained through translations from English, French, Arabic or other developed languages (mainly Japanese and Chinese).

Localization, adapting a technological product (hardware or software) into a local language and culture, is a new and an infant phenomenon in Africa for several reasons. Firstly, the emergence of localization itself, according to Esselink, was in the 1980s along with the introduction of Desktop computers. Secondly, localization practice has been limited to a few Asian and Western languages such as Japanese, English, French and Spanish. A third, and probably not the last, reason is the low/slow penetration rate of the computer technology into Africa coupled with the peoples’ illiteracy and lack of technical knowledge for easily accessing and using computer technology. This can be substantiated by the findings of the Common Sense Advisory studies. The CSA’s reports of 2012 and 2011 show that Africa is the least participating as well as the


21

least beneficiary continent in the localization industry. According to the reports, the market share of Africa in the localization industry is just 0.26% which is far less than even 1% and the least of all the continents. None of the top 50 language service provider (LSP) companies reported in the 2011 study, were from Africa.

ConclusionThe African continent comprises over one-fourth of the languages spoken in the world. Yet these numerous languages have not developed to be awarded the status of being official languages, languages of education, business and technology. Many of them are shadowed with a couple of colonialist languages and a few dominant local languages.

Currently, however, a revival for many of the vernaculars as well as an interest for investment in Africa from the Western economy seems to be initiated. This has a direct implication on the cultures and languages of the continent. Gradually, computer and mobile technologies are penetrating Africa, and the need for translation and localization has become crucial for the success of the penetration rate and actual benefits of the technological products and services for the people.

In the next issue I will try to look at the translation and localization practices in Africa by providing some practical examples and experiences.

Get your insights, tools, metrics, data, benchmarking, contacts and knowledge from aneutral and independent industry organization.

Join TAUS!

TAUS is a think tank and resource center for the global translation industry.

Open to all translation buyers and providers, from individual translators, language service providers & buyers to governments and NGOs.taus.net


22

The Language Perspectiveby Nicholas Ostler

Of course, we know the worthy reasons why we ought to take them seriously. Greater openness to different views on what it is to be human. Better access to customers in far and growing markets. But why is this a hard sell? There must be some things which resist these arguments.

It is not enough to note that there is a dominant lingua franca in the world at the moment, which happens to be English. After all, that just gives English native-speakers an unfair advantage in the language game. Why should it put them off learning any other language, which remain valuable for all the above worthy reasons?

Are we, perhaps, riding that donkey imagined by mediaeval logician John of Buridan? Placed equally far from two equal piles of hay – both requiring a short walk to reach – the poor ass then starved to death. He needed a reason to choose one over the other, and without that was unable to come down in favour of either. Similarly, as speakers of the language that everyone else now chooses to learn first, we have a problem that they do not. Which other language to learn first? All so tempting, but each requiring an investment of effort, which might be better rewarded in one rather than another. How to know which? But the ass was still an ass, even if ruled by reason. He should have seen that the result of not choosing was worse than giving in, just a little, to the random.

In fact, I think there are two factors that

particularly discourage students. In a phrase, they are hard slog and humiliation.

The crucial early stages of language learning are made up of learning arbitrary items: new sounds, new words and meanings, new phrase-types and discourse strategies – and perhaps a new writing-system too. Inevitably they are arbitrary – hence hard to grasp – early on, before any system has become apparent; and they are unnatural – hence hard to reproduce.

They go against the grain because they are alien, marshalled by rules that are still unknown, and which, as they emerge, are quite different from the ones the learner had absorbed in childhood. In the most direct sense, to a monolingual person, they are perverse.

The effect of this is to humiliate the learner, to cut him down to size as no other subject does. From beginning to end, the learner of a foreign language is at a disadvantage, and the more she identifies with it, the smaller she feels, failing to recall, or even to understand, what is clearly child’s play for a person who has grown up with it. Practice no doubt gives improvement, but it is practice in self-mortification, so it is more like a penance than healthy exercise. One needs to concentrate

Why are foreign languages hard to learn? More to the point, why do people resist learning them? Why – at least in my country (the UK) – do children increasingly prefer to focus their school effort on other challenges? Why is the task of learning a foreign language harder to enjoy, and clearly more difficult to motivate, than other subjects?

The Crux of the Matter

23

fiercely, and even then cannot do what one can see must really be easy, because one does the equivalent so naturally in one’s own language. Why choose to live so far outside one’s comfort zone?

Yet these characteristic ordeals are the flip-side of the acknowledged joys of language-learning – joys that become evident, once you have done it.

Intellectually, nothing is more stimulating than picking out a hidden pattern revealed in foreign speech, a meaning articulated out of apparent noise – perhaps even more stimulating because the speaker (unlike the learner) does not realize how complicated it all is! What had been a pile of arbitrary bits and pieces has become a complicated dance.

And the humiliation which comes with self-adjudged failure at understanding and expressing meanings through this resistant code is turned inside out, in unreasoned joy, when one actually begins to make sense of anything – not yet, not yet for a long time, everything – in the foreign language. Simple jokes are funnier when understood – if darkly – through a foreign language. And simple good-heartedness is warmer when it penetrates the language barrier.

Is there any way to speed the transition from frustration to fruition, and from embarrassment to embrassement?

I have tended to be pessimistic about this, e.g. in my book The Last Lingua Franca. Everywhere and always, the native speakers of large-scale lingua-francas (Spanish, French, Russian…) have been notorious as bad language learners. Those obstacles of hard slog and humilation rise, even if one makes the effort, especially when the alternative is to luxuriate in a superior command of the medium which has cost them nothing to acquire, and which they like to think is a magnificent gift which their nation, their ancestors, have given to the world as a whole. Cicero, speaking in another era of another

lingua franca, said that it was not so much a credit to speak Latin, as a disgrace not to.

But there does seem to be one force which may overcome this language-learning inertia. That is loyalty. Surprisingly, nowadays, it does seem possible to recruit new speakers to a language which is not a medium of wider communication, if it is seen as a root of one’s own local identity, nationality even. The techniques of language-teaching developed to introduce people to English can then be re-purposed to give people a head-start with Kaurna in South Australia, Wampanoag in Massachusetts, Māori in New Zealand, Manx in the Isle of Man. A pragmatic retreat to use of the easiest medium of expression can be overcome by a romantic yearning to re-assert an ancient identity through the way we speak.

And while we’re at it, we can jump off that indifferent ass of Buridan. There may not be a good reason to choose it, but what could matter more than speaking the language of the place and the people that we come from?

The Language Perspectiveby Nicholas Ostler

24

MT has great promise, and it also has its practical uses already today. But it has huge challenges to overcome before people can safely skip foreign-language learning, knowing that MT will be fast, accurate, and cheap enough to replace the cumbersome learning of a foreign language. Most crucially, the difference between speech and writing must be borne in mind at all times.

Just a couple hundred of the world’s 6,000 languages are written in any serious way. Writing is an artificial and difficult skill. Just 85% of the world can read. Many of those who read can barely write. Of those who can write, a smaller still fraction can do so competently, and of those, a smaller subset still can write at a high-quality and professional level. And this has always been so. Despite the complaints of every generation of older adults that young people can’t write anymore, writing has always been a minority skill, especially skillful writing.

This matters for MT. Writing well is precisely the art of crafting sentences that are concise, unambiguous and easy to understand. The more a piece of writing has short and unambiguous sentences, the easier it will be to translate. I expect MT will make steady but real progress in making readable and accurate translations of good writing, especially when guided by human editors.

Speech is completely different. If we were ever to be able to rely on MT for speech (and so to replace English’s lingua franca function), we would need a system that would work in real time—obviously without human editing. And it would have to deal with things like this:

We need to have a much more intentional explicit plan for NATO to engage with African countries and regional organizations, uh, not because the United States is not prepared to invest in security efforts in Africa, but rather to ensure that, uh, we are not perceived as trying to uh, dominate the continent. Rather we wanna make sure that we’re prep-, uh, seen as, uh, a reliable partner, and there are some advantages to some European countries with historical ties, uh, being engaged, uh, in uh, and uh, in ha-, in, taking advantage of relationships. The francophile countries obviously is gonna to be able to do certain things better than we can, uh, and, uh, you know, one of, one of the, uh, things we, we wanna make sure of, though is that, uh, when, when the average African thinks about US, uh, engagement in Africa, I don’t want them to think our only interest is avoiding terrorists from spilling out into, uh, the world stage.

Who is responsible for this word salad?

It is Barack Obama, speaking to the editor-in-chief of The Economist on Air Force One, knowing that there was a tape recorder sitting in front of him. (The interview is on The Economist’s web site.) This is what one of the most silver-tongued politicians alive “sounds” like on the page, even when he knows what he is saying will be scrutinized.

The Journalist’s Perspectiveby Lane Greene

At TAUS’s recent conference in Dublin, Nicholas Ostler won our debate, “What is the lingua franca of the future? English or MT?”, with Nick plumping for MT. But we agreed on one crucial fact – namely, English will remain the crucial lingua franca for decades to come.

25

Note how many things have gone wrong in this randomly chosen minute-long passage. He says “rather” one too many times. He says “francophile” in a mess of a sentence in which he probably means that France will be able “to do certain things better than we can” with the francophone countries of Africa. He fails a subject-verb agreement, saying “the countries…is.” He talks of terrorism spilling “into” rather than “onto” the world stage. He says “uh” fifteen times. This is not a knock on Obama (though he really does say “uh” a lot). This is simply what educated adult conversational English looks like on the page.

It is amazing that we understand each other at all. But we do—shared knowledge and the great redundancy in natural languages means that even mangled-looking messages like this are not only understood, but understood perfectly. (Indeed, our editors cleaned up the text, per standard journalistic practice, and published a perfectly readable version online.)

By contrast, MT professionals don’t need me to tell them what would happen if you plugged this text into an MT system today. Try it anyway. And if you’re using a system like Google Translate, click on the little speaker icon to hear the translation synthesized aloud. The results do not impress.

Even granting that all technologies involved will improve, there will always be information loss at each stage. The original signal has mistakes (“the Francophile countries”). Speech recognition will add errors. MT will add its own errors. And the final speech synthesis will never convey the original pitch, emphasis pauses and so forth that help speakers understand each other in real speech.

But indulge a thought-experiment. Imagine that speech recognition were perfect—that software could render every one of Obama’s words without error. It would know to strip out the “uhs”, and would know just what to do with “wanna” and “gonna”. Then imagine that MT is perfect; the output of a top-flight system would render a fully accurate translation of the input. Then imagine a speech synthesizer rendering this all not in today’s robotic montone, but rather in natural, human-like French. Then (we’re in science-fiction land) imagine it can even correct obvious mistakes of meaning, like “francophile” for “francophone”.

We would still not have faithful translation. We would be left with a weird, sanitized version of the original, an aural version of the “creepy valley” problem. When animators make animated humans look too real, the avatars are still obviously non-human—and they creep actual humans out. And this, remember, would be what success looks like. It’s a far cry from a friendly, winding, engaging talk with an old friend, an exciting new contact, or even Barack Obama.

I predict a long and healthy future for real language-learning.

The Journalist’s Perspectiveby Lane Greene

26

The Translator’s Perspectiveby Jost Zetzsche

Each time I happily accepted the invitation in the spirit it was offered, as a request to a representative of the translation community. Naturally, a community that speaks in as many languages as the translation community -- both figuratively and literally -- cannot be represented by a lone voice, but I tried to be as inclusive as possible by querying the larger community on which topics I should present on.

For my most recent talk I ended up revisiting the topic that I had explored six years ago, the “tasks” I had “assigned” both to the translation community and the MT community that would allow us to forge sensible avenues of communication.

Here’s what I originally challenged the translation community to do:

1. Look back at its responses to translation technology in the 1990s (responses that were not particularly productive) and assess whether something could be learned from those past mistakes.

2. Put into perspective what machine translation is in relation to other translation technologies (such as translation memory).

3. Distinguish between the different forms of machine translation technology and their various kinds of application.

4. Employ machine translation as a sales tool (and if only to differentiate non-MT services from it) and use it as a productivity tool where appropriate.

While I think it’s fair to say that the translation community’s employment of machine translation has risen significantly over the last

six years (one strong indicator for that is the wide availability of MT plugins in CAT tools), it’s also true that the first three points still need a lot of work. And it seems to me that while the first task -- the attempt to put the current response to technology developments into a historical perspective -- needs to happen internally within the translation community, tasks 2 and 3 present an opportunity and a challenge for the machine translation community to provide the necessary information in a creative and well-digestible manner. This dovetails nicely into the original challenges that were thrown down to the MT community:

1. Acknowledge the origin of the data that is being used to build machine translation engines (hint: it comes from the efforts of professional translators).

2. Engage the translation community in challenging and meaningful ways.

3. Listen to the needs of the translation community.

4. Communicate with comprehensible and honest statements about what machine translation technology can and cannot do (don’t use jargon like the “pre-beta of magic” that Microsoft recently employed when describing the real-time interpretation of Skype).

I was privileged to be asked to give a keynote presentation at the 2008 meeting of the Association of Machine Translation in the Americas (AMTA), and again in 2014 at the meeting of the European Association of Machine Translation (EAMT).

Telling Stories

27

I’m not sure I’m qualified to dispense grades on how well those tasks have been completed, but my feeling is that almost anyone in the MT community would evaluate those tasks in 2014 in much the same way as they did in 2008: with the sinking feeling that a lot still needs to be done on all fronts.

As a translator, I want to encourage members of the MT community to re-evaluate your communication. I’m sure you noticed that each of these points involves communication in some way. Sure, there might be data available on all these topics in the form of research and white papers, but those kinds of data don’t capture the heart and the imagination like well-told stories. And I don’t mean fairy-tales. I mean truthful stories that change the perspective of both the story teller and the listener.

Here is one of those stories. Post-editing the output of a mediocre machine translation engine (the only kind of engine the very large majority of translators have access to) is counter-intuitive and frustrating to virtually every professional translator. Why? Because it sidelines the translator to a secondary role. Rather than being a translator-centric workflow, this is an MT-centric workflow in which the translator merely responds to a

suggestion that comes from a nebulous place that is difficult to influence. Yet there could be value in even the most mediocre of MT engines if the translator were put back into the center of the workflow, fully in control of the process. This can happen in a number of ways, including the AutoComplete suggestions of subsegments coming from MT engines or the MT-based repairing of fuzzy TM matches (I’ve described how these processes are already being used by some CAT tools today here: http://tinyurl.com/TranslatorMT).

There are plenty of other methods to make machine translation translator-centric, including dynamic improvements in machine translation suggestions, termbase- or TM-derived repairs of MT suggestions, a deep integration of morphology into any of these processes, and so on and so on. Some of these processes are already being worked on in academic settings, but a good story only comes to life when we can actually envision it. Even better, if we can actually participate in shaping the story into something that’s relevant and suitable to our professional lives its impact could be tremendous.

taus.net

The Translator’s Perspectiveby Jost Zetzsche

28

The Research Perspectiveby Luigi Muzii

I am a veteran in the translation industry, old enough to remember when FAHQMT (Fully Automatic High Quality Machine Translation) was promised as imminent, and it was roughly more than a quarter of a century ago. I have seen almost all the stages of research, the rage against the machines, the teasing and the frustration.

The magic of FAHQMT is yet to come. Five years ago, President Obama called for advanced language technology and machine translation as one of the enablers to improve “our quality of life and establish the foundation for the industries and jobs of the future.” Immediately after, the President of the American Translators Association wrote a letter to President Obama urging the government to rather invest in human language skills and promote greater awareness of and expertise in foreign languages. Two years after, Ray Kurzweil predicted that advances in translation technology would enable us to live in a society free of language barriers by the year 2029 when machines reach human levels of translation quality.

We are about to enter the convergence era. With the interconnection of uniquely identifiable embedded computing-like devices within the existing Internet infrastructure — the so-called Internet of Things (IoT) — translation will be embedded in everything we do, on every screen, in every app.

The TAUS Review will deal with the challenges and opportunities that this new era is going to bring about, with columnists zooming in on this and on the evolution of the translation

industry.

As I said, I am a veteran in the translation industry, but I am hopefully young enough to be confident in technology and research and to see what they are going to bring to us even in the immediate future.

Recently two books impressed me more than any other in the last few years. One is Robert Fogel’s The Escape from Hunger and Premature Death, 1700–2100 and the other is The Second Machine Age, by Erik Brynjolfsson and Andrew McAfee. Both books tell us we are in the exponential side of the growth curve.

In this column, we will discuss the latest news from academic circles related to all fields of translation automation. I will keep asking myself the same million-dollar questions I have been asking for the last thirty years and more: What breakthrough can we expect from research? May any breakthrough come from the academic world of translation?

For the first issue of the TAUS Review, two important things came to the forefront: the release of the proceedings of this year’s meeting of the Association for Computational Linguistics (ACL conference: http://acl2014.org/), and two papers attesting the growing interest in machine translation of the healthcare

The TAUS Review will report news and ideas from the world of language business and technology, with columnists being confronted with the shortcomings of the technology, as we constantly are in all fields of our everyday lives.

Breakthroughs from Research

29

community in the USA and in Canada.

Computational linguistics is an interdisciplinary field that is gaining more and more attention, shifting from pure theoretical to applied research on the practical outcome of modeling human language use.

Researchers in machine translation have recently been focusing on automatic learning. Not surprisingly, the awards at this year’s ACL conference went to papers on neural networks. Neural networks were very popular in the field of artificial intelligence research in the 1990s. Today, many see the combination of neural networks and big data as the new frontier in Natural Language Processing, as a method to help improve the representation of semantic information and learning from corpora by incorporating semantic roles into probabilistic models. Eventually, statistical and symbolic systems together could provide robustness and ease of automatic training.

Research in the realm of artificial intelligence is gaining momentum also in respect to machine

translation where the pure probabilistic/stochastic approach has lately been showing its limits. Today researchers are trying to modernize the traditional rule-based approach by applying neural network models for more efficient ‘hybrid’ solutions.

The Best Long Paper Award at this year’s ACL conference went to “Fast and Robust Neural Network Joint Models for Statistical Machine Translation” (http://acl2014.org/acl2014/P14-1/pdf/P14-1129.pdf) presenting the formulation of a neural network joint model for integration in MT decoders requiring no linguistic resources, no feature engineering, and only a handful of hyper-parameters.

Another interesting paper is “Recurrent Neural Networks for Word Alignment Model.” (http://acl2014.org/acl2014/P14-1/pdf/P14-1138.pdf) The authors propose a word alignment model in which an unlimited alignment history is represented by recurrently connected hidden layers.

The ACL conference proceedings are freely

taus.net


30

available for download here (http://acl2014.org/Program.htm).More news in the machine translation research field are expected from the Ninth MT Marathon (http://www.statmt.org/mtm14/). The program of talks and lectures will focus mostly on models and on practical applications. On the speculative side, deep learning for MT will possibly match the neutral network model while post-editing and quality evaluation will keep a currently hot topic on the grill.

In this respect, two papers recently contributed to revive interest in the factual application of machine translation.

In 2011, Katrin Kirchhoff and Anne Turner of the University of Washington in Seattle ran a thorough feasibility study on the application of statistical machine translation to public health information (http://jamia.bmj.com/content/18/4/473.full) to fulfill federal mandates to provide multilingual materials to individuals with limited English proficiency. The results indicated that machine translation with appropriate post-editing could be an effective method of producing multilingual health materials with equivalent quality but lower cost compared to manual translations.

Three years later, Katrin Kirchhoff and Anne Turner published a new study comparing human

and machine translation of health promotion materials for public health practice (http://www.ncbi.nlm.nih.gov/pubmed/24084391), which confirmed those results. This study took into account 25 machine and human translations, with public health translators queried about the workflow, quality, costs, and time. Human translation via local health department methods took 17 hours to 6 days, while human translation post-editing ranged from 1.58 to 5.88 words per minute. Machine translation with human post-editing ranged from 10 to 30 words per minute. The cost of human translation ranged from $130 to $1220; machine translation required no additional costs. A quality comparison by bilingual public health professionals showed that machine translation and human translation were equivalently preferred.

Finally, another study on using machine translation in clinical practice (http://www.cfp.ca/content/59/4/382.full) was recently run at the Department of Family Medicine and the Department of Epidemiology and Community Medicine of the University of Ottawa in Canada.The limited availability of professionally trained medical interpreters in community-based practices led Canadian physicians to turn to machine translation to supplement their communication with patients in clinical encounters. Although results showed that performance remains imperfect and can vary greatly between language pairs, the study reports that machine translation is increasingly used to clarify patient histories, review a clinical diagnosis, or restate the recommended treatment plan and follow-up to facilitate comprehension.


31

YOUR AD HERE

YOUR AD HERE

YOUR AD HERE

YOUR AD HERE

People-powered translation at machine-like speed.Translation should be fast, whether you’re ordering 1 word or 1 million, for yourself or for enterprise. Our platform, designed for speed, quality and high capacity, puts over 12,000+ skilled translators right at your fingertips, both online and through the Gengo API.

Experience how fast for free at gengo.com/taus

32

Speech Translation TechnologyA conversation between Alexander Waibel and Mark Seligman

When will speech translation technology be ready for general use in business?

Alex: It’s already happening. Jibbigo (now Facebook), Google, and Microsoft all offer speech translation services and operate platforms through which third party app developers have built speech translation apps (that use ASR, MT, and/or TTS from some other vendor, usually over the Internet). This trend will continue.

Mark: Yes, several apps and APIs are already available and in increasing use. (Full disclosure: I represent two of the third-party developers that Alex mentions.) The crucial point is that, as of a couple of years ago, two gating technology sets have passed the usability thresholds and converged: (1) the core (speech recognition, machine translation, and text-to-speech); and (2) the infrastructure (networking, the cloud, mobile computing, sufficient computational power and speed, the app market, etc.) Perhaps surprisingly, readiness of the infrastructure turned out to be as important as readiness of the core. However, overall readiness of the combination for prime time business use is a relative matter. Availability, usability, reliability, and customization issues certainly remain, and will work themselves out steadily but unevenly during the coming decade. Critical mass has been reached; and, with multiple deep-pockets competitors and agile camp-followers now in the game, rapid evolution, in Kevin Kelly’s sense (What Technology Wants), seems inevitable.

Alex: For Jibbigo, 2009 was a sweet spot. Speech translation has always offered two conveniences: first, of course, that you can translate languages you don’t speak; and second, that you don’t have to type in mobile environments. So you have to make sure that speech recognition is faster than typing to be useful. Else why put up with speech recognition errors? So the key advantage of ASR is increased input speed. Speech and translation should also capture enough of what you want to say in the relevant scenario, say travel, without too much limitation. To us that has meant at least a 40k-word vocabulary and near-real-time performance. So we worked to make the engines fast and small.

Then came the other factor: with the development of iPhone and iTunes, you had a distribution channel. The app market really was transformational because such a wide distribution network was previously difficult to create. Now people can use translation in all kinds of circumstances. When translation needs arise, they arise in unusual, geographically distributed situations. So having translation on the phone really makes a difference. And the hardware was fast enough to pack everything on the device.

We could sense by the summer of 2009 that things had come together and we had it made. In October of 2009 we rolled our product out.

What are the hurdles still to be taken?

Alex: First off, from a commercial point of view, the volume is inherently small now because people who don’t speak the same language usually don’t communicate with each other. I

Mark Seligman from Spoken Translation met with Alexander Waibel this week to discuss their topic of speech translation technology. Together they will host the keynote session at the TAUS Annual Conference in Vancouver on October 27 and 28. Jaap van der Meer gave them a few questions to get the conversation going.

33

believe this will only take time though. Eventually, we will be in a world where the tech offering is so pervasive that people will commonly interact with people whose language they don’t speak.

Technically, the three biggest hurdles are:

• The long tail of language: How do we build the technology not just for twenty languages but for 6,000. This will require a rethinking of how we build speech translators, from the traditional engineering/lab work to methods that involve the language communities themselves. That is, tech that will adapt to new languages without requiring technologists with expertise in each of the languages.

• Noise: Speech translation is used in many situation where the environmental noise adds considerable challenges (restaurants, bars, ..). Dealing with these types of noises robustly remains a challenge for ASR.

• Disfluent language: Most of our tech so far works on reasonably well behaved language, that is, either single sentences (e.g., Jibbigo), lectures (e.g., our Lecture Translator service), or broadcast news. When we deal with highly disfluent speech such as meetings, telephone conversations, etc., language becomes so fragmentary and syntactically ill-formed that standard translation methods cannot produce satisfactory results. New methods of content extraction, summarization, and semantic and discourse modeling will need to be considered to have successful impact in that space as well.

Mark: I agree with all of Alex’s points (though where noise is concerned, good progress has been made to date: I’ve used speech recognition on iOS platforms in very noisy rooms with good results). I’d add three more:

Alexander WaibelDr. Alexander Waibel is a Professor of Computer Science at Carnegie Mellon University,

Pittsburgh and at the Karlsruhe Institute of Technology, Germany. He is the director of the

International Center for Advanced Communication Technologies (interACT), a joint center

at eight international research institutions worldwide. The Center develops multimodal and

multilingual human communication technologies that attempt to improve human-human and

human-machine communication.

Prof. Waibel’s team developed and demonstrated the first speech translation systems in Europe and the USA

in 1990/1991 (ICASSP’91) and the first simultaneous lecture translation system in 2005, and Jibbigo, the first

commercially available speech translator product on a phone in 2009 (www.jibbigo.com). Dr. Waibel was one of

the founders and chairmen of C-STAR, the Consortium for Speech Translation Research in 1991. Since then he has

directed and coordinated many research programs in the field in the US, Europe and Asia. He currently serves as

director of EU-Bridge, a large scale multi-site European research initiative aimed to develop speech translation

services in Europe and of several US programs aimed at improving language portability and performance.

Mark SeligmanDr. Mark Seligman is founder, president, and

CEO of Spoken Translation, Inc established in

2002. His background is a unique mixture of

ivory tower and school of hard knocks. He is

both an established researcher (with a PhD in

computational linguistics from UC Berkeley,

granted in 1991) and a Silicon Valley veteran (with technical and

managerial participation in four high-tech start-ups under his belt).


34

• A serious reliability gap remains. Users won’t trust automatic translation for serious uses unless effective verification and correction capability is provided. (Full disclosure again: that’s where my own company has focused its efforts.) Equally important, once correction facilities are available, they can enable even monolingual users to collectively provide big data in support of continual machine learning (training).

• Knowledge source integration is due for a rebirth. Integration of prosodic, discourse, domain, and other knowledge was tried in the Verbmobil and related projects as a way of moving past accuracy plateaus, but the effort came much too early, when everything had to be handmade and machines were too slow. Now that statistical methods have come of age and Moore’s Law has had another twenty years, it’s time for another try, in which both statistical and symbolic methods contribute according to their abilities.

• As a related matter, semantic processing needs, and I bet will undergo, much further development. I think this expansion will proceed both top-down (through exploitation of manually created categories and ontologies) and bottom-up (through perceptually grounded learning of categories and ontologies from videos, graphics, audio, etc.).

Alex: True, noise handling is making progress when you’re speaking directly into a device. What still lags is speech into a device on a meeting table, etc., in which the ambient noise is harder to distinguish from the main signal.

What kind of applications do you see for speech translation technology? First for consumers?

Alex: Travelers; health care; government services.

Mark: Language instruction (huge!).

And second for business?

Alex: Teleconferencing; health care; government; broadcast media; social media.

Mark: Defense and intelligence; professional (e.g. legal); re: government more specifically, immigration, customs, emergency response, welfare; re: teleconferencing more specifically, B2B (e.g. internal corporate communications), B2C (e.g. customer service in retail outlets, call centers).

What do you think? Will MT become the new Lingua Franca, or will English grow as a world language and become the new Lingua Franca? Nicholas Ostler thinks that English will play a diminishing role and that linguistic diversity will be the new reality. What do you think?

Alex: I believe the world will converge on a smaller set of Linguae Francae (English, Spanish, Mandarin Chinese,..) that serve as backup languages between speakers of a region or cultural group. MT will serve to bridge between them. Regional languages will also converge from 6,000 to perhaps ~500 that remain as stable languages.

Mark: Opposing forces will probably seek balances as time goes on. On one hand, the advantages of a single Lingua Franca for interaction unmediated by human or artificial interpreters will remain in force, and English has achieved an insurmountable lead. To add


35

momentum, multimedia, networked, high-tech instruction (including speech translation) will make learning English progressively easier. On the other hand, the drags imposed by nationalism, educational gaps, personal preferences, tradition, literary and practical backlogs, etc. will also persist, while progress in speech and text translation will diminish the requirement for foreign language mastery, at least for straightforward interactions. So I’d expect that English will remain as the Lingua Franca for many decades, still a prerequisite for career advancement, international socializing, and prestige, but that its momentum will be braked by opposing forces. Just as skydivers (like Alex!) reach terminal velocity when acceleration due to gravity is balanced by wind drag, I think the juggernaut of English will reach temporary maxima when the forces for and against its spread reach a balance for a given state of the translation art.

Alex: I still think we’ll see two or three or four Linguae Francae: English, Spanish, Mandarin, and maybe Russian; and then there will be stable local languages. I don’t think people will stop speaking German, French, etc. But the current 6,000 or so languages will no doubt reduce to 500 or so. And technology will fill the remaining gaps. I doubt that every English speaker visiting China will take the trouble to learn Chinese. Nor will every taxi driver in Hong Kong learn Spanish etc.; but he will learn Mandarin, as a bridge to the other languages, e.g. for tourists. And maybe there will even be Cantonese to Spanish machine translation. The technology will complement all of the surviving languages.

Any thoughts that you want to share in reflection on my talk about the Human Language Project?

Alex: Very cool! We need more such dialogs.

Mark: Amen! Only I suspect that commercial interest and Wikipedia-style open-source volunteerism will be more significant in furthering the project(s) than government funding. I’d also like your audience and the TAUS community to be aware of, and make common cause with, the PanLex project at the Long Now Foundation, headed by Jonathan Pool and colleagues.

Alex: With respect to language preservation efforts, they’re great for historical reasons; but, practically speaking, if a community no longer wants to speak its language and has no cultural impetus, then you can’t force or alter the course of language evolution. Language is fundamentally a communication tool.

Can you tell us about MT and Speech translation at Facebook?

Alex: Nope ... other than: we are working on it. ;-) (Anyway, not yet.)

Mark: Awaiting news with bated breath. Whatever form the work takes at Facebook, the company’s interest in speech translation is one more indication of the awakening of the giants. Natural language processing will be ever more important to their bottom lines, and signs of awareness are accumulating.


36

Call for Human Language Project

Global communications is becoming a matter of data and technology. Data in this context are collections of text and speech corpora. Technology is translation automation technology. Organizations – both business and government – that do not have access to data and technology are at risk to be left out from global communications. The future of English, or of any other language, as a lingua franca is most uncertain. Linguistic diversity on the internet keeps rising.

Around the world some 7,000 languages are being spoken, four hundred of which have more than one million speakers. The vast majority of businesses and public institutions in the world ‘speak’ only one or two languages. Businesses – large and small – need to communicate in the languages of their customers. The translation services industry is struggling to keep up with the demand for capacity and speed. Translation in ten or more languages is a luxury that only the largest enterprises and rich NGOs can afford.

TAUS calls for the Human Language Project. The Human Language Project is intended to be a global collaboration between business, government, academia and individuals with the goal of making language data and technology accessible to all stakeholders in the world. The Human Language Project will be instrumental and crucial for:

• Economic growth of nations and communities with ‘smaller’ languages.• Preserving cultural heritage of lesser resourced language communities.• International trade and growth of the world economy.• Supporting many UN and NGO programs and institutions in securing and protecting health,

peace and welfare around the world.• Growing the global translation industry.

TAUS is only one of many language and translation industry organizations in the world that is concerned about the future of the translation profession and industry. We invite other industry organizations, governments, NGOs and private business and enterprises to join in this ambitious plan to form a global organization. We envision the creation of the Global Language Organization (GLO) as a not for profit organization that unites the interests of all stakeholders.GLO will undertake all necessary activities to stimulate access, creation and sharing of language and technology as shared commons. These activities will include among others:

37

• Guidance on language policies for public and private sectors• Interoperability of language data and technology• Copyright and legal guidance on language data

GLO will have a global span through a network of ‘ambassadors’ in most language communities and countries.

We invite all interested parties to join a brainstorming discussion about this Call for the Human Language Project at the TAUS Annual Conference on October 27-28 in Vancouver, where a timeline and an initial plan of action will be plotted.

taus.net

Call for Human Langauge Project

38

Life of a translator who developed his own toolsby Nicolas Gregoire

Linguists are being hampered by lack of innovation and crippling business models. After years of utter frustration, I decided to act.

When I was a translator, I remember having to pay €750 for an SDL Studio license. It was buggy, slow and incredibly hard to use. It had a ton of “advanced” functions, but I couldn’t do the simple stuff I needed to do. The language communities I used were poorly designed, subscription was expensive and it was almost impossible to get jobs at European rates because the market was pushing prices down, sometimes completely disregarding quality.

Collection agencies would not take me as a client because I was not big enough. And as the years went by, as rates were pushed down, my hourly salary was freefalling. I was spending more and more time working, instead of with my family and doing the things I love. When I wasn’t working, I was taking care of accounting, sending invoices, looking for clients, changing diapers… My home, from which I loved to work, was turning into a sweatshop. I remember having to work forty hours straight, ingesting massive amounts of caffeine, to deliver an impromptu order. When I finally went to bed, my heart started racing so erratically that I had to call an ambulance. Then it struck me: I was putting everything aside, even damaging my health, just to pay the rent. Something had to give.

I looked everywhere for a solution to my problems, to no avail. Then I realized most translators were in the same position. This is why I founded amigoCAT. Alone and mostly devoid of technical skills, I drew mockups and conceptualized essential features. Then I looked for financing. It took me a year to raise money and put together our dream team. Then we did in one year, with five developers, what other CAT tool publishers had done in three, with twice or three times our staff. It was not easy to fine tune simplicity of use and functionality.

We quickly decided against implementing the advanced functions provided by big publishers. Because 98% of translators never use them anyway. Because most of these features were created as a crutch, to fix broken, monstrous software. And most importantly, because selling new versions, with advanced functions nobody needs, cosmetic adjustments and bug fixes is how traditional publishers make their money. That’s why SDL protects its files formats. That’s why its retrocompatibility is poor. SDL wants translators to keep using their tools, even if they don’t like them, and it wants them to upgrade when a client, a provider or a colleague does.

Existing publishers do not understand that translators need more than a CAT tool. Language professionals want a tool that makes their lives better. To automate tedious administrative tasks. To protect translators. And most importantly, to help them find jobs, make more money and get their personal lives back. To migrate to the cloud, to the browser. This should have been done by traditional software publishers a long time ago. Why was it put aside?

Something has to give

39

Translators should not have to worry about sending files and invoices, getting paid or managing their schedule. They should be able to find colleagues and clients very easily. Or better yet, clients should find them. They should be able to email every client they haven’t heard from during the last six months, to calculate their mean order, their transformation rate. They should be able to calculate how much money will come in next month. Nobody programs it for them because complicated is good. In this industry, complicated sells. In a market that portrays itself as led by innovation and constant improvements, linguists cannot easily find answers to simple questions like “am I being productive this week?” or “could I improve something?”. Translators do not only deserve to shadow language service providers’ productivity gains. They need to. Modern translation is killing its children: a whole generation is plagued by economic insecurity and low wages. This constant fee decrease is pushing some of the best elements out of the industry; towards better paying jobs.

Translation is an ecosystem, not an archipelago

Another fundamental mistake in CAT tool development is to ignore LSP workflows and constraints. Let’s imagine the following scenario: an LSP has a project for translation. They upload their files or statistics to a Web application. The application generates a list of the best suited and available translators for the job. After matching client and translator, the application automatically creates the project, uploads the files, the TMs and sets the deadline. When the translator is ready to deliver, the project is closed, an invoice is automatically generated and sent with the files and a personalized message. If the client has already been created in the system, the application knows their payment terms, their contact person, address, company number, etc. When payment is due, the application notifies the client and directs them to a payment platform. If the client refuses to pay, the translator can open a dispute resolution in the system and send the client to collection, anywhere in the

taus.net


40

world. LSPs and translators can rate each other. During the project and after the project, translators can generate simple and powerful analytical graphs of their performance. They know if they’re late, if they’re early, how many hours they need to finish. Translators can also generate analytical accounting reports that can be shared.

Now is the time to create new interfaces to increase ease of use, boost productivity and lower body strain, simplifying and accelerating terminology lookup, sharing massive translation memories, flawlessly integrating machine translation and developing a real time collaboration system that works. Translators want more time, more money. And we want amigoCAT to have a meaningful impact in their lives. Every week, the average translator spends a day looking for jobs. Half a day invoicing and doing other administrative tasks. And at least half a day looking for terminology with the infamous “double click, copy, go to

browser, paste, click on search, copy, go back to CAT tool, click in active segment, paste”. When someone finally decides to listen to these very basic needs, CAT tool users will enjoy a 20% productivity boost. That’s a day and a half gained, every week. They will spend their free time working more, enjoying time with their family, or simply getting their lives back. This is what gets me out of bed in the morning. This is what the industry needs, and this is what amigoCAT aims to achieve, with our limited capabilities. We are not big, we are not rich, but that’s probably best. Nobody needs more of the same.

Nicolas Gregoire

After working in marketing and PR, Nicolas started translating in 2008. Freshly divorced, he needed to spend a lot of time at home to take care of his two year old daughter who was suffering from osteochondritis, a degenerative bone disease. It took him 2-3 years to develop a clientele but even after that, Nicolas was making so little money that his home turned into a sweatshop.

As a freelance translator, Nicolas worked with CAT tools like SDL Trados. The usability frustrated him. The simple stuff he wanted to do was not available. Then one day, while in the shower, it struck him. He leaped out and typed frantically. He showed his project to numerous translators: the feedback was great, and amigoCAT was born.

amigoCAT is a free, online CAT tool made for professionals, with simplicity in mind. The platform also automates repetitive and administrative tasks, matches LSPs with translators, pushes jobs to translators in real time and helps language professionals get the most out of their time. amigoCAT’s goal is a 20% productivity boost for translators. What would you do with all that free time?


41

noradiaz.blogspot.co.uk@NoraDiazB#Studio2014

Join the conversation

• The ribbon, particularly for new users and others who are not aware of all the settings in the old menu structure

• Much more accurate Concordance search results• Speed improvements • Virtual merge and autosave• Improved display filter • Easier access to the various help resources • New TM fields and field values are immediately available• Very stable

Nora Diaz Freelance Translator - Mexico

/sdltrados

www.sdl.com/studio2014www.translationzone.com/studio2014

Purchase or upgrade toSDL Trados Studio 2014 today

Take it further, share projects with SDL Studio GroupSharewww.translationzone.com/groupshare2014

42

Contributors

Amlaku Eshetie

Amlaku earned a BA degree in

Foreign Languages & Literature

(English & French) in 1997, and

an MA in Teaching English as a

Foreign Language (TEFL) in 2005,

both at Addis Ababa University, Ethiopia. He had been a

teacher of English at various levels until he switched to

translation and localisation in 2009. Currently, Amlaku

is the founder and manager of KHAABBA International

Training and Language Services at which he has been

able to create a big base of clients for services, such

as localisation, translation, editing & proofreading,

interpretation, voiceovers, copy writing.

Andrew Joscelyne

Andrew Joscelyne has been reporting

on language technology in Europe for

well over 20 years now. He also been a market watcher

for European Commission support programs devoted

to mapping language technology progress and needs.

Andrew has been especially interested in the changing

translation industry, and began working with TAUS from

its beginnings as a part of the communication team.

Today he sees language technologies (and languages

themselves) as a collection of silos – translation, spoken

interaction, text analytics, semantics, NLP and so on.

Tomorrow, these will converge and interpenetrate,

releasing new energies and possibilities for human

communication.

Mike Tian-Jian Jiang

Mike was the core developer of GOING

(Natural Input Method, http://iasl.iis.

sinica.edu.tw/goingime.htm), one of

the most famous intelligent Chinese

phonetic input method products.

He was also one of the core committers of OpenVanilla,

one of the most active text input method and processing

platform. He has over 12, 10, and 8 years experiences

on C++, Java, and C#, respectively. Also familiar with

Lucene and Lemur/Indri. His most important skill set is

natural language processing, especially for Chinese word

segmentation based on pattern generation/matching,

n-gram statistical language modeling with SRILM, and

conditional random fields with CRF++ or Wapiti.

Specialties: Natural Language Processing, especially for

pattern analysis and statistical language modeling.

Information Retrieval, especially for tuning Lucene and

Lemur/Indri. Text Entry (Input Method).

Brian McConnell

An inventor, author and entrepreneur,

Brian has founded four technology

companies since moving to California

in the mid 1990s. His current

company, Worldwide Lexicon, focuses

on translation and localization technology. In September

2012, his company launched xlatn.com, an online buyers

guide and consultancy for translation and localization

technology and services. In March 2013, they launched

www.dermundo.com, a multilingual link sharing service

that enables users to curate and share interesting content

across language barriers.

Specialties: Telecommunications system and software

design with emphasis on IVR, wireless and multi-modal

communications. Translation and localization technology.

Reviewers

43

Nicholas Ostler

Nicholas Ostler is author of three

books on language history, Empires

of the Word (2005), Ad Infinitum (on

Latin - 2007), and The Last Lingua

Franca (2010). He is also Chairman

of the Foundation for Endangered Languages, a global

charitable organization registered in England and Wales.

A research associate at the School of Oriental and African

Studies, University of London, he has also been a visiting

professor at Hitotsubashi University in Tokyo, and L.N.

Gumilev University in Astana, Kazakhstan. He holds an

M.A. from Oxford University in Latin, Greek, philosophy

and economics, and a 1979 Ph.D. in linguistics from

M.I.T. He is an academician in the Russian Academy of

Linguistics.

Lane Greene

Lane Greene is a business and

finance correspondent for The

Economist based in Berlin, and

he also writes frequently about

language for the newspaper and

online. His book on the politics of language around

the world, You Are What You Speak, was published by

Random House in Spring 2011. He contributed a chapter

on culture to the Economist book “Megachange”, and his

writing has also appeared in many other publications. He

is an outside advisor to Freedom House, and from 2005

to 2009 was an adjunct assistant professor in the Center

for Global Affairs at New York University.

Jost Zetzsche

Jost Zetzsche is a certified English-

to-German technical translator, a

translation technology consultant,

and a widely published author

on various aspects of translation.

Originally from Hamburg, Germany, he earned a Ph.D.

in the field of Chinese translation history and linguistics.

His computer guide for translators, A Translator’s Tool

Box for the 21st Century, is now in its eleventh edition

and his technical newsletter for translators goes out to

more than 10,000 translation professionals. In 2012,

Penguin published his co-authored Found in Translation,

a book about translation and interpretation for the

general public. His Twitter handle is @jeromobot.

Perspectives

Luigi Muzii

Luigi Muzii has been working in the

language industry for more than

30 years as a translator, localizer,

technical writer, author, trainer,

university teacher of terminology and

localization, and consultant. He has

authored books on technical writing

and translation quality systems, and

is a regular speaker at conferences.

Contributors

44

Directory of Distributors

Safaba Translation Solutions, Inc.A technology leader providing automated translation solutions that deliver superior quality and simplify the path to global presence unlike any other solution.

WelocalizeWelocalize offers innovative translation & localization solutions helping global brands grow & reach audiences around the world.

Lingo24Lingo24 delivers a range of professional language services, using technologies to help our clients & linguists work more effectively.

AppenAppen is an award-winning, global leader in language, search and social technology. Appen helps leading technology companies expand into new global markets.

LionbridgeLionbridge is the largest translation company and #1 localization provider in marketing services in the world, ensuring global success for over 800 leading brands

SYSTRANSYSTRAN is the market historic provider of language translation softwaresolutions for global corporations, public agencies and LSPs

MoraviaFlexible thinking. Reliable delivery. Under this motto, Moravia delivers multilingual language services for the world’s brand leaders.

CloudwordsCloudwords accelerates content globalization at scale, dramatically reducing the cost, complexity and turnaround time required for localization.

45

Industry Agenda

Upcoming TAUS Events

TAUS Moses Industry RoundtableVancouver, BC (Canada26 October 2014

TAUS Annual Conference 2014Vancouver, BC (Canada)27 & 28 October 2014

TAUS MT ShowcaseVancouver, BC (Canada)29 October 2014

TAUS Quality Evaluation SummitVancouver, BC (Canada)29 October 2014

Upcoming TAUS Webinars

TAUS Translation Technology Showcase

Conyac and Unbabel1 October 2014

Interverbum and CSOFT5 November 2014

Iconic Translation Machines3 December 2014

TAUS Post-Editing WebinarTurkish Language11 November 2014

TAUS Translation Quality WebinarIntegrating QE in CAT Tools and the DQF API25 November 2014

TAUS Speaks at

AMTAVancouver, BC (Canada)22 - 24 October 2014

Tekom 2014Stuttgart (Germany)11 - 13 November 2014

Translating and the Computer (ASLIB)London (United Kingdom)27 & 28 November 2014

Industry Events

Brand2GlobalLondon (United Kingdom)1 & 2 October 2014

Localization WorldVancouver, BC (Canada)30 & 31 October 2014

Do you want to have your event listed here? Write to [email protected] for information.

taus review - no.1 - october 2014

Documents