technical_trends_role_machine_translation_march15

4
CSI Communications | March 2015 | 13 Introduction to Machine Translation Machine Translation (MT) is the method of translation carried out by a computer. It is a sub category of the computational linguistics which scrutinizes employ of software to translate a plain text or vocalizations from one ordinary language to anther ordinary language. The procedure of translation is done by a computer. There is no human being involvement. This is the technique which has been found in 1950s and it is also known as automated, automatic or instant translation. In fact, the concept of machine translation has been marked out back in 17th century. The concept of “Universal Language” with different tongues and similar kind of symbol is proposed by Rene Descartes. But it becomes first field of researchers in 1950. The first public demo by Georgetown University MT research team with IBM has done in 1954. Terminologies It is necessary to get not only target language retrieval automatically but also in correct place of the result document. It is only possible whenever the right terminology has been supplied to the system of Machine Translation. Now let us see how does machine translation is working. Basically, there are two different types of Machine Translations. The first one is, rule based machine translation system and another one is statistical machine translation system. The rule based machine translation system is using mixture of language and grammar rules as well as dictionaries for ordinary words. It is also known as knowledge based machine translation. There is a special creation of dictionaries which focuses on particular industries or disciplines. This type of machine translation systems classically conveys reliable translations with accurate terminology, whenever there is proper training by special created dictionaries. Another type is statistical machine translation systems. There is no knowledge of rules about languages. But there is learning by analysis of large scaled data for each language pair. It may be trained for specialized industry sector or disciplines using further data relevant to the sector needed. Naturally, the delivery of machine translation is more fluent- sounding but less reliable translations. Statistical based translation and rule based translation are mostly matched with languages like French and Spanish. Where as, specific statistical based translation is suited for minority language. Rule based translation can perform better on languages includes Korean, Japanese, Russian and German. The differences between statistical machine translation and rule based machine translation and are given bellow: The best terminology about Machine Translation is to analyse Google’s translation. It is not stand on intellectual assumption of early machine efforts. It is also not just an algorithm which has been Role of Machine Translation for Multilingual Social Media Technical Trends Hardik A Gohel Assistant Professor, AITS, Rajkot and active Member of CSI Statistical Based Machine Translation Rule Based Machine Translation It is well again for content which is generated by user and broad domain material such as patents. It is well again for records and even software. It might translates software tags It defends software tags. Its’ more suitable on fly translation on small-shelf-life substance. It is good enough for editing by later and changes during translation. It is using most likely terms but it is not necessary that individual will prefer it. It ruins modifications to terms and relates the correct grammar. It is not predictable It is predictable. It is having longer updating cycles. (Once or twice in a year) It is faster to update (Can be on daily basis) This can be free or an open source. This is high-priced to license. It is very heavy on processing resources. It is very heavy on linguistic resources. SMT creates more flowing sentences. RBMT creates less flowing sentences. It can handle the terrible grammar as well as doesn’t get better much with unnatural authoring. It is doing appreciably better when unnatural authoring is in place. SMT can hold over 50 languages out of the box. E.g. is Google & Bing Translator. RBMT can hold 20 targeted languages out of the box. Table 1: Dierence between SMT & RBMT Fig. 1: People from Dierent Region Communicaon with help of Machine Translaon CSI Communications | March 2015 | 13

Upload: hardik-gohel

Post on 07-Aug-2015

17 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Technical_Trends_Role_Machine_Translation_march15

CSI Communications | March 2015 | 13

Introduction to Machine TranslationMachine Translation (MT) is the method

of translation carried out by a computer.

It is a sub category of the computational

linguistics which scrutinizes employ

of software to translate a plain text or

vocalizations from one ordinary language

to anther ordinary language. The procedure

of translation is done by a computer. There

is no human being involvement. This is

the technique which has been found in

1950s and it is also known as automated,

automatic or instant translation.

In fact, the concept of machine

translation has been marked out back in

17th century. The concept of “Universal

Language” with diff erent tongues and

similar kind of symbol is proposed by Rene

Descartes. But it becomes fi rst fi eld of

researchers in 1950. The fi rst public demo

by Georgetown University MT research

team with IBM has done in 1954.

TerminologiesIt is necessary to get not only target language

retrieval automatically but also in correct

place of the result document. It is only

possible whenever the right terminology

has been supplied to the system of Machine

Translation. Now let us see how does

machine translation is working. Basically,

there are two diff erent types of Machine

Translations. The fi rst one is, rule based

machine translation system and another one

is statistical machine translation system.

The rule based machine translation

system is using mixture of language and

grammar rules as well as dictionaries

for ordinary words. It is also known as

knowledge based machine translation.

There is a special creation of dictionaries

which focuses on particular industries

or disciplines. This type of machine

translation systems classically conveys

reliable translations with accurate

terminology, whenever there is proper

training by special created dictionaries.

Another type is statistical machine

translation systems. There is no

knowledge of rules about languages. But

there is learning by analysis of large scaled

data for each language pair. It may be

trained for specialized industry sector or

disciplines using further data relevant to

the sector needed. Naturally, the delivery

of machine translation is more fl uent-

sounding but less reliable translations.

Statistical based translation and rule

based translation are mostly matched

with languages like French and Spanish.

Where as, specifi c statistical based

translation is suited for minority language.

Rule based translation can perform better

on languages includes Korean, Japanese,

Russian and German.

The diff erences between statistical

machine translation and rule based

machine translation and are given bellow:

The best terminology about Machine

Translation is to analyse Google’s

translation. It is not stand on intellectual

assumption of early machine eff orts. It is

also not just an algorithm which has been

Role of Machine Translation for Multilingual Social Media

Technical Trends

Hardik A GohelAssistant Professor, AITS, Rajkot and active Member of CSI

Statistical Based Machine Translation Rule Based Machine Translation

It is well again for content which is

generated by user and broad domain

material such as patents.

It is well again for records and even software.

It might translates software tags It defends software tags.

Its’ more suitable on fl y translation on

small-shelf-life substance.

It is good enough for editing by later and

changes during translation.

It is using most likely terms but it is not

necessary that individual will prefer it.

It ruins modifi cations to terms and relates

the correct grammar.

It is not predictable It is predictable.

It is having longer updating cycles.

(Once or twice in a year)

It is faster to update (Can be on daily basis)

This can be free or an open source. This is high-priced to license.

It is very heavy on processing resources. It is very heavy on linguistic resources.

SMT creates more fl owing sentences. RBMT creates less fl owing sentences.

It can handle the terrible grammar as

well as doesn’t get better much with

unnatural authoring.

It is doing appreciably better when unnatural

authoring is in place.

SMT can hold over 50 languages out of

the box. E.g. is Google & Bing Translator.

RBMT can hold 20 targeted languages out

of the box.

Table 1: Diff erence between SMT & RBMT

Fig. 1: People from Diff erent Region Communicati on with help of Machine Translati on

CSI Communications | March 2015 | 13

Page 2: Technical_Trends_Role_Machine_Translation_march15

CSI Communications | March 2015 | 14 www.csi-india.org

intended to extract the signifi cance of an

expression from its syntax and vocabulary.

It is also not dealing with meaning.

Something that probably been said

before, instead of taking simply linguistic

expression, decoding is the principle on

which machine translation of Google is

working. It utilizes huge computing power

to search the internet within the blink of

an eye, looking for the expression in some

text which exists next to its matching

translation. The mass content scanning it

includes all the paper put out by European

Union in 24 languages, a lot the United

Nations and its agencies have ever done

in writing in 6 offi cial languages, and

large quantity of diff erent material, since

the records of international tribunals to

company reports and all the books as

well as bilingual articles from that have

been put up on the web by individuals,

booksellers, libraries, authors and

departments of academics. Drawing on

conventional patterns that already exists,

of matches between these millions of

paired documents, Google Translate uses

statistical methods to pick out the most

possible satisfactory version of what’s

been submitted to it. All most, all the time

it works. It is quite spectacular and mainly

liable for the new mood of optimism

about the prediction for “fully automated

high-quality machine translation”. Google

translate might not work exclusive of very

large pre-existing amount of translation.

It is erects upon the millions of hours to

work on human translators who

fashioned the texts which Google

Translate searches. At existing,

Google off ers two way translations,

by using machine translation, among

58 languages, that is 3,306 separate

translation services, more than ever

existed in all human history till date.

Google Translate, with the help

of Machine Translation, is providing

voice reorganization for Hindi

and other seven Indian languages

also. The latest version of Google

Translate supports Hindi, Gujarati,

Bengali, Marathi, Punjabi, Kannada, Tamil,

and Telugu, enveloping major languages

of India. Presently, Google introduced

advertisements in Hindi on its network as

there are more than 500 million people

speaking Hindi worldwide.

After analysing more about Machine Translation in Social Media, there are more than 6,000 multilingual posts.

Multilingualism, Social Media and Machine Translation Columbia Business School Centre

conducted a research study on global

brand leadership in which they have found

about their recent marketing tool. None

other than social network accounts are

most preferable tool by 85% corporations.

It includes brand accounts on Facebook,

Twitter, Google+, Foursquare and others

also. But the problem is companies are

looking for marketing in native languages

of their business market instead of English

only. Since numbers of non-English

speaking users are rising day by day, it is

necessary to communicate in their native

languages. According to search engine

journal states ascertaining a worldwide

presence across all social media platforms

will help boost your brand awareness.

Preferably, an organization’s presence on

a social media will provide as a portal to

their website. The social media is helpful

to companies to achieve their goal of

marketing across the globe.

By analysing more about social

media statistics, we have found more than

6,000 multilingual posts. The languages of

comments are one or more, or the thread-

starter, were various apparently signifying

people being able to communicate with other

people in other languages through machine

translation. The following are specifi ed

statistics of multilingual comments.

Moreover, the mainstream multilingual

posts concerned English with diff erent

language like English-Spanish and English-

Portuguese being the most frequent

combination along with bilingual threads:

The above study is related to

multilingualism of worldwide. Now let’s

have study related to multiple languages

of incredible India and its connectivity to

social media through machine translation.

As India is having diversity in cultural and

it involves lots of languages spoken by over

1.2 billion people lives in the country. Yes,

that is true that 200 million Indians are

capable to recognize English but according

to the record of 2001 half billion of Indian

population has recognized Hindi as their

mother tongue. Furthermore, if we are

talking about rural India, 43% of citizens

mentioned that they would readily adopt

social media if at all there had been content

in their respective local languages. With

Sr. No. Number of Languages

Percentage

1 Two Languages in

Comments on thread

85%

2 Three Languages in

Comments on thread

15%

3 Four+ Languages in

Comments on thread

3%

Table 2: Number of Languages on Social Media threads Comments

Fig. 2: Multi lingualism of India with Machine Translati on

CSI Communications | March 2015 | 14 www.csi-india.org

Page 3: Technical_Trends_Role_Machine_Translation_march15

CSI Communications | March 2015 | 15

the fi gure given by Accredited Language

Services (ALS) amongst top 10 common

languages spoken worldwide, following is

the position of Indian languages.

Google off ers two way translations, by using Machine Translate, between

58 languages, which is 3,306 separate translation services, more than have ever

existed in all human history till date.

The popular social media is the

twitter in which there is a line, world may

like to tweet but Japanese love to! But

the problem is Japanese are twitting in

their native language. If any multinational

company is looking forward their product

marketing by twitter in Japan, it is

mandatary to tweet in Japanese to get

maximum followers. Now if company

would go for Japanese twit only then

other nation will not understand it. So the

solution is to create, and make it update

also, the multiple twitter accounts.

Another popular social media is

Facebook. The companies are using

Facebook for marketing to globe have to

create separate pages similar to twitter.

But in 2012, Facebook has provided a

new tool to get a streamline the process

for companies for global page creation.

In this new tool any organization can set

up localized version of their cover photos,

Page apps, profi le photos, news feed

stories and about information. The version

in English might say “Hello”, for welcoming

them, where as the users who is visiting

from Spanish-speaking countries would

see “Hola”. In short, pages available

globally allows corporate to create distinct

brand identity.

Facebook at present ropes 13 Indian

languages and is determined on facilitating

all major Indic languages and on actively

advancing them on various platforms.

Mark Zuckerberg, co-founder and

chairman of Facebook, has recently met

PM Modi to discuss his plan to develop

Facebook in other Indian languages by

applying advanced machine translation.

Local language utilization growth rate is around approximate to be more

than four times than that of English language. -Google

Since last three years, the platforms

of social media have been rolling out

Machine Translation (MT) in trusts of

facilitating multilingual interactions. It is

possible that people are interacting with

each other through social media knowing

very well and having common languages.

But what about the people, who are having

Common Interests but not a Common

Language? As we have discussed above

also, companies are also working to create

distinct brand identity by multilingual

social media.

As we have mentioned above that by

using the facility of “Machine Translation”

the Facebook is the fi rst social media with

multilingual facility. Google+ and Twitter

have also started providing this facility

later on. The Machine Translation Tool,

Facebook launched, is known as New In-

Line translation tool. It allows facility of

auto translating conversations and posts on

Facebook pages. This is diverse from tool of

Google’s Machine Translation tool. This is

permitted services by Microsoft and works

on Facebook post of any individual’s profi le

as well as pages. Lets’ say example, if you

are speaking and understanding English

only and found a comment in Gujarati, you

can see the Translate button next to that

comment which allows you to pop-out

window in English. Furthermore, there is a

facility of Machine Learning Translation on

Facebook to provide better accuracy. In this,

a user can enter a human translation in that

pop-out window. If it is getting enough votes

Sr. No. Languages Pairs No. of Threads (Approx)1 English & Portuguese 2500

2 English & Spanish 1150

3 English & French 650

4 English & Italian 400

5 English & Turkish 300

6 Catalan & English 250

7 German & English 200

8 English & Vietnamese 150

9 English & Japanese 100

10 English & Russian 100

Table 3: Top 10 Threads of Social Media with Multi lingual Pairs

Indian Language Rank of Spoken Worldwide

Hindi 4th

Urdu 5th

Bengali 7th

Punjabi 10th

Table 4: Rank of Indian Languages spoken worldwide

Fig. 3: Multi lingualism & Social Media Interacti on

CSI Communications | March 2015 | 15

Page 4: Technical_Trends_Role_Machine_Translation_march15

CSI Communications | March 2015 | 16 www.csi-india.org

from other users in positive way related its

accuracy then it will replace from existing

translation while translating next time.

These all translations can be managed by

page administrators by using “Manage

Translations” link beneath posts on pages

they manage.

We have mentioned, Twitter is the

second large social media in all over the world

where only 50% of tweets are in English and

others are in various languages apart from

English. Twitter is allowing 140 characters

only so, it is not diffi cult to translate these

limited characters and its possibilities of

reordering by human translation which

is also very less. But machine translation

would be the fi rst choice of users. Machine

translation for Twitter can be considered

as domain adaptation crisis, as there is

no huge bilingual Twitter as collection of

written text. The fi eld of domain variation

has been measured signifi cant, because

the performance of a statistical machine

translation system decomposes when faced

with tasks from various type. However, the

work is mainly looked into adaptation to

domains which is similar to the types of

data training. It requires having domain

adaptation research as well as tests since

huge amount of monolingual in domain data

are freely available through its streaming

application programme interface.

Accomplishment of Machine Translation in Social Media• It is possible to have quick language

translations by using in-line

translation which is available in

multilingual social media.

• It is complimentary for all the users

of social media who are having same

interests but with diff erent languages.

• The most signifi cant accomplishment

of machine translation in multilingual

social media is, it supports and able

to use all the internet browsers.

• By using machine translation,

individual can accomplish global

communication through social

media.

• At present, multilingual social media

supports high number of languages

so people can have varieties of

languages for translation.

• It off ers links for real person

translations for suggestion if needed

and by having number of votes it

replaces existing translation to real

person translation.

• It is very useful for multinational

companies for better branding of

their products on local market.

Challenges with Machine Translation in Social Media• Till date, the machine translation

uses in social media is not 100%

accurate so it requires extending

eff orts to make it more accurate.

• It is quite diffi cult to decide which

translation is accurate or which one

is not and this is very big challenge.

• At present in social media if you are

going for real life translation, it is

having some allege for that.

• Each type of Machine Translation

is having their own drawback which

is applicable to multilingual social

media also.

• Some of the languages like

Vietnamese and other few are not

having enough content online from

where machine can learn translate.

ConclusionMultilingual social media by the facility

of machine translation is very innovative

idea to extend usage of social media. It

is not only for social interaction but for

branding of multinational products and

services worldwide. Social media is one of

the most signifi cant way to promote any

product as well as to extend network but it

was having limitation of English language

which can be understand by 50% of the

people in all over the world. So multilingual

social media with the facility of machine

translation is having some challenges but

most imperative way to give personal

touch towards communication.

References[1] Hardik Gohel “Looking Back at

the Evolution of the Internet”,  CSI

Communications - Knowledge Digest for

IT Community, 38(6), pp. 23-26 [Online].

Available at:http://www.csi-india.

org/ (Accessed: 9th February 2015).

[2] Hardik Gohel & Priyanka Sharma

“Study of Quantum Computing with

Signifi cance of Machine Learning”,  CSI

Communications - Knowledge Digest for

IT Community, 38(11), pp. 21-23 [Online].

Available at:http://www.csi-india.

org/ (Accessed: 16th February 2015).

[3] SDL (2014)  What is Machine

Translation,  Available at:http://www.

translationzone.com/products/machine-

translation/  (Accessed: 9th February

2015).

[4] Charlie White (2011) Facebook Launches

New In-Line Translation Tool,  Available

at:http://mashable.com/2011/10/06/

facebook-translation-tool/  (Accessed:

15th February 2015).

[5] Andrés Monroy-Hernández

(2014)  Multilingual Interactions

through Machine Translation—

Numbers from Socl,  Available at:http://

socialmediacollective.org/2013/10/04/

multi l ingual-interactions-through-

machine-translation-numbers-from-

socl/ (Accessed: 9th January 2015).

[6] M Vasconcellos, B Avey, C Gdaniec, L

Gerber, M León & T Mitamura (2001)

Terminology and Machine Translation,

2 edn., Amsterdam/Philadelphia: John

Benjamins.

[7] Libor Safar (2013)  Why multilingual

social media marketing is good for

business, Available at:  http://info.

moravia.com/blog/bid/265158/Why-

multilingual-social-media-marketing-is-

good-for-business-and-how-to-do-it-

right (Accessed: 11th February 2015).

[8] David Bellos (2011)  How Google

Translate works,  Available at:http://

www.independent.co.uk/life-style/

gadgets-and-tech/features/how-

google-translate-works-2353594.

html (Accessed: 16th February 2015).

[9] Lori (2014)  Machine Translation

Blog,  Available at:  http://lexworks.com/

machine-translation-blog/  (Accessed:

17th February 2015).

[10] Jasleen Kaur (2015)  Indian regional

languages emerge in Digital and Social

Media,  Available at:  http://www.

digitalvidya.com/blog/indian-regional-

languages-emerge-in-digital-and-social-

media/ (Accessed: 27th February 2015).

n

Abo

ut th

e A

utho

r

Hardik A Gohel, an academician and researcher, is an assistant professor at AITS, Rajkot and active member of CSI. His research spans Artifi cial Intelligence and Intelligent Web Applications and Services. He also focuses on “How to make popular, Artifi cial Intelligence in study of Computer Science for various reasons” He has 28 publications in Journals and proceedings of national and international conferences. He is also working as a Research Consultant. He has contributed cover stories in CSI Communication Magazine by last year and technical trends in last month. He can be reached at [email protected]

CSI Communications | March 2015 | 16 www.csi-india.org