technical_trends_role_machine_translation_march15
TRANSCRIPT
CSI Communications | March 2015 | 13
Introduction to Machine TranslationMachine Translation (MT) is the method
of translation carried out by a computer.
It is a sub category of the computational
linguistics which scrutinizes employ
of software to translate a plain text or
vocalizations from one ordinary language
to anther ordinary language. The procedure
of translation is done by a computer. There
is no human being involvement. This is
the technique which has been found in
1950s and it is also known as automated,
automatic or instant translation.
In fact, the concept of machine
translation has been marked out back in
17th century. The concept of “Universal
Language” with diff erent tongues and
similar kind of symbol is proposed by Rene
Descartes. But it becomes fi rst fi eld of
researchers in 1950. The fi rst public demo
by Georgetown University MT research
team with IBM has done in 1954.
TerminologiesIt is necessary to get not only target language
retrieval automatically but also in correct
place of the result document. It is only
possible whenever the right terminology
has been supplied to the system of Machine
Translation. Now let us see how does
machine translation is working. Basically,
there are two diff erent types of Machine
Translations. The fi rst one is, rule based
machine translation system and another one
is statistical machine translation system.
The rule based machine translation
system is using mixture of language and
grammar rules as well as dictionaries
for ordinary words. It is also known as
knowledge based machine translation.
There is a special creation of dictionaries
which focuses on particular industries
or disciplines. This type of machine
translation systems classically conveys
reliable translations with accurate
terminology, whenever there is proper
training by special created dictionaries.
Another type is statistical machine
translation systems. There is no
knowledge of rules about languages. But
there is learning by analysis of large scaled
data for each language pair. It may be
trained for specialized industry sector or
disciplines using further data relevant to
the sector needed. Naturally, the delivery
of machine translation is more fl uent-
sounding but less reliable translations.
Statistical based translation and rule
based translation are mostly matched
with languages like French and Spanish.
Where as, specifi c statistical based
translation is suited for minority language.
Rule based translation can perform better
on languages includes Korean, Japanese,
Russian and German.
The diff erences between statistical
machine translation and rule based
machine translation and are given bellow:
The best terminology about Machine
Translation is to analyse Google’s
translation. It is not stand on intellectual
assumption of early machine eff orts. It is
also not just an algorithm which has been
Role of Machine Translation for Multilingual Social Media
Technical Trends
Hardik A GohelAssistant Professor, AITS, Rajkot and active Member of CSI
Statistical Based Machine Translation Rule Based Machine Translation
It is well again for content which is
generated by user and broad domain
material such as patents.
It is well again for records and even software.
It might translates software tags It defends software tags.
Its’ more suitable on fl y translation on
small-shelf-life substance.
It is good enough for editing by later and
changes during translation.
It is using most likely terms but it is not
necessary that individual will prefer it.
It ruins modifi cations to terms and relates
the correct grammar.
It is not predictable It is predictable.
It is having longer updating cycles.
(Once or twice in a year)
It is faster to update (Can be on daily basis)
This can be free or an open source. This is high-priced to license.
It is very heavy on processing resources. It is very heavy on linguistic resources.
SMT creates more fl owing sentences. RBMT creates less fl owing sentences.
It can handle the terrible grammar as
well as doesn’t get better much with
unnatural authoring.
It is doing appreciably better when unnatural
authoring is in place.
SMT can hold over 50 languages out of
the box. E.g. is Google & Bing Translator.
RBMT can hold 20 targeted languages out
of the box.
Table 1: Diff erence between SMT & RBMT
Fig. 1: People from Diff erent Region Communicati on with help of Machine Translati on
CSI Communications | March 2015 | 13
CSI Communications | March 2015 | 14 www.csi-india.org
intended to extract the signifi cance of an
expression from its syntax and vocabulary.
It is also not dealing with meaning.
Something that probably been said
before, instead of taking simply linguistic
expression, decoding is the principle on
which machine translation of Google is
working. It utilizes huge computing power
to search the internet within the blink of
an eye, looking for the expression in some
text which exists next to its matching
translation. The mass content scanning it
includes all the paper put out by European
Union in 24 languages, a lot the United
Nations and its agencies have ever done
in writing in 6 offi cial languages, and
large quantity of diff erent material, since
the records of international tribunals to
company reports and all the books as
well as bilingual articles from that have
been put up on the web by individuals,
booksellers, libraries, authors and
departments of academics. Drawing on
conventional patterns that already exists,
of matches between these millions of
paired documents, Google Translate uses
statistical methods to pick out the most
possible satisfactory version of what’s
been submitted to it. All most, all the time
it works. It is quite spectacular and mainly
liable for the new mood of optimism
about the prediction for “fully automated
high-quality machine translation”. Google
translate might not work exclusive of very
large pre-existing amount of translation.
It is erects upon the millions of hours to
work on human translators who
fashioned the texts which Google
Translate searches. At existing,
Google off ers two way translations,
by using machine translation, among
58 languages, that is 3,306 separate
translation services, more than ever
existed in all human history till date.
Google Translate, with the help
of Machine Translation, is providing
voice reorganization for Hindi
and other seven Indian languages
also. The latest version of Google
Translate supports Hindi, Gujarati,
Bengali, Marathi, Punjabi, Kannada, Tamil,
and Telugu, enveloping major languages
of India. Presently, Google introduced
advertisements in Hindi on its network as
there are more than 500 million people
speaking Hindi worldwide.
After analysing more about Machine Translation in Social Media, there are more than 6,000 multilingual posts.
Multilingualism, Social Media and Machine Translation Columbia Business School Centre
conducted a research study on global
brand leadership in which they have found
about their recent marketing tool. None
other than social network accounts are
most preferable tool by 85% corporations.
It includes brand accounts on Facebook,
Twitter, Google+, Foursquare and others
also. But the problem is companies are
looking for marketing in native languages
of their business market instead of English
only. Since numbers of non-English
speaking users are rising day by day, it is
necessary to communicate in their native
languages. According to search engine
journal states ascertaining a worldwide
presence across all social media platforms
will help boost your brand awareness.
Preferably, an organization’s presence on
a social media will provide as a portal to
their website. The social media is helpful
to companies to achieve their goal of
marketing across the globe.
By analysing more about social
media statistics, we have found more than
6,000 multilingual posts. The languages of
comments are one or more, or the thread-
starter, were various apparently signifying
people being able to communicate with other
people in other languages through machine
translation. The following are specifi ed
statistics of multilingual comments.
Moreover, the mainstream multilingual
posts concerned English with diff erent
language like English-Spanish and English-
Portuguese being the most frequent
combination along with bilingual threads:
The above study is related to
multilingualism of worldwide. Now let’s
have study related to multiple languages
of incredible India and its connectivity to
social media through machine translation.
As India is having diversity in cultural and
it involves lots of languages spoken by over
1.2 billion people lives in the country. Yes,
that is true that 200 million Indians are
capable to recognize English but according
to the record of 2001 half billion of Indian
population has recognized Hindi as their
mother tongue. Furthermore, if we are
talking about rural India, 43% of citizens
mentioned that they would readily adopt
social media if at all there had been content
in their respective local languages. With
Sr. No. Number of Languages
Percentage
1 Two Languages in
Comments on thread
85%
2 Three Languages in
Comments on thread
15%
3 Four+ Languages in
Comments on thread
3%
Table 2: Number of Languages on Social Media threads Comments
Fig. 2: Multi lingualism of India with Machine Translati on
CSI Communications | March 2015 | 14 www.csi-india.org
CSI Communications | March 2015 | 15
the fi gure given by Accredited Language
Services (ALS) amongst top 10 common
languages spoken worldwide, following is
the position of Indian languages.
Google off ers two way translations, by using Machine Translate, between
58 languages, which is 3,306 separate translation services, more than have ever
existed in all human history till date.
The popular social media is the
twitter in which there is a line, world may
like to tweet but Japanese love to! But
the problem is Japanese are twitting in
their native language. If any multinational
company is looking forward their product
marketing by twitter in Japan, it is
mandatary to tweet in Japanese to get
maximum followers. Now if company
would go for Japanese twit only then
other nation will not understand it. So the
solution is to create, and make it update
also, the multiple twitter accounts.
Another popular social media is
Facebook. The companies are using
Facebook for marketing to globe have to
create separate pages similar to twitter.
But in 2012, Facebook has provided a
new tool to get a streamline the process
for companies for global page creation.
In this new tool any organization can set
up localized version of their cover photos,
Page apps, profi le photos, news feed
stories and about information. The version
in English might say “Hello”, for welcoming
them, where as the users who is visiting
from Spanish-speaking countries would
see “Hola”. In short, pages available
globally allows corporate to create distinct
brand identity.
Facebook at present ropes 13 Indian
languages and is determined on facilitating
all major Indic languages and on actively
advancing them on various platforms.
Mark Zuckerberg, co-founder and
chairman of Facebook, has recently met
PM Modi to discuss his plan to develop
Facebook in other Indian languages by
applying advanced machine translation.
Local language utilization growth rate is around approximate to be more
than four times than that of English language. -Google
Since last three years, the platforms
of social media have been rolling out
Machine Translation (MT) in trusts of
facilitating multilingual interactions. It is
possible that people are interacting with
each other through social media knowing
very well and having common languages.
But what about the people, who are having
Common Interests but not a Common
Language? As we have discussed above
also, companies are also working to create
distinct brand identity by multilingual
social media.
As we have mentioned above that by
using the facility of “Machine Translation”
the Facebook is the fi rst social media with
multilingual facility. Google+ and Twitter
have also started providing this facility
later on. The Machine Translation Tool,
Facebook launched, is known as New In-
Line translation tool. It allows facility of
auto translating conversations and posts on
Facebook pages. This is diverse from tool of
Google’s Machine Translation tool. This is
permitted services by Microsoft and works
on Facebook post of any individual’s profi le
as well as pages. Lets’ say example, if you
are speaking and understanding English
only and found a comment in Gujarati, you
can see the Translate button next to that
comment which allows you to pop-out
window in English. Furthermore, there is a
facility of Machine Learning Translation on
Facebook to provide better accuracy. In this,
a user can enter a human translation in that
pop-out window. If it is getting enough votes
Sr. No. Languages Pairs No. of Threads (Approx)1 English & Portuguese 2500
2 English & Spanish 1150
3 English & French 650
4 English & Italian 400
5 English & Turkish 300
6 Catalan & English 250
7 German & English 200
8 English & Vietnamese 150
9 English & Japanese 100
10 English & Russian 100
Table 3: Top 10 Threads of Social Media with Multi lingual Pairs
Indian Language Rank of Spoken Worldwide
Hindi 4th
Urdu 5th
Bengali 7th
Punjabi 10th
Table 4: Rank of Indian Languages spoken worldwide
Fig. 3: Multi lingualism & Social Media Interacti on
CSI Communications | March 2015 | 15
CSI Communications | March 2015 | 16 www.csi-india.org
from other users in positive way related its
accuracy then it will replace from existing
translation while translating next time.
These all translations can be managed by
page administrators by using “Manage
Translations” link beneath posts on pages
they manage.
We have mentioned, Twitter is the
second large social media in all over the world
where only 50% of tweets are in English and
others are in various languages apart from
English. Twitter is allowing 140 characters
only so, it is not diffi cult to translate these
limited characters and its possibilities of
reordering by human translation which
is also very less. But machine translation
would be the fi rst choice of users. Machine
translation for Twitter can be considered
as domain adaptation crisis, as there is
no huge bilingual Twitter as collection of
written text. The fi eld of domain variation
has been measured signifi cant, because
the performance of a statistical machine
translation system decomposes when faced
with tasks from various type. However, the
work is mainly looked into adaptation to
domains which is similar to the types of
data training. It requires having domain
adaptation research as well as tests since
huge amount of monolingual in domain data
are freely available through its streaming
application programme interface.
Accomplishment of Machine Translation in Social Media• It is possible to have quick language
translations by using in-line
translation which is available in
multilingual social media.
• It is complimentary for all the users
of social media who are having same
interests but with diff erent languages.
• The most signifi cant accomplishment
of machine translation in multilingual
social media is, it supports and able
to use all the internet browsers.
• By using machine translation,
individual can accomplish global
communication through social
media.
• At present, multilingual social media
supports high number of languages
so people can have varieties of
languages for translation.
• It off ers links for real person
translations for suggestion if needed
and by having number of votes it
replaces existing translation to real
person translation.
• It is very useful for multinational
companies for better branding of
their products on local market.
Challenges with Machine Translation in Social Media• Till date, the machine translation
uses in social media is not 100%
accurate so it requires extending
eff orts to make it more accurate.
• It is quite diffi cult to decide which
translation is accurate or which one
is not and this is very big challenge.
• At present in social media if you are
going for real life translation, it is
having some allege for that.
• Each type of Machine Translation
is having their own drawback which
is applicable to multilingual social
media also.
• Some of the languages like
Vietnamese and other few are not
having enough content online from
where machine can learn translate.
ConclusionMultilingual social media by the facility
of machine translation is very innovative
idea to extend usage of social media. It
is not only for social interaction but for
branding of multinational products and
services worldwide. Social media is one of
the most signifi cant way to promote any
product as well as to extend network but it
was having limitation of English language
which can be understand by 50% of the
people in all over the world. So multilingual
social media with the facility of machine
translation is having some challenges but
most imperative way to give personal
touch towards communication.
References[1] Hardik Gohel “Looking Back at
the Evolution of the Internet”, CSI
Communications - Knowledge Digest for
IT Community, 38(6), pp. 23-26 [Online].
Available at:http://www.csi-india.
org/ (Accessed: 9th February 2015).
[2] Hardik Gohel & Priyanka Sharma
“Study of Quantum Computing with
Signifi cance of Machine Learning”, CSI
Communications - Knowledge Digest for
IT Community, 38(11), pp. 21-23 [Online].
Available at:http://www.csi-india.
org/ (Accessed: 16th February 2015).
[3] SDL (2014) What is Machine
Translation, Available at:http://www.
translationzone.com/products/machine-
translation/ (Accessed: 9th February
2015).
[4] Charlie White (2011) Facebook Launches
New In-Line Translation Tool, Available
at:http://mashable.com/2011/10/06/
facebook-translation-tool/ (Accessed:
15th February 2015).
[5] Andrés Monroy-Hernández
(2014) Multilingual Interactions
through Machine Translation—
Numbers from Socl, Available at:http://
socialmediacollective.org/2013/10/04/
multi l ingual-interactions-through-
machine-translation-numbers-from-
socl/ (Accessed: 9th January 2015).
[6] M Vasconcellos, B Avey, C Gdaniec, L
Gerber, M León & T Mitamura (2001)
Terminology and Machine Translation,
2 edn., Amsterdam/Philadelphia: John
Benjamins.
[7] Libor Safar (2013) Why multilingual
social media marketing is good for
business, Available at: http://info.
moravia.com/blog/bid/265158/Why-
multilingual-social-media-marketing-is-
good-for-business-and-how-to-do-it-
right (Accessed: 11th February 2015).
[8] David Bellos (2011) How Google
Translate works, Available at:http://
www.independent.co.uk/life-style/
gadgets-and-tech/features/how-
google-translate-works-2353594.
html (Accessed: 16th February 2015).
[9] Lori (2014) Machine Translation
Blog, Available at: http://lexworks.com/
machine-translation-blog/ (Accessed:
17th February 2015).
[10] Jasleen Kaur (2015) Indian regional
languages emerge in Digital and Social
Media, Available at: http://www.
digitalvidya.com/blog/indian-regional-
languages-emerge-in-digital-and-social-
media/ (Accessed: 27th February 2015).
n
Abo
ut th
e A
utho
r
Hardik A Gohel, an academician and researcher, is an assistant professor at AITS, Rajkot and active member of CSI. His research spans Artifi cial Intelligence and Intelligent Web Applications and Services. He also focuses on “How to make popular, Artifi cial Intelligence in study of Computer Science for various reasons” He has 28 publications in Journals and proceedings of national and international conferences. He is also working as a Research Consultant. He has contributed cover stories in CSI Communication Magazine by last year and technical trends in last month. He can be reached at [email protected]
CSI Communications | March 2015 | 16 www.csi-india.org