linkage of language specific synset resource center for indian language technology solutions ...

17
Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions http://www.cfilt.iitb.ac.in Computer Science and Engineering Department, IIT Bombay

Upload: cornelius-owen

Post on 23-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Linkage of Language Specific Synset

Resource Center for Indian Language Technology Solutions

http://www.cfilt.iitb.ac.in Computer Science and Engineering

Department, IIT Bombay

Page 2: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Outline

• What is Language specific synset (LSS)• What is the need of LSS• Linkage of LSS• Problems related to it• Solution to the problems

Page 3: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

What is Language specific synset

• A Language specific Synset is the synset based on the concept which is available only in a particular language, and which has no conceptual match in other languages. e.g.,

से�ल रो�टी� in Nepalisela rotiiring shaped deep fried sweet roti

made of rice flour.

Page 4: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

What is the need of LSS

• The need for LSS arise to capture the following types of lexical items in a particular language to retain the uniqueness of the language.

Lexical Uniqueness Lexical Gap Cultural Gap Pragmatic Gap Lexical Mismatch

Page 5: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Lexical Uniqueness

• Every language does posses a list of unique lexical items which refer to some unique concepts and ideas for which no conceptual equivalents are available in other languages.

e.g.,

भुत्या (in Maraathii)BhutyaaA devotee of Bhavaani devii.

Page 6: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Lexical Gap

• This refers to the phenomenon of lack of lexical equivalence between any two or more languages. When meanings of words of a language do not exactly fit into the meanings of words of the other language.

e.g., Challenge (in English)

There is no word, phrase or multi word to justify its meaning in Bangla

Page 7: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Cultural Gap

• A cultural gap may originate from socio-cultural differences between the languages. It may happen that A particular language community observes some socio-cultural rites, rituals, festivals, practices etc., which are not known to the members of another language.

e.g.,रो जा raajaaa unique socio-cultural ritual which is practiced by Oriya

language groups.

Page 8: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Pragmatic Gap• This is caused due to the differences in lexicalization

between the languages. It says that the basic concept is known to both the languages, but not expressed in the same manner. While it is expressed in a single lexicalized form in one language, it is expressed in the form of a multiword expression (i.e., phrases, idioms, etc.) in another language.

e.g.,भु नवसे, भु णवसे (in maraathii)Bhaanavasaचू�ल्हे� का पा टीa Platform behind the village cookingstove

Page 9: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Lexical mismatch

• This is a unique linguistic phenomenon where a lexical item refers a particular concept in a language, while the same lexical item refers to a different concept in another language.

e.g.,शि�क्षा शि�क्षा Shikshaa Shikshaa

punishment education, in Marathi preachment, moral etc. in Hindi

Page 10: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Linkage of LSS

• Words having LS concepts is selected by a particular language group and synsets are created in the language for the concepts by the group and, parallelly, the group creates a Hindi synsets for these concepts as well.

• LSSs created in this manner are sent to IITB. HWN group will verify and correct grammatical errors etc. of the Hindi synsets. Duplicate synsets will be deleted.

• After verification and correction, it will be sent back to the language group to see whether corrected Hindi synsets are right or not.

• If green signal is given, then it will be loaded to repository with their relations.

Page 11: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Problems in linkage

• Duplication of synsets may occur since a concept can be in other languages as well and lexicographer may not be familiar with it.

• Linkage of lexical relations e.g., antonymy relation• LSSs linked with hypernymy-hyponymy

relation.

Page 12: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Solutions

• Duplicate synsets will be nulled, as we have been doing so.

• Interface will be created to link lexical relations like antonymy.

• Brijesh will give suggestion for the third problem.

Page 13: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Linking of WordNetsLanguage specific synsetsCulture specific• Food Items• Places• Traditions

Same concept in different languages?Lexical gap• Kashmiri doesn’t have lexeme for ‘Water’, However

there is a lexeme for ‘Drinking Water’.

Modification in hierarchy?

Page 14: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Creating Language Specific Synsets

• Use hypernymy to describe gloss• Try to distinguish between co-hyponym• Define domain (Food Items, Place etc)• Translate gloss in Hindi and English

Page 15: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Common Concept HierarchyEnglish Gujarati Kannada

Uncle Kaka (Paternal Uncle) ‘Doddappa’ (Father’s elder brother)

Mama (Maternal Uncle) ‘Chikkappa’ (Father’s younger brother)

Uncle

Kaka

ChikkappaDoddappa

mama

Page 16: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Common Index Creating Common concept hierarchy for all

languages– Use concept hierarchy of Hindi language as

starting point– Add concepts and modify hierarchy for each

language– Translate gloss in Hindi & English to compare

synsets of two different languages.

Page 17: Linkage of Language Specific Synset Resource Center for Indian Language Technology Solutions  Computer Science and Engineering

Thank you