what the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf ·...

17
What the አማርኛ is...

Upload: ngotram

Post on 11-Mar-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop

What the አማርኛ is...

Page 2: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop

What the አማርኛ is Amharic?

Page 3: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop

Amharic basics

● Ethiopia's only official language

○ Other speakers in Eritrea, Canada, US, Israel, Sweden

● Originates from the Amhara region and ethnic group in Ethiopia

● ~22 million speakers, 14.8 million monolingual

● Semitic language, second-most popular next to Arabic

Page 4: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop
Page 5: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop
Page 6: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop
Page 7: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop
Page 8: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop
Page 9: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop

Fidel● abugida

○ consonant + vowel = character

● 36 consonants × 7 vowels = 252 characters

Page 10: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop

Fidel● abugida

○ consonant + vowel = character

● 36 consonants × 7 vowels = 252 characters

Page 11: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop

Fidel● abugida

○ consonant + vowel = character

● 36 consonants × 7 vowels = 252 characters

Page 12: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop

Fidel● abugida

○ consonant + vowel = character

● 36 consonants × 7 vowels = 252 characters

Page 13: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop
Page 14: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop

Characteristics

እሱ ወደ ከተማ መጣ

Ǝssu wädä kätäma mäṭṭa.

he to city came

'He came to the city.'

● SOV

● prepositions, genitives, articles precede noun heads

○ head-final, left-branching

● Three-radical system typical of Semitic languages

○ Patterns of vowels in between 3 root

consonants, e.g. for nominalization of a verb

Page 15: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop

Challenges● no standard romanization

● reordering

● gemination

○ Doubling consonants ignored, though is contrastive (homographs)

● implicit articles

● rich morphology

○ Affixes express much of the meaning

Page 16: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop

Resources

● Word alignment with distributional approach

● Phrase-based MT with word segmentation

● Teaching NLP in Addis Ababa (future…?)

Previous work

● 232,653-word corpus from

European Language Resources

Association

○ (legal and news domain), nicely

transliterated

● 219,430-word corpus from

Ethiopian Parliament

● Quran, Bible

Page 17: What the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf · prepositions, genitives, articles precede noun heads ... Proceedings of the Second ACL Workshop

ReferencesAmharic. Ethnologue. http://www.ethnologue.com/language/amh. Accessed 26 January 2016.Amharic alphabet, pronunciation and language. Omniglot. http://www.omniglot.com/writing/amharic.htm. Accessed 26 January 2016.Amsalu, S. 2006. Data-driven Amharic-English bilingual lexicon acquisition. LREC (Genoa, 2006), 281-286.Amsalu, S. & Gibbon, D. 2005. Finite state morphology of Amharic. In Proceedings of RANLP.Argaw, A. A. & Asker, L. 2007. An Amharic stemmer: reducing words to their citation forms. In Proceedings of the 2007 Workshop on

Computational Approaches to Semitic Languages: Common Issues and Resources (Semitic '07). Association for Computational Linguistics, Stroudsburg, PA, USA, 104-110.

Gambäck, B., Eriksson, G. & Fourla, A. 2005. Natural language processing at the school of information studies for Africa. In Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 49-56.

"Language Amharic." The World Atlas of Language Structures Online. http://wals.info/languoid/lect/wals_code_amh. Accessed 26 January 2016.

OPUS: The Open Parallel Corpus. http://opus.lingfil.uu.se/. Accessed 26 January 2016.Teshome, M. G. & Besacier, L. 2012. Preliminary experiments on English-Amharic statistical machine translation. In SLTU, 36-41.