coreference recognition in arabic
DESCRIPTION
TRANSCRIPT
Coreference Recognition in
Arabic
Alshunabier,AtheerAldakheel,Bushra
Supervisor :Alsaif,Amal
Overview
Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion
Agenda
In linguistics, co-reference occurs when multiple expressions in a sentence or document refer to the same thing.For example,
"Mary said she would help me"
Introduction
Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion
Agenda
The Arabic language differs greatly from the
English language and other Germanic and Latin-based languages. There are certain grammatical differences you must know before you begin to understand the language.
Arabic is a synthetic language as opposed to an analytical one.
Arabic language trait is to leave out short vowels. difficult to read Arabic unless you have a vast knowledge of the written language.
Characteristics of the Arabic Language
Use of multiple consonants attached to a root verb to create a different meaning..
In English : In Arabic :
“He wrote“ "Aktaba.“ “He dictated"
Tri-Consonantal Root Verb
To determine the person who performs an action.
In English : In Arabic :
"drink" "sh-r-b"
"drinker" "sharib"
Active Participle
Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion
Agenda
Related work
Kehler (1997) used maximum entropy modeling to assign aprobability distribution to alternative sets of coreference relationships among noun phrase entity templates, whereas we used decision tree learning.
Ge, Hale, and Charniak (1998) used a statistical model for resolving pronouns, where as we used a decision tree learning algorithm and resolved general noun phrases, not just pronouns.
the work of Cardie and Wagstaff (1999) also falls under the machine learning approach.
Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion
Agenda
MT-based stemmer
Partition the Arabic words into clusters based on the English translations of the Arabic words. The Arabic words whose English translations, after removing English stopwords.
MT-based stemmer
Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion
Agenda
Coreference Resolution
Coreference resolution is the process of determining whether two expressions in natural language refer to the same entity in the world.
Anaphora
Anaphora is a linguistic relation between two textual entities .
Pronominal anaphora in Quran
Why is Arabic Information Extraction difficult?
The Arabic alphabet consists of 28 letters that can be extended to 90 by additional shapes, marks, and vowels.
There are two genders (masculine and feminine), three numbers (singular, dual, and plural), and three grammatical cases (nominative, genitive, and accusative)
Annotated Corpora
The number of corpora annotated both anaphorically and coreferentially have increased.
For English, there are some resources such as the Lancaster Anaphoric Treebank and other .
For Arabic , no expansion in the field of anaphorical or coreferential corpus annotation .
Annotating Tools
The annotation task of anaphoric or coreferential relations require a considerable effort from the human annotator.
such as: .Callisto3,MMAX2 and PALinkA These tools are written in Java
Introduction Characteristics of the Arabic Language Methodology Discussion Conclusion
Agenda
Conclusion