exploring the performance of tagging for the classical and...

11
Research Article Exploring the Performance of Tagging for the Classical and the Modern Standard Arabic Dia AbuZeina 1 and Taqieddin Mostafa Abdalbaset 2 1 College of Information Technology and Computer Engineering, Palestine Polytechnic University, Hebron, State of Palestine 2 Palestine Technical University–Kadoorie, AL-Aroub Branch, Hebron, State of Palestine Correspondence should be addressed to Dia AbuZeina; [email protected] Received 7 August 2018; Accepted 23 October 2018; Published 23 January 2019 Guest Editor: Omar Abu Arqub Copyright © 2019 Dia AbuZeina and Taqieddin Mostafa Abdalbaset. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e part of speech (PoS) tagging is a core component in many natural language processing (NLP) applications. In fact, the PoS taggers contribute as a preprocessing step in various NLP tasks, such as syntactic parsing, information extraction, machine translation, and speech synthesis. In this paper, we examine the performance of a modern standard Arabic (MSA) based tagger for the classical (i.e., traditional or historical) Arabic. In this work, we employed the Stanford Arabic model tagger to evaluate the imperative verbs in the Holy Quran. In fact, the Stanford tagger contains 29 tags; however, this work experimentally evaluates just one that is the VB imperative verb. e testing set contains 741 imperative verbs, which appear in 1,848 positions in the Holy Quran. Despite the previously reported accuracy of the Arabic model of the Stanford tagger, which is 96.26% for all tags and 80.14% for unknown words, the experimental results show that this accuracy is only 7.28% for the imperative verbs. is result promotes the need for further research to expose why the tagging is severely inaccurate for classical Arabic. e performance decline might be an indication of the necessity to distinguish between training data for both classical and MSA Arabic for NLP tasks. 1. Introduction e part of speech (PoS) tagging, also known as word- category disambiguation, is a process to determine the tag of each word in a given input text. e tagging process uses the context to label words using syntactic tags, such as noun, adjective, verb, or preposition that are also known as parts of speech, word-classes, grammatical categories, lexical class markers, or syntactic categories. Tagging is performed either manually by linguistic experts or automatically by machine learning algorithms; intuitively, this work considers the computational track. Word tags are mainly used to describe the words and their jobs according to the context for further processing. at is, each word has a particular role based on the position and the adjacent words in the sentence. e tagset is a predefined list that generally includes some symbols, such as nouns, pronouns, adjectives, verbs, adverbs, propositions, conjunctions, and the definite and indefinite articles (sometimes called “determiners”). Of course, the tagset is prepared by the language linguistic scholars to describe the language’s membership or word family. e size of the tagset is variable and depends on the requirements or the capacity of developing applications. In any case, the tagset should best fit and efficiently serve the intended purposes. Hence, there is no predefined tagset for all languages and there is no standard (i.e., unique) tagset for a certain language. Rather, it is a debatable matter. e PoS is increasingly becoming a vital factor in the related natural language processing (NLP) applications. In fact, creating knowledge base resources (e.g., tag relation- ships) is one objective of the PoS tagging that can be later used in other NLP tools. In fact, PoS tagging has many roles in the field of NLP as a basic prepossessing step. For instance, some of NLP PoS tagging based applications include syntactic parsing, information extraction, machine translation, speech synthesis, and named entity recognition (NER). is work is aimed at exploring the performance of the PoS for the classical Arabic using a modern standard Arabic (MSA) tagger that is the Stanford tagger [1]. Since it is difficult to evaluate the Stanford tagger for all tags (29 tags) as it requires Hindawi Advances in Fuzzy Systems Volume 2019, Article ID 6254649, 10 pages https://doi.org/10.1155/2019/6254649

Upload: others

Post on 04-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploring the Performance of Tagging for the Classical and ...downloads.hindawi.com/journals/afs/2019/6254649.pdf · ResearchArticle Exploring the Performance of Tagging for the Classical

Research ArticleExploring the Performance of Tagging for the Classical andthe Modern Standard Arabic

Dia AbuZeina 1 and Taqieddin Mostafa Abdalbaset2

1College of Information Technology and Computer Engineering Palestine Polytechnic University Hebron State of Palestine2Palestine Technical UniversityndashKadoorie AL-Aroub Branch Hebron State of Palestine

Correspondence should be addressed to Dia AbuZeina abuzeinappuedu

Received 7 August 2018 Accepted 23 October 2018 Published 23 January 2019

Guest Editor Omar Abu Arqub

Copyright copy 2019 Dia AbuZeina and Taqieddin Mostafa Abdalbaset This is an open access article distributed under the CreativeCommons Attribution License which permits unrestricted use distribution and reproduction in any medium provided theoriginal work is properly cited

The part of speech (PoS) tagging is a core component in many natural language processing (NLP) applications In fact thePoS taggers contribute as a preprocessing step in various NLP tasks such as syntactic parsing information extraction machinetranslation and speech synthesis In this paper we examine the performance of a modern standard Arabic (MSA) based taggerfor the classical (ie traditional or historical) Arabic In this work we employed the Stanford Arabic model tagger to evaluatethe imperative verbs in the Holy Quran In fact the Stanford tagger contains 29 tags however this work experimentally evaluatesjust one that is the VB equiv imperative verb The testing set contains 741 imperative verbs which appear in 1848 positions in the HolyQuran Despite the previously reported accuracy of the Arabic model of the Stanford tagger which is 9626 for all tags and 8014for unknown words the experimental results show that this accuracy is only 728 for the imperative verbs This result promotesthe need for further research to expose why the tagging is severely inaccurate for classical Arabic The performance decline mightbe an indication of the necessity to distinguish between training data for both classical and MSA Arabic for NLP tasks

1 Introduction

The part of speech (PoS) tagging also known as word-category disambiguation is a process to determine the tagof each word in a given input text The tagging processuses the context to label words using syntactic tags such asnoun adjective verb or preposition that are also known asparts of speech word-classes grammatical categories lexicalclass markers or syntactic categories Tagging is performedeither manually by linguistic experts or automatically bymachine learning algorithms intuitively this work considersthe computational track Word tags are mainly used todescribe the words and their jobs according to the context forfurther processing That is each word has a particular rolebased on the position and the adjacent words in the sentenceThe tagset is a predefined list that generally includes somesymbols such as nouns pronouns adjectives verbs adverbspropositions conjunctions and the definite and indefinitearticles (sometimes called ldquodeterminersrdquo) Of course thetagset is prepared by the language linguistic scholars to

describe the languagersquos membership or word family The sizeof the tagset is variable and depends on the requirements orthe capacity of developing applications In any case the tagsetshould best fit and efficiently serve the intended purposesHence there is no predefined tagset for all languages andthere is no standard (ie unique) tagset for a certain languageRather it is a debatable matter

The PoS is increasingly becoming a vital factor in therelated natural language processing (NLP) applications Infact creating knowledge base resources (eg tag relation-ships) is one objective of the PoS tagging that can be laterused in other NLP tools In fact PoS tagging has many rolesin the field of NLP as a basic prepossessing step For instancesome ofNLPPoS tagging based applications include syntacticparsing information extraction machine translation speechsynthesis and named entity recognition (NER) This workis aimed at exploring the performance of the PoS for theclassical Arabic using a modern standard Arabic (MSA)tagger that is the Stanford tagger [1] Since it is difficult toevaluate the Stanford tagger for all tags (29 tags) as it requires

HindawiAdvances in Fuzzy SystemsVolume 2019 Article ID 6254649 10 pageshttpsdoiorg10115520196254649

2 Advances in Fuzzy Systems

a large annotated corpus the Quranic imperative verbs werechosen in the evaluation process The Stanford tagger usesthe label VB to mark the imperative verbs That is this workis restricted to a testing dataset that contains a list of allimperative verbs in the Holy Quran that is obtained from[2]This work is distinguished by presenting an experimentalstudy of the classical Arabic performance using one of thefreely available taggers and therefore making it clear forcomparison purposes This work also aims to demonstratethe tagging problems from different points of view suchas the Arabic PoS tagging benefits and challenges tagsetscapacities tagging algorithms and the recent studies in thisfield

In spite of the importance of taggersrsquo performance forboth classical andMSAArabic few studies have explored theaccuracy for the classical Arabic On the other hand most ofthe previous studies focused on the tagsets and the taggingapproaches For instance one study [3] proposed an Arabictagset with detailed hierarchical levels of the categories andtheir relationships (ie a tree of different levels) As indicatedthis study focused on the imperative verbs in theHoly QuranThe reason for choosing the imperative is that it is easier tofind such annotated testing collections due to the previouseffort of Arabic scholars to serve the Quranic studies Inaddition theArabic language is distinguished to have a stand-alone form of imperative verbs whereas it is mixed with thepresent verb as found in the English language For instancethe English language has the verb ldquogordquo as an imperative andpresent verb while the same verbs have a different form in theArabic language as the imperative is ldquo

rdquo and the presentis ldquo

rdquo which are completely different words in terms oftranscription and tense

Even though the documentation of the Stanford tagger[4] indicates that the accuracy of the Arabic model is 9626on an MSA test portion as described in [5] and 8014 forunknown words our measure shows extremely less accuracyIn this work the Stanford tagger scored only 728 accuracyfor a collection of Arabic imperative verbs It is worthindicating that the Stanford tagger works at word level (iethe tag is given to the whole word instead of its parts suchas prefixes stems and suffixes as some other taggers do)Despite diacritics playing an important role in the taggingprocess nevertheless they are discarded in this work sincethe Stanford tagger does not consider the diacritics of theinput text However we do keep the Hamza (eg

) and the

Madd () symbols in the corresponding characters That is

the testing dataset is a nonvocalized Arabic text The outputof this work highlights the importance of reinvestigatingthe tagging problem for the Arabic language since many ofprevious studies report accuracies into the nineties percentileReinvestigation includes different aspects of training data aseither classical or MSA the tagsets the corpora sizes etc

The rest of this paper is organized as follows In the nextsection we demonstrate the benefits of the tagging for variousNLP applications In Section 3 we present why tagging is achallenging task We exhibit the literature review in Section 4followed by the Stanford tagset in Section 5 The proposed

method is described in Section 6 and the experimental resultsin Section 7 Finally we conclude in Section 8

2 The Benefits of Tagging

The PoS tagging is the core of many NLP algorithms dueto the useful information it gives about a word and itsneighbors In fact NLP applications employ the output ofthe PoS tagging for different purposes such as checkingthe correctness of the syntactic structure around the wordFor instance regarding the Arabic language adjectives arepreceded by nouns while nouns are preceded by adjectivesin the English language such as ldquoA beautiful schoollarrrarr13

13 rdquo Similarly nouns are preceded by verbs inthe English such as ldquoHe runs fastrdquo while the Arabic allowsboth directions such as ldquo 13 larrrarr the teacher

writes the lessonrdquo and ldquo 13 larrrarr theteacher writes the lessonrdquo Therefore the Google translatorgives the same translation for two different word orderArabic sentences Hence knowing the syntax of word orderis extremely important for some NLP applications since itlimits the output candidates and increases the probabilities ofcorrect answers The following are some of NLP applicationsthat utilize the PoS tagging

(i) Capturing common syntactical rules [6] presentsa data mining based method to extract the com-mon syntactical rules in the Holy Quran The studyreported that the common relationships between thewordsrsquo tags (ie the common rule) are tag1=RPtag2=NN tag3=WP 91 997904rArr tag4=VBD 90 accuracy(097912) Formore information of the tags the readerrefers to Section 5 in this paper

(ii) Enhancing the performance in speech recognition[7] employs the PoS to generate new words basedon the neighboring word tags The study used com-pound nouns that are followed by adjectives and thepreposition followed by any word After recognitionthe compound words were placed back to theiroriginal states (ie two parts) This method showsperformance enhancement

(iii) Named Entity Recognition (NER) [8] employs atagger for named entities recognition (NER) NERaims at extracting the names such as people orga-nizations locations cities or companies NER isbeneficial for certain applications such as classifyingcontent for news providers This facilitates catego-rization and content discovery NER also speeds upthe search process in sizeable data that containsfor instance millions of articles Other applicationsinclude using powering content recommendationscustomer support and research papers

(iv) Syntactic Parsing [9] employs the PoS tagging forsyntactic parsing Syntactic parsing is a process toconfirm that the input sentence follows the languagersquosformal grammar Figure 1 shows a parsing tree fora simple sentence The parsing tree represents the

Advances in Fuzzy Systems 3

sentence

noun phrasenoun phrase verb phrase

noun nounverbarticle

the cat ate mouse

Figure 1 An example of a parsing tree

syntactic structure of the text and is mainly used foranalyzing the input sentence

(v) Other PoS tagging based applications includesemantic role labeling [10] speech synthesis [11]speech recognition [12] information extraction [13]summarization [14] sentiment analysis also calledopinion mining [15] diacritization [16] softwareengineering [17] question answering [18] translation[19] plagiarism detection [20] key phrases extraction[21] ontology [22] and extracting Arabic noun com-pound [23]

3 The Challenge of Tagging

That fact that a word can take different tags makes the PoStagging a challenging task That is a word can be labeled bydifferent tags based on the context Therefore the goal of thePoS tagging algorithms is to remove such ambiguity and labelthe words correctly Table 1 shows some examples of wordsthat take different tags based on the context As shown in thetable the word ldquogold larrrarr rdquo in sentence 1 is taggedas VBD (verb past tense) while it is tagged as NN (nounsingular or mass) in sentence 2 Similarly the word ldquoSaidlarrrarr rdquo in the first sentence is tagged as NNP (propernoun singular) while it is tagged as JJ (adjective) in sentence3 This shows how a particular word can have different labelswhich is the challenge of the PoS tagging process Hencethe problem of the PoS tagging is to resolve ambiguities bychoosing the proper tag considering the surrounded wordsOf course the absence of diacritics in the Arabic formalwriting system adds evenmore ambiguity For instance thereis no ambiguity to know that the diacritized word ldquogoldlarrrarr rdquo is a noun and the diacritized word ldquowent larrrarr

rdquo is a verb The figure also shows the tagging output for thetranslated sentences using the English model of the Stanfordtagger

4 Literature Review

Despite the importance of the PoS tagging for both MSAand classical Arabic most of the previous tagging studieshave mainly focused on the MSA In addition the literatureshows there is an active research to consider suitable tagsets

that truly reflect the linguistic items of Arabic as one ofthe morphologically rich languages In this literature wedemonstrate the up-to-date Arabic tagging research whichfocused on the main aspects and components such as thetype of the training text (ie MSA classical tweets) tagsetstagging algorithms unknown words stemming In [30] thestudy indicated that stemming (ie removing prefixes andpostfixes or suffixes) enhances the tagging performance In[31] the study presented a method to tag tweets that isusually written out of the formal and proper spelling of thelanguage In [28] the study considered a method to handlethe ldquounknown wordsrdquo which are the words that did notappear in the training corpus In [26] the study consideredthe problem which arises when estimating the transitionprobabilities in limited amounts of training data The studyproposed decision trees basedmethod to handle this problemthat generally occurs in the hidden Markov models (HMM)tagging technique In [32] the study implemented themaster-slave technique for the PoS tagging they used HMM as amaster tagger and maximum match (MM) and Brill taggersas slaves There are many approaches to perform the PoStagging the most widely used is the statistical approach thatis based on the HMM Another approach is not-statisticalwhich is on a number of hand-crafted disambiguation rulesto find the most appropriate tag for each word as in[33]

The recent studies of part of speech tagging includedifferent aspects For instance [34] developed a part of speechtagger for the Arabic heritage They scored an accuracy of9622They also reported that themost of the tagging errorsare results of segmentation Reference [35] employs part ofspeech tagging to enhance the performance of Arabic textclassification Reference [36] demonstrates part of speechtagging for the Arabic Gulf dialect For the tagging processthey employ Support Vector Machine (SVM) classifier andbidirectional Long Short Term Memory (Bi-LSTM) Refer-ence [37] presents a tagging based study regarding Arabicdialects identification Reference [38] uses part of speechand semantic tagging to extract features for training NeuralMachine Translation

Table 2 presents some information regarding taggingsystems such as tagging algorithms tagsets corpora andaccuracies We are aware that the accuracy is not a mattersince each work has its own corpus nevertheless reportingthese measures might give an indication of the overallaccuracy of the Arabic PoS tagging Similarity even the tagsetsize is important however it is more important to haveenough training set to cover the tags used otherwise zerovaluesmight be assigned to theHMM transition probabilitieswhich raises a tagging problem

5 Stanford Tagset

As indicated in the literature review there are many tagsetsthat are used in the previous studies Mainly the tags aredivided into two classes (ie categories) which are closedclass and open classThe closed class has a fixedmembershipsuch as prepositions while the open class can accept new

4 Advances in Fuzzy Systems

Tabl

e1So

meS

tanf

ordtagg

edAra

bica

ndEn

glish

sent

ence

s

Inpu

tsen

tenc

esan

dth

etra

nslatio

nus

ingGoo

glet

rans

lator

Sent

ence

1Se

nten

ce2

Sent

ence

313

13 $

13 amp

( )

+

( -13

Said

wen

ttoscho

olTh

ewin

nerg

otag

oldne

cklace

Hap

pyda

ywew

ishyo

uStan

ford

tagg

erou

tput

s(Ara

bicm

odel)

DTN

N13

IN N

NP

VBD

NN

IN

N

N13

$13 IN

amp

DTN

N

( ) V

BD

+VBG

V

BP(

-13 JJ

NN

Stan

ford

tagg

erou

tput

s(En

glish

mod

el)Sa

idN

NPwen

tVBD

toT

Oscho

olN

NTh

eDT

win

nerNN

gotV

BDaDT

goldJJ

neck

lace

NN

Hap

pyJJ

dayNN

wePR

Pwish

VBP

you

PRP

Advances in Fuzzy Systems 5

The proposed algorithm1 Obtain the text of the Holy Quran from [24] and remove the diacritics2 Install the full version of the Stanford Arabic model tagger from [25]3 Have the text of the Holy Quran tagged4 Obtain a list of all imperative verbs in the Holy Quran from [2]5 Find all words that have the tag VB equiv imperative verb6 Compare the two lists the one we obtained in step 5 and the list we obtained in step 4 to find the correctly tagged imperative verb7 Find the accuracy based on the information that is obtained in step 6

Algorithm 1 Evaluating imperative verbs using the Stanford tagger

Table 2 Some of the literature tagging research

No Ref Tagging Method Tagset Size Corpus Size Accuracy(tags) (words) ()

1 [26] A decision tree based tagger 110 78K amp500K 9165 997888rarr 97182 [27] Support Vector Machines (SVM) 24 140K 95493 [28] Hidden Markov Models 24 29300 950 997888rarr 9714 [29] SVM and a Neural network 21 6844 9105 [1] Maximum Entropy based tagger 29 588244 961

words especially in the technology fields as ldquoto faxrdquo Table 3shows the 29 tags of the Arabic model of the Stanford tagger

6 The Proposed Method

This section presents the steps that we follow to find theperformance of the Stanford tagger against the Quranicimperative verbs The first step is the tagging process thatproduces an annotated text file of the entire Quranic sen-tencesThenwe used a number of Python programs to extractthe correctly tagged imperative verbs as well as the wronglytagged imperative verbs etc The textual version of theHoly Quran is obtained from the Quran Printing ComplexSaudi Arabia website [36] Algorithm 1 summarizes theimplemented steps

The input testing set is the nondiacritized textual form ofthe Holy Quran Figure 2 shows what the testing set lookslike The figure contains the first chapter or Surah of theHoly Quran (Surat al-Fatih ahmdashThe Opening) in additionto the first three sentence of the second chapter (SuratAl-BaqarahmdashThe Cow) Figure 3 shows the output of theStanford tagger for the Quranic sentences that appear inFigure 2 As it is observed Figure 3 shows some correctlytagged words such as the following 0

13)131 DTJJ 13-

VBD WP 2 VBP The figure also shows some

wrongly tagged words such as the following 034NNP 5(VBD ( VBD

The tagger output that is shown in Figure 3 is the maincontent that can be used for further analysis to find thebehavior of the tagger Of course the correctly tagged wordsare required (ie the correct labels of the testing words) inorder to measure the accuracy which adds more difficulty inthis kind of research In other words if we want to measure

the accuracy for the ldquoentirerdquo Holy Quran we have to preparean annotated version of the Holy Quran which is a difficulttask This is why we chose a subset that contains only theimperative verbs

7 The Experimental Results

For the evaluation we used the full Stanford tagger (129MB) that is freely available at the website of the Stanfordnatural language processing group through the link [37]It is relatively simple to execute the tagger by running thecommand shown in Figure 4 in the Windows CommandPrompt program That is the tagger does not require specialsystems as we run it on the Command prompt of theWindows 10 home operating system The figure shows that77749 words are tagged in a very short time

The experimental results are demonstrated in Table 4The table exposes the information regarding the imperativeverbs however this work can be expanded to measurethe performance for different tags such as noun or verbSimilarly it is possible to find the performance of the Stanfordtagger regarding the prepositions in the Holy Quran inwhich the same steps can be followed to get the accuracyfor prepositions or the overall accuracy of all tags Finallyexploring the performance for the Stanford tagger as wellas for the other taggers will lead to discover more weaknesspoints to be avoided in future NLP systems

8 Conclusions

This work explored the performance of the Stanford taggerfor the Arabic language The experimental results show theimportance of distinguishing between training data whenpreparing taggers That is the tagger that is prepared forpoetry is different from the tagger that is prepared for prose

6 Advances in Fuzzy SystemsTa

ble3

Thet

agse

toft

heStan

ford

Ara

bicm

odel

tagg

er

Tag

Mea

ning

with

exam

ples

Ta

gM

eani

ngwith

exam

ples

1DTJ

JDT

+Ad

jective

16PR

PPe

rson

alpr

onou

n13

13 )6

7 8 9

7 lt

713 = )

)

(gt

lt 7

7A

2DTJ

JRDT

+Ad

jective

com

parativ

e17

PRP$

Possessiv

epro

noun

13

13 (13 ) -

13 ) 6

A=

7(

7B

C

D

(gt

E 3

DTN

NDT

+Nou

nsin

gularo

rmass

18RB

Adve

rb F

0G

7H(

713 gt(

713 -

=

2(

F I

J

75(

4DTN

NP

DT

+Pr

oper

noun

sin

gular

19RP

Particle

F

amp0G

K

1 713

(13 )

713 L

A )

F

79

75

DTN

NS

DT

+Nou

nplur

al20

VB

Verb

the

impe

rativ

efor

mD

0G

13 M-J

713 M( 9

7

13 M(

1

7

13 7N

7

6IN

Prep

ositi

onor

subo

rdin

atin

gco

njun

ction

(B

+|

F

)21

VBD

Verb

pas

tten

se

F

O(

2(P713

(13 7 amp

7JJ

Adjective

22VBG

Verb

ger

undor

presen

tpar

ticiple

QH13

7( 13 amp

713 R

+13 )6

J 7D gtG

713 ( 13 713

8JJR

Adjective

com

parativ

e23

VBN

Verb

pastp

artic

iple

13 (13 ) (

13 S13 T

13 13 ) 6

U 7

BC V

7 U

HWX

- 0

13 13

7

7(13 )

9NN

Nou

nsin

gularo

rmas

s24

VBP

Verb

non

3rdpe

rson

singu

larp

rese

ntQ

- R

N( +

13 0G

13 7

E lt 7

Y (13

10NNP

Prop

erno

uns

ingu

lar

25VN

Verb

3rd

person

singu

larp

rese

ntQ13 amp

713

X 1R

amp0G

0

7

2(

7Z [

H )

0G

11NNS

Nou

nplur

al26

WP

Whpr

onou

n

H6

0G

7

B

13 D

0G

13 M(

713 M( I

713 M(13 13

12NOUN

QUA

NT

Nou

nqu

antit

y27

WRB

Whad

verb

8

7P7

J 0J 7D

2(

F I

( -P7

] V

7J

13CC

Coo

rdin

atin

gco

njun

ction

28ADJNUM

Adjective

Num

eric

]=

F[

7(-V7[

7^J lt

(

13 S13

J (J

713

7D (1

14

CDCa

rdin

alnu

mbe

r29

UH

Interje

ctionun

usua

lkin

dof

wor

dQE

7$

P7EWR

(13

D 7J M

$J 7 C ) 7

13 8( amp

15DT

Dem

onstr

ativep

rono

uns

13 (J 9

_( gtG

7`

7Z

7

Advances in Fuzzy Systems 7

Tabl

e4

Thee

xper

imen

talr

esults

Mea

sure

Total

Then

umbe

rofim

perativ

everbs

inth

eHolyQur

an18

48ve

rbs

Then

umbe

rofim

perativ

everbs

after

rem

ovin

gdu

plicates

741v

erbs

Then

umbe

rofw

ords

tagg

edas

impe

rativ

everbs

(ie

VB)

282wor

dsTh

ecor

rectly

tagg

edim

perativ

everbs

byco

mpa

ringwith

thec

orrect

list

This

lead

stoth

ecor

rectly

tagg

edlis

ttha

tinc

lude

s

54wor

dsA

7

= 7

- 13 )13

( 7[ aJ 3 47

(

7[ aJ 3 4(

7 (

7

7

[C 6

7N

( 7[

(

713 13

7[

C 6(

7 (

7P7[

C 13 amp(

7

713

7N

7

( 7[

V 7

amp(

7 2 (

7 ) amp7

7

13 7

7[

( 6[

77Y

7 13 (

7 V

7

7

V7

7

(

7

7 = 47

1

7 amp

713 [

7

(

7

[ = 47

6

7[

bG(

7[

amp(

7

13 7 amp

7N

6(

7 X )

713 )

13 c(

713

7-

amp(

7

13 C amp(

Ther

estist

hewro

ngly

classifi

edve

rbs(

iec

lassifi

edou

tofV

Btag)

This

listi

nclude

sfori

nstanc

e68

7

[7

( 7

7(

(

7(

7

[

7$ 7

7

E W

[7

0 7

( [

7 [

This

listi

nten

tiona

llyco

ntains

wor

dsth

atha

veth

esam

eroo

tldquo equiv

enterrdquoTh

isgive

sanin

dica

tionof

ther

ichn

essa

ndth

ederivative

natu

reof

theA

rabicl

angu

age

Accu

racy=

Cor

rectly

tagg

ged

impe

rativ

eve

rbs

All

impe

rativ

eve

rbs

Accu

racy=54

741=728

8 Advances in Fuzzy Systems

Figure 2 A part of Quranic testing set

Figure 3 A part of the Stanford tagger output

Figure 4 The running command of the Stanford tagger

Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort

towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research

Advances in Fuzzy Systems 9

References

[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000

[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814

[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017

[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef

ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006

[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013

[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011

[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008

[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016

[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002

[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006

[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007

[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008

[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018

[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017

[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015

[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015

[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012

[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003

[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015

[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015

[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015

[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015

[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp

stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-

ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016

[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004

[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016

[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018

[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017

[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017

[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015

[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000

[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018

[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018

10 Advances in Fuzzy Systems

[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012

[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017

[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772

[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015

[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 2: Exploring the Performance of Tagging for the Classical and ...downloads.hindawi.com/journals/afs/2019/6254649.pdf · ResearchArticle Exploring the Performance of Tagging for the Classical

2 Advances in Fuzzy Systems

a large annotated corpus the Quranic imperative verbs werechosen in the evaluation process The Stanford tagger usesthe label VB to mark the imperative verbs That is this workis restricted to a testing dataset that contains a list of allimperative verbs in the Holy Quran that is obtained from[2]This work is distinguished by presenting an experimentalstudy of the classical Arabic performance using one of thefreely available taggers and therefore making it clear forcomparison purposes This work also aims to demonstratethe tagging problems from different points of view suchas the Arabic PoS tagging benefits and challenges tagsetscapacities tagging algorithms and the recent studies in thisfield

In spite of the importance of taggersrsquo performance forboth classical andMSAArabic few studies have explored theaccuracy for the classical Arabic On the other hand most ofthe previous studies focused on the tagsets and the taggingapproaches For instance one study [3] proposed an Arabictagset with detailed hierarchical levels of the categories andtheir relationships (ie a tree of different levels) As indicatedthis study focused on the imperative verbs in theHoly QuranThe reason for choosing the imperative is that it is easier tofind such annotated testing collections due to the previouseffort of Arabic scholars to serve the Quranic studies Inaddition theArabic language is distinguished to have a stand-alone form of imperative verbs whereas it is mixed with thepresent verb as found in the English language For instancethe English language has the verb ldquogordquo as an imperative andpresent verb while the same verbs have a different form in theArabic language as the imperative is ldquo

rdquo and the presentis ldquo

rdquo which are completely different words in terms oftranscription and tense

Even though the documentation of the Stanford tagger[4] indicates that the accuracy of the Arabic model is 9626on an MSA test portion as described in [5] and 8014 forunknown words our measure shows extremely less accuracyIn this work the Stanford tagger scored only 728 accuracyfor a collection of Arabic imperative verbs It is worthindicating that the Stanford tagger works at word level (iethe tag is given to the whole word instead of its parts suchas prefixes stems and suffixes as some other taggers do)Despite diacritics playing an important role in the taggingprocess nevertheless they are discarded in this work sincethe Stanford tagger does not consider the diacritics of theinput text However we do keep the Hamza (eg

) and the

Madd () symbols in the corresponding characters That is

the testing dataset is a nonvocalized Arabic text The outputof this work highlights the importance of reinvestigatingthe tagging problem for the Arabic language since many ofprevious studies report accuracies into the nineties percentileReinvestigation includes different aspects of training data aseither classical or MSA the tagsets the corpora sizes etc

The rest of this paper is organized as follows In the nextsection we demonstrate the benefits of the tagging for variousNLP applications In Section 3 we present why tagging is achallenging task We exhibit the literature review in Section 4followed by the Stanford tagset in Section 5 The proposed

method is described in Section 6 and the experimental resultsin Section 7 Finally we conclude in Section 8

2 The Benefits of Tagging

The PoS tagging is the core of many NLP algorithms dueto the useful information it gives about a word and itsneighbors In fact NLP applications employ the output ofthe PoS tagging for different purposes such as checkingthe correctness of the syntactic structure around the wordFor instance regarding the Arabic language adjectives arepreceded by nouns while nouns are preceded by adjectivesin the English language such as ldquoA beautiful schoollarrrarr13

13 rdquo Similarly nouns are preceded by verbs inthe English such as ldquoHe runs fastrdquo while the Arabic allowsboth directions such as ldquo 13 larrrarr the teacher

writes the lessonrdquo and ldquo 13 larrrarr theteacher writes the lessonrdquo Therefore the Google translatorgives the same translation for two different word orderArabic sentences Hence knowing the syntax of word orderis extremely important for some NLP applications since itlimits the output candidates and increases the probabilities ofcorrect answers The following are some of NLP applicationsthat utilize the PoS tagging

(i) Capturing common syntactical rules [6] presentsa data mining based method to extract the com-mon syntactical rules in the Holy Quran The studyreported that the common relationships between thewordsrsquo tags (ie the common rule) are tag1=RPtag2=NN tag3=WP 91 997904rArr tag4=VBD 90 accuracy(097912) Formore information of the tags the readerrefers to Section 5 in this paper

(ii) Enhancing the performance in speech recognition[7] employs the PoS to generate new words basedon the neighboring word tags The study used com-pound nouns that are followed by adjectives and thepreposition followed by any word After recognitionthe compound words were placed back to theiroriginal states (ie two parts) This method showsperformance enhancement

(iii) Named Entity Recognition (NER) [8] employs atagger for named entities recognition (NER) NERaims at extracting the names such as people orga-nizations locations cities or companies NER isbeneficial for certain applications such as classifyingcontent for news providers This facilitates catego-rization and content discovery NER also speeds upthe search process in sizeable data that containsfor instance millions of articles Other applicationsinclude using powering content recommendationscustomer support and research papers

(iv) Syntactic Parsing [9] employs the PoS tagging forsyntactic parsing Syntactic parsing is a process toconfirm that the input sentence follows the languagersquosformal grammar Figure 1 shows a parsing tree fora simple sentence The parsing tree represents the

Advances in Fuzzy Systems 3

sentence

noun phrasenoun phrase verb phrase

noun nounverbarticle

the cat ate mouse

Figure 1 An example of a parsing tree

syntactic structure of the text and is mainly used foranalyzing the input sentence

(v) Other PoS tagging based applications includesemantic role labeling [10] speech synthesis [11]speech recognition [12] information extraction [13]summarization [14] sentiment analysis also calledopinion mining [15] diacritization [16] softwareengineering [17] question answering [18] translation[19] plagiarism detection [20] key phrases extraction[21] ontology [22] and extracting Arabic noun com-pound [23]

3 The Challenge of Tagging

That fact that a word can take different tags makes the PoStagging a challenging task That is a word can be labeled bydifferent tags based on the context Therefore the goal of thePoS tagging algorithms is to remove such ambiguity and labelthe words correctly Table 1 shows some examples of wordsthat take different tags based on the context As shown in thetable the word ldquogold larrrarr rdquo in sentence 1 is taggedas VBD (verb past tense) while it is tagged as NN (nounsingular or mass) in sentence 2 Similarly the word ldquoSaidlarrrarr rdquo in the first sentence is tagged as NNP (propernoun singular) while it is tagged as JJ (adjective) in sentence3 This shows how a particular word can have different labelswhich is the challenge of the PoS tagging process Hencethe problem of the PoS tagging is to resolve ambiguities bychoosing the proper tag considering the surrounded wordsOf course the absence of diacritics in the Arabic formalwriting system adds evenmore ambiguity For instance thereis no ambiguity to know that the diacritized word ldquogoldlarrrarr rdquo is a noun and the diacritized word ldquowent larrrarr

rdquo is a verb The figure also shows the tagging output for thetranslated sentences using the English model of the Stanfordtagger

4 Literature Review

Despite the importance of the PoS tagging for both MSAand classical Arabic most of the previous tagging studieshave mainly focused on the MSA In addition the literatureshows there is an active research to consider suitable tagsets

that truly reflect the linguistic items of Arabic as one ofthe morphologically rich languages In this literature wedemonstrate the up-to-date Arabic tagging research whichfocused on the main aspects and components such as thetype of the training text (ie MSA classical tweets) tagsetstagging algorithms unknown words stemming In [30] thestudy indicated that stemming (ie removing prefixes andpostfixes or suffixes) enhances the tagging performance In[31] the study presented a method to tag tweets that isusually written out of the formal and proper spelling of thelanguage In [28] the study considered a method to handlethe ldquounknown wordsrdquo which are the words that did notappear in the training corpus In [26] the study consideredthe problem which arises when estimating the transitionprobabilities in limited amounts of training data The studyproposed decision trees basedmethod to handle this problemthat generally occurs in the hidden Markov models (HMM)tagging technique In [32] the study implemented themaster-slave technique for the PoS tagging they used HMM as amaster tagger and maximum match (MM) and Brill taggersas slaves There are many approaches to perform the PoStagging the most widely used is the statistical approach thatis based on the HMM Another approach is not-statisticalwhich is on a number of hand-crafted disambiguation rulesto find the most appropriate tag for each word as in[33]

The recent studies of part of speech tagging includedifferent aspects For instance [34] developed a part of speechtagger for the Arabic heritage They scored an accuracy of9622They also reported that themost of the tagging errorsare results of segmentation Reference [35] employs part ofspeech tagging to enhance the performance of Arabic textclassification Reference [36] demonstrates part of speechtagging for the Arabic Gulf dialect For the tagging processthey employ Support Vector Machine (SVM) classifier andbidirectional Long Short Term Memory (Bi-LSTM) Refer-ence [37] presents a tagging based study regarding Arabicdialects identification Reference [38] uses part of speechand semantic tagging to extract features for training NeuralMachine Translation

Table 2 presents some information regarding taggingsystems such as tagging algorithms tagsets corpora andaccuracies We are aware that the accuracy is not a mattersince each work has its own corpus nevertheless reportingthese measures might give an indication of the overallaccuracy of the Arabic PoS tagging Similarity even the tagsetsize is important however it is more important to haveenough training set to cover the tags used otherwise zerovaluesmight be assigned to theHMM transition probabilitieswhich raises a tagging problem

5 Stanford Tagset

As indicated in the literature review there are many tagsetsthat are used in the previous studies Mainly the tags aredivided into two classes (ie categories) which are closedclass and open classThe closed class has a fixedmembershipsuch as prepositions while the open class can accept new

4 Advances in Fuzzy Systems

Tabl

e1So

meS

tanf

ordtagg

edAra

bica

ndEn

glish

sent

ence

s

Inpu

tsen

tenc

esan

dth

etra

nslatio

nus

ingGoo

glet

rans

lator

Sent

ence

1Se

nten

ce2

Sent

ence

313

13 $

13 amp

( )

+

( -13

Said

wen

ttoscho

olTh

ewin

nerg

otag

oldne

cklace

Hap

pyda

ywew

ishyo

uStan

ford

tagg

erou

tput

s(Ara

bicm

odel)

DTN

N13

IN N

NP

VBD

NN

IN

N

N13

$13 IN

amp

DTN

N

( ) V

BD

+VBG

V

BP(

-13 JJ

NN

Stan

ford

tagg

erou

tput

s(En

glish

mod

el)Sa

idN

NPwen

tVBD

toT

Oscho

olN

NTh

eDT

win

nerNN

gotV

BDaDT

goldJJ

neck

lace

NN

Hap

pyJJ

dayNN

wePR

Pwish

VBP

you

PRP

Advances in Fuzzy Systems 5

The proposed algorithm1 Obtain the text of the Holy Quran from [24] and remove the diacritics2 Install the full version of the Stanford Arabic model tagger from [25]3 Have the text of the Holy Quran tagged4 Obtain a list of all imperative verbs in the Holy Quran from [2]5 Find all words that have the tag VB equiv imperative verb6 Compare the two lists the one we obtained in step 5 and the list we obtained in step 4 to find the correctly tagged imperative verb7 Find the accuracy based on the information that is obtained in step 6

Algorithm 1 Evaluating imperative verbs using the Stanford tagger

Table 2 Some of the literature tagging research

No Ref Tagging Method Tagset Size Corpus Size Accuracy(tags) (words) ()

1 [26] A decision tree based tagger 110 78K amp500K 9165 997888rarr 97182 [27] Support Vector Machines (SVM) 24 140K 95493 [28] Hidden Markov Models 24 29300 950 997888rarr 9714 [29] SVM and a Neural network 21 6844 9105 [1] Maximum Entropy based tagger 29 588244 961

words especially in the technology fields as ldquoto faxrdquo Table 3shows the 29 tags of the Arabic model of the Stanford tagger

6 The Proposed Method

This section presents the steps that we follow to find theperformance of the Stanford tagger against the Quranicimperative verbs The first step is the tagging process thatproduces an annotated text file of the entire Quranic sen-tencesThenwe used a number of Python programs to extractthe correctly tagged imperative verbs as well as the wronglytagged imperative verbs etc The textual version of theHoly Quran is obtained from the Quran Printing ComplexSaudi Arabia website [36] Algorithm 1 summarizes theimplemented steps

The input testing set is the nondiacritized textual form ofthe Holy Quran Figure 2 shows what the testing set lookslike The figure contains the first chapter or Surah of theHoly Quran (Surat al-Fatih ahmdashThe Opening) in additionto the first three sentence of the second chapter (SuratAl-BaqarahmdashThe Cow) Figure 3 shows the output of theStanford tagger for the Quranic sentences that appear inFigure 2 As it is observed Figure 3 shows some correctlytagged words such as the following 0

13)131 DTJJ 13-

VBD WP 2 VBP The figure also shows some

wrongly tagged words such as the following 034NNP 5(VBD ( VBD

The tagger output that is shown in Figure 3 is the maincontent that can be used for further analysis to find thebehavior of the tagger Of course the correctly tagged wordsare required (ie the correct labels of the testing words) inorder to measure the accuracy which adds more difficulty inthis kind of research In other words if we want to measure

the accuracy for the ldquoentirerdquo Holy Quran we have to preparean annotated version of the Holy Quran which is a difficulttask This is why we chose a subset that contains only theimperative verbs

7 The Experimental Results

For the evaluation we used the full Stanford tagger (129MB) that is freely available at the website of the Stanfordnatural language processing group through the link [37]It is relatively simple to execute the tagger by running thecommand shown in Figure 4 in the Windows CommandPrompt program That is the tagger does not require specialsystems as we run it on the Command prompt of theWindows 10 home operating system The figure shows that77749 words are tagged in a very short time

The experimental results are demonstrated in Table 4The table exposes the information regarding the imperativeverbs however this work can be expanded to measurethe performance for different tags such as noun or verbSimilarly it is possible to find the performance of the Stanfordtagger regarding the prepositions in the Holy Quran inwhich the same steps can be followed to get the accuracyfor prepositions or the overall accuracy of all tags Finallyexploring the performance for the Stanford tagger as wellas for the other taggers will lead to discover more weaknesspoints to be avoided in future NLP systems

8 Conclusions

This work explored the performance of the Stanford taggerfor the Arabic language The experimental results show theimportance of distinguishing between training data whenpreparing taggers That is the tagger that is prepared forpoetry is different from the tagger that is prepared for prose

6 Advances in Fuzzy SystemsTa

ble3

Thet

agse

toft

heStan

ford

Ara

bicm

odel

tagg

er

Tag

Mea

ning

with

exam

ples

Ta

gM

eani

ngwith

exam

ples

1DTJ

JDT

+Ad

jective

16PR

PPe

rson

alpr

onou

n13

13 )6

7 8 9

7 lt

713 = )

)

(gt

lt 7

7A

2DTJ

JRDT

+Ad

jective

com

parativ

e17

PRP$

Possessiv

epro

noun

13

13 (13 ) -

13 ) 6

A=

7(

7B

C

D

(gt

E 3

DTN

NDT

+Nou

nsin

gularo

rmass

18RB

Adve

rb F

0G

7H(

713 gt(

713 -

=

2(

F I

J

75(

4DTN

NP

DT

+Pr

oper

noun

sin

gular

19RP

Particle

F

amp0G

K

1 713

(13 )

713 L

A )

F

79

75

DTN

NS

DT

+Nou

nplur

al20

VB

Verb

the

impe

rativ

efor

mD

0G

13 M-J

713 M( 9

7

13 M(

1

7

13 7N

7

6IN

Prep

ositi

onor

subo

rdin

atin

gco

njun

ction

(B

+|

F

)21

VBD

Verb

pas

tten

se

F

O(

2(P713

(13 7 amp

7JJ

Adjective

22VBG

Verb

ger

undor

presen

tpar

ticiple

QH13

7( 13 amp

713 R

+13 )6

J 7D gtG

713 ( 13 713

8JJR

Adjective

com

parativ

e23

VBN

Verb

pastp

artic

iple

13 (13 ) (

13 S13 T

13 13 ) 6

U 7

BC V

7 U

HWX

- 0

13 13

7

7(13 )

9NN

Nou

nsin

gularo

rmas

s24

VBP

Verb

non

3rdpe

rson

singu

larp

rese

ntQ

- R

N( +

13 0G

13 7

E lt 7

Y (13

10NNP

Prop

erno

uns

ingu

lar

25VN

Verb

3rd

person

singu

larp

rese

ntQ13 amp

713

X 1R

amp0G

0

7

2(

7Z [

H )

0G

11NNS

Nou

nplur

al26

WP

Whpr

onou

n

H6

0G

7

B

13 D

0G

13 M(

713 M( I

713 M(13 13

12NOUN

QUA

NT

Nou

nqu

antit

y27

WRB

Whad

verb

8

7P7

J 0J 7D

2(

F I

( -P7

] V

7J

13CC

Coo

rdin

atin

gco

njun

ction

28ADJNUM

Adjective

Num

eric

]=

F[

7(-V7[

7^J lt

(

13 S13

J (J

713

7D (1

14

CDCa

rdin

alnu

mbe

r29

UH

Interje

ctionun

usua

lkin

dof

wor

dQE

7$

P7EWR

(13

D 7J M

$J 7 C ) 7

13 8( amp

15DT

Dem

onstr

ativep

rono

uns

13 (J 9

_( gtG

7`

7Z

7

Advances in Fuzzy Systems 7

Tabl

e4

Thee

xper

imen

talr

esults

Mea

sure

Total

Then

umbe

rofim

perativ

everbs

inth

eHolyQur

an18

48ve

rbs

Then

umbe

rofim

perativ

everbs

after

rem

ovin

gdu

plicates

741v

erbs

Then

umbe

rofw

ords

tagg

edas

impe

rativ

everbs

(ie

VB)

282wor

dsTh

ecor

rectly

tagg

edim

perativ

everbs

byco

mpa

ringwith

thec

orrect

list

This

lead

stoth

ecor

rectly

tagg

edlis

ttha

tinc

lude

s

54wor

dsA

7

= 7

- 13 )13

( 7[ aJ 3 47

(

7[ aJ 3 4(

7 (

7

7

[C 6

7N

( 7[

(

713 13

7[

C 6(

7 (

7P7[

C 13 amp(

7

713

7N

7

( 7[

V 7

amp(

7 2 (

7 ) amp7

7

13 7

7[

( 6[

77Y

7 13 (

7 V

7

7

V7

7

(

7

7 = 47

1

7 amp

713 [

7

(

7

[ = 47

6

7[

bG(

7[

amp(

7

13 7 amp

7N

6(

7 X )

713 )

13 c(

713

7-

amp(

7

13 C amp(

Ther

estist

hewro

ngly

classifi

edve

rbs(

iec

lassifi

edou

tofV

Btag)

This

listi

nclude

sfori

nstanc

e68

7

[7

( 7

7(

(

7(

7

[

7$ 7

7

E W

[7

0 7

( [

7 [

This

listi

nten

tiona

llyco

ntains

wor

dsth

atha

veth

esam

eroo

tldquo equiv

enterrdquoTh

isgive

sanin

dica

tionof

ther

ichn

essa

ndth

ederivative

natu

reof

theA

rabicl

angu

age

Accu

racy=

Cor

rectly

tagg

ged

impe

rativ

eve

rbs

All

impe

rativ

eve

rbs

Accu

racy=54

741=728

8 Advances in Fuzzy Systems

Figure 2 A part of Quranic testing set

Figure 3 A part of the Stanford tagger output

Figure 4 The running command of the Stanford tagger

Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort

towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research

Advances in Fuzzy Systems 9

References

[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000

[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814

[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017

[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef

ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006

[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013

[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011

[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008

[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016

[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002

[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006

[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007

[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008

[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018

[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017

[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015

[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015

[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012

[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003

[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015

[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015

[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015

[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015

[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp

stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-

ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016

[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004

[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016

[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018

[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017

[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017

[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015

[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000

[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018

[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018

10 Advances in Fuzzy Systems

[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012

[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017

[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772

[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015

[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 3: Exploring the Performance of Tagging for the Classical and ...downloads.hindawi.com/journals/afs/2019/6254649.pdf · ResearchArticle Exploring the Performance of Tagging for the Classical

Advances in Fuzzy Systems 3

sentence

noun phrasenoun phrase verb phrase

noun nounverbarticle

the cat ate mouse

Figure 1 An example of a parsing tree

syntactic structure of the text and is mainly used foranalyzing the input sentence

(v) Other PoS tagging based applications includesemantic role labeling [10] speech synthesis [11]speech recognition [12] information extraction [13]summarization [14] sentiment analysis also calledopinion mining [15] diacritization [16] softwareengineering [17] question answering [18] translation[19] plagiarism detection [20] key phrases extraction[21] ontology [22] and extracting Arabic noun com-pound [23]

3 The Challenge of Tagging

That fact that a word can take different tags makes the PoStagging a challenging task That is a word can be labeled bydifferent tags based on the context Therefore the goal of thePoS tagging algorithms is to remove such ambiguity and labelthe words correctly Table 1 shows some examples of wordsthat take different tags based on the context As shown in thetable the word ldquogold larrrarr rdquo in sentence 1 is taggedas VBD (verb past tense) while it is tagged as NN (nounsingular or mass) in sentence 2 Similarly the word ldquoSaidlarrrarr rdquo in the first sentence is tagged as NNP (propernoun singular) while it is tagged as JJ (adjective) in sentence3 This shows how a particular word can have different labelswhich is the challenge of the PoS tagging process Hencethe problem of the PoS tagging is to resolve ambiguities bychoosing the proper tag considering the surrounded wordsOf course the absence of diacritics in the Arabic formalwriting system adds evenmore ambiguity For instance thereis no ambiguity to know that the diacritized word ldquogoldlarrrarr rdquo is a noun and the diacritized word ldquowent larrrarr

rdquo is a verb The figure also shows the tagging output for thetranslated sentences using the English model of the Stanfordtagger

4 Literature Review

Despite the importance of the PoS tagging for both MSAand classical Arabic most of the previous tagging studieshave mainly focused on the MSA In addition the literatureshows there is an active research to consider suitable tagsets

that truly reflect the linguistic items of Arabic as one ofthe morphologically rich languages In this literature wedemonstrate the up-to-date Arabic tagging research whichfocused on the main aspects and components such as thetype of the training text (ie MSA classical tweets) tagsetstagging algorithms unknown words stemming In [30] thestudy indicated that stemming (ie removing prefixes andpostfixes or suffixes) enhances the tagging performance In[31] the study presented a method to tag tweets that isusually written out of the formal and proper spelling of thelanguage In [28] the study considered a method to handlethe ldquounknown wordsrdquo which are the words that did notappear in the training corpus In [26] the study consideredthe problem which arises when estimating the transitionprobabilities in limited amounts of training data The studyproposed decision trees basedmethod to handle this problemthat generally occurs in the hidden Markov models (HMM)tagging technique In [32] the study implemented themaster-slave technique for the PoS tagging they used HMM as amaster tagger and maximum match (MM) and Brill taggersas slaves There are many approaches to perform the PoStagging the most widely used is the statistical approach thatis based on the HMM Another approach is not-statisticalwhich is on a number of hand-crafted disambiguation rulesto find the most appropriate tag for each word as in[33]

The recent studies of part of speech tagging includedifferent aspects For instance [34] developed a part of speechtagger for the Arabic heritage They scored an accuracy of9622They also reported that themost of the tagging errorsare results of segmentation Reference [35] employs part ofspeech tagging to enhance the performance of Arabic textclassification Reference [36] demonstrates part of speechtagging for the Arabic Gulf dialect For the tagging processthey employ Support Vector Machine (SVM) classifier andbidirectional Long Short Term Memory (Bi-LSTM) Refer-ence [37] presents a tagging based study regarding Arabicdialects identification Reference [38] uses part of speechand semantic tagging to extract features for training NeuralMachine Translation

Table 2 presents some information regarding taggingsystems such as tagging algorithms tagsets corpora andaccuracies We are aware that the accuracy is not a mattersince each work has its own corpus nevertheless reportingthese measures might give an indication of the overallaccuracy of the Arabic PoS tagging Similarity even the tagsetsize is important however it is more important to haveenough training set to cover the tags used otherwise zerovaluesmight be assigned to theHMM transition probabilitieswhich raises a tagging problem

5 Stanford Tagset

As indicated in the literature review there are many tagsetsthat are used in the previous studies Mainly the tags aredivided into two classes (ie categories) which are closedclass and open classThe closed class has a fixedmembershipsuch as prepositions while the open class can accept new

4 Advances in Fuzzy Systems

Tabl

e1So

meS

tanf

ordtagg

edAra

bica

ndEn

glish

sent

ence

s

Inpu

tsen

tenc

esan

dth

etra

nslatio

nus

ingGoo

glet

rans

lator

Sent

ence

1Se

nten

ce2

Sent

ence

313

13 $

13 amp

( )

+

( -13

Said

wen

ttoscho

olTh

ewin

nerg

otag

oldne

cklace

Hap

pyda

ywew

ishyo

uStan

ford

tagg

erou

tput

s(Ara

bicm

odel)

DTN

N13

IN N

NP

VBD

NN

IN

N

N13

$13 IN

amp

DTN

N

( ) V

BD

+VBG

V

BP(

-13 JJ

NN

Stan

ford

tagg

erou

tput

s(En

glish

mod

el)Sa

idN

NPwen

tVBD

toT

Oscho

olN

NTh

eDT

win

nerNN

gotV

BDaDT

goldJJ

neck

lace

NN

Hap

pyJJ

dayNN

wePR

Pwish

VBP

you

PRP

Advances in Fuzzy Systems 5

The proposed algorithm1 Obtain the text of the Holy Quran from [24] and remove the diacritics2 Install the full version of the Stanford Arabic model tagger from [25]3 Have the text of the Holy Quran tagged4 Obtain a list of all imperative verbs in the Holy Quran from [2]5 Find all words that have the tag VB equiv imperative verb6 Compare the two lists the one we obtained in step 5 and the list we obtained in step 4 to find the correctly tagged imperative verb7 Find the accuracy based on the information that is obtained in step 6

Algorithm 1 Evaluating imperative verbs using the Stanford tagger

Table 2 Some of the literature tagging research

No Ref Tagging Method Tagset Size Corpus Size Accuracy(tags) (words) ()

1 [26] A decision tree based tagger 110 78K amp500K 9165 997888rarr 97182 [27] Support Vector Machines (SVM) 24 140K 95493 [28] Hidden Markov Models 24 29300 950 997888rarr 9714 [29] SVM and a Neural network 21 6844 9105 [1] Maximum Entropy based tagger 29 588244 961

words especially in the technology fields as ldquoto faxrdquo Table 3shows the 29 tags of the Arabic model of the Stanford tagger

6 The Proposed Method

This section presents the steps that we follow to find theperformance of the Stanford tagger against the Quranicimperative verbs The first step is the tagging process thatproduces an annotated text file of the entire Quranic sen-tencesThenwe used a number of Python programs to extractthe correctly tagged imperative verbs as well as the wronglytagged imperative verbs etc The textual version of theHoly Quran is obtained from the Quran Printing ComplexSaudi Arabia website [36] Algorithm 1 summarizes theimplemented steps

The input testing set is the nondiacritized textual form ofthe Holy Quran Figure 2 shows what the testing set lookslike The figure contains the first chapter or Surah of theHoly Quran (Surat al-Fatih ahmdashThe Opening) in additionto the first three sentence of the second chapter (SuratAl-BaqarahmdashThe Cow) Figure 3 shows the output of theStanford tagger for the Quranic sentences that appear inFigure 2 As it is observed Figure 3 shows some correctlytagged words such as the following 0

13)131 DTJJ 13-

VBD WP 2 VBP The figure also shows some

wrongly tagged words such as the following 034NNP 5(VBD ( VBD

The tagger output that is shown in Figure 3 is the maincontent that can be used for further analysis to find thebehavior of the tagger Of course the correctly tagged wordsare required (ie the correct labels of the testing words) inorder to measure the accuracy which adds more difficulty inthis kind of research In other words if we want to measure

the accuracy for the ldquoentirerdquo Holy Quran we have to preparean annotated version of the Holy Quran which is a difficulttask This is why we chose a subset that contains only theimperative verbs

7 The Experimental Results

For the evaluation we used the full Stanford tagger (129MB) that is freely available at the website of the Stanfordnatural language processing group through the link [37]It is relatively simple to execute the tagger by running thecommand shown in Figure 4 in the Windows CommandPrompt program That is the tagger does not require specialsystems as we run it on the Command prompt of theWindows 10 home operating system The figure shows that77749 words are tagged in a very short time

The experimental results are demonstrated in Table 4The table exposes the information regarding the imperativeverbs however this work can be expanded to measurethe performance for different tags such as noun or verbSimilarly it is possible to find the performance of the Stanfordtagger regarding the prepositions in the Holy Quran inwhich the same steps can be followed to get the accuracyfor prepositions or the overall accuracy of all tags Finallyexploring the performance for the Stanford tagger as wellas for the other taggers will lead to discover more weaknesspoints to be avoided in future NLP systems

8 Conclusions

This work explored the performance of the Stanford taggerfor the Arabic language The experimental results show theimportance of distinguishing between training data whenpreparing taggers That is the tagger that is prepared forpoetry is different from the tagger that is prepared for prose

6 Advances in Fuzzy SystemsTa

ble3

Thet

agse

toft

heStan

ford

Ara

bicm

odel

tagg

er

Tag

Mea

ning

with

exam

ples

Ta

gM

eani

ngwith

exam

ples

1DTJ

JDT

+Ad

jective

16PR

PPe

rson

alpr

onou

n13

13 )6

7 8 9

7 lt

713 = )

)

(gt

lt 7

7A

2DTJ

JRDT

+Ad

jective

com

parativ

e17

PRP$

Possessiv

epro

noun

13

13 (13 ) -

13 ) 6

A=

7(

7B

C

D

(gt

E 3

DTN

NDT

+Nou

nsin

gularo

rmass

18RB

Adve

rb F

0G

7H(

713 gt(

713 -

=

2(

F I

J

75(

4DTN

NP

DT

+Pr

oper

noun

sin

gular

19RP

Particle

F

amp0G

K

1 713

(13 )

713 L

A )

F

79

75

DTN

NS

DT

+Nou

nplur

al20

VB

Verb

the

impe

rativ

efor

mD

0G

13 M-J

713 M( 9

7

13 M(

1

7

13 7N

7

6IN

Prep

ositi

onor

subo

rdin

atin

gco

njun

ction

(B

+|

F

)21

VBD

Verb

pas

tten

se

F

O(

2(P713

(13 7 amp

7JJ

Adjective

22VBG

Verb

ger

undor

presen

tpar

ticiple

QH13

7( 13 amp

713 R

+13 )6

J 7D gtG

713 ( 13 713

8JJR

Adjective

com

parativ

e23

VBN

Verb

pastp

artic

iple

13 (13 ) (

13 S13 T

13 13 ) 6

U 7

BC V

7 U

HWX

- 0

13 13

7

7(13 )

9NN

Nou

nsin

gularo

rmas

s24

VBP

Verb

non

3rdpe

rson

singu

larp

rese

ntQ

- R

N( +

13 0G

13 7

E lt 7

Y (13

10NNP

Prop

erno

uns

ingu

lar

25VN

Verb

3rd

person

singu

larp

rese

ntQ13 amp

713

X 1R

amp0G

0

7

2(

7Z [

H )

0G

11NNS

Nou

nplur

al26

WP

Whpr

onou

n

H6

0G

7

B

13 D

0G

13 M(

713 M( I

713 M(13 13

12NOUN

QUA

NT

Nou

nqu

antit

y27

WRB

Whad

verb

8

7P7

J 0J 7D

2(

F I

( -P7

] V

7J

13CC

Coo

rdin

atin

gco

njun

ction

28ADJNUM

Adjective

Num

eric

]=

F[

7(-V7[

7^J lt

(

13 S13

J (J

713

7D (1

14

CDCa

rdin

alnu

mbe

r29

UH

Interje

ctionun

usua

lkin

dof

wor

dQE

7$

P7EWR

(13

D 7J M

$J 7 C ) 7

13 8( amp

15DT

Dem

onstr

ativep

rono

uns

13 (J 9

_( gtG

7`

7Z

7

Advances in Fuzzy Systems 7

Tabl

e4

Thee

xper

imen

talr

esults

Mea

sure

Total

Then

umbe

rofim

perativ

everbs

inth

eHolyQur

an18

48ve

rbs

Then

umbe

rofim

perativ

everbs

after

rem

ovin

gdu

plicates

741v

erbs

Then

umbe

rofw

ords

tagg

edas

impe

rativ

everbs

(ie

VB)

282wor

dsTh

ecor

rectly

tagg

edim

perativ

everbs

byco

mpa

ringwith

thec

orrect

list

This

lead

stoth

ecor

rectly

tagg

edlis

ttha

tinc

lude

s

54wor

dsA

7

= 7

- 13 )13

( 7[ aJ 3 47

(

7[ aJ 3 4(

7 (

7

7

[C 6

7N

( 7[

(

713 13

7[

C 6(

7 (

7P7[

C 13 amp(

7

713

7N

7

( 7[

V 7

amp(

7 2 (

7 ) amp7

7

13 7

7[

( 6[

77Y

7 13 (

7 V

7

7

V7

7

(

7

7 = 47

1

7 amp

713 [

7

(

7

[ = 47

6

7[

bG(

7[

amp(

7

13 7 amp

7N

6(

7 X )

713 )

13 c(

713

7-

amp(

7

13 C amp(

Ther

estist

hewro

ngly

classifi

edve

rbs(

iec

lassifi

edou

tofV

Btag)

This

listi

nclude

sfori

nstanc

e68

7

[7

( 7

7(

(

7(

7

[

7$ 7

7

E W

[7

0 7

( [

7 [

This

listi

nten

tiona

llyco

ntains

wor

dsth

atha

veth

esam

eroo

tldquo equiv

enterrdquoTh

isgive

sanin

dica

tionof

ther

ichn

essa

ndth

ederivative

natu

reof

theA

rabicl

angu

age

Accu

racy=

Cor

rectly

tagg

ged

impe

rativ

eve

rbs

All

impe

rativ

eve

rbs

Accu

racy=54

741=728

8 Advances in Fuzzy Systems

Figure 2 A part of Quranic testing set

Figure 3 A part of the Stanford tagger output

Figure 4 The running command of the Stanford tagger

Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort

towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research

Advances in Fuzzy Systems 9

References

[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000

[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814

[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017

[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef

ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006

[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013

[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011

[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008

[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016

[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002

[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006

[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007

[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008

[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018

[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017

[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015

[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015

[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012

[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003

[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015

[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015

[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015

[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015

[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp

stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-

ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016

[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004

[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016

[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018

[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017

[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017

[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015

[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000

[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018

[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018

10 Advances in Fuzzy Systems

[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012

[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017

[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772

[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015

[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 4: Exploring the Performance of Tagging for the Classical and ...downloads.hindawi.com/journals/afs/2019/6254649.pdf · ResearchArticle Exploring the Performance of Tagging for the Classical

4 Advances in Fuzzy Systems

Tabl

e1So

meS

tanf

ordtagg

edAra

bica

ndEn

glish

sent

ence

s

Inpu

tsen

tenc

esan

dth

etra

nslatio

nus

ingGoo

glet

rans

lator

Sent

ence

1Se

nten

ce2

Sent

ence

313

13 $

13 amp

( )

+

( -13

Said

wen

ttoscho

olTh

ewin

nerg

otag

oldne

cklace

Hap

pyda

ywew

ishyo

uStan

ford

tagg

erou

tput

s(Ara

bicm

odel)

DTN

N13

IN N

NP

VBD

NN

IN

N

N13

$13 IN

amp

DTN

N

( ) V

BD

+VBG

V

BP(

-13 JJ

NN

Stan

ford

tagg

erou

tput

s(En

glish

mod

el)Sa

idN

NPwen

tVBD

toT

Oscho

olN

NTh

eDT

win

nerNN

gotV

BDaDT

goldJJ

neck

lace

NN

Hap

pyJJ

dayNN

wePR

Pwish

VBP

you

PRP

Advances in Fuzzy Systems 5

The proposed algorithm1 Obtain the text of the Holy Quran from [24] and remove the diacritics2 Install the full version of the Stanford Arabic model tagger from [25]3 Have the text of the Holy Quran tagged4 Obtain a list of all imperative verbs in the Holy Quran from [2]5 Find all words that have the tag VB equiv imperative verb6 Compare the two lists the one we obtained in step 5 and the list we obtained in step 4 to find the correctly tagged imperative verb7 Find the accuracy based on the information that is obtained in step 6

Algorithm 1 Evaluating imperative verbs using the Stanford tagger

Table 2 Some of the literature tagging research

No Ref Tagging Method Tagset Size Corpus Size Accuracy(tags) (words) ()

1 [26] A decision tree based tagger 110 78K amp500K 9165 997888rarr 97182 [27] Support Vector Machines (SVM) 24 140K 95493 [28] Hidden Markov Models 24 29300 950 997888rarr 9714 [29] SVM and a Neural network 21 6844 9105 [1] Maximum Entropy based tagger 29 588244 961

words especially in the technology fields as ldquoto faxrdquo Table 3shows the 29 tags of the Arabic model of the Stanford tagger

6 The Proposed Method

This section presents the steps that we follow to find theperformance of the Stanford tagger against the Quranicimperative verbs The first step is the tagging process thatproduces an annotated text file of the entire Quranic sen-tencesThenwe used a number of Python programs to extractthe correctly tagged imperative verbs as well as the wronglytagged imperative verbs etc The textual version of theHoly Quran is obtained from the Quran Printing ComplexSaudi Arabia website [36] Algorithm 1 summarizes theimplemented steps

The input testing set is the nondiacritized textual form ofthe Holy Quran Figure 2 shows what the testing set lookslike The figure contains the first chapter or Surah of theHoly Quran (Surat al-Fatih ahmdashThe Opening) in additionto the first three sentence of the second chapter (SuratAl-BaqarahmdashThe Cow) Figure 3 shows the output of theStanford tagger for the Quranic sentences that appear inFigure 2 As it is observed Figure 3 shows some correctlytagged words such as the following 0

13)131 DTJJ 13-

VBD WP 2 VBP The figure also shows some

wrongly tagged words such as the following 034NNP 5(VBD ( VBD

The tagger output that is shown in Figure 3 is the maincontent that can be used for further analysis to find thebehavior of the tagger Of course the correctly tagged wordsare required (ie the correct labels of the testing words) inorder to measure the accuracy which adds more difficulty inthis kind of research In other words if we want to measure

the accuracy for the ldquoentirerdquo Holy Quran we have to preparean annotated version of the Holy Quran which is a difficulttask This is why we chose a subset that contains only theimperative verbs

7 The Experimental Results

For the evaluation we used the full Stanford tagger (129MB) that is freely available at the website of the Stanfordnatural language processing group through the link [37]It is relatively simple to execute the tagger by running thecommand shown in Figure 4 in the Windows CommandPrompt program That is the tagger does not require specialsystems as we run it on the Command prompt of theWindows 10 home operating system The figure shows that77749 words are tagged in a very short time

The experimental results are demonstrated in Table 4The table exposes the information regarding the imperativeverbs however this work can be expanded to measurethe performance for different tags such as noun or verbSimilarly it is possible to find the performance of the Stanfordtagger regarding the prepositions in the Holy Quran inwhich the same steps can be followed to get the accuracyfor prepositions or the overall accuracy of all tags Finallyexploring the performance for the Stanford tagger as wellas for the other taggers will lead to discover more weaknesspoints to be avoided in future NLP systems

8 Conclusions

This work explored the performance of the Stanford taggerfor the Arabic language The experimental results show theimportance of distinguishing between training data whenpreparing taggers That is the tagger that is prepared forpoetry is different from the tagger that is prepared for prose

6 Advances in Fuzzy SystemsTa

ble3

Thet

agse

toft

heStan

ford

Ara

bicm

odel

tagg

er

Tag

Mea

ning

with

exam

ples

Ta

gM

eani

ngwith

exam

ples

1DTJ

JDT

+Ad

jective

16PR

PPe

rson

alpr

onou

n13

13 )6

7 8 9

7 lt

713 = )

)

(gt

lt 7

7A

2DTJ

JRDT

+Ad

jective

com

parativ

e17

PRP$

Possessiv

epro

noun

13

13 (13 ) -

13 ) 6

A=

7(

7B

C

D

(gt

E 3

DTN

NDT

+Nou

nsin

gularo

rmass

18RB

Adve

rb F

0G

7H(

713 gt(

713 -

=

2(

F I

J

75(

4DTN

NP

DT

+Pr

oper

noun

sin

gular

19RP

Particle

F

amp0G

K

1 713

(13 )

713 L

A )

F

79

75

DTN

NS

DT

+Nou

nplur

al20

VB

Verb

the

impe

rativ

efor

mD

0G

13 M-J

713 M( 9

7

13 M(

1

7

13 7N

7

6IN

Prep

ositi

onor

subo

rdin

atin

gco

njun

ction

(B

+|

F

)21

VBD

Verb

pas

tten

se

F

O(

2(P713

(13 7 amp

7JJ

Adjective

22VBG

Verb

ger

undor

presen

tpar

ticiple

QH13

7( 13 amp

713 R

+13 )6

J 7D gtG

713 ( 13 713

8JJR

Adjective

com

parativ

e23

VBN

Verb

pastp

artic

iple

13 (13 ) (

13 S13 T

13 13 ) 6

U 7

BC V

7 U

HWX

- 0

13 13

7

7(13 )

9NN

Nou

nsin

gularo

rmas

s24

VBP

Verb

non

3rdpe

rson

singu

larp

rese

ntQ

- R

N( +

13 0G

13 7

E lt 7

Y (13

10NNP

Prop

erno

uns

ingu

lar

25VN

Verb

3rd

person

singu

larp

rese

ntQ13 amp

713

X 1R

amp0G

0

7

2(

7Z [

H )

0G

11NNS

Nou

nplur

al26

WP

Whpr

onou

n

H6

0G

7

B

13 D

0G

13 M(

713 M( I

713 M(13 13

12NOUN

QUA

NT

Nou

nqu

antit

y27

WRB

Whad

verb

8

7P7

J 0J 7D

2(

F I

( -P7

] V

7J

13CC

Coo

rdin

atin

gco

njun

ction

28ADJNUM

Adjective

Num

eric

]=

F[

7(-V7[

7^J lt

(

13 S13

J (J

713

7D (1

14

CDCa

rdin

alnu

mbe

r29

UH

Interje

ctionun

usua

lkin

dof

wor

dQE

7$

P7EWR

(13

D 7J M

$J 7 C ) 7

13 8( amp

15DT

Dem

onstr

ativep

rono

uns

13 (J 9

_( gtG

7`

7Z

7

Advances in Fuzzy Systems 7

Tabl

e4

Thee

xper

imen

talr

esults

Mea

sure

Total

Then

umbe

rofim

perativ

everbs

inth

eHolyQur

an18

48ve

rbs

Then

umbe

rofim

perativ

everbs

after

rem

ovin

gdu

plicates

741v

erbs

Then

umbe

rofw

ords

tagg

edas

impe

rativ

everbs

(ie

VB)

282wor

dsTh

ecor

rectly

tagg

edim

perativ

everbs

byco

mpa

ringwith

thec

orrect

list

This

lead

stoth

ecor

rectly

tagg

edlis

ttha

tinc

lude

s

54wor

dsA

7

= 7

- 13 )13

( 7[ aJ 3 47

(

7[ aJ 3 4(

7 (

7

7

[C 6

7N

( 7[

(

713 13

7[

C 6(

7 (

7P7[

C 13 amp(

7

713

7N

7

( 7[

V 7

amp(

7 2 (

7 ) amp7

7

13 7

7[

( 6[

77Y

7 13 (

7 V

7

7

V7

7

(

7

7 = 47

1

7 amp

713 [

7

(

7

[ = 47

6

7[

bG(

7[

amp(

7

13 7 amp

7N

6(

7 X )

713 )

13 c(

713

7-

amp(

7

13 C amp(

Ther

estist

hewro

ngly

classifi

edve

rbs(

iec

lassifi

edou

tofV

Btag)

This

listi

nclude

sfori

nstanc

e68

7

[7

( 7

7(

(

7(

7

[

7$ 7

7

E W

[7

0 7

( [

7 [

This

listi

nten

tiona

llyco

ntains

wor

dsth

atha

veth

esam

eroo

tldquo equiv

enterrdquoTh

isgive

sanin

dica

tionof

ther

ichn

essa

ndth

ederivative

natu

reof

theA

rabicl

angu

age

Accu

racy=

Cor

rectly

tagg

ged

impe

rativ

eve

rbs

All

impe

rativ

eve

rbs

Accu

racy=54

741=728

8 Advances in Fuzzy Systems

Figure 2 A part of Quranic testing set

Figure 3 A part of the Stanford tagger output

Figure 4 The running command of the Stanford tagger

Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort

towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research

Advances in Fuzzy Systems 9

References

[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000

[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814

[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017

[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef

ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006

[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013

[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011

[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008

[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016

[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002

[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006

[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007

[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008

[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018

[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017

[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015

[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015

[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012

[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003

[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015

[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015

[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015

[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015

[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp

stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-

ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016

[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004

[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016

[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018

[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017

[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017

[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015

[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000

[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018

[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018

10 Advances in Fuzzy Systems

[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012

[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017

[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772

[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015

[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 5: Exploring the Performance of Tagging for the Classical and ...downloads.hindawi.com/journals/afs/2019/6254649.pdf · ResearchArticle Exploring the Performance of Tagging for the Classical

Advances in Fuzzy Systems 5

The proposed algorithm1 Obtain the text of the Holy Quran from [24] and remove the diacritics2 Install the full version of the Stanford Arabic model tagger from [25]3 Have the text of the Holy Quran tagged4 Obtain a list of all imperative verbs in the Holy Quran from [2]5 Find all words that have the tag VB equiv imperative verb6 Compare the two lists the one we obtained in step 5 and the list we obtained in step 4 to find the correctly tagged imperative verb7 Find the accuracy based on the information that is obtained in step 6

Algorithm 1 Evaluating imperative verbs using the Stanford tagger

Table 2 Some of the literature tagging research

No Ref Tagging Method Tagset Size Corpus Size Accuracy(tags) (words) ()

1 [26] A decision tree based tagger 110 78K amp500K 9165 997888rarr 97182 [27] Support Vector Machines (SVM) 24 140K 95493 [28] Hidden Markov Models 24 29300 950 997888rarr 9714 [29] SVM and a Neural network 21 6844 9105 [1] Maximum Entropy based tagger 29 588244 961

words especially in the technology fields as ldquoto faxrdquo Table 3shows the 29 tags of the Arabic model of the Stanford tagger

6 The Proposed Method

This section presents the steps that we follow to find theperformance of the Stanford tagger against the Quranicimperative verbs The first step is the tagging process thatproduces an annotated text file of the entire Quranic sen-tencesThenwe used a number of Python programs to extractthe correctly tagged imperative verbs as well as the wronglytagged imperative verbs etc The textual version of theHoly Quran is obtained from the Quran Printing ComplexSaudi Arabia website [36] Algorithm 1 summarizes theimplemented steps

The input testing set is the nondiacritized textual form ofthe Holy Quran Figure 2 shows what the testing set lookslike The figure contains the first chapter or Surah of theHoly Quran (Surat al-Fatih ahmdashThe Opening) in additionto the first three sentence of the second chapter (SuratAl-BaqarahmdashThe Cow) Figure 3 shows the output of theStanford tagger for the Quranic sentences that appear inFigure 2 As it is observed Figure 3 shows some correctlytagged words such as the following 0

13)131 DTJJ 13-

VBD WP 2 VBP The figure also shows some

wrongly tagged words such as the following 034NNP 5(VBD ( VBD

The tagger output that is shown in Figure 3 is the maincontent that can be used for further analysis to find thebehavior of the tagger Of course the correctly tagged wordsare required (ie the correct labels of the testing words) inorder to measure the accuracy which adds more difficulty inthis kind of research In other words if we want to measure

the accuracy for the ldquoentirerdquo Holy Quran we have to preparean annotated version of the Holy Quran which is a difficulttask This is why we chose a subset that contains only theimperative verbs

7 The Experimental Results

For the evaluation we used the full Stanford tagger (129MB) that is freely available at the website of the Stanfordnatural language processing group through the link [37]It is relatively simple to execute the tagger by running thecommand shown in Figure 4 in the Windows CommandPrompt program That is the tagger does not require specialsystems as we run it on the Command prompt of theWindows 10 home operating system The figure shows that77749 words are tagged in a very short time

The experimental results are demonstrated in Table 4The table exposes the information regarding the imperativeverbs however this work can be expanded to measurethe performance for different tags such as noun or verbSimilarly it is possible to find the performance of the Stanfordtagger regarding the prepositions in the Holy Quran inwhich the same steps can be followed to get the accuracyfor prepositions or the overall accuracy of all tags Finallyexploring the performance for the Stanford tagger as wellas for the other taggers will lead to discover more weaknesspoints to be avoided in future NLP systems

8 Conclusions

This work explored the performance of the Stanford taggerfor the Arabic language The experimental results show theimportance of distinguishing between training data whenpreparing taggers That is the tagger that is prepared forpoetry is different from the tagger that is prepared for prose

6 Advances in Fuzzy SystemsTa

ble3

Thet

agse

toft

heStan

ford

Ara

bicm

odel

tagg

er

Tag

Mea

ning

with

exam

ples

Ta

gM

eani

ngwith

exam

ples

1DTJ

JDT

+Ad

jective

16PR

PPe

rson

alpr

onou

n13

13 )6

7 8 9

7 lt

713 = )

)

(gt

lt 7

7A

2DTJ

JRDT

+Ad

jective

com

parativ

e17

PRP$

Possessiv

epro

noun

13

13 (13 ) -

13 ) 6

A=

7(

7B

C

D

(gt

E 3

DTN

NDT

+Nou

nsin

gularo

rmass

18RB

Adve

rb F

0G

7H(

713 gt(

713 -

=

2(

F I

J

75(

4DTN

NP

DT

+Pr

oper

noun

sin

gular

19RP

Particle

F

amp0G

K

1 713

(13 )

713 L

A )

F

79

75

DTN

NS

DT

+Nou

nplur

al20

VB

Verb

the

impe

rativ

efor

mD

0G

13 M-J

713 M( 9

7

13 M(

1

7

13 7N

7

6IN

Prep

ositi

onor

subo

rdin

atin

gco

njun

ction

(B

+|

F

)21

VBD

Verb

pas

tten

se

F

O(

2(P713

(13 7 amp

7JJ

Adjective

22VBG

Verb

ger

undor

presen

tpar

ticiple

QH13

7( 13 amp

713 R

+13 )6

J 7D gtG

713 ( 13 713

8JJR

Adjective

com

parativ

e23

VBN

Verb

pastp

artic

iple

13 (13 ) (

13 S13 T

13 13 ) 6

U 7

BC V

7 U

HWX

- 0

13 13

7

7(13 )

9NN

Nou

nsin

gularo

rmas

s24

VBP

Verb

non

3rdpe

rson

singu

larp

rese

ntQ

- R

N( +

13 0G

13 7

E lt 7

Y (13

10NNP

Prop

erno

uns

ingu

lar

25VN

Verb

3rd

person

singu

larp

rese

ntQ13 amp

713

X 1R

amp0G

0

7

2(

7Z [

H )

0G

11NNS

Nou

nplur

al26

WP

Whpr

onou

n

H6

0G

7

B

13 D

0G

13 M(

713 M( I

713 M(13 13

12NOUN

QUA

NT

Nou

nqu

antit

y27

WRB

Whad

verb

8

7P7

J 0J 7D

2(

F I

( -P7

] V

7J

13CC

Coo

rdin

atin

gco

njun

ction

28ADJNUM

Adjective

Num

eric

]=

F[

7(-V7[

7^J lt

(

13 S13

J (J

713

7D (1

14

CDCa

rdin

alnu

mbe

r29

UH

Interje

ctionun

usua

lkin

dof

wor

dQE

7$

P7EWR

(13

D 7J M

$J 7 C ) 7

13 8( amp

15DT

Dem

onstr

ativep

rono

uns

13 (J 9

_( gtG

7`

7Z

7

Advances in Fuzzy Systems 7

Tabl

e4

Thee

xper

imen

talr

esults

Mea

sure

Total

Then

umbe

rofim

perativ

everbs

inth

eHolyQur

an18

48ve

rbs

Then

umbe

rofim

perativ

everbs

after

rem

ovin

gdu

plicates

741v

erbs

Then

umbe

rofw

ords

tagg

edas

impe

rativ

everbs

(ie

VB)

282wor

dsTh

ecor

rectly

tagg

edim

perativ

everbs

byco

mpa

ringwith

thec

orrect

list

This

lead

stoth

ecor

rectly

tagg

edlis

ttha

tinc

lude

s

54wor

dsA

7

= 7

- 13 )13

( 7[ aJ 3 47

(

7[ aJ 3 4(

7 (

7

7

[C 6

7N

( 7[

(

713 13

7[

C 6(

7 (

7P7[

C 13 amp(

7

713

7N

7

( 7[

V 7

amp(

7 2 (

7 ) amp7

7

13 7

7[

( 6[

77Y

7 13 (

7 V

7

7

V7

7

(

7

7 = 47

1

7 amp

713 [

7

(

7

[ = 47

6

7[

bG(

7[

amp(

7

13 7 amp

7N

6(

7 X )

713 )

13 c(

713

7-

amp(

7

13 C amp(

Ther

estist

hewro

ngly

classifi

edve

rbs(

iec

lassifi

edou

tofV

Btag)

This

listi

nclude

sfori

nstanc

e68

7

[7

( 7

7(

(

7(

7

[

7$ 7

7

E W

[7

0 7

( [

7 [

This

listi

nten

tiona

llyco

ntains

wor

dsth

atha

veth

esam

eroo

tldquo equiv

enterrdquoTh

isgive

sanin

dica

tionof

ther

ichn

essa

ndth

ederivative

natu

reof

theA

rabicl

angu

age

Accu

racy=

Cor

rectly

tagg

ged

impe

rativ

eve

rbs

All

impe

rativ

eve

rbs

Accu

racy=54

741=728

8 Advances in Fuzzy Systems

Figure 2 A part of Quranic testing set

Figure 3 A part of the Stanford tagger output

Figure 4 The running command of the Stanford tagger

Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort

towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research

Advances in Fuzzy Systems 9

References

[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000

[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814

[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017

[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef

ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006

[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013

[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011

[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008

[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016

[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002

[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006

[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007

[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008

[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018

[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017

[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015

[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015

[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012

[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003

[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015

[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015

[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015

[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015

[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp

stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-

ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016

[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004

[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016

[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018

[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017

[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017

[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015

[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000

[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018

[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018

10 Advances in Fuzzy Systems

[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012

[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017

[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772

[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015

[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 6: Exploring the Performance of Tagging for the Classical and ...downloads.hindawi.com/journals/afs/2019/6254649.pdf · ResearchArticle Exploring the Performance of Tagging for the Classical

6 Advances in Fuzzy SystemsTa

ble3

Thet

agse

toft

heStan

ford

Ara

bicm

odel

tagg

er

Tag

Mea

ning

with

exam

ples

Ta

gM

eani

ngwith

exam

ples

1DTJ

JDT

+Ad

jective

16PR

PPe

rson

alpr

onou

n13

13 )6

7 8 9

7 lt

713 = )

)

(gt

lt 7

7A

2DTJ

JRDT

+Ad

jective

com

parativ

e17

PRP$

Possessiv

epro

noun

13

13 (13 ) -

13 ) 6

A=

7(

7B

C

D

(gt

E 3

DTN

NDT

+Nou

nsin

gularo

rmass

18RB

Adve

rb F

0G

7H(

713 gt(

713 -

=

2(

F I

J

75(

4DTN

NP

DT

+Pr

oper

noun

sin

gular

19RP

Particle

F

amp0G

K

1 713

(13 )

713 L

A )

F

79

75

DTN

NS

DT

+Nou

nplur

al20

VB

Verb

the

impe

rativ

efor

mD

0G

13 M-J

713 M( 9

7

13 M(

1

7

13 7N

7

6IN

Prep

ositi

onor

subo

rdin

atin

gco

njun

ction

(B

+|

F

)21

VBD

Verb

pas

tten

se

F

O(

2(P713

(13 7 amp

7JJ

Adjective

22VBG

Verb

ger

undor

presen

tpar

ticiple

QH13

7( 13 amp

713 R

+13 )6

J 7D gtG

713 ( 13 713

8JJR

Adjective

com

parativ

e23

VBN

Verb

pastp

artic

iple

13 (13 ) (

13 S13 T

13 13 ) 6

U 7

BC V

7 U

HWX

- 0

13 13

7

7(13 )

9NN

Nou

nsin

gularo

rmas

s24

VBP

Verb

non

3rdpe

rson

singu

larp

rese

ntQ

- R

N( +

13 0G

13 7

E lt 7

Y (13

10NNP

Prop

erno

uns

ingu

lar

25VN

Verb

3rd

person

singu

larp

rese

ntQ13 amp

713

X 1R

amp0G

0

7

2(

7Z [

H )

0G

11NNS

Nou

nplur

al26

WP

Whpr

onou

n

H6

0G

7

B

13 D

0G

13 M(

713 M( I

713 M(13 13

12NOUN

QUA

NT

Nou

nqu

antit

y27

WRB

Whad

verb

8

7P7

J 0J 7D

2(

F I

( -P7

] V

7J

13CC

Coo

rdin

atin

gco

njun

ction

28ADJNUM

Adjective

Num

eric

]=

F[

7(-V7[

7^J lt

(

13 S13

J (J

713

7D (1

14

CDCa

rdin

alnu

mbe

r29

UH

Interje

ctionun

usua

lkin

dof

wor

dQE

7$

P7EWR

(13

D 7J M

$J 7 C ) 7

13 8( amp

15DT

Dem

onstr

ativep

rono

uns

13 (J 9

_( gtG

7`

7Z

7

Advances in Fuzzy Systems 7

Tabl

e4

Thee

xper

imen

talr

esults

Mea

sure

Total

Then

umbe

rofim

perativ

everbs

inth

eHolyQur

an18

48ve

rbs

Then

umbe

rofim

perativ

everbs

after

rem

ovin

gdu

plicates

741v

erbs

Then

umbe

rofw

ords

tagg

edas

impe

rativ

everbs

(ie

VB)

282wor

dsTh

ecor

rectly

tagg

edim

perativ

everbs

byco

mpa

ringwith

thec

orrect

list

This

lead

stoth

ecor

rectly

tagg

edlis

ttha

tinc

lude

s

54wor

dsA

7

= 7

- 13 )13

( 7[ aJ 3 47

(

7[ aJ 3 4(

7 (

7

7

[C 6

7N

( 7[

(

713 13

7[

C 6(

7 (

7P7[

C 13 amp(

7

713

7N

7

( 7[

V 7

amp(

7 2 (

7 ) amp7

7

13 7

7[

( 6[

77Y

7 13 (

7 V

7

7

V7

7

(

7

7 = 47

1

7 amp

713 [

7

(

7

[ = 47

6

7[

bG(

7[

amp(

7

13 7 amp

7N

6(

7 X )

713 )

13 c(

713

7-

amp(

7

13 C amp(

Ther

estist

hewro

ngly

classifi

edve

rbs(

iec

lassifi

edou

tofV

Btag)

This

listi

nclude

sfori

nstanc

e68

7

[7

( 7

7(

(

7(

7

[

7$ 7

7

E W

[7

0 7

( [

7 [

This

listi

nten

tiona

llyco

ntains

wor

dsth

atha

veth

esam

eroo

tldquo equiv

enterrdquoTh

isgive

sanin

dica

tionof

ther

ichn

essa

ndth

ederivative

natu

reof

theA

rabicl

angu

age

Accu

racy=

Cor

rectly

tagg

ged

impe

rativ

eve

rbs

All

impe

rativ

eve

rbs

Accu

racy=54

741=728

8 Advances in Fuzzy Systems

Figure 2 A part of Quranic testing set

Figure 3 A part of the Stanford tagger output

Figure 4 The running command of the Stanford tagger

Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort

towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research

Advances in Fuzzy Systems 9

References

[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000

[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814

[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017

[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef

ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006

[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013

[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011

[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008

[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016

[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002

[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006

[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007

[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008

[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018

[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017

[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015

[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015

[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012

[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003

[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015

[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015

[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015

[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015

[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp

stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-

ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016

[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004

[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016

[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018

[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017

[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017

[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015

[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000

[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018

[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018

10 Advances in Fuzzy Systems

[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012

[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017

[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772

[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015

[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 7: Exploring the Performance of Tagging for the Classical and ...downloads.hindawi.com/journals/afs/2019/6254649.pdf · ResearchArticle Exploring the Performance of Tagging for the Classical

Advances in Fuzzy Systems 7

Tabl

e4

Thee

xper

imen

talr

esults

Mea

sure

Total

Then

umbe

rofim

perativ

everbs

inth

eHolyQur

an18

48ve

rbs

Then

umbe

rofim

perativ

everbs

after

rem

ovin

gdu

plicates

741v

erbs

Then

umbe

rofw

ords

tagg

edas

impe

rativ

everbs

(ie

VB)

282wor

dsTh

ecor

rectly

tagg

edim

perativ

everbs

byco

mpa

ringwith

thec

orrect

list

This

lead

stoth

ecor

rectly

tagg

edlis

ttha

tinc

lude

s

54wor

dsA

7

= 7

- 13 )13

( 7[ aJ 3 47

(

7[ aJ 3 4(

7 (

7

7

[C 6

7N

( 7[

(

713 13

7[

C 6(

7 (

7P7[

C 13 amp(

7

713

7N

7

( 7[

V 7

amp(

7 2 (

7 ) amp7

7

13 7

7[

( 6[

77Y

7 13 (

7 V

7

7

V7

7

(

7

7 = 47

1

7 amp

713 [

7

(

7

[ = 47

6

7[

bG(

7[

amp(

7

13 7 amp

7N

6(

7 X )

713 )

13 c(

713

7-

amp(

7

13 C amp(

Ther

estist

hewro

ngly

classifi

edve

rbs(

iec

lassifi

edou

tofV

Btag)

This

listi

nclude

sfori

nstanc

e68

7

[7

( 7

7(

(

7(

7

[

7$ 7

7

E W

[7

0 7

( [

7 [

This

listi

nten

tiona

llyco

ntains

wor

dsth

atha

veth

esam

eroo

tldquo equiv

enterrdquoTh

isgive

sanin

dica

tionof

ther

ichn

essa

ndth

ederivative

natu

reof

theA

rabicl

angu

age

Accu

racy=

Cor

rectly

tagg

ged

impe

rativ

eve

rbs

All

impe

rativ

eve

rbs

Accu

racy=54

741=728

8 Advances in Fuzzy Systems

Figure 2 A part of Quranic testing set

Figure 3 A part of the Stanford tagger output

Figure 4 The running command of the Stanford tagger

Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort

towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research

Advances in Fuzzy Systems 9

References

[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000

[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814

[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017

[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef

ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006

[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013

[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011

[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008

[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016

[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002

[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006

[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007

[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008

[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018

[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017

[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015

[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015

[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012

[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003

[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015

[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015

[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015

[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015

[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp

stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-

ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016

[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004

[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016

[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018

[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017

[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017

[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015

[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000

[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018

[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018

10 Advances in Fuzzy Systems

[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012

[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017

[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772

[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015

[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 8: Exploring the Performance of Tagging for the Classical and ...downloads.hindawi.com/journals/afs/2019/6254649.pdf · ResearchArticle Exploring the Performance of Tagging for the Classical

8 Advances in Fuzzy Systems

Figure 2 A part of Quranic testing set

Figure 3 A part of the Stanford tagger output

Figure 4 The running command of the Stanford tagger

Similarly the tagger used in the old text is different thanone that is prepared for MSA The tweets are also differentfrom MSA This is the main observation of this study as theperformance of theMSAbased tagger sharply declines for theclassical text The study also shows the differences betweenthe literature tagsets which promotes a better study and workfor a standard tagset that thoroughly covers the languageHowever preparing a comprehensive tagset requires anextensive double check of the transition probabilities betweenall tags since zero probabilities might give errors especially inHMM based taggers As a future work it might be good tomerge between hand-crafted rules and statistical approachesfor the PoS tagging It is also important to consider wordsegmentation before tagging as many Arabic words containdifferent tags such as a preposition and a noun for exampleas in the wordldquo 13 ( equiv at schoolrdquo Finally the Arabiclanguage is characterized by sizeable vocabulary as well asan extremely rich morphology that requires an endless effort

towards optimal NLP systems It is worth indicating [39 40]as they have a thorough discussion of the Arabic challengesas well as some recent Arabic NLP contribution such asstemming corpora and classifiers

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The authors would like to thank the Palestine PolytechnicUniversity (PPU) and the Palestine Technical UniversityndashKadoorie for their support to conduct this research

Advances in Fuzzy Systems 9

References

[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000

[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814

[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017

[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef

ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006

[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013

[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011

[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008

[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016

[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002

[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006

[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007

[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008

[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018

[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017

[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015

[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015

[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012

[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003

[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015

[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015

[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015

[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015

[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp

stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-

ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016

[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004

[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016

[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018

[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017

[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017

[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015

[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000

[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018

[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018

10 Advances in Fuzzy Systems

[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012

[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017

[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772

[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015

[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 9: Exploring the Performance of Tagging for the Classical and ...downloads.hindawi.com/journals/afs/2019/6254649.pdf · ResearchArticle Exploring the Performance of Tagging for the Classical

Advances in Fuzzy Systems 9

References

[1] K Toutanova and C D Manning ldquoEnriching the knowledgesources used in a maximum entropy part-of-speech taggerrdquo inProceedings of the the 2000 Joint SIGDAT conference pp 63ndash70Hong Kong October 2000

[2] ldquoTheQuran ImperativeVerbsrdquo httpjamharahnetshowthreadphpp=51814

[3] I Zeroual A Lakhouaja and R Belahbib ldquoTowards a standardPart of Speech tagset for the Arabic languagerdquo Journal of KingSaud University - Computer and Information Sciences vol 29no 2 pp 171ndash178 2017

[4] ldquoStanford taggerrdquo httpsnlpstanfordedusoftwaretaggershtml[5] D Chiang M Diab N Habash O Rambow and S Shareef

ldquoParsing arabic dialectsrdquo in Proceedings of the 11th Conferenceof the European Chapter of the Association for ComputationalLinguistics EACL 2006 pp 369ndash376 Italy April 2006

[6] D E M A Abuzeina and M H Alsaheb ldquoCapturing theCommon Syntactical Rules for the Holy Quran A Data MiningApproachrdquo in Proceedings of the Taibah University InternationalConference on Advances in Information Technology for the HolyQuran and Its Sciences NOORIC 2013 pp 670ndash680 SaudiArabia December 2013

[7] D AbuZeina W Al-Khatib M Elshafei and H Al-MuhtasebldquoToward enhanced Arabic speech recognition using part ofspeech taggingrdquo International Journal of Speech Technology vol14 no 4 pp 419ndash426 2011

[8] B Farber D Freitag N Habash and O Rambow ldquoImprovingNER in Arabic using a morphological taggerrdquo in Proceedingsof the 6th International Conference on Language Resources andEvaluation LREC 2008 pp 2509ndash2514 Morocco May 2008

[9] A Shahrour et al ldquoCamelparser A system for arabic syntacticanalysis and morphological disambiguationrdquo in Proceedingsof the of COLING 2016 the 26th International Conference onComputational Linguistics System Demonstrations 2016

[10] D Gildea and D Jurafsky ldquoAutomatic labeling of semanticrolesrdquo Computational Linguistics vol 28 no 3 pp 245ndash2882002

[11] J R Bellegarda Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis vol 719 006 6 2014US Patent No 8719006

[12] R Beutler Improving speech recognition through linguisticknowledge Diss ETH Zurich 2007

[13] O Etzioni M Banko S Soderland and D S Weld ldquoOpeninformation extraction from the webrdquo Communications of theACM vol 51 no 12 pp 68ndash74 2008

[14] A Z Arifin M Z Abdullah A W Rosyadi D I Ulumi AWahib and R W Sholikah ldquoSentence Extraction Based onSentence Distribution and Part of Speech Tagging for Multi-Document Summarizationrdquo TELKOMNIKA Telecommunica-tion Computing Electronics and Control vol 16 no 2 p 8432018

[15] E Cambria S Poria A Gelbukh and M Thelwall ldquoSentimentAnalysis Is a Big Suitcaserdquo IEEE Intelligent Systems vol 32 no6 pp 74ndash80 2017

[16] A Shahrour S Khalifa and N Habash ldquoImproving Arabicdiacritization through syntactic analysisrdquo in Proceedings ofthe Conference on Empirical Methods in Natural LanguageProcessing EMNLP 2015 pp 1309ndash1315 Portugal September2015

[17] N Ibrahim and F Khamayseh A Semi-Automated Generationof Activity Diagrams from Arabic User Requirements 2015

[18] Q Zhonghua and Y Liu ldquoSentence Dependency Tagging inOnline Question Answering Forumsrdquo in Proceedings of the 50thAnnualMeeting of the Association for Computational Linguisticspp 554ndash562 Jeju Republic of Korea 2012

[19] P Koehn F J Och and D Marcu ldquoStatistical phrase-basedtranslationrdquo inProceedings of the Conference of theNorth Ameri-can Chapter of the Association for Computational Linguistics pp48ndash54 Edmonton Canada May 2003

[20] A S Hussein ldquoA plagiarism detection system for ArabicdocumentsrdquoAdvances in Intelligent Systems andComputing vol323 pp 541ndash552 2015

[21] M Nabil A F Atiya and M Aly ldquoNew approaches for extract-ing Arabic keyphrasesrdquo in Proceedings of the 1st InternationalConference on Arabic Computational Linguistics ACLing 2015pp 133ndash137 Egypt April 2015

[22] A Al-Arfaj and A Al-Salman ldquoArabic NLP tools for ontologyconstruction from Arabic text An overviewrdquo in Proceedings ofthe 1st International Conference on Electrical and InformationTechnologies ICEIT 2015 pp 246ndash251 Morocco March 2015

[23] M Al-Mashhadani and N Omar ldquoExtraction of arabic nestednoun compounds based on a hybrid method of linguisticapproach and statistical methodsrdquo Journal of Theoretical andApplied InformationTechnology vol 76 no 3 pp 408ndash416 2015

[24] ldquoQuran Printing Complexrdquo httpswwwqurancomplexorg[25] ldquoThe Stanford natural language processing grouprdquo httpsnlp

stanfordedusoftwaretaggershtml[26] I Zeroual and L Abdelhak ldquoAdapting a decision tree based tag-

ger for Arabicrdquo in Proceedings of the International Conference onInformation Technology for Organizations Development IT4OD2016 Morocco April 2016

[27] M Diab K Hacioglu and D Jurafsky ldquoAutomatic tagging ofArabic textrdquo in Proceedings of the HLT-NAACL 2004 ShortPapers pp 149ndash152 Boston Massachusetts May 2004

[28] A Mohammed et al ldquoProbabilistic arabic part of speechtagger with unknown words handlingrdquo Journal of Theoretical ampApplied Information Technology 2016

[29] R Alharbi12 et al Part-of-Speech Tagging for Arabic Gulf DialectUsing Bi-LSTM 2018

[30] I Zeroual M Boudchiche A Mazroui and A LakhouajaldquoDeveloping and performance evaluation of a new Arabicheavylight stemmerrdquo in Proceedings of the the 2nd internationalConference pp 1ndash6 Tetouan Morocco March 2017

[31] M Abdulkareem and S Tiun ldquoComparative analysis of MLPOS on Arabic tweetsrdquo Journal of Theoretical and AppliedInformation Technology vol 95 no 2 pp 403ndash411 2017

[32] A H Aliwy ldquoCombining POS taggers in master-slaves tech-nique for highly inflected languages as Arabicrdquo in Proceedingsof the 2015 1st International Conference on Cognitive Computingand Information Processing CCIP 2015 India March 2015

[33] D Jurafsky and J H Martin Speech and Language ProcessingAn Introduction to Natural Language Processing ComputationalLinguistics and Speech Recognition Prentice-Hall New Jersey2000

[34] E Mohamed ldquoMorphological Segmentation and Part-of-Speech Tagging for the Arabic Heritagerdquo ACM Transactions onAsian and Low-Resource Language Information Processing vol17 no 3 pp 1ndash13 2018

[35] A Al-Thubaity A Alqarni and A Alnafessah ldquoDo Words withCertain Part of Speech Tags Improve the Performance of ArabicText Classificationrdquo in Proceedings of the the 2nd InternationalConference pp 155ndash161 Lakeland FL USA April 2018

10 Advances in Fuzzy Systems

[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012

[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017

[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772

[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015

[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 10: Exploring the Performance of Tagging for the Classical and ...downloads.hindawi.com/journals/afs/2019/6254649.pdf · ResearchArticle Exploring the Performance of Tagging for the Classical

10 Advances in Fuzzy Systems

[36] S Ramakrishnan et al Part-of-Speech Tagging for Arabic GulfDialect Using Bi-LSTM 2012

[37] M Zampieri S Malmasi N Ljubesic et al ldquoFindings of theVarDial EvaluationCampaign 2017rdquo in Proceedings of the FourthWorkshop on NLP for Similar Languages Varieties and Dialects(VarDial) pp 1ndash15 Valencia Spain April 2017

[38] Y Belinkov et al Evaluating layers of representation in neuralmachine translation on part-of-speech and semantic taggingtasks 2018 arXiv preprint arXiv180107772

[39] F S Al-Anzi andDAbuzeina ldquoStemming impact onArabic textcategorization performance A surveyrdquo in Proceedings of the 5thInternational Conference on Information and CommunicationTechnology and Accessibility ICTA 2015 Morocco December2015

[40] F S Al-Anzi and D AbuZeina ldquoToward an enhanced Arabictext classification using cosine similarity and Latent SemanticIndexingrdquo Journal of King Saud University - Computer andInformation Sciences vol 29 no 2 pp 189ndash195 2017

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Page 11: Exploring the Performance of Tagging for the Classical and ...downloads.hindawi.com/journals/afs/2019/6254649.pdf · ResearchArticle Exploring the Performance of Tagging for the Classical

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom