an empirical evaluation of arabic-specific ... - loria · anempiricalevaluationofarabic-specific...
TRANSCRIPT
An Empirical Evaluation of Arabic-specificEmbeddings for Sentiment Analysis
Amira Barhoumi, Nathalie Camelin, Chafik Aloulou, YannickEstève, Lamia Belguith
LIUM, Le Mans University, France MIRACL, Sfax university, Tunisia
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Introduction
Sentiment Analysis (SA) framework :- given a textual statement,- identify its polarity : positive or negative.
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 1 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
State of the art
Recent works use Deep learning techniques :based on Convolutional Neural Network (CNN)([Dahou et al., 2016], [Alayba et al., 2018],[Dahou et al., 2019]).based on Long Short-Term Memory (LSTM) ([Hassan, 2017],[Heikal et al., 2018], [Al-Smadi et al., 2018]).
Network inputs = word embeddings.Words = space separator units.Do not take into account specificity of Arabic language (agglu-tination and morphological richness) in the embedding space.
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 2 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Specificity of Arabic languageEmbedding Models
Specificity of Arabic language : Agglutination andmorphological richness
AgglutinationPhrases could be composed with only one word.
i.e. : ½J.j. «@�@ (Do you like it ?)
Word = inflected form + clitics (proclitics and enclitics).
Word TranslationDecomposition
Inflected form CliticséJ.j. ªJ�@ will he like it ?
I. j. ªK
@ + � + è
½J.j. ªJ�¯ you will like it
¬ + � + ¼
ÑîD.j. ªKð and they like it ð + Ñë
Morphological richness : root + schemesi.e. : conjugation of I. j. « :
àð�J.j. «�K,à@�J.j. «�K,
�à�J.j. «�K
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 3 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Specificity of Arabic languageEmbedding Models
Specificity of Arabic language : Agglutination andmorphological richness
6 different lexical units :
word : space separator unit.token : inflected form and clitics (Farasa tool 1).token\clitics :inflected form (Farasa tool).Clitics do not usually affect the polarity of words.Lemma : canonic form (Farasa tool).stem : root of word (Tashaphyne tool 2).light stem : stem + infixes (Arabic light stemmer 3).
Granularity of words could impact the embedding quality.
1. http://qatsdemo.cloudapp.net/farasa/
2. https://pypi.org/project/Tashaphyne/
3. https://github.com/motazsaad/arabic-light-stemming-py
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 4 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Specificity of Arabic languageEmbedding Models
Embedding Models
Word2vec (w2v) [Mikolov et al., 2013b] :words sharing common context are closely located in theembedding space.
Figure – Skip-gram and CBOW architectures [Mikolov et al., 2013a].
Skip-gram � CBOW [Dahou et al., 2016, Barhoumi et al., 2018]
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 5 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Specificity of Arabic languageEmbedding Models
Embedding Models
Fasttext (ft) [Bojanowski et al., 2016] :
Extension of word2vec.embeddings for unseen rare words.sub-word (n-gram character) information.i.e. : for the word "where", add 2 symbols (< and >) <where>if n = 2, sub-words = {<w , wh, he, er, re, e> }
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 6 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
CNN-based systemBiLSTM-based system
Document representation
Document length n :- Hypothesis : length distribution follows Gaussian law.
n = mean + 2× standard deviation (1)
- Empirical rule : Probability of documents ≈ 0.9545An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 7 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
CNN-based systemBiLSTM-based system
Document representation
What type of padding/truncating (Begin, end, extremities) ?
Protocol :
Splitting the document into 3 parts.Analyse polar words and negation terms for each part.Informativeness of each part :- First part : post padding/truncating.- Second part : padding/truncating on extremities.- Last part : pre padding/truncating
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 8 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
CNN-based systemBiLSTM-based system
DNN Component
CNN-based system.BiLSTM-based system.
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 9 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
CNN-based systemBiLSTM-based system
CNN-based system
Architecture similar to [Dahou et al., 2016].
ةياورةليمجتعتمتسااقحاهتءارقب
Positive :) Negative :(
n x k representation of review with non static channel
Convolutional layer with multiple filters
Max-pooling Output layer
Figure – CNN architecture for a given review.
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 10 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
CNN-based systemBiLSTM-based system
BiLSTM-based system
Architecture similar to [Ma and Hovy, 2016].
LSTM LSTM LSTM LSTM LSTM
LSTM LSTM LSTM LSTM LSTM
Majority vote
Positive :) Negative :(
Backward LSTM
Farward LSTM
Character-based embedding
Unit embedding
ة ل ي م Paddingج Padding
Character embedding
Convolution
Max pooling
Character-based word embedding
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 11 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Corpora for embedding trainingCorpus LABRResults
Corpora for embedding training
Global dataset = fusion of existing Arabic SA and newspapercorpora :
BRAD [Elnagar et al., 2018b] (510k book reviews).HARD [Elnagar et al., 2018a](373K hotel reviews).LABR [Mahmoud et al., 2014] (only train set composed of23K book reviews).AbuELkhair [El-Khair, 2016] (5222k news).
Cleaning and normalisation of Global dataset.
GlobalLexical unit
Word token token\clitics lemma light stem stemSize 3000k 1980k 1555k 950k 1997k 280K
Embedding dimension = 300 [Dahou et al., 2016,Soliman et al., 2017, Bojanowski et al., 2017]
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 12 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Corpora for embedding trainingCorpus LABRResults
Corpus LABR
- LABR corpus Large-scale Arabic Book Reviews [Mahmoud et al., 2014] :63257 Arabic book reviews (rating scale from 1 to 5) :
LABR * ** *** **** ***** totaltrain 4 195 8 554 9 561 10 136 17 960 50 606test 1 090 2 001 2 440 3 864 3 256 12 651
- Dataset construction in binary classification framework :Negative reviews : 1 or 2 stars.Positive reviews : 4 or 5 stars.
- Dataset repartition :train set : 90% of considered train LABRvalidation set : 10% of considered train LABRtest set : considered test LABR
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 13 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Corpora for embedding trainingCorpus LABRResults
Results
CNN BiLSTM
Unit Emb. A P R F1 A P R F1
wordw2v 91.1 85.7 76.3 80.7 90.9 83.5 78.3 80.8
ft 91.2 85.2 77.2 81 90.9 83.1 79 81
tokenw2v 91.2 85.8 76.7 81 90.8 83.8 77.3 80.4
ft 91.2 85.8 76.7 81 90.7 83.8 76.7 80.1
token\cliticsw2v 91.2 86.4 76.0 80.9 90.8 83.8 77.2 80.4
ft 91.2 85.7 76.7 80.9 90.9 84.6 76.7 80.4
lemmaw2v 91.5 85.8 78 81.7 91 84.3 76.9 80.4
ft 91.4 87 76.4 81.4 91 83.6 78.5 81
light stemw2v 91.2 85.6 76.8 81 90.8 83 78 80.4
ft 91.4 86.3 76.8 81.3 90.7 83.3 77.4 80.3
stemw2v 89 81.2 69.8 75.1 89.3 80.2 73.9 76.9
ft 88.8 81.3 69 74.6 89.1 79.9 73.4 76.5
Table – System evaluations with Arabic-specific embeddings (A = accuracy , P = precision, R =
recall, F1 = F1 measure), where 1st and 2nd best performances.
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 14 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Corpora for embedding trainingCorpus LABRResults
Results
CNN BiLSTM
Unit Emb. A P R F1 A P R F1
wordw2v 91.1 85.7 76.3 80.7 90.9 83.5 78.3 80.8
ft 91.2 85.2 77.2 81 90.9 83.1 79 81
tokenw2v 91.2 85.8 76.7 81 90.8 83.8 77.3 80.4
ft 91.2 85.8 76.7 81 90.7 83.8 76.7 80.1
token\cliticsw2v 91.2 86.4 76.0 80.9 90.8 83.8 77.2 80.4
ft 91.2 85.7 76.7 80.9 90.9 84.6 76.7 80.4
lemmaw2v 91.5 85.8 78 81.7 91 84.3 76.9 80.4
ft 91.4 87 76.4 81.4 91 83.6 78.5 81
light stemw2v 91.2 85.6 76.8 81 90.8 83 78 80.4
ft 91.4 86.3 76.8 81.3 90.7 83.3 77.4 80.3
stemw2v 89 81.2 69.8 75.1 89.3 80.2 73.9 76.9
ft 88.8 81.3 69 74.6 89.1 79.9 73.4 76.5
Table – System evaluations with Arabic-specific embeddings (A = accuracy , P = precision, R =
recall, F1 = F1 measure), where 1st and 2nd best performances.
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 14 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Corpora for embedding trainingCorpus LABRResults
Results
CNN BiLSTM
Unit Emb. A P R F1 A P R F1
wordw2v 91.1 85.7 76.3 80.7 90.9 83.5 78.3 80.8
ft 91.2 85.2 77.2 81 90.9 83.1 79 81
tokenw2v 91.2 85.8 76.7 81 90.8 83.8 77.3 80.4
ft 91.2 85.8 76.7 81 90.7 83.8 76.7 80.1
token\cliticsw2v 91.2 86.4 76.0 80.9 90.8 83.8 77.2 80.4
ft 91.2 85.7 76.7 80.9 90.9 84.6 76.7 80.4
lemmaw2v 91.5 85.8 78 81.7 91 84.3 76.9 80.4
ft 91.4 87 76.4 81.4 91 83.6 78.5 81
light stemw2v 91.2 85.6 76.8 81 90.8 83 78 80.4
ft 91.4 86.3 76.8 81.3 90.7 83.3 77.4 80.3
stemw2v 89 81.2 69.8 75.1 89.3 80.2 73.9 76.9
ft 88.8 81.3 69 74.6 89.1 79.9 73.4 76.5
Table – System evaluations with Arabic-specific embeddings (A = accuracy , P = precision, R =
recall, F1 = F1 measure), where 1st and 2nd best performances.
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 15 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Corpora for embedding trainingCorpus LABRResults
Results
CNN BiLSTM
Unit Emb. A P R F1 A P R F1
wordw2v 91.1 85.7 76.3 80.7 90.9 83.5 78.3 80.8
ft 91.2 85.2 77.2 81 90.9 83.1 79 81
tokenw2v 91.2 85.8 76.7 81 90.8 83.8 77.3 80.4
ft 91.2 85.8 76.7 81 90.7 83.8 76.7 80.1
token\cliticsw2v 91.2 86.4 76.0 80.9 90.8 83.8 77.2 80.4
ft 91.2 85.7 76.7 80.9 90.9 84.6 76.7 80.4
lemmaw2v 91.5 85.8 78 81.7 91 84.3 76.9 80.4
ft 91.4 87 76.4 81.4 91 83.6 78.5 81
light stemw2v 91.2 85.6 76.8 81 90.8 83 78 80.4
ft 91.4 86.3 76.8 81.3 90.7 83.3 77.4 80.3
stemw2v 89 81.2 69.8 75.1 89.3 80.2 73.9 76.9
ft 88.8 81.3 69 74.6 89.1 79.9 73.4 76.5
Table – System evaluations with Arabic-specific embeddings (A = accuracy , P = precision, R =
recall, F1 = F1 measure), where 1st and 2nd best performances.
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 15 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Corpora for embedding trainingCorpus LABRResults
Results
CNN BiLSTM
Unit Emb. A P R F1 A P R F1
wordw2v 91.1 85.7 76.3 80.7 90.9 83.5 78.3 80.8
ft 91.2 85.2 77.2 81 90.9 83.1 79 81
tokenw2v 91.2 85.8 76.7 81 90.8 83.8 77.3 80.4
ft 91.2 85.8 76.7 81 90.7 83.8 76.7 80.1
token\cliticsw2v 91.2 86.4 76.0 80.9 90.8 83.8 77.2 80.4
ft 91.2 85.7 76.7 80.9 90.9 84.6 76.7 80.4
lemmaw2v 91.5 85.8 78 81.7 91 84.3 76.9 80.4
ft 91.4 87 76.4 81.4 91 83.6 78.5 81
light stemw2v 91.2 85.6 76.8 81 90.8 83 78 80.4
ft 91.4 86.3 76.8 81.3 90.7 83.3 77.4 80.3
stemw2v 89 81.2 69.8 75.1 89.3 80.2 73.9 76.9
ft 88.8 81.3 69 74.6 89.1 79.9 73.4 76.5
Table – System evaluations with Arabic-specific embeddings (A = accuracy , P = precision, R =
recall, F1 = F1 measure), where 1st and 2nd best performances.
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 15 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Corpora for embedding trainingCorpus LABRResults
System combination
Combination protocols :Oracle (ideal combination) and Consensus (agreement).
CNN BiLSTM
Protocol Coverage Accuracy Coverage Accuracy
All\stemOracle 100% 100% 100% 100%
Consensus 86.57% 100% 81.70% 100%
2 BestOracle 100% 92.3% 100% 92.5%
Consensus 98.25% 92.2% 96.96% 92.2%
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 16 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Corpora for embedding trainingCorpus LABRResults
System combination
Combination protocols :Oracle (ideal combination) and Consensus (agreement).
CNN BiLSTM
Protocol Coverage Accuracy Coverage Accuracy
All\stemOracle 100% 100% 100% 100%
Consensus 86.57% 100% 81.70% 100%
2 BestOracle 100% 92.3% 100% 92.5%
Consensus 98.25% 92.2% 96.96% 92.2%
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 16 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Corpora for embedding trainingCorpus LABRResults
2 Best CNN Combination frame : non-consensus analysis
Number of non-consensus reviews = 146 reviews :
68 positive (19 with 5* and 49 with 4*)78 negative (20 with 1* et 58 with 2*).
Examples
Mixedreview
ÉJÔg.
H. ñÊ�B�@ áºË
�éKYJÊ
�®�K ©J
�@
�ñÖÏ @ Traditional topics but the style is beautiful.
�é
®Jª
�
�H@
�Yg@
�ð ø
ñ
�¯ H. ñÊ�@ Strong style and weak events
Authorevaluation
�A�JË @
àñJªK. @
�Q�J.» ù��®J. JË
�éK. A
��JºË@ ¼Q�
�Kà@
à@
�YK
P úΫ ú
�Ϋ Aly Zaydan He has to leave writing to stay
big with people’s eyes
I.�KA
�¾Ë@
�éÒJ
�¯ áÓ É�
�Ê�®K
��Jʪ
�K ø
A�¯ ,
��Jʪ
�K B
�No comment, Any comment reduces the au-thor value
Starnumber
á�
��J�Òm.
�' ZA
�¢«A
�K. @
�Yg.
�éÖßQ»
�I
J» Y
�®Ë I have been very generous giving two stars
Õæ
�®�JË @ ú
¯
�éÒm.
�' �
I��®
K @ lost one star in the evaluation
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 17 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Conclusion
Arabic sentiment analysis framework.Specificity of Arabic language : agglutination andmorphological richness.12 Arabic-specific embedding sets : unit_Emb, where- unit ∈ {word, token, token\clitics, lemma, light stem, stem}- Emb ∈ {w2v, ft}.
2 neural systems : CNN-based and BiLSTM-based.
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 18 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Conclusion
Lemma ≈ the most appropriate lexical unit for Arabic SA.Best performance in this work = 91.5% of accuracyAccuracy of [Barhoumi et al., 2018] = 89.3%.
Choice of document length and padding/truncating type = + 1Similar domain corpora for training embeddings = + 0.8Lemma embeddings = + 0.4
- Note that :- Size (word embedding space) = 3000k- Size (lemma embedding space) = 950k
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 19 / 22
IntroductionArabic-specific embeddingsSentiment analysis systemsExperimental Framework
Conclusion and Future works
Future works
Evaluation of different Arabic-specific embeddings in otherNLP tasks (POS tagging, NER, syntactic and semanticanalogies).Sentiment embeddings [Yu et al., 2017].Contextualized embeddings ELMO [Peters et al., 2018].
An Empirical Evaluation of Arabic-specific Embeddings for Sentiment Analysis 20 / 22
Thank you for yourattention :)
ReferencesAl-Smadi, M., Talafha, B., Al-Ayyoub, M., and Jararweh, Y. (2018).Using long short-term memory deep neural networks for aspect-based sentiment analysis of arabicreviews.International Journal of Machine Learning and Cybernetics, pages 1–13.
Alayba, A. M., Palade, V., England, M., and Iqbal, R. (2018).Improving sentiment analysis in arabic using word representation.In 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition(ASAR), pages 13–18. IEEE.
Barhoumi, A., Camelin, N., and Estève, Y. (2018).Des représentations continues de mots pour l’analyse d’opinions en arabe : une étude qualitative.In 25e conférence sur le Traitement Automatique des Langues Naturelles (TALN 2018), Rennes,France.
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2016).Enriching word vectors with subword information.arXiv preprint arXiv :1607.04606.
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017).Enriching word vectors with subword information.Transactions of the Association for Computational Linguistics, 5 :135–146.
Dahou, A., Elaziz, M. A., Zhou, J., and Xiong, S. (2019).Arabic sentiment classification using convolutional neural network and differential evolutionalgorithm.Computational intelligence and neuroscience, 2019.
Dahou, A., Xiong, S., Zhou, J., Haddoud, M. H., and Duan, P. (2016).Word embeddings and convolutional neural network for arabic sentiment classification.In Proceedings of COLING 2016, the 26th International Conference on ComputationalLinguistics : Technical Papers, pages 2418–2427.
El-Khair, I. A. (2016).1.5 billion words arabic corpus.arXiv preprint arXiv :1611.04033.
Elnagar, A., Khalifa, Y. S., and Einea, A. (2018a).Hotel arabic-reviews dataset construction for sentiment analysis applications.In Intelligent Natural Language Processing : Trends and Applications, pages 35–52. Springer.
Elnagar, A., Lulu, L., and Einea, O. (2018b).An annotated huge dataset for standard and colloquial arabic reviews for subjective sentimentanalysis.Procedia computer science, 142 :182–189.
Hassan, A. (2017).Sentiment analysis with recurrent neural network and unsupervised neural language model.
Heikal, M., Torki, M., and El-Makky, N. (2018).Sentiment analysis of arabic tweets using deep learning.Procedia Computer Science, 142 :114–122.
Ma, X. and Hovy, E. (2016).End-to-end sequence labeling via bi-directional lstm-cnns-crf.In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics(Volume 1 : Long Papers), volume 1, pages 1064–1074.
Mahmoud, N., Aly, M., and Atiya, A. (2014).Labr 2.0 : Large scale arabic sentiment analysis benchmark.arXiv e-print (arXiv :1411.6718).
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013a).Efficient estimation of word representations in vector space.arXiv preprint arXiv :1301.3781.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013b).Distributed representations of words and phrases and their compositionality.In Advances in neural information processing systems, pages 3111–3119.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L.(2018).Deep contextualized word representations.In Proc. of NAACL.
Soliman, A. B., Eissa, K., and El-Beltagy, S. R. (2017).Aravec : A set of arabic word embedding models for use in arabic nlp.Procedia Computer Science, 117 :256–265.
Yu, L.-C., Wang, J., Lai, K. R., and Zhang, X. (2017).Refining word embeddings for sentiment analysis.In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,pages 534–539.