sentiment analysis for serbian language

12
Sentiment analysis of sentences in Serbian language Nikola Milošević

Upload: nikola-milosevic

Post on 11-May-2015

184 views

Category:

Technology


3 download

DESCRIPTION

Sentiment analisis of Serbian languge with stemmer for Serbian.

TRANSCRIPT

Page 1: Sentiment analysis for Serbian language

Sentiment analysis of sentences in Serbian language

Nikola Milošević

Page 2: Sentiment analysis for Serbian language

Why to analyze sentiment in Serbian?

● Great industrial need– Ads websites– Automated market research– Customer satisfaction

● NLP tools for Serbian are not developed– Need for tools and resources– Almost no accessible tools through API

Page 3: Sentiment analysis for Serbian language

Serbian language

● Belongs to Indo-Europian language group● Slavic language● Highly inflectional● 3 pronunciation types● 3 dialect groups● Write as you speak● Latin and Cyrillic

writing system

Page 4: Sentiment analysis for Serbian language

Sentiment analysis work-flow

Page 5: Sentiment analysis for Serbian language

Tokenization and preprocessing

● Process of breaking a stream of text up into words

● Stop-word filtering● Negation handling

– Adding NE_ prefix after negation– All words before punctuation

● Irregular verbs

Page 6: Sentiment analysis for Serbian language

Stemming

● Process for reducing inflected words to their stem, base or root form

● Kešelj and Šipka (2008)● Hand crafted rule based stemmer● ~300 rules

Page 7: Sentiment analysis for Serbian language

Sentiment analysis

● Aim to build binary sentiment analysis● General Serbian language● No annotated corpus for Serbian● Annotation work (~1000 small texts)● Supervised machine learning

Page 8: Sentiment analysis for Serbian language

Naive Bayes

● Algorithm that learns fast● Bag of words approach● Assumption of conditional independence● Laplace smoothing

Page 9: Sentiment analysis for Serbian language

Implementation

● Web API with presentation layer● JSON communication● Secured page for annotating● Build using PHP and MySQL● Web & Android

Page 10: Sentiment analysis for Serbian language

Results

● Stemmer– Smallest and most precise stemmer– 90% correct on news articles– Problems: small words, irregular inflections,

voice changes

● Sentiment analyzer– 80% correct– Problems: Irony, ambiguity, small training

data

Page 11: Sentiment analysis for Serbian language

Future work

● Stemmer– Use snowball framework

– Build multi-step stemmer

● Sentiment analyzer– POS tagging

– Complex negation handling

– SVM algorithm

Page 12: Sentiment analysis for Serbian language

Thank you

● Available from http://inspiratron.org

● Contact: [email protected]