an overview of text mining and sentiment analysis for decision support system
TRANSCRIPT
An Overview of Text Mining and Sentiment Analysis
- for Decision Support System
Gan Keng Hoon
School of Computer Sciences
Universiti Sains Malaysia
12 May 2015
Outlines
1. Decision Support Systems
2. Overview of Text Mining & Sentiment Analysis Techniques in Text Mining
Techniques in Sentiment Analysis
3. Applications and Challenges ahead.
Decision Support SystemAs an end user, every day, we need to make decision ..
What to eat for lunch? What
subject to choose?
Which hotel to stay?
Decision Support System
every hour/minute/second, business provider needs to make crucialdecision ..
Source: http://attunelive.com/blog/how-a-screening-prompted-by-clinical-decision-support-system-helped-save-a-patients-life/
As a business provider,
Decision Support System
Source: http://www.informationbuilders.com/decision-support-systems-dss
Decision maker in a company checks the sales before decide which product to promote ..
Decision Support System
A hotelier wants to know why ..
If location is good, how can I take advantage ..
Why are they/we using Decision Support System
Business provider Improve customer
experience
Improve products and services
More returns …
End user Better purchasing choice
Better value
Happier ..
Sample Decision Support System
Looks good, 155 person says Very
Good…
Not bad, customers rated 4
* and above for location,
cleanliness ..
http://www.tripadvisor.com.my
The Truth ?
http://www.tripadvisor.com.my
Many Questions …
Mr X: How is the condition of Wifi?
Miss Y: Is the toilet really dirty?
Family Z: Any convenience store nearby?
Manager of Hotel: I want to know all the complaints about toilet!
Harnessing Web and Social Texts
Very influential.
Latest and most updated.
The truth (but sometimes not).
Free (most of the time).
Source: Hotel Review Sites: What’s the ‘Truth’ About Fairness? http://www.hospitalitynet.org/news/4056065.html
However. With No Automation Methods
It is impossible to scan through each of them. Important details could be missed.
It is hard to visualize or summarize all the texts via manual effort.
It is impossible to digest new reviews generated each day.
*There are 344 reviews (as of 10/5/2015) for the mentioned hotel.
Overview of Text Mining & Sentiment Analysis
Is the toilet really dirty?
Text Mining- Let’s mine some texts to answer the question.
1. in the bathroom, used toiletries (shampoo & soap) were not thrown and were left in the shower area
2. dirty sink, and very verydirty shower glass wall.
3. the shower, it's clean...
Sentiment Analysis- Let’s find some sentiments about these texts.
Techniques in Text Mining
What is text mining?
To exploit information contained in textual documents in various ways.
Natural Language Processing
Information Retrieval
Information Retrieval- Find relevant sentences.
Document Collection Processing1. Texts Preprocessing
Sentence Tokenizer
Stop Word Removal
2. Feature Selection Bags of Words Approach
Term Frequency Inversed Document Frequency
3. Inverted Index Creation Term – Doc Posting
Information Retrieval- Find relevant sentences.
Query Processing1. Intention as Query
2. Query Preprocessing Tokenization
Expansion using Synonym
3. Query-Doc Matching Ranking
Information Retrieval- Find relevant sentences.
Simple and fast Quickly retrieve all relevant sentences or
documents given some keywords. But losses detail like sentence structure,
word order. Context is not captured.
E.g. a term “cold” may be referring to air cond is cold or the receptionist is cold.
Natural Language Processing
Source: Cheng Xiang Zhai, Text Retrieval and Search Engine, Coursera Slide.
Natural Language Processing
Difficult because we assume the hearer has some background knowledge.
Not only surface analysis of text is required.
Need common sense analysis. E.g. I can write words on that dusty
table top.
Techniques in Sentiment Analysis
Sentence Extractor
Tokenization
Boundary Detection
Sentence Selector
Entity Dictionary
Sentence Categorization
Sentiment Dictionary
Sentiment Extraction
Pre-processing Entity Detection Post-processing
MySQL Database
Browser
Entity Extraction Prediction Rating
Part of Summarev Framework for Entity’s Text Processing and Sentiment Analysis
http://ir.cs.usm.my/siir/project_summarev.php
Entity Detection (or Aspect Selection)
Texts
1. in the bathroom, used toiletries (shampoo & soap) were not thrown and were left in the shower area
2. dirty sink, and very verydirty shower glass wall.
3. the shower, it's clean...
…
Aspect
1. Bathroom
2. Toiletries
3. Shower area
4. Sink
5. Shower
6. Hair dryer
7. Wifi
8. Bed
...
- POS- Tagging
- Noun Phrase Selection
- Term Weighting
Sentiment Extraction
Texts
1. in the bathroom, used toiletries (shampoo & soap) were not thrown and were left in the shower area
2. dirty sink, and very verydirty shower glass wall.
3. the shower, it's clean...
…
Aspect -Sentiment
1. Sink – dirty
2. Shower – clean
3. Shower glass wall - dirty
- POS- Tagging
- Adjective Phrase Selection
Sentiment Scoring
Texts
1. in the bathroom, used toiletries (shampoo & soap) were not thrown and were left in the shower area
2. dirty sink, and very verydirty shower glass wall.
3. the shower, it's clean...
…
Aspect - Sentiment
1. Sink – dirty (N:0.75)
2. Shower – clean (P:0.5)
3. Shower glass wall – dirty (N:0.75)
Source: sentiwordnet.isti.cnr.it
Applications Source: http://www.twtbase.com/twitrratr/
Challenges Ahead
How to detect a more in depth sentiment.
Differentiate the spam and the credible.
Language problem
usage of mixed languages.
Usage of non standard languages.
Challenges Ahead
Last but not least,The challenge is to put the research and solution into real use.