opinion mining on hotel reviews
TRANSCRIPT
![Page 1: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/1.jpg)
Opinion Mining on Hotel Reviews
CSE-4250: PROJECT AND THESIS-I
Presented ByMd. Rafeedul Bar Chowdhury ID:11.02.04.088
Zahidul Haque ID:11.02.04.086Mahmud Hossain ID:11.02.04.082
Soumik Das Bibon ID:11.02.04.003
![Page 2: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/2.jpg)
2
Course Teacher
MIR TAFSEER NAYEEM
Lecturer, Dept. of Computer Science & EngineeringAhsanullah University Of Science & Technology
18-Jun-15
![Page 3: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/3.jpg)
3
What is Opinion?Opinions are thoughts influencing decision making.
• Which schools should I apply to?• Which professor to work for?• Whom should I vote for?• Which hotel should I book?
18-Jun-15
![Page 4: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/4.jpg)
4
Whom shall I ask for opinions?Pre Web
Friends and relatives Acquaintances
Post Web Blogs (google blogs, livejournal) E-commerce sites (amazon, ebay) Review sites (CNET, PC Magazine) Discussion forums (forums.craigslist.org,
forums.macrumors.com)18-Jun-15
![Page 5: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/5.jpg)
5
Have enough opinions!
Now that I have got enough opinions, I can take decisions…
Is it really enough?
18-Jun-15
![Page 6: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/6.jpg)
6
…Having opinions is not enough Searching for reviews is quite difficult
Searching for opinions is not as convenient as general web search
Huge amounts of information available on one topic Difficult and quite impossible to analyze each and every review separately Expression of reviews are different in many ways “overall, this hotel is my first choice at Cox’s Bazar…”
“the facilities are good but the service isn’t so…”
“best in Cox’s Bazar by quality and value”
“…disappointing”18-Jun-15
![Page 7: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/7.jpg)
7
What is Opinion Mining?Computational study of opinions, sentiments and emotions expressed in text.
Detects the contextual polarity of text (positive or, neutral or, negative)
Derives the opinion, or the attitude of an opinion holder.
18-Jun-15
![Page 8: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/8.jpg)
8
Mining opinions…
Its about finding out what people think…
18-Jun-15
![Page 9: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/9.jpg)
9
Topic of our researchMining traveler experiences/reviews expressed on services provided by hotels in Bangladesh
18-Jun-15
![Page 10: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/10.jpg)
10
Our research domain We are mining reviews from the travelling guide, “TripAdvisor” www.tripadvisor.com/Bangladesh
We are starting our work from the district that attracts tourists the most which is Cox’s Bazar
Primarily we are mining reviews of travelers visiting various hotels in this tourist gathering area
18-Jun-15
![Page 11: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/11.jpg)
11
TripAdvisor
2
3
1
18-Jun-15
![Page 12: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/12.jpg)
12
TripAdvisor(Cont.)
1) Searching panel for hotels, situated on different locations/arease.g. Search Cox’s Bazar, Bangladesh, Asia for Long Beach Hotel
2) Panel for booking hotels based on prices
3) Searching Panel for a sorted list of hotels based on particular preferencese.g. types of hotel, most rated hotels etc.
18-Jun-15
![Page 13: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/13.jpg)
13
TripAdvisor(Cont.)
1
2
18-Jun-15
![Page 14: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/14.jpg)
14
TripAdvisor(Cont.)
1) List of hotels sorted by traveler ratings
2) Panel where travelers checks vacancy and books hotels e.g. Check-In Date/Check-Out Date of travelers
18-Jun-15
![Page 15: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/15.jpg)
15
TripAdvisor(Cont.)
1 2
318-Jun-15
![Page 16: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/16.jpg)
16
TripAdvisor(Cont.)
1) Hotel rating in stars(out of 5) based on type of reviews (positive or, negative) given by travelers e.g. Rating: 4.5 out of 5 stars; 95 Reviews
2) Position of the hotel among other hotels in an area depending on positive feedbacks from the travelerse.g. No.1 of 24 hotels in Cox’s Bazar
3) Special services provided by the hotele.g. Free parking, Free breakfast etc.
18-Jun-15
![Page 17: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/17.jpg)
17
Reviewers of TripAdvisor
1 3
52
4
18-Jun-15
![Page 18: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/18.jpg)
18
Reviewers of TripAdvisor(Cont.)
1) Name and residence of the reviewing traveler e.g. MUHassan, Dhaka City, Bangladesh
2) Information about the profile of the reviewer on “TripAdvisor” site e.g. Top Contributor, 50 Reviews, 19 helpful votes etc.
3) Rating provided by the reviewer and the time of the reviewe.g. 5 out of 5 stars, Reviwed 12,June,2015 etc.
4) The review of the traveler in a large opinioned paragraph
5) Voting panel for other users to vote a review if it is helpful or not!18-Jun-15
![Page 19: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/19.jpg)
19
Reviewers of TripAdvisor(Cont.)
1
2
18-Jun-15
![Page 20: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/20.jpg)
20
Reviewers of TripAdvisor(Cont.)
1) All the reviews provided by a certain reviewer on the site
2) Feedback given by a hotel manager upon reviewing of a traveler
18-Jun-15
![Page 21: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/21.jpg)
21
Problem domain of our research
Integrity of the reviews can not be ensured
Due to competition among the hotels high possibility of false reviews
Amount of reviews highly varies which sometimes results in inaccurate ratings of hotels
18-Jun-15
![Page 22: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/22.jpg)
22
Solution to these problems… Only the reviews given by verified travelers
will be prioritized first e.g. Top Contributor, Contributor etc.
For the rating system to work there has to be at least 10 or more reviews
18-Jun-15
![Page 23: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/23.jpg)
23
Goal of our research
Rating the hotels according to reviews based on potential opinion holders
Finding out the hotels which impacts the most upon the traveling tax
Mining each individual reviews then summarizing them according to polarity to get accurate opinion about a hotel
18-Jun-15
![Page 24: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/24.jpg)
24
Works we have done so far… Primarily we have parsed data from popular hotels in Cox’s Bazar
We have mined and stored data of 15 hotels initially
This data will work as our data set to further our research
which contains reviews and reviewers information
We have collected another data set from National Board of Revenue(NBR) about the revenue on traveling tax
18-Jun-15
![Page 25: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/25.jpg)
25
Tools for Parsing data
The language we used for parsing data,-python(2.7.9)
https://www.python.org/
We used a library for extracting data from HTML and XML files,-beautifulsoup (vers. 4)
18-Jun-15
![Page 26: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/26.jpg)
26
Flowchart for parsing data
18-Jun-15
![Page 27: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/27.jpg)
27
Code for parsing data
18-Jun-15
![Page 28: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/28.jpg)
28
Code for parsing data(Cont.)1) ‘head’ : this variable stores headline of the reviews2) ‘date’ : time when the review was given3) ‘member_info’ : name and residence of the reviewer4) ‘member_badge’ : Information about reviewers profile
e.g. Top Contributor, 50 Reviews etc.5) ‘mngr_com’ : Feedbacks on reviews from hotel managers6) ‘review’ : The whole review will be stored in this variable
12
34
56 18-Jun-15
![Page 29: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/29.jpg)
29
Data set created from the parsed data
1
2
3
5
4
18-Jun-15
![Page 30: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/30.jpg)
30
Data set on traveling tax
This is our data set which will be used to calculate the impact of hotels on traveling tax
It contains data on collected revenue from the tourism sectortill fiscal year ’13 - ’14 to ’14 - ’15
18-Jun-15
![Page 31: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/31.jpg)
31
Data set on traveling tax(Cont.)
18-Jun-15
![Page 32: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/32.jpg)
Why Prioritize Opinions?
Opinion prioritizing is a must in order to distinguish mature and professional reviewers opinion from an amateur reviewers opinion.
A professionals point of view and interest will definitely make a difference in case of ranking a hotel.
There will always be some integrity/accuracy issues when it comes to opinions as someone stating a feature as satisfactory while other will state otherwise ( Whose opinion to choose? )
![Page 33: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/33.jpg)
Prioritizing Opinions(Cont.) In TripAdvisor, there is already a priority rating system available for
the reviewers!
1. Reviewer: (3 – 5) reviews 2. Senior Reviewer: (6 – 10) reviews 3. Contributor: (11 – 20) reviews 4. Senior Contributor: (21 – 49) reviews 5. Top Contributor: (50+) reviews
![Page 34: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/34.jpg)
Prioritizing Opinions(Cont.) For the prioritizing purpose, we gave a potential opinion holders value
for each class of reviewers.
1. Reviewer: 22. Senior Reviewer: 4 3. Contributor: 5 4. Senior Contributor: 7 5. Top Contributor: 10
This value indicates which reviewer gets more priority than others.
![Page 35: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/35.jpg)
Opinion Subjectivity
Opinion Subjectivity is categorizing reviews as positive, negative or objective. Methods we are using for calculating opinion subjectivity:
-SentiWordNet -Semi Supervised Learning Approach -Naive Bayes Method
![Page 36: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/36.jpg)
SentiWordNet
It is a lexical resource that is open publically for opinion mining
It scores every word in the database as positive(+) negative(-) or objective/neutral values.
![Page 37: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/37.jpg)
Semi Supervised Learning Approach
In the beginning synsets is labeled manually.i.e SentiWordNet in our case
Labeling will be expanded by machine learning approach. i.e we are using Naive Bayes Method
This process is efficient and helps avoiding labeling errors.
![Page 38: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/38.jpg)
Naive Bayes Method This method is used to predict the values of words that are not in the
database or synsets. The main formula is:
Where,P(c) = Probability of Unlabeled data(c)P(d) = Probability of Labeled data(d)
![Page 39: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/39.jpg)
Final Calculation
The formula we will use for ranking hotel is:
Z =
Where,Z is the deciding value for an individual hotel, which will place it in the appropriate position in the ranks.
![Page 40: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/40.jpg)
For instance: Let’s assume:
Hotel 1: Long Beach Hotel Hotel 2: Resort Labiba BilashPotential Opinion Holders value: 7 Potential Opinion Holders Value: 2Opinion Subjectivity: +15.7 Opinion Subjectivity: +12.3
For, Hotel 1: Z = 7 * (+15.7) For, Hotel 2: Z = 2 * (+12.3)= 109.9 = 24.6
So, Hotel 1 will be ranked above Hotel 2. (assuming there is only one review in each hotels)
![Page 41: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/41.jpg)
41
References [1] B. Liu, “Sentiment Analysis and Subjectivity.” A Chapter in
Handbook of Natural Language Processing, 2nd Edition, 2010. http://www.cs.uic.edu/~liub/
[2] Kavita Ganesan , Hyun Duk kim, “Opinion Mining-A Short Tutorial”.
[3] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, “Thumbs up? sentiment classification using machine learning techniques” 2002.
[4] http://sentiwordnet.isti.cnr.it/
[5] https://en.wikipedia.org/
18-Jun-15
![Page 42: Opinion Mining on Hotel Reviews](https://reader034.vdocument.in/reader034/viewer/2022042605/5870c5431a28ab0b4a8b80b3/html5/thumbnails/42.jpg)
4218-Jun-15