cos326 big data analytics lecture19
TRANSCRIPT
-
7/23/2019 COS326 Big Data Analytics Lecture19
1/19
COS 326Database Systems
Lecture 19
Big Data and
Big Data Analytics (2)Notes
14 October 2015
-
7/23/2019 COS326 Big Data Analytics Lecture19
2/19
Admin matters: next 3 weeks
2
Week Date Day Topic
10 13 Oct TuesL18: Big Data Analytics
Presentation for topic 15
14 Oct WedL19: Big Data Analytics
Presentation for topic 16
16 Oct Fri No prac
11 20 Oct Tues L20: Guest lecture: SAP
21 Oct WedL21: Data analytics: Data mining
Presentation for topic 1723 Oct Fri
No prac
project day for Computer Science
12 27 Oct TueL22: Class test 3: data analytics
Presentation for topic 18
28 Oct Wed
L23: Data analytics: Data mining
Presentation for topics 19, 20
-
7/23/2019 COS326 Big Data Analytics Lecture19
3/19
3
Outline
Last lecture:1. Technologies supporting Big data storage & analytics
MapReduce computation framework
NoSQL big database management systems (BDMSes)
NewSQL big database management systems (BDMSes)
2. What types of analytics for big data?
This lecture:Case study:
Analysis of microblogs data: Twitter
sentiment analysis of microblogs Pearson Education Limited 1995, 2005
-
7/23/2019 COS326 Big Data Analytics Lecture19
4/19
RECAP: on Big Data
Sources of Big Data:Web-generated structured & unstructured data e.g.
e-commerce purchasing histories
social media: Face Book, Twitter, LinkedIn,YouTube etc.
Some processing activities for big data:
(1) descriptive analytics
(2) predictive analytics
e.g. sentiment analysis for microblogs (e.g. Twitter)
4
-
7/23/2019 COS326 Big Data Analytics Lecture19
5/19
5
Case study: Analytics for Twitter
Twitter : http://www.twitter.com
1. Why do people tweet?
Notable users of Twitter:
Pope Francis: 78.4 million followers
Barak Obama: 640 thousand followers
2. Format of a tweet: max 140 characters, possible inclusion of
emoticons: smiley (:-) sad face (:-( to express sentiment
4. Value of tweets to businesses:used by market researchers in business organisations
(a) what are customers saying about our products & services?
(b) what are customers saying about our competitors products &
services? Pearson Education Limited 1995, 2005
http://www.twitter.com/http://www.twitter.com/ -
7/23/2019 COS326 Big Data Analytics Lecture19
6/19
Twitter statistics
Twitter was launched in 2006 Twitter statistics (source Twitter, April 2010):
106 million registered users
180 million unique visitors every month
300,000 new users signing up every day.
600 million queries received daily via Twitters search engine
3 billion requests per day based on the Twitter API.
37% of active users used mobile phones to send requests.
approx. 200 million tweets per day (big data)
More recently:
the number of regular Twitter users has been estimated at more
than 200 million.6
-
7/23/2019 COS326 Big Data Analytics Lecture19
7/19
Twitter adoption in SA
Adoption in South Africa:
Businesses governments non-government
organisations have a Twitter & Facebook presence.
Adoption statistics for 2014
(source: Fuseware and World Wide Worx , 2014)
9.4 million active users of Facebook
5.5 million users of Twitter in South Africa.
93% of RSA major brands use Facebook
and 79% use Twitter.
7
-
7/23/2019 COS326 Big Data Analytics Lecture19
8/19
Twitter analytics
Two approaches to analysis:
(1) Online analytics:
(i) Subscribe to a service for social media data analytics
(ii) use service to obtain analysis reports & Twitter data
(2) Offline analytics:
(i) register with Twitter(ii) use Twitter APIs to obtain data & store it in a DB
e.g. NoSQL DB
(iii) conduct analysis on the data8
-
7/23/2019 COS326 Big Data Analytics Lecture19
9/19
Online analytics: Twitter data (1)
(i) Subscribe to a service for social media data analytics
(ii) use service to obtain analysis reports
9
Service name
( and purpose)
URL &
Examples of services provided / report types
Sentiment140
(sentiment
analysis)
URL: http://www.sentiment140.com
Performs sentiment analysis on the tweets returned for a query
supplied by the user. (for free)
Twitonomy
(get overall
view of
Twitter account)
URL: http://www.twitonony.com
Analyse a Twitter account. Provides the following for free:
1. number of: tweets per day, mentions, retweets, favoritedtweets (for a given period)
2. Charts showing tweet frequencies by day of the week
and time of day
3. platforms most tweeted from
(e.g. Twitter for iPhone, Twitter web client)
http://www.sentiment140.com/http://www.twitonony.com/http://www.twitonony.com/http://www.sentiment140.com/ -
7/23/2019 COS326 Big Data Analytics Lecture19
10/19
4.2 Analysis of social network data: Twitter (2)
Online tools for analysis of Twitter data
Sentiment140: http://www.sentiment140/
Performs sentiment analysis on the tweets returned fora query supplied by the user. (for free) e.g.
10
available languages
http://www.sentiment140/http://www.sentiment140/ -
7/23/2019 COS326 Big Data Analytics Lecture19
11/19
4.2 Analysis of Twitter data (3)Twitonomy URL: http://www.twitonomy.com
Analyse a Twitter account. Provides the following for free:
1. number of:tweets per day, mentions,
retweets, favourited tweets
ORSSA 2015 presentation 15
September 201511
http://www.twitonomy.com/http://www.twitonomy.com/ -
7/23/2019 COS326 Big Data Analytics Lecture19
12/19
4.2 Analysis of social network data: Twitter (4)
Twitonomy :Analyse a Twitter account. Provides the followingfor free:
2. Charts showing tweet frequencies by day of the week and time of day
ORSSA 2015 presentation 15
September 201512
-
7/23/2019 COS326 Big Data Analytics Lecture19
13/19
4.2 Analysis of social network data: Twitter (5)Twitonomy: Analyse a Twitter account. Provides the following
for free:
3. platforms most tweeted from (e.g. Twitter for iPhone,
Twitter web client)
ORSSA 2015 presentation 15
September 201513
Can download
tweets in
MS Excel format
for further
analysis
-
7/23/2019 COS326 Big Data Analytics Lecture19
14/19
14
Offline analysis of Twitter data
Twitter: http://www.twitter.com
(2) Offline analytics:
(i) register with Twitter
(ii) use Twitter APIs to obtain data & store it in a DB
e.g. NoSQL DB(iii) conduct analysis on the data
e.g. of analysis
a. descriptives
b. sentiment analysis
c. graph mining, e.g. for community discovery
Pearson Education Limited 1995, 2005
http://www.twitter.com/http://www.twitter.com/http://www.twitter.com/ -
7/23/2019 COS326 Big Data Analytics Lecture19
15/19
15
Twitter: Facilities available for developers
https://dev.twitter.com/overview/documentation
Twitter APIs:
(1) REST APIs
provide programmatic access to read & write Twitter data
responses available in JSON
identifies Twitter applications & users using OAuth
(2) Streaming APIs
continuously deliver new responses to REST API queries over
long-lived http connection receive updates on latest tweets matching a search query
OAuth: applications send secure authorised requests Twitter APIs
application must registered before it can access to Twitter APIs Pearson Education Limited 1995, 2005
https://dev.twitter.com/overview/documentationhttps://dev.twitter.com/overview/documentation -
7/23/2019 COS326 Big Data Analytics Lecture19
16/19
16
Sentiment Analysis for microblogs
Sentiment Analysis (defined):
Given a tweet on a topic of interest (e.g. to a market researcher): determine if the sentiment(opinion) of the tweet is:
positive,
negative, or
neutral.
the effect of one tweet may be small but the effect of many is
significant
Analysis methods: Use text mining methods to create predictive (classification) model
to classify tweets as (+ve, -ve, neutral) sentiment
Traditionallytext mining has been used for document
classification Pearson Education Limited 1995, 2005
-
7/23/2019 COS326 Big Data Analytics Lecture19
17/19
17
Sentiment Analysis for microblogs
Using a predictive model to classify tweets
Pearson Education Limited 1995, 2005
tweets
to be
classified
+ve
sentiment
tweets
neutral
sentiment
tweets
-ve
sentiment
tweets
Predictive
( classification)
model
-
7/23/2019 COS326 Big Data Analytics Lecture19
18/19
Essay presentation
Topic topic 16
18
-
7/23/2019 COS326 Big Data Analytics Lecture19
19/19
References1. IBM Global Business Services (2012) Analytics: the real-world use of big data
how innovative enterprises extract value from uncertain data, IBM Institutefor Business value.
2. Moniruzzaman, A.B.M. & Hossain, S.A. (2013) NoSQL database: new era of
databases for big data analyticsclassification, characteristics and
comparison. International Journal of Database Theory and Application, vol. 6,
no. 4, 2013.
3. Wakade, S., Shekar, C., Liszka, K. J. and Chan, C.-C., 2012, Text Mining for
Sentiment Analysis of Twitter Data, International Conference on Information
and Knowledge Engineering, (IKE'12), pp. 109-114.
19