analyzing real time news
TRANSCRIPT
CREARELANOTIZIA
This project has been realized during the 2015-2016 master “Business Intelligence and Big Data Analytics” at Università di Milano - Bicocca CONTEXT
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
CREARELANOTIZIA
BIGDATA Quali son le tecnologie e le potenzialità dei Big Data
Twitter as an example of new media and realtime news sharing TWITTER
#RateMe
TIMELINE
NEWSLIFECYCLE How news spreads on Twitter and other new-media
NewsTweet
TweetTweet
Tweet
TweetTweetTweet
Tweet
#RateMe
TIMELINE
NEWSLIFECYCLE How news spreads on Twitter and other new-media
NewsTweet
TweetTweet
Tweet
TweetTweetTweet
Tweet
Tweet
Tweet Tweet Tweet
Tweet
TweetTweet
Tweet
#RateMe
TIMELINE
NEWSLIFECYCLE How news spreads on Twitter and other new-media
News
TweetTweet
Tweet
TweetTweetTweet
Tweet
Tweet
Tweet Tweet Tweet
TweetTweet
Tweet
Tweet
Tweet Tweet Tweet
Tweet
TweetTweet
Tweet
Tweet
#RateMe
TIMELINE
NEWSLIFECYCLE How news spreads on Twitter and other new-media
Tweet
TweetTweet
Tweet
TweetTweetTweet
Tweet
Tweet
Tweet Tweet Tweet
Tweet
TweetTweet
Tweet
Tweet
Tweet
Tweet Tweet Tweet
TweetTweet
Tweet
Tweet Tweet
News
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
Twitter is an easy way to create and share news and opinions. It’s a new flow of content and information associated with huge opportunities.
With the collected data it’s possible to conduct statystical analysis that allow us to extrapolate quantitative and qualitative indicators in order to identify trends, correlations, flows, sentiment,….
CREATE
ANALYZE
FOLLOWFollow the news evolution during the time by analyzing and contextualyizing it in the reality and comparing the externals events that can contribute to generete and modify the news itself.
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
ARCHITECTURE Main Components
#RateMe
BA
TCH
LA
YER
SP
EED
LA
YER
DA
TA
SOU
RC
ES
Machine Learning
PRESENTATION LAYER
CREARELANOTIZIAARCHITECTURE The Lambda Architecture
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati. Big Data Ecosystem
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
SENTIMENTANALYSIS
From the text of the Tweets it’s possible to compute a measure relative to the sentiment associated with it. In this project we have built two different models.
BIG DATA BACKEND
BIG DATA FRONTEND
CLUSTER THEN
PREDICT
BIG DATA BACKEND
DICTIONARY ALGORITM
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
SENTIMENTANALYSIS
This model concept is to split a Tweet into tokens composed by the single words, and then associate a score to each word by looking in a dictionary table containing positive and negative words and a numerical score.
BIG DATA BACKEND
BIG DATA BACKEND
DICTIONARY ALGORITM
#RateMe
L'IT riesce a conseguire una sostanziale riduzione dei costi operativi attraverso la modernizzazione delle proprie Data Architecture. L'innovazione include l'implementazione di Active Archive per i cold data, l’offloading di processi ETL e l'enrichment dei dati.
SENTIMENTANALYSIS
This model is based upon clustering Tweets with similar words and then applying a Random Forest algorithm on each cluster
“Improved Twitter Sentiment prediction through Cluster then Predict Model” International Journal of Computer Science and Network, August 2015
BIG DATA FRONTEND
CLUSTER THEN
PREDICT
#RateMe
CREARELANOTIZIACONCLUSIONS
• The «Lambda Architecture» seems a good approach thanks to the tradeoff between the need of RealTime Analysis and Batch computations
• The Big Data Ecosystem is composed by etherogeneous technologies and each of them solve just a part of the whole problem
• Many technlogies are easily interoperable and composable
• There are many first mover in the Big Data market but also consolidated ones that are nowdays a must have in a Big Data Architecture
Big Data Ecosystem - Architecture
#RateMe
CREARELANOTIZIA
BIGDATA
CONCLUSIONS
• The most twitted technlogies are not always the ones that has the largest market share
• It seems there’s no correlation between real Big Data Events and tweets volumes
• In this case study the sentiment analysis made with the cluster then predict model is worse than the one made
with the dictionary algorithm
• The dictionary algorithm approach is very susceptible to the usage of a good dictionary with a lot of words.
With the dictionary we used only 42% tweets were scored
• The analysis between the senders and the mentioned users underlyned that there are many influencers who
are actually closely connected to the technologies or even the official accounts of that technlogy
• 45% of the tweets were sent by official apps from Web platform, Android and IOS
Big Data Ecosystem – Data Analysis
#RateMe
Tweet to @masterbibda
Reference the keyword by using an hashtag #datascientistprofiles
Vote alto – medio - basso
Example #RateMe
#RateMe
CREIAMOLANOTIZIA
and…
Feel free to Tweet your toughts @masterbibda!
Every Tweet will be analyzed!
#RateMe
#RateMe
Tweet
TweetTweet
Tweet
TweetTweetTweet
Tweet
Tweet
Tweet Tweet Tweet
Tweet
TweetTweet
Tweet
Tweet
Tweet
Tweet Tweet Tweet
TweetTweet
Tweet
Tweet Tweet
News
Enjoy #RateMe
#RateMe