lecture 4 social web 2017
TRANSCRIPT
Social Web2017
Lecture 4: How do we MINE, ANALYSE & VISUALISE the Social Web?
Davide Ceolin (credits to: Lora Aroyo)The Network Institute
VU University Amsterdam
• 200 billion tweets on Twitter in 2015, by 1.3 billion registered users
• 4.5 billion likes generated on Facebook in 2015, by 1.55 billion different users
• 300 hours of videos uploaded to YouTube every minute
• 60.7 million photos uploaded to flickr per month
The Age of BIG Data
Social Web 2017, Davide Ceolin
Science with BIG Data
Social Web 2017, Davide Ceolin
BIG Data Challenges
Social Web 2017, Davide Ceolin
Big Data vs. Deep Data
• Social Web data often follow a long tail distribution
Social Web 2017, Davide Ceolin
Big Deep
enormous wealth of data = lots of insights• insights in users’ daily lives and
activities• insights in history• insights in politics• insights in communities• insights in trends• insights in businesses & brands
Why?
Social Web 2017, Davide Ceolin
enormous wealth of data = lots of insights• who uploads/talks? (age, gender,
nationality, community, etc.)• what are the trending topics? when?• what else do these users like? on which
platform?• who are the most/least active users?• ..…
Why?
Social Web 2017, Davide Ceolin
Web Source Criticism?Source criticism checklist (https://en.wikipedia.org/wiki/Source_criticism) • Who is the author and what are the qualifications of
the author in regard to the topic that is discussed?• When was the information published?• What is the reputation of the publisher?• Does the source show a particular cultural or
political bias?• Does the source contain a bibliography?• …
How does this apply to Web sources?Social Web 2017, Davide Ceolin
Image: http://www.co.olmsted.mn.us/prl/propertyrecords/RecordingDocuments/PublishingImages/forms.jpg
This doesn’t work
Social Web 2017, Davide Ceolin
How about this?
Social Web 2016, Davide Ceolin
Side note - check this out: http://guessthecorrelation.com
Web of Trust
https://www.mywot.com/en/scorecard/pulse.seattlechildrens.org
Who uses it?
Social Web 2016, Davide Ceolin
PoliticiansGovernmental
institutions
Social Web 2017, Davide Ceolin
Whole society
Social Web 2017, Davide Ceolin
Whole society
repurposing data
danger of second order effect
Social Web 2017, Davide Ceolin
Whole society
Repurposing data
discoveries & correlations
Web-Scale Pharmacovigilance: Listening to Signals from the Crowd, R.W. White et al (2013)
Social Web 2017, Davide Ceolin
Scientists
Bibliometrics
Social Web 2017, Davide Ceolin
CultureHistory
Social Web 2017, Davide Ceolin
CultureHistory
Social Web 2017, Davide Ceolin
Culture
Bill Howe, University of Washington
Social Web 2017, Davide Ceolin
Entertainment
Social Web 2017, Davide Ceolin
You?
Social Web 2017, Davide Ceolin
https://klout.com/#/measure
Companies
Social Web 2017, Davide Ceolin
Social Web 2017, Davide Ceolin
Who does it?
Social Web 2017, Davide Ceolin
The Rise of the Data Scientist
Data Geeks Skills:Statistics & Math
Data mungingVisualisation
Social Web 2017, Davide Ceolin
http://radar.oreilly.com/2010/06/what-is-data-science.html
The Rise of the Data Scientist
Social Web 2017, Davide Ceolin
• Data Science enables the creation of data products
• Data products are applications that acquire their value from the data, and create more data as a result.
• Users are in a feedback loop: they constantly provide information about the products they use, which gets used in the data product.
Data Science
Social Web 2017, Davide Ceolin
Data Science Venn Diagram
Drew Conway
Social Web 2017, Davide Ceolin
Data Science Venn Diagram
Social Web 2017, Davide Ceolin
Social Web 2017, Davide Ceolin
Popular Data Products
Data Science is about building products
not just answering questionsSocial Web 2017, Davide Ceolin
Popular Data Products
empower the others to use the
data
empower the others to their own analysis
Social Web 2017, Davide Ceolin
Popular Data Products
http://www.metacog.com/resources/banner3.jpg
(Inspired by George Tziralis’ FOSS Conf’09, John Elder IV’s Salford Systems Data Mining Conf. and Toon Calders’ slides)
Data mining is the exploration & analysis of large quantities of data
in order to discover valid, novel, potentially useful, & ultimately understandable patterns in data
http://www.freefoto.com/images/33/12/33_12_7---Pebbles_web.jpg
Data Mining 101
Social Web 2017, Davide Ceolin
Databases
Statistics/ Numerical methods
Artificial Intelligen
ce
Data Mining 101
• Data input & exploration
• Preprocessing• Data mining algorithms
• Evaluation & Interpretation
Social Web 2017, Davide Ceolin
• What data do I need to answer question X?
• What variables are in the data?
• Basic stats of my data?
Data Input & Exploration
“LikeMiner”Social Web 2017, Davide Ceolin
• Cleanup! • Choose a suitable data model• What happens if you integrate data from multiple
sources?• Reformat your data
Preprocessing
“LikeMiner”
Social Web 2017, Davide Ceolin
• Classification: Generalising a known structure & apply to new data
• Association: Finding relationships between variables
• Clustering: Discovering groups and structures in data
Data Mining Algorithms
Social Web 2017, Davide Ceolin
• Filter users by interests
• Construct user graphs
• PageRank on graphs to mine representativeness
• Result: set of influential users
• Compare page topics to user interests to find pages most representative for topics
Mining in “LikeMiner”
Social Web 2017, Davide Ceolin
Evaluation & InterpretationWhat does the pattern I found mean?
• Pitfalls: • Meaningless Discoveries• Implication ≠ Causality (Intensive care -> death)• Simpson’s paradox• Data Dredging• Redundancy• No New Information
• Overfitting• Bad Experimental Setup
Social Web 2017, Davide Ceolin
Data Mining is not easy
Social Web 2017, Davide Ceolin
Popular ML – Deep learning
http://www.kdnuggets.com/wp-content/uploads/deep-
learning-small-big-data.jpg
http://scyfer.nl/wp-content/uploads/2014/05/
Deep_Neural_Network.png
Deep learning frameworks
https://code.facebook.com/posts/1687861518126048/facebook-to-open-
source-ai-hardware-design/
Social Web 2017, Davide Ceolin
Data Journalism
Social Web 2017, Davide Ceolin
Social Web 2017, Davide Ceolin
Social Web 2017, Davide Ceolin
source: http://kunau.us/wp-content/uploads/2011/02/Screen-shot-2011-02-09-at-9.03.46-PM-w600-h900.png
Mining Social Web Data
Social Web 2017, Davide Ceolin
Source: http://infosthetics.com/archives/2011/12/all_the_information_facebook_knows_about_you.htmlSee also: http://www.youtube.com/watch?feature=player_embedded&v=kJvAUqs3Ofg
Single Person
Social Web 2017, Davide Ceolin
http://www.brandrants.com/brandrants/obama/
Populations
Social Web 2017, Davide Ceolin
Brand Sentiment via Twitter
http://flowingdata.com/2011/07/25/brand-sentiment-showdown/
Social Web 2017, Davide Ceolin
Sentiment Analysis as Service
Social Web 2017, Davide Ceolinhttp://www.crowdflower.com/type-sentiment-analysis
http://text-processing.com/demo/sentiment/
Social Web 2017, Davide Ceolin
http://www.cs.cornell.edu/home/kleinber/networks-book/networks-book.pdf
Recommended Reading
Social Web 2017, Davide Ceolin
image source: http://www.flickr.com/photos/bionicteaching/1375254387/
Hands-on Teaser
• Build your own recommender system 101• Recommend pages on del.icio.us • Recommend pages to your Facebook friends
Social Web 2017, Davide Ceolin