lecture 4 social web 2017

56
Social Web 2017 Lecture 4: How do we MINE, ANALYSE & VISUALISE the Social Web? Davide Ceolin (credits to: Lora Aroyo) The Network Institute VU University Amsterdam

Upload: davide-ceolin

Post on 03-Mar-2017

335 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Lecture 4 Social Web 2017

Social Web2017

Lecture 4: How do we MINE, ANALYSE & VISUALISE the Social Web?

Davide Ceolin (credits to: Lora Aroyo)The Network Institute

VU University Amsterdam

Page 2: Lecture 4 Social Web 2017

• 200 billion tweets on Twitter in 2015, by 1.3 billion registered users

• 4.5 billion likes generated on Facebook in 2015, by 1.55 billion different users

• 300 hours of videos uploaded to YouTube every minute

• 60.7 million photos uploaded to flickr per month

The Age of BIG Data

Social Web 2017, Davide Ceolin

Page 3: Lecture 4 Social Web 2017

Science with BIG Data

Social Web 2017, Davide Ceolin

Page 4: Lecture 4 Social Web 2017

BIG Data Challenges

Social Web 2017, Davide Ceolin

Page 5: Lecture 4 Social Web 2017

Big Data vs. Deep Data

• Social Web data often follow a long tail distribution

Social Web 2017, Davide Ceolin

Big Deep

Page 6: Lecture 4 Social Web 2017

enormous wealth of data = lots of insights• insights in users’ daily lives and

activities• insights in history• insights in politics• insights in communities• insights in trends• insights in businesses & brands

Why?

Social Web 2017, Davide Ceolin

Page 7: Lecture 4 Social Web 2017

enormous wealth of data = lots of insights• who uploads/talks? (age, gender,

nationality, community, etc.)• what are the trending topics? when?• what else do these users like? on which

platform?• who are the most/least active users?• ..…

Why?

Social Web 2017, Davide Ceolin

Page 8: Lecture 4 Social Web 2017

Web Source Criticism?Source criticism checklist (https://en.wikipedia.org/wiki/Source_criticism) • Who is the author and what are the qualifications of

the author in regard to the topic that is discussed?• When was the information published?• What is the reputation of the publisher?• Does the source show a particular cultural or

political bias?• Does the source contain a bibliography?• …

How does this apply to Web sources?Social Web 2017, Davide Ceolin

Page 10: Lecture 4 Social Web 2017

How about this?

Social Web 2016, Davide Ceolin

Side note - check this out: http://guessthecorrelation.com

Page 11: Lecture 4 Social Web 2017

Web of Trust

https://www.mywot.com/en/scorecard/pulse.seattlechildrens.org

Page 12: Lecture 4 Social Web 2017

Who uses it?

Social Web 2016, Davide Ceolin

Page 13: Lecture 4 Social Web 2017

PoliticiansGovernmental

institutions

Social Web 2017, Davide Ceolin

Page 14: Lecture 4 Social Web 2017

Whole society

Social Web 2017, Davide Ceolin

Page 15: Lecture 4 Social Web 2017

Whole society

repurposing data

danger of second order effect

Social Web 2017, Davide Ceolin

Page 16: Lecture 4 Social Web 2017

Whole society

Repurposing data

discoveries & correlations

Web-Scale Pharmacovigilance: Listening to Signals from the Crowd, R.W. White et al (2013)

Social Web 2017, Davide Ceolin

Page 17: Lecture 4 Social Web 2017

Scientists

Bibliometrics

Social Web 2017, Davide Ceolin

Page 18: Lecture 4 Social Web 2017

CultureHistory

Social Web 2017, Davide Ceolin

Page 19: Lecture 4 Social Web 2017

CultureHistory

Social Web 2017, Davide Ceolin

Page 20: Lecture 4 Social Web 2017

Culture

Bill Howe, University of Washington

Social Web 2017, Davide Ceolin

Page 21: Lecture 4 Social Web 2017

Entertainment

Social Web 2017, Davide Ceolin

Page 22: Lecture 4 Social Web 2017

You?

Social Web 2017, Davide Ceolin

https://klout.com/#/measure

Page 23: Lecture 4 Social Web 2017

Companies

Social Web 2017, Davide Ceolin

Page 24: Lecture 4 Social Web 2017

Social Web 2017, Davide Ceolin

Page 25: Lecture 4 Social Web 2017

Who does it?

Social Web 2017, Davide Ceolin

Page 26: Lecture 4 Social Web 2017

The Rise of the Data Scientist

Data Geeks Skills:Statistics & Math

Data mungingVisualisation

Social Web 2017, Davide Ceolin

Page 27: Lecture 4 Social Web 2017

http://radar.oreilly.com/2010/06/what-is-data-science.html

The Rise of the Data Scientist

Social Web 2017, Davide Ceolin

Page 28: Lecture 4 Social Web 2017

• Data Science enables the creation of data products

• Data products are applications that acquire their value from the data, and create more data as a result.

• Users are in a feedback loop: they constantly provide information about the products they use, which gets used in the data product.

Data Science

Social Web 2017, Davide Ceolin

Page 29: Lecture 4 Social Web 2017

Data Science Venn Diagram

Drew Conway

Social Web 2017, Davide Ceolin

Page 30: Lecture 4 Social Web 2017

Data Science Venn Diagram

Social Web 2017, Davide Ceolin

Page 31: Lecture 4 Social Web 2017

Social Web 2017, Davide Ceolin

Page 32: Lecture 4 Social Web 2017

Popular Data Products

Data Science is about building products

not just answering questionsSocial Web 2017, Davide Ceolin

Page 33: Lecture 4 Social Web 2017

Popular Data Products

empower the others to use the

data

empower the others to their own analysis

Social Web 2017, Davide Ceolin

Page 34: Lecture 4 Social Web 2017

Popular Data Products

http://www.metacog.com/resources/banner3.jpg

Page 35: Lecture 4 Social Web 2017

(Inspired by George Tziralis’ FOSS Conf’09, John Elder IV’s Salford Systems Data Mining Conf. and Toon Calders’ slides)

Data mining is the exploration & analysis of large quantities of data

in order to discover valid, novel, potentially useful, & ultimately understandable patterns in data

http://www.freefoto.com/images/33/12/33_12_7---Pebbles_web.jpg

Data Mining 101

Social Web 2017, Davide Ceolin

Page 36: Lecture 4 Social Web 2017

Databases

Statistics/ Numerical methods

Artificial Intelligen

ce

Data Mining 101

• Data input & exploration

• Preprocessing• Data mining algorithms

• Evaluation & Interpretation

Social Web 2017, Davide Ceolin

Page 37: Lecture 4 Social Web 2017

• What data do I need to answer question X?

• What variables are in the data?

• Basic stats of my data?

Data Input & Exploration

“LikeMiner”Social Web 2017, Davide Ceolin

Page 38: Lecture 4 Social Web 2017

• Cleanup! • Choose a suitable data model• What happens if you integrate data from multiple

sources?• Reformat your data

Preprocessing

“LikeMiner”

Social Web 2017, Davide Ceolin

Page 39: Lecture 4 Social Web 2017

• Classification: Generalising a known structure & apply to new data

• Association: Finding relationships between variables

• Clustering: Discovering groups and structures in data

Data Mining Algorithms

Social Web 2017, Davide Ceolin

Page 40: Lecture 4 Social Web 2017

• Filter users by interests

• Construct user graphs

• PageRank on graphs to mine representativeness

• Result: set of influential users

• Compare page topics to user interests to find pages most representative for topics

Mining in “LikeMiner”

Social Web 2017, Davide Ceolin

Page 41: Lecture 4 Social Web 2017

Evaluation & InterpretationWhat does the pattern I found mean?

• Pitfalls: • Meaningless Discoveries• Implication ≠ Causality (Intensive care -> death)• Simpson’s paradox• Data Dredging• Redundancy• No New Information

• Overfitting• Bad Experimental Setup

Social Web 2017, Davide Ceolin

Page 42: Lecture 4 Social Web 2017

Data Mining is not easy

Social Web 2017, Davide Ceolin

Page 43: Lecture 4 Social Web 2017

Popular ML – Deep learning

http://www.kdnuggets.com/wp-content/uploads/deep-

learning-small-big-data.jpg

http://scyfer.nl/wp-content/uploads/2014/05/

Deep_Neural_Network.png

Page 44: Lecture 4 Social Web 2017

Deep learning frameworks

https://code.facebook.com/posts/1687861518126048/facebook-to-open-

source-ai-hardware-design/

Page 45: Lecture 4 Social Web 2017

Social Web 2017, Davide Ceolin

Page 46: Lecture 4 Social Web 2017

Data Journalism

Social Web 2017, Davide Ceolin

Page 47: Lecture 4 Social Web 2017

Social Web 2017, Davide Ceolin

Page 48: Lecture 4 Social Web 2017

Social Web 2017, Davide Ceolin

Page 50: Lecture 4 Social Web 2017

Source: http://infosthetics.com/archives/2011/12/all_the_information_facebook_knows_about_you.htmlSee also: http://www.youtube.com/watch?feature=player_embedded&v=kJvAUqs3Ofg

Single Person

Social Web 2017, Davide Ceolin

Page 51: Lecture 4 Social Web 2017

http://www.brandrants.com/brandrants/obama/

Populations

Social Web 2017, Davide Ceolin

Page 52: Lecture 4 Social Web 2017

Brand Sentiment via Twitter

http://flowingdata.com/2011/07/25/brand-sentiment-showdown/

Social Web 2017, Davide Ceolin

Page 53: Lecture 4 Social Web 2017

Sentiment Analysis as Service

Social Web 2017, Davide Ceolinhttp://www.crowdflower.com/type-sentiment-analysis

Page 54: Lecture 4 Social Web 2017

http://text-processing.com/demo/sentiment/

Social Web 2017, Davide Ceolin

Page 55: Lecture 4 Social Web 2017

http://www.cs.cornell.edu/home/kleinber/networks-book/networks-book.pdf

Recommended Reading

Social Web 2017, Davide Ceolin

Page 56: Lecture 4 Social Web 2017

image source: http://www.flickr.com/photos/bionicteaching/1375254387/

Hands-on Teaser

• Build your own recommender system 101• Recommend pages on del.icio.us • Recommend pages to your Facebook friends

Social Web 2017, Davide Ceolin