social networks at scale

21
Social Networks @ Scale Eoin Hurrell, PhD Data Lead, Cohort @eoinhurrell

Upload: eoin-hurrell

Post on 08-Jan-2017

159 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Social Networks at Scale

Social Networks @ Scale

Eoin Hurrell, PhD Data Lead, Cohort

@eoinhurrell

Page 2: Social Networks at Scale

Cohort as a use-case

Page 3: Social Networks at Scale

Social Networks

👤

👤

👤

👤

👤

👤

👤

👤

Page 4: Social Networks at Scale

Social Network Analysis

📚 ☕📜

• Provides many tools to solve problems • Consider your problem before you consider your tools! • SNA has a long history in sociology

What do you want to know?

Page 5: Social Networks at Scale

Let's talk Graphs

Page 6: Social Networks at Scale

Let's talk Graphs

source: http://www.nltk.org/book_1ed/ch04.html Fig 4.16

Page 7: Social Networks at Scale

Social Networks as Big Data

Page 8: Social Networks at Scale

Options for Getting Networks

• Start a new social network from scratch

• Ahead-of-time scrape a bunch of data from target social networks.

Page 9: Social Networks at Scale

Options for Examining Networks• networkx

• Graph database like Neo4j:

• pandas, dask, standard PyData tools are not focused on networks or cause issues with production service issues

Page 10: Social Networks at Scale

Cohort as a use-case

## Cohort as a use-case - We want to understand friend of a friend relationships, and the knowledge of people in them, so any existing data is

important to us. Python is excellent because sklearn, networkx and other data science libraries exist. It also allows for Spark and Kafka usage as we scale.

We need to get existing data from social networks and be able to process large amounts of data intelligently

Over half a billion relationships, 72+ million people

Page 11: Social Networks at Scale

Streaming architecture

👤` ` `

👤👤 👤👤

Single Source of Truth

www.kappa-architecture.com

🤖

=

Realised Views

Page 12: Social Networks at Scale

Streaming architecture

👤` ` `

👤👤 👤👤

www.kappa-architecture.com

🤖🤖🤖🤖

Page 13: Social Networks at Scale

In production

Page 14: Social Networks at Scale

Batch calculation

👤

👤

👤

👤

👤

👤

👤

👤

👤

Community detection

Page 15: Social Networks at Scale

Batch calculation

👤

👤

👤

👤

👤

👤

👤

👤

👤

Popularity models (e.g. PageRank)

Page 16: Social Networks at Scale

Handling Batch calculation

One Trillion Edges: Graph Processing at Facebook-ScaleVLDB '15, A Ching et al.

Page 17: Social Networks at Scale

How to handle messages like Twitter

SELECT * FROM posts WHERE user_id IN :friend_list ORDER BY timestamp DESC LIMIT 100;

This does not scale 💀

Page 18: Social Networks at Scale

How to handle messages like Twitter

Redis

👤:1

✉✉✉✉✉✉

✉✉✉

✉✉✉✉✉✉✉✉

✉✉✉✉

✉✉✉✉✉✉✉

📨

📨

📨

📨

posts a new

Single Source of Truth

📨

Page 19: Social Networks at Scale

How to handle messages like Twitter

SELECT * FROM posts WHERE id IN :timeline_ids

This scales 😻

Page 20: Social Networks at Scale

Conclusion

• Networks are dense but useful data • Scalable data science depends on usage, not just

traditional form • Python is useful and powerful at every level of this

stack

Page 21: Social Networks at Scale

Thank You!

🔬

Cohort helps you find what you need through the people you know and trustcohort.is

Eoin Hurrell, PhD Data Lead, Cohort

@eoinhurrell