social’mediaand’social’ compung’dmml.asu.edu/cdm/slides/chapter1.pdf ·...
TRANSCRIPT
Social Media and Social Compu0ng
Chapter 1
1 Chapter 1, Community Detec0on and Mining in Social Media. Lei Tang and Huan Liu,
Morgan & Claypool, September, 2010.
Tradi0onal Media
Broadcast Media: One-‐to-‐Many
Communica0on Media: One-‐to-‐One
2
Social Media: Many-‐to-‐Many
Social Media
Social Networking
Blogs
Microblogging
Wiki
Forum
Content
Sharing
3
Various forms of Social Media
• Blog: Wordpress, blogspot, LiveJournal • Forum: Yahoo! Answers, Epinions
• Media Sharing: Flickr, YouTube, Scribd
• Microblogging: TwiSer, FourSquare
• Social Networking: Facebook, LinkedIn, Orkut • Social Bookmarking: Del.icio.us, Diigo
• Wikis: Wikipedia, scholarpedia, AskDrWiki
4
Characteris0cs of Social Media • “Consumers” become “Producers” • Rich User Interac0on • User-‐Generated Contents • Collabora0ve environment
• Collec0ve Wisdom
• Long Tail
Broadcast Media Filter, then Publish
Social Media Publish, then Filter
5
Top 20 Websites at USA 1 Google.com 11 Blogger.com
2 Facebook.com 12 msn.com
3 Yahoo.com 13 Myspace.com
4 YouTube.com 14 Go.com
5 Amazon.com 15 Bing.com
6 Wikipedia.org 16 AOL.com
7 Craigslist.org 17 LinkedIn.com
8 TwiSer.com 18 CNN.com
9 Ebay.com 19 Espn.go.com
10 Live.com 20 Wordpress.com
40% of websites are social media sites 6
7
8
Networks and Representa0on
• Graph Representa0on • Matrix Representa0on
9
Social Network: A social structure made of nodes (individuals or organiza0ons) and edges that connect nodes in various rela0onships like friendship, kinship etc.
Basic Concepts
• A: the adjacency matrix • V: the set of nodes • E: the set of edges • vi: a node vi • e(vi, vj): an edge between node vi and vj • Ni: the neighborhood of node vi • di: the degree of node vi • geodesic: a shortest path between two nodes – geodesic distance
10
Proper0es of Large-‐Scale Networks
• Networks in social media are typically huge, involving millions of actors and connec0ons.
• Large-‐scale networks in real world demonstrate similar paSerns – Scale-‐free distribu0ons – Small-‐world effect
– Strong Community Structure
11
Scale-‐free Distribu0ons
• Degree distribu0on in large-‐scale networks ojen follows a power law.
• A.k.a. long tail distribu0on, scale-‐free distribu0on
12
log-‐log plot
• Power law distribu0on becomes a straight line if plot in a log-‐log scale
13
100
101
102
103
104
105
100
101
102
103
104
105
106
Node Degree
Nu
mb
er
of
No
de
s
100
101
102
103
104
105
100
101
102
103
104
105
106
Node Degree
Num
ber
of N
odes
Friendship Network in Flickr Friendship Network in YouTube
Small-‐World Effect
• “Six Degrees of Separa0on”
• A famous experiment conducted by Travers and Milgram (1969) – Subjects were asked to send a chain leSer to his acquaintance in order
to reach a target person
– The average path length is around 5.5
• Verified on a planetary-‐scale IM network of 180 million users (Leskovec and Horvitz 2008) – The average path length is 6.6
14
Diameter
• Measures used to calibrate the small world effect – Diameter: the longest shortest path in a network – Average shortest path length
15
• The shortest path between two nodes is called geodesic. • The number of hops in the geodesic is the geodesic distance.
• The geodesic distance between node 1 and node 9 is 4. • The diameter of the network is 5, corresponding to the geodesic distance between nodes 2 and 9.
Community Structure
• Community: People in a group interact with each other more frequently than those outside the group
• Friends of a friend are likely to be friends as well • Measured by clustering coefficient:
– density of connec0ons among one’s friends
16
Clustering Coefficient
• d6=4, N6= {4, 5, 7,8} • k6=4 as e(4,5), e(5,7), e(5,8), e(7,8) • C6 = 4/(4*3/2) = 2/3 • Average clustering coefficient
C = (C1 + C2 + … + Cn)/n
• C = 0.61 for the lej network • In a random graph, the expected
coefficient is 14/(9*8/2) = 0.19.
17
Challenges
• Scalability – Social networks are ojen in a scale of millions of nodes and connec0ons
– Tradi0onal Network Analysis ojen deals with at most hundreds of subjects
• Heterogeneity – Various types of en00es and interac0ons are involved
• Evolu0on – Timeliness is emphasized in social media
• Collec0ve Intelligence – How to u0lize wisdom of crowds in forms of tags, wikis, reviews
• Evalua0on – Lack of ground truth, and complete informa0on due to privacy
18
Social Compu0ng Tasks
• Social Compu0ng: a young and vibrant field • Many new challenges
• Tasks – Network Modeling – Centrality Analysis and Influence Modeling – Community Detec0on
– Classifica0on and Recommenda0on – Privacy, Spam and Security
19
Network Modeling • Large Networks demonstrate sta0s0cal paSerns:
– Small-‐world effect (e.g., 6 degrees of separa0on)
– Power-‐law distribu0on (a.k.a. scale-‐free distribu0on) – Community structure (high clustering coefficient)
• Model the network dynamics – Find a mechanism such that the sta0s0cal paSerns observed in large-‐
scale networks can be reproduced.
– Examples: random graph, preferen0al aSachment process, WaSs and Strogatz model
• Used for simula0on to understand network proper0es – Thomas Shelling’s famous simula0on: What could cause the
segrega0on of white and black people
– Network robustness under aSack
Comparing Network Models
observa0ons over various real-‐word large-‐scale networks
outcome of a network model
(Figures borrowed from “Emergence of Scaling in Random Networks”)
Centrality Analysis and Influence Modeling
• Centrality Analysis: – Iden0fy the most important actors or edges – Various criteria
• Influence modeling: – How is informa0on diffused?
– How does one influence each other? • Related Problems
– Viral marke0ng: word-‐of-‐mouth effect
– Influence maximiza0on
22
Community Detec0on
• A community is a set of nodes between which the interac0ons are (rela0vely) frequent – A.k.a., group, cluster, cohesive subgroups, modules
• Applica0ons: Recommenda<on based communi<es, Network Compression, Visualiza<on of a huge network
• New lines of research in social media – Community Detec0on in Heterogeneous Networks – Community Evolu0on in Dynamic Networks – Scalable Community Detec0on in Large-‐Scale Networks
23
Classifica0on and Recommenda0on
• Common in social media applica0ons – Tag sugges0on, Friend/Group Recommenda0on, Targe0ng
24
Link predic0on
Network-‐Based Classifica0on
Privacy, Spam and Security
• Privacy is a big concern in social media
– Facebook, Google buzz ojen appear in debates about privacy
– NetFlix Prize Sequel cancelled due to privacy concern – Simple annoymiza0on does not necessarily protect privacy
• Spam blog (splog), spam comments, Fake iden0ty, etc., all requires new techniques
• As private informa0on is involved, a secure and trustable system is cri0cal
• Need to achieve a balance between sharing and privacy
25
Book Available at • Morgan & claypool Publishers • Amazon
If you have any comments, please feel free to contact:
• Lei Tang, Yahoo! Labs, ltang@yahoo-‐inc.com
• Huan Liu, ASU [email protected]