social’mediaand’social’ compung’dmml.asu.edu/cdm/slides/chapter1.pdf ·...

26
Social Media and Social Compu0ng Chapter 1 1 Chapter 1, Community Detec0on and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.

Upload: others

Post on 11-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Social  Media  and  Social  Compu0ng  

Chapter  1  

1  Chapter  1,  Community  Detec0on  and  Mining  in  Social  Media.    Lei  Tang  and  Huan  Liu,  

Morgan  &  Claypool,  September,  2010.    

Page 2: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Tradi0onal  Media  

Broadcast  Media:  One-­‐to-­‐Many  

Communica0on  Media:  One-­‐to-­‐One  

2  

Page 3: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Social  Media:  Many-­‐to-­‐Many  

Social  Media  

Social  Networking  

Blogs  

Microblogging  

Wiki  

Forum  

Content  

Sharing  

3  

Page 4: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Various  forms  of  Social  Media  

•  Blog:  Wordpress,  blogspot,  LiveJournal  •  Forum:  Yahoo!  Answers,    Epinions    

•  Media  Sharing:  Flickr,  YouTube,  Scribd  

•  Microblogging:  TwiSer,  FourSquare  

•  Social  Networking:  Facebook,  LinkedIn,  Orkut  •  Social  Bookmarking:  Del.icio.us,  Diigo  

•  Wikis:  Wikipedia,  scholarpedia,  AskDrWiki  

4  

Page 5: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Characteris0cs  of  Social  Media  •  “Consumers”  become  “Producers”  •  Rich  User  Interac0on  •  User-­‐Generated  Contents  •  Collabora0ve  environment  

•  Collec0ve  Wisdom  

•  Long  Tail  

Broadcast  Media  Filter,  then  Publish  

Social  Media  Publish,  then  Filter  

5  

Page 6: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Top  20  Websites  at  USA  1   Google.com   11   Blogger.com  

2   Facebook.com   12   msn.com  

3   Yahoo.com   13   Myspace.com  

4   YouTube.com   14   Go.com  

5   Amazon.com   15   Bing.com  

6   Wikipedia.org   16   AOL.com  

7   Craigslist.org   17   LinkedIn.com  

8   TwiSer.com   18   CNN.com  

9   Ebay.com   19   Espn.go.com  

10   Live.com   20   Wordpress.com  

40%  of  websites  are  social  media  sites   6  

Page 7: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

7  

Page 8: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

8  

Page 9: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Networks  and  Representa0on  

•  Graph  Representa0on   •  Matrix  Representa0on  

9  

Social  Network:    A  social  structure  made  of  nodes  (individuals  or  organiza0ons)  and  edges  that  connect  nodes  in  various    rela0onships  like  friendship,  kinship  etc.    

Page 10: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Basic  Concepts  

•  A:  the  adjacency  matrix  •  V:  the  set  of  nodes  •  E:  the  set  of  edges  •  vi:  a  node  vi  •  e(vi,  vj):  an  edge  between  node  vi  and  vj  •  Ni:  the  neighborhood  of  node  vi  •  di:  the  degree  of  node  vi  •  geodesic:  a  shortest  path  between  two  nodes  –  geodesic  distance  

10  

Page 11: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Proper0es  of  Large-­‐Scale  Networks  

•  Networks  in  social  media  are  typically  huge,  involving  millions  of  actors  and  connec0ons.  

•  Large-­‐scale  networks  in  real  world  demonstrate  similar  paSerns  –  Scale-­‐free  distribu0ons  –  Small-­‐world  effect  

–  Strong  Community  Structure  

11  

Page 12: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Scale-­‐free  Distribu0ons  

•  Degree  distribu0on  in  large-­‐scale  networks  ojen  follows  a  power  law.    

•  A.k.a.  long  tail  distribu0on,  scale-­‐free  distribu0on  

12  

Page 13: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

log-­‐log  plot  

•  Power  law  distribu0on  becomes  a  straight  line  if  plot  in  a  log-­‐log  scale  

13  

100

101

102

103

104

105

100

101

102

103

104

105

106

Node Degree

Nu

mb

er

of

No

de

s

100

101

102

103

104

105

100

101

102

103

104

105

106

Node Degree

Num

ber

of N

odes

Friendship  Network  in  Flickr   Friendship  Network  in  YouTube  

Page 14: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Small-­‐World  Effect  

•  “Six  Degrees  of  Separa0on”  

•  A  famous  experiment  conducted  by  Travers  and  Milgram  (1969)  –  Subjects  were  asked  to  send  a  chain  leSer  to  his  acquaintance  in  order  

to  reach  a  target  person    

–  The  average  path  length  is  around  5.5  

•  Verified  on  a  planetary-­‐scale  IM  network  of  180  million  users  (Leskovec  and  Horvitz  2008)    –  The  average  path  length  is  6.6  

14  

Page 15: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Diameter  

•  Measures  used  to  calibrate  the  small  world  effect  –  Diameter:  the  longest  shortest  path  in  a  network  –  Average  shortest  path  length  

15  

•     The  shortest  path  between  two  nodes  is  called  geodesic.    •     The  number  of  hops  in  the  geodesic  is  the  geodesic  distance.    

•     The  geodesic  distance  between  node  1  and  node  9  is  4.  •     The  diameter  of  the  network  is  5,  corresponding  to  the  geodesic  distance  between  nodes  2  and  9.    

Page 16: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Community  Structure  

•  Community:  People  in  a  group  interact  with  each  other  more  frequently  than  those  outside  the  group  

•  Friends  of  a  friend  are  likely  to  be  friends  as  well  •  Measured  by  clustering  coefficient:    

–  density  of  connec0ons  among  one’s  friends  

16  

Page 17: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Clustering  Coefficient  

•  d6=4,  N6=  {4,  5,  7,8}  •  k6=4  as  e(4,5),  e(5,7),  e(5,8),  e(7,8)  •  C6  =  4/(4*3/2)  =  2/3  •  Average  clustering  coefficient  

C  =  (C1  +  C2  +  …  +  Cn)/n  

•  C  =  0.61  for  the  lej  network  •  In  a  random  graph,  the  expected  

coefficient  is    14/(9*8/2)  =  0.19.  

17  

Page 18: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Challenges  

•  Scalability  –  Social  networks  are  ojen  in  a  scale  of  millions  of  nodes  and  connec0ons  

–  Tradi0onal  Network  Analysis  ojen  deals  with  at  most  hundreds  of  subjects    

•  Heterogeneity  –  Various  types  of  en00es  and  interac0ons  are  involved  

•  Evolu0on  –  Timeliness  is  emphasized  in  social  media  

•  Collec0ve  Intelligence  –  How  to  u0lize  wisdom  of  crowds  in  forms  of  tags,  wikis,  reviews  

•  Evalua0on  –  Lack  of  ground  truth,  and  complete  informa0on  due  to  privacy  

18  

Page 19: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Social  Compu0ng  Tasks  

•  Social  Compu0ng:  a  young  and  vibrant  field  •  Many  new  challenges  

•  Tasks  – Network  Modeling  – Centrality  Analysis  and  Influence  Modeling  – Community  Detec0on  

– Classifica0on  and  Recommenda0on  – Privacy,  Spam  and  Security  

19  

Page 20: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Network  Modeling  •  Large  Networks  demonstrate  sta0s0cal  paSerns:  

–  Small-­‐world  effect    (e.g.,  6  degrees  of  separa0on)  

–  Power-­‐law  distribu0on  (a.k.a.  scale-­‐free  distribu0on)  –  Community  structure  (high  clustering  coefficient)  

•  Model  the  network  dynamics  –  Find  a  mechanism  such  that  the  sta0s0cal  paSerns  observed  in  large-­‐

scale  networks  can  be  reproduced.  

–  Examples:  random  graph,  preferen0al  aSachment  process,  WaSs  and  Strogatz  model  

•  Used  for  simula0on  to  understand  network  proper0es  –  Thomas  Shelling’s  famous  simula0on:  What  could  cause  the  

segrega0on  of  white  and  black  people  

–  Network  robustness  under  aSack    

Page 21: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Comparing  Network  Models  

observa0ons  over    various    real-­‐word  large-­‐scale  networks  

outcome  of  a    network  model  

(Figures  borrowed  from  “Emergence  of  Scaling  in  Random  Networks”)  

Page 22: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Centrality  Analysis  and  Influence  Modeling  

•  Centrality  Analysis:    –  Iden0fy  the  most  important  actors  or  edges  –  Various  criteria  

•  Influence  modeling:    –  How  is  informa0on  diffused?    

–  How  does  one  influence  each  other?      •  Related  Problems  

–  Viral  marke0ng:  word-­‐of-­‐mouth  effect  

–  Influence  maximiza0on  

22  

Page 23: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Community  Detec0on  

•  A  community  is  a  set  of  nodes  between  which  the  interac0ons  are  (rela0vely)  frequent  –  A.k.a.,    group,  cluster,  cohesive  subgroups,  modules    

•  Applica0ons:  Recommenda<on  based  communi<es,  Network  Compression,  Visualiza<on  of  a  huge  network    

•  New  lines  of  research  in  social  media  –  Community  Detec0on  in  Heterogeneous  Networks  –  Community  Evolu0on    in  Dynamic  Networks  –  Scalable  Community  Detec0on  in  Large-­‐Scale  Networks  

23  

Page 24: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Classifica0on  and  Recommenda0on  

•  Common  in  social  media  applica0ons  –  Tag  sugges0on,  Friend/Group  Recommenda0on,  Targe0ng  

24  

Link  predic0on    

Network-­‐Based  Classifica0on  

Page 25: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Privacy,  Spam  and  Security  

•  Privacy  is  a  big  concern  in  social  media  

–  Facebook,  Google  buzz  ojen  appear  in  debates  about  privacy  

–  NetFlix  Prize  Sequel  cancelled  due  to  privacy  concern  –  Simple  annoymiza0on  does  not  necessarily  protect  privacy  

•  Spam  blog  (splog),  spam  comments,  Fake  iden0ty,  etc.,  all  requires  new  techniques  

•  As  private  informa0on  is  involved,  a  secure  and  trustable  system  is  cri0cal    

•  Need  to  achieve  a  balance  between  sharing  and  privacy  

25  

Page 26: Social’Mediaand’Social’ Compung’dmml.asu.edu/cdm/slides/chapter1.pdf · Social’Media:’ManyEtoEMany’ Social’ Media Social Networking’ Blogs Microblogging’ Wiki’

Book  Available  at    •  Morgan  &  claypool  Publishers  •  Amazon  

If  you  have  any  comments,  please  feel  free  to  contact:    

•  Lei  Tang,    Yahoo!  Labs,    ltang@yahoo-­‐inc.com    

•  Huan  Liu,  ASU  [email protected]