cs#6604:#data#mining#large# …people.cs.vt.edu/~badityap/classes/cs6604-fall15/... ·  ·...

58
CS 6604: Data Mining Large Networks and Timeseries B. Aditya Prakash Lecture #1: Introduc/on

Upload: hakhuong

Post on 30-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

CS  6604:  Data  Mining  Large  Networks  and  Time-­‐series  

B.  Aditya  Prakash  Lecture  #1:  Introduc/on  

Networks  are  everywhere!  

   

Human  Disease  Network  [Barabasi  2007]  

Gene  Regulatory  Network  [Decourty  2008]  

Facebook  Network  [2010]  

The  Internet  [2005]  

2  CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

What  else  do  they  have  in  common?  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   3  

High  School  DaDng  Network  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   4  

Bearman  et.  al.  Am.  Jnl.  of  Sociology,  2004.  Image:  Mark  Newman   Blue:  Male  

Pink:  Female  

Interes/ng  observa/ons?  

The  Internet  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   5  

Skewed  Degrees  Robustness  

Karate  Club  Network  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   6  

7  

Dynamical  Processes  over  networks  are  also  everywhere!

 

CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

Why  do  we  care?  §  Social  collabora/on  §  Informa/on  Diffusion  §  Viral  Marke/ng  §  Epidemiology  and  Public  Health  §  Cyber  Security  §  Human  mobility    §  Games  and  Virtual  Worlds    §  Ecology  ........   CS  6604:  DM  Large  Networks  &  Time-­‐Series   8  Prakash  2015  

Why  do  we  care?  (1:  Epidemiology)  

§  Dynamical  Processes  over  networks  [AJPH  2007]  

CDC  data:  Visualiza/on  of  the  first  35  tuberculosis  (TB)  pa/ents  and  their  1039  contacts    

Diseases  over  contact  networks  

9  CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

Why  do  we  care?  (1:  Epidemiology)  

§  Dynamical  Processes  over  networks  

•   Each  circle  is  a  hospital  •   ~3000  hospitals  •   More  than  30,000  pa/ents  transferred      

[US-­‐MEDICARE  NETWORK  2005]  

10  

Problem:  Given  k  units  of  disinfectant,  whom  to  immunize?  

CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

Why  do  we  care?  (1:  Epidemiology)  

CURRENT  PRACTICE   OUR  METHOD  

~6x  fewer!  

[US-­‐MEDICARE  NETWORK  2005]  

11  Hospital-­‐acquired  inf.  took  99K+  lives,  cost  $5B+  (all  per  year)  CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

Why  do  we  care?  (2:  Online  Diffusion)  

12  

>  800m  users,  ~$1B  revenue  [WSJ  2010]  

 ~100m  acEve  users  

>  50m  users  

CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

Why  do  we  care?  (2:  Online  Diffusion)    

§  Dynamical  Processes  over  networks  

Celebrity  

Buy  Versace™!  

Followers  

13  Social  Media  Marke/ng  

CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

Why  do  we  care?    (3:  To  change  the  world?)  

§  Dynamical  Processes  over  networks  

Social  networks  and  CollaboraEve  AcEon  14  CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

High  Impact  –  MulDple  SeWngs  

Q.  How  to  squash  rumors  faster?    Q.  How  do  opinions  spread?      Q.  How  to  market  bePer?    

15  

epidemic  out-­‐breaks  

products/viruses  

transmit  s/w  patches  

CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

Dynamical  Processes    =  (a  lot  of)  Networks    +  (some)  Time-­‐Series  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   16  

Research  Theme  

DATA  Large  real-­‐world  

networks  &  processes  17  

ANALYSIS  Understanding  

POLICY/  ACTION  Managing  

CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

Research  Theme  –  Public  Health  

DATA  Modeling  #  pa/ent  

transfers  

ANALYSIS  Will  an  epidemic  

happen?  

POLICY/  ACTION  

How  to  control  out-­‐breaks?  18  CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

Research  Theme  –  Social  Media  

DATA  Modeling  Tweets  

spreading  

POLICY/  ACTION  

How  to  market  beoer?  

ANALYSIS  #  cascades  in  

future?  

19  CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

ML  &  Stats.  

Comp.  Systems  

Theory  &  Algo.  

Biology  

Econ.  

Social  Science  

Physics  

20  

Networks  and  Time-­‐Series  

CS  6604:  DM  Large  Networks  &  Time-­‐Series  Prakash  2015  

COURSE  LOGISTICS  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   21  

Course  InformaDon  §  Instructor    B.  Aditya  Prakash,  Torgersen  Hall  3160  F,  [email protected]    –  Office  Hours:  10-­‐11am  Tuesdays  –  Include  string  CS  6604  in  subject    

 §  Class  MeeDng  Time    Tuesdays,  Thursdays,  11:00am-­‐12:15pm,  McBryde  Hall  655      (/me  table  shows  McB  133C,  ignore!)  

 §  Syllabus:  Models,  Theory,  Algorithms,  and  ApplicaDons  for  Graph  

and  Time-­‐Series  Mining  –  Special  Focus  on  how  they  relate  to  Cascade/Propaga/on-­‐like  Processes  

   

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   22  

Course  InformaDon  §  Keeping  in  Touch    Course  web  site    

                               hop://www.cs.vt.edu/~badityap/classes/cs6604-­‐Fall15/    updated  regularly  through  the  semester  –  Piazza  link  on  the  website  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   23  

Textbook  §  NO  required  textbook  §  Recommended      David  Easley  and  Jon  Kleinberg:  Networks,  Crowds  and  Markets:  Reasoning  about  a  highly  connected  world.  Cambridge  University  Press.  2010  

   Web  page  for  the  book  (with  FREE  PDF!)    hop://www.cs.cornell.edu/home/kleinber/networks-­‐book/  

§  Other  (excellent)  related  books:  –  See  the  course  webpage  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   24  

Pre-­‐reqs  (A) Should  enjoy  the  course    J  (B) Ability  to  read  research  papers  (C) Background  in    

1.  Algorithms    2.  Probability  and  Stats  3.  Linear  Algebra  (helps)  4.  Graph  theory  (helps)  

(D) Graduate-­‐level  Programming  Skills  (i.e.  ability  to  use  unfamiliar  sosware,  picking  up  new  languages,  comfortable  with  at  least  one  of  Python/C/C++/Ruby/Java  etc.  (Matlab/R  a  plus))  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   25  

Rough  outline  

§  Part  (A)  Networks:  Structure  and  Evolu/on  §  Part  (B)  Processes  and  Dynamics:  Focus  on  Propaga/on  

§  Part  (C)  Graph  analy/cs:  Hadoop,  Anomaly  detec/on  

§  Part  (D)  Time  Series  Overview:  Mining  and  Forecas/ng  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   26  

Course  Grading:  TentaDve      

   

§  Class  parDcipaDon  necessary!  –  To  get  at  least  5%    

•  Non-­‐trivial  response  to  ‘reac/on’  posts  [see  next  slide]  at  least  6  /mes  

•  Remember  Piazza  gives  post  sta/s/cs  J  

Homework     10%   One  

Presenta/on/Grading   15%   Presen/ng  a  paper  in  class/grading  

Course  project     60%   BIGGEST  component  

Class  Par/cipa/on   10%   !=  aoendance  J  

‘Scribe’   5%    3  paragraph  reac/on  post  on  Piazza  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   27  

Scribe:  ReacDon  Post  §  A  3  paragraph  post  on  piazza  aser  each  class  –  Due  10pm  the  day  of  the  lecture  –  Summary  (4  lines)    

•  What  was  the  main  technical  content?  •  How  did  it  relate  to  other  topics?  •  What  were  the  main  doubts  raised  by  students?  

–  Expanding  (3  lines)  •  Describe  least  one  non-­‐trivial  related  aspect  not  covered  in  the  lecture    

–  Brainstorming  (4  lines)  •  Any  open  interes/ng  ques/ons  (e.g.  which  readings  missed)?Extensions?    

•  What  possible  applica/ons  could  benefit  from  the  proposed  material?    

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   28  

Scribe:  ReacDon  Post  

§  1  scribe  per  lecture  – Everyone  would  scribe  one  lecture  

§ Will  start  aser  a  few  lectures  §  Readings  will  be  posted  on  Friday  for  the  following  week.    

§  The  schedule  will  be  posted  on  Piazza  soon  – Essen/ally  will  follow  class-­‐order  alphabe/cally  as  on  Hokiespa  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   29  

Project  (Groups  of  2-­‐3)  §  Has  to  be  substanDal    §  Can  be  

–  Experimental:  evalua/on  of  different  algorithms  and  models  on  an  interes/ng  dataset(s)  

–  TheoreDcal:  considers  a  model  (can  be  novel),  or  an  algorithm,  or  a  metric  and  derives  a  rigorous  result  about  it  (e.g.  /ghter  bounds,  surprising  proper/es  etc)    

–  Extension:  an  extension  or  improvement  of  a  method  or  model  covered  in  class  to  a  different  or  more  general  seyng  (eg.  /me-­‐varying,  distributed,  any/me,  scalable,  etc.)  and  experiments  that  jus/fy  the  new  proposal  

§  Can  NOT  be  just  a  survey  §  (no  double  dipping:  should  not  get  double  credit  for  same  

work)  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   30  

Project  

§  Deliverables                        Weight    –  Project  Proposal                  15%    –  Project  Milestone  Report            10%  –  Final  Report                  20%    –  Final  Presenta/on/Poster  in  class  (TBD)  15%  

§  Proposal  should  contain  a  detailed  survey  of  3-­‐4  pages  –  6-­‐8  papers  in  a  topic  of  your  interest  

•  Discussed  in  or  related  to  the  course  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   31  

Project  

§  Deliverables                      Rough  due  Dates  – Project  Proposal                      ~Oct.  6    – Project  Milestone  Report                ~Nov.  5  – Final  Report                                                    ~Dec.  2      – Final  Presenta/on/Poster  in  class                            Dec  3/8  

§  Start  EARLY!  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   32  

One  Homework  

§  To  be  released  on  ~Sept  15  §  Due  on  ~Sept.  29  (beginning  of  class)  §  Theore/cal  +  Hands-­‐on  

§  Start  EARLY!    

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   33  

Other…  

§  Class  moves  at  a  fast  pace!  I’ll  assume  this  is  the  only  (or  one  of  two)  graduate  level  classes  this  semester  for  you  

§  Graduate  class-­‐-­‐-­‐so  no  spoonfeeding!    

§  Announcement:  NO  class  on  Thursday  Aug  27  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   34  

WARM-­‐UP  AND  BASICS  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   35  

A  QuesDon  

§  How  many  of  you  think  your  friends  have  more  friends  than  you?  J  

§  A  recent  Facebook  study  –  Examined  all  of  FB’s  users:  721  million  people  with  69  billion  friendships.    •  about  10  percent  of  the  world’s  popula/on!  

–  Found  that  user’s  friend  count  was  less  than  the  average  friend  count  of  his  or  her  friends,  93  percent  of  the  /me.    

–  Users  had  an  average  of  190  friends,  while  their  friends  averaged  635  friends  of  their  own.  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   36  

Possible  Reasons?  

§  You  are  a  loner?  §  Your  friends  are  extroverts?    §  There  are  more  extroverts  than  introverts  in  the  world?  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   37  

Example  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   38  

Source:  S.  Strogatz,  NYT  2012  

Average  number  of  friends?  

Example  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   39  

Source:  S.  Strogatz,  NYT  2012  

Average  number  of  friends  =  (  1  +  3  +  2  +  2  )  /  4  =  2  

Example  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   40  

Source:  S.  Strogatz,  NYT  2012  

Average  number  of  friends  =  (  1  +  3  +  2  +  2  )  /  4  =  2  

Average  number  of  friends  of  friends  

Example  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   41  

Source:  S.  Strogatz,  NYT  2012  

Average  number  of  friends  =  (  1  +  3  +  2  +  2  )  /  4  =  2  

Average  number  of  friends  of  friends  =  (3  +  1  +  2  +  2  +  3  +  2  +  3  +  2)/8  =  ((1x1)  +  (3x3)  +  (2x2)  +  (2x2))/8  

Example  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   42  

Source:  S.  Strogatz,  NYT  2012  

Average  number  of  friends  =  (  1  +  3  +  2  +  2  )  /  4  =  2  

Average  number  of  friends  of  friends  =  (3  +  1  +  2  +  2  +  3  +  2  +  3  +  2)/8  =  ((1x1)  +  (3x3)  +  (2x2)  +  (2x2))/8  =  2.25!  

Actually  it  is  (almost)  always  true!    

§  Proof?  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   43  

Actually  it  is  (almost)  always  true!    

§  Proof?  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   44  

E[X]= xi / N∑

Actually  it  is  (almost)  always  true!    

§  Proof?  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   45  

E[X]= xi / N∑Var[X]= E[(X −E[X])2 ]

= E[X 2 ]−E[X]2

Actually  it  is  (almost)  always  true!    

§  Proof?  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   46  

E[X]= xi / N∑Var[X]= E[(X −E[X])2 ]

= E[X 2 ]−E[X]2

E[X 2 ]E[X]

= E[X]+Var[X]E[X]

Actually  it  is  (almost)  always  true!    

§  Proof?  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   47  

E[X]= xi / N∑Var[X]= E[(X −E[X])2 ]

= E[X 2 ]−E[X]2

E[X 2 ]E[X]

= E[X]+Var[X]E[X]

EssenDally,  it  is  true  if  there  is  any  spread  in  #  of  friends  (non-­‐zero  variance)!  

ImplicaDons  

§  Immuniza/on    – We  will  see  later,  acquaintance  immuniza/on  

•  Immunize  friend-­‐of-­‐friend  

§  Early  warning  of  outbreaks  – Again,  monitor  friends  of  friends  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   48  

we expect a set of nominated friends to get infected earlier than aset of randomly chosen individuals (who represent the populationas a whole). More specifically, a random sample of individualsfrom a social network will have a mean degree of m (the meandegree for the population); but the friends of these randomindividuals will have a mean degree of m plus a quantity defined bythe variance of the degree distribution divided by m. Hence, whenthere is variance in degree in a population, and especially whenthere is high variance, the mean number of contacts for the friendswill be greater (and potentially much greater) than the mean forthe random sample. This is sometimes known as the ‘‘friendshipparadox’’ (‘‘your friends have more friends than you do’’) [15–19].While the idea of immunizing such friends of randomly chosen

people has previously been explored in a stimulating theoreticalpaper [12], to our knowledge, a method that uses nominatedfriends as sensors for early detection of an outbreak has notpreviously been proposed, nor has it been tested on any sort of realoutbreak. To evaluate the effectiveness of nominated friends associal network sensors, we therefore monitored the spread of flu atHarvard College from September 1 to December 31, 2009. In thefall of 2009, both seasonal flu (which typically kills 41,000Americans each year [20]) and the H1N1 strain were prevalent inthe US, though the great majority of cases in 2009 have beenattributed to the latter.[1] It is estimated that this H1N1 epidemic,which began roughly in April 2009, infected over 50 millionAmericans. Unlike seasonal flu, which typically affects individualsolder than 65, H1N1 tends to affect young people. Nationally,according to the CDC, the epidemic peaked in late October 2009,and vaccination only became widely available in December 2009.Whether another outbreak of H1N1 will occur (for example, inareas and populations that have heretofore been spared) is a

Figure 1. Network Illustrating Structural Parameters. This realnetwork of 105 students shows variation in structural attributes andtopological position. Each circle represents a person and each linerepresents a friendship tie. Nodes A and B have different ‘‘degree,’’ ameasure that indicates the number of ties. Nodes with higher degreealso tend to exhibit higher ‘‘centrality’’ (node A with six friends is morecentral than B and C who both only have four friends). If contagionsinfect people at random at the beginning of an epidemic, centralindividuals are likely to be infected sooner because they lie a shorternumber of steps (on average) from all other individuals in the network.Finally, although nodes B and C have the same degree, they differ in‘‘transitivity’’ (the probability that any two of one’s friends are friendswith each other). Node B exhibits high transitivity with many friendsthat know one another. In contrast, node C’s friends are not connectedto one another and therefore they offer more independent possibilitiesfor becoming infected earlier in the epidemic.doi:10.1371/journal.pone.0012948.g001

Figure 2. Theoretical expectations of differences in contagion between central individuals and the population as a whole. Acontagious process passes through two phases, one in which the number of infected individuals exponentially increases as the contagion spreads,and one in which incidence exponentially decreases as susceptible individuals become increasingly scarce. These dynamics can be modeled by alogistic function. Central individuals lie on more paths in a network compared to the average person in a population and are therefore more likely tobe infected early by a contagion that randomly infects some individuals and then spreads from person to person within the network. This shifts the S-shaped logistic cumulative incidence function forward in time for central individuals compared to the population as a whole (left panel). It also shiftsthe peak infection rate forward (right panel).doi:10.1371/journal.pone.0012948.g002

Social Network Sensors

PLoS ONE | www.plosone.org 2 September 2010 | Volume 5 | Issue 9 | e12948

Network  structure  is  important!!  

§  A  network  is  a  collec/ons  of  nodes  with  rela/ons  between  some  nodes  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   49  

Object:  nodes,  ver/ces  N  RelaDons:  links,  edges  E  System:  graphs,  networks  G(N,  E)  

Networks  and  Graphs  

§  Networks:  typically  a  real  system  – Metabolic  Network,  Social  Network,  etc.  

§  Graphs:  typically  the  mathema/cal  representa/on  – Web  graph,  Planar  graphs  etc.  

                     But  we  use  it  interchangeably    

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   50  

Networks:  Which  representaDon?  

§  Connect  people  who  work  together:  professional  network  

§  Connect  authors  and  papers:  co-­‐authorship  network  

§  Connect  all  people  whose  name  is  John  Smith?      

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   51  

Networks:  Which  representaDon?  

§  Choice  is  important  –  In  some  cases  there  is  a  unique  unambiguous  representa/on  

–  In  most  others,  YOU  have  to  choose  •  Depends  on  what  you  want  to  ask  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   52  

Undirected  vs  Directed  Graphs  

§  Undirected  – Links  are  symmetrical  

–   Examples  •  Friendships  (on  FB!)    •  Collaborators  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   53  

Undirected � Links: undirected

(symmetrical)

� Examples:

� Collaborations

� Friendship on Facebook

Directed � Links: directed

(arcs)

� Examples:

� Phone calls

� Following on Twitter

9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 52

A

B

D

C

L

M F

G

H

I

A G

F

B C

D

E

Undirected  vs  Directed  Graphs  

§  Directed  – Links  are  directed  

–   Examples  •  Following  on  Twioer  •  Phone  calls  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   54  

Undirected � Links: undirected

(symmetrical)

� Examples:

� Collaborations

� Friendship on Facebook

Directed � Links: directed

(arcs)

� Examples:

� Phone calls

� Following on Twitter

9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 52

A

B

D

C

L

M F

G

H

I

A G

F

B C

D

E

Graph  connecDvity  

§  Connected  (undirected)  graphs  – There  is  a  path  between  any  two  ver/ces  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   55  

Bridge edge: If we erase it, the graph becomes disconnected. Articulation point: If we erase it, the graph becomes disconnected.

Largest Component: Giant Component Isolated node (node F)

� Connected (undirected) graph: � Any two vertices can be joined by a path.

� A disconnected graph is made up by two or more connected components

D C

A

B

H

F

G

D C

A

B

H

F

G

9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 53

Mul/ple  components  

Graph  connecDvity  

§  Connected  (undirected)  graphs  – There  is  a  path  between  any  two  ver/ces  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   56  

Bridge edge: If we erase it, the graph becomes disconnected. Articulation point: If we erase it, the graph becomes disconnected.

Largest Component: Giant Component Isolated node (node F)

� Connected (undirected) graph: � Any two vertices can be joined by a path.

� A disconnected graph is made up by two or more connected components

D C

A

B

H

F

G

D C

A

B

H

F

G

9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 53

Largest  component:  GIANT  component  

Graph  connecDvity  

§  Extension  to  directed  graphs  – Strongly  connected:  has  a  path  from  every  node  to  every  other  node  

– Weakly  connected:  is  connected  if  edge  direc/ons  are  disregarded  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   57  

E

C

A

B

G

F

D

� Strongly connected directed graph � has a path from each node to every other node

and vice versa (e.g., A-B path and B-A path) � Weakly connected directed graph

� is connected if we disregard the edge directions

Graph on the left is connected but not strongly connected (e.g., there is no way to get from F to G by following the edge directions).

9/25/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 54

Weakly  connected,  but  not  strongly  connected    Why?    

Next  Class  

§  The  Web  as  a  graph  § More  network  proper/es  §  The  G(n,p)  model  

Prakash  2015   CS  6604:  DM  Large  Networks  &  Time-­‐Series   58