network models and data analysis stephen e. fienberg department of statistics machine learning...
TRANSCRIPT
Network Models and Data Analysis
Stephen E. FienbergDepartment of Statistics
Machine Learning Department
Machine Learning 10-701/15-781, Fall 2008
2
Making Pretty Pictures — Visualizing Networks — Is Easy
October 22, 2008
3October 22, 2008
4
Example 1: 9/11 Terrorists
October 22, 2008
5
Lots of Probabilistic/Statistical Models
• Types of models:– Descriptive vs. Generative.– Static vs. Dynamic.
• Origin of social network models in 1930s, integrated with graph representation in 1950s.
• Erdos-Renyi random graph models.– Generalized random graph models.– Stochastic process reinterpretations.
• Sociometric models such as p1 and ERGMs.• Machine learning / latent-variable models:
– Stochasitic block models for mixed membership.October 22, 2008
6
Applications Galore
• Small world studies• Social networks:
– Sampson’s monks – Classroom friendship
• Organization theory– Branch banks
• Homeland security• Politics
– Voting behavior– Bill co-sponsorship
• Public health– Needle sharing– Spread of AIDS– Obesity
• Computer science:– Email networks (Enron)– Internet – WWW routing systems
• Biology:– Protein-protein interactions– Zebras
October 22, 2008
7
But Doing Careful Statistical Analysis is Difficult
• Claims for network behavior are often based on casual empiricism:– Power laws are everywhere, yet nowhere
once we look closely at the data.
• Inferential issues usually buried:– Algorithms, simulations, and “experiments”
are not substitutes for formal statistical representation and theory.
October 22, 2008
8October 22, 2008
9
Power Laws & Internet Graph
October 22, 2008
10
Framework for Networks Evolving over Time
• Our representation for a network will be a graph: Gt={Nt;Et}.– Nodes and edges can be created and can die.– Edges can be directed or undirected.– Data are available to be observed beginning at time
t0.
• There exists stochastic process, evolving over time which, combined with initial conditions, describes the network structure and evolution.– May involve more than dyadic relationships.
October 22, 2008
11
Forms of Network Data
1. Observe formation (or removal) of each edge with a time stamp indicating when this occurs. • Can see how entire network or sub-network
changes with each transaction.
2. Observe status of network or sub-network at T epochs. • Represent snapshots of network.• Correspond to information on incidence of links
and information on relationships.
3. Observe cumulative network links over time.• “Prevalence” approach.
October 22, 2008
12
Example 3: Enron E-mail Database
• Attributes nodes (including organization chart!) and full text on all e-mail messages.
• Multiple addressees and cc’s. Thus observations produce structure different from dyadic edges.
• Messages contain time stamps, so we are in situation 3.
• Question: Who was party to fraudulent transactions and when?
October 22, 2008
13
Enron−Threshold 5 (151 employees)
October 22, 2008
14
Enron−Threshold 30 (151 employees)
October 22, 2008
15
Example 4: The Framingham “Obesity” Study
• Original Framingham “sample” cohort with offspring cohort of N0=5,124 individuals measured beginning in 1971 for T=7 epochs centered at 1971, 1981, 1985, 1989, 1992, 1997, 1999.
• Link information on family members and one “close friend.” Total number of individuals on whom we have obesity measures is N=12,067.
• NEJM, July 2007.October 22, 2008
16October 22, 2008
17
Animation
October 22, 2008
18
Erdos-Renyi Random Graph Model
• Two versions:– In G(n, M) model, graph is chosen uniformly
at random from collection of all graphs which have n nodes and M edges.
– In G(n, p) model, each edge is included in graph with probability p, with presence or absence of distinct edges being independent.• As p increases from 0 to 1, the model becomes
more and more likely to include graphs with more edges.
– October 22, 2008
19
Erdos-Renyi Random Graph Model
• G(n, p) has on average nC2 p edges, and distribution of degree of any node is binomial (n,p).– If np < 1, G(n,p) will almost surely have no
connected components of size larger than O(logn).– If np = 1, G(n,p) will almost surely have largest
component whose size 0(n2 / 3). – If np tends to constant c > 1, G(n, p) will almost
surely have unique "giant" component containing positive fraction of the nodes. No other component will contain more than O(logn) nodess.October 22, 2008
20October 22, 2008
21
Preferential Attachment Model
• Encourages formation of hubs in in graph.
• Degree distribution follows power law.– Fraction of nodes having k edges to other
nodes for large values of k as P(k) ~ k−γ .– Linear on log-log scale.
October 22, 2008
22
Small World Model
• Designed to produce local clustering and triadic closures, by interpolating between an ER graph and a regular ring lattice.
October 22, 2008
October 22, 2008 23
Example 5: Monks in a Monastery
• 18 novices observed over two years.– Network data gather at 4 time points; and on
multiple relationships, e.g., friendship.– Airoldi, et al., (2007, 2008)
October 22, 2008 24
25
Holland-Leinhardt’s p1 Model
• n nodes; occurrence of “directed” links is random.
• Consider dyads Dij= (Xij,Xji) to be independent with– Pr(Dij=(1,1)) = mij, i < j
– Pr(Dij=(1,0)) = aij, i ≠ j
– Pr(Dij=(0,0)) = nij, i < j
where mij + aij + aji + nij = 1, for all i < j.October 22, 2008
26
p1 Model
• If we let – ρij = log{mijnij/(aijaji)}, i < j
– θij = log{aij/nij}, i ≠ j
Then p1 assumes probability of observing x is:
p1(x) = Pr(X=x) = K exp[Σi<j ρijXijXji + ΣijθijXij]– K = Πi<k 1/kij
– kij ({θij}, { ρij }) is a normalizing constant for Dij.
October 22, 2008
27
Three Common Forms of p1
• If we add restrictions:• θij = θ + αi + ßj i ≠ j
• (i) ρij = 0, (ii) ρij = ρ, and (iii) ρij = ρ+ρi+ρj
• Then for case (ii):• p1(x) = K exp[ρM + θL + ΣiαiXi+ + ΣjßjX+j]
reciprocity density expansiveness popularity
October 22, 2008
28
Estimation for p1
• Exponential family form.– Set MSSs equal to their expectations.– Iterate.
• Holland and Leinhardt explored goodness of fit of p1:– Comparing ρij = 0 vs. ρij = ρ.
– Usual chi-square results don’t apply. – How to test ρij = ρ against a more complex
model? October 22, 2008
29
p1 As a Log-Linear Model
• p1 is expressible as log-linear model on “incomplete” 4-way contingency table:– Yijkl= 1 if Xij = k and Xji = l,
0 otherwise.
• p1 with ρij = ρ corresponds to log-linear model on Y with all two-way interactions: [12][13][14][23][24][34].
• p1 with ρij = ρ +ρi + ρj corresponds to [12][134][234].October 22, 2008
October 22, 2008 30
Example 5: Monks in a Monastery
• 18 novices observed over two years.– Network data gather at 4 time points; and on
multiple relationships, e.g., friendship.– Airoldi, et al., (2007, 2008)
31
p1 Analysis of Monk Data
October 22, 2008
32
p1 Analysis of Monk Data
October 22, 2008
33
Sampson’s Monks−3 Blocks?
October 22, 2008
October 22, 2008 34
K=3 SBMM for Friendship
• Friendship relationship among novices measured at 3 successive times.
• K=3 stochastic blocks + mixed membership:
October 22, 2008 35
Example 6: MIPS-Curated PPI in Yeast
• 871 proteins participate in 15 high-level functions• 2119 functional annotations (binary)
October 22, 2008 36
M = 871 nodesM2 = 750K entries
The Data: Interaction Graphs
• M proteins in a graph (nodes)• M2 observations on pairs of proteins
– Edges are random quantities, Y[n,m]
– Interactions are not independent.
– Interacting proteins form a protein complex.
• T graphs on the same set of proteins• Partial annotations for each protein, X[n]
October 22, 2008 37
Modeling Ideas
• Hierarchical Bayes:– Latent variables encode semantic elements– Assume structure on observable-latent
elements
1. Models of mixed membership
2. Network models (block models)
Stochastic block models of mixed membership
=
October 22, 2008 38
Graphical Model Representation
StochasticBlocks
MixedMembership
October 22, 2008 39
Interactions(observed*)
j
i
yij = 1
i
j
1 2 3
Mixed membershipVectors (latent*)
h
g
1 2 3123
23 = 0.9
Group-to-grouppatterns (latent*)
Pr ( yij=1 | i,j, ) = i j
T
Hierarchical Likelihood
October 22, 2008 40
Interactions in Yeast (MIPS)
Do PPI contain information about functions?
YLD014W
1
01 2 3 . . . 15
October 22, 2008 41
Results: Functional Annotations
October 22, 2008 42
Results: Stochastic Block Model
October 22, 2008 43
Some Results
• K=50 blocks works well using 5-fold cross-validation, and are consistent with 15 functional categories.
• Our predictions of functional annotations are superior to others in the literature on same data base.
• Lots of technical details.
44
Example 7: Social Network
of Zebras
October 22, 2008
45
Dynamical Representation
• What is the stochastic model for group formation and change?
• Groups of females and shifting males who are mating?
October 22, 2008
46
Summary
• Lots of networks and their graphs representation.
• Eros-Renyi random graph models, preferential attachment models, small world models.
• p1 and log-linear models.– Generalization to Exponential Random Graph
Models.
• Stochastic block models with mixed membership.October 22, 2008
47
Some References• Holland, P.W. and Leinhardt, S. (1981). An exponential family of
probability distributions for directed graphs (with discussion). Journal of the American Statistical Association, 76:33–65.
• Fienberg, S.E. and Wasserman, S.S. (1981). Categorical data analysis of single sociometric relations. Sociological Methodology, 156–192.
• Fienberg, S.E. Meyer, M.M. and Wasserman, S.S. (1985). Statistical analysis of multiple sociometric relations. Journal of the American Statistical Association, 80:51–67.
• Airoldi, E.M., Blei, D.M. Fienberg, S.E., Goldenberg, A., Xing, E., and Zheng, A. eds. (2007). Statistical Network Analysis: Models, Issues and New Directions, LNCS 4503 Springer-Verlag, Berlin.
• Newman, M., Barabási, A.L., and Watts, D.J. eds. (2006). The Structure and Dynamics of Networks. Princeton Univ. Press.
• Airoldi, E.M., Blei, D.M. Fienberg, S.E., and Xing, E. (2008). Mixed Membership Stochastic Blockmodels. Journal of Machine Learning Research, 9:1981—2014.
October 22, 2008