socialnetworkanalysis (tin180 com)
DESCRIPTION
http://tin180.com - Trang tin tức văn hóa lành mạnhTRANSCRIPT
April 12, 2023 Data Mining: Concepts and Techniques 1
Social Network Analysis
Social Network Introduction
Statistics and Probability Theory
Models of Social Network Generation
Networks in Biological System
April 12, 2023 Data Mining: Concepts and Techniques 2
Society
Nodes: individuals
Links: social relationship (family/work/friendship/etc.)
S. Milgram (1967)
Social networks: Many individuals with diverse social interactions between them.
John Guare
Six Degrees of Separation
April 12, 2023 Data Mining: Concepts and Techniques 3
Communication networks
The Earth is developing an electronic nervous system, a network with diverse nodes and links are
-computers
-routers
-satellites
-phone lines
-TV cables
-EM waves
Communication networks: Many non-identical components with diverse connections between them.
April 12, 2023 Data Mining: Concepts and Techniques 4
Complex systemsMade of
many non-identical elements connected by diverse interactions.
NETWORK
April 12, 2023 Data Mining: Concepts and Techniques 5
“Natural” Networks and Universality
Consider many kinds of networks: social, technological, business, economic, content,…
These networks tend to share certain informal properties: large scale; continual growth distributed, organic growth: vertices “decide” who to link to interaction restricted to links mixture of local and long-distance connections abstract notions of distance: geographical, content, social,…
Do natural networks share more quantitative universals? What would these “universals” be? How can we make them precise and measure them? How can we explain their universality? This is the domain of social network theory Sometimes also referred to as link analysis
April 12, 2023 Data Mining: Concepts and Techniques 6
Some Interesting Quantities
Connected components: how many, and how large?
Network diameter: maximum (worst-case) or average? exclude infinite distances? (disconnected components) the small-world phenomenon
Clustering: to what extent that links tend to cluster “locally”? what is the balance between local and long-distance
connections? what roles do the two types of links play?
Degree distribution: what is the typical degree in the network? what is the overall distribution?
April 12, 2023 Data Mining: Concepts and Techniques 7
A “Canonical” Natural Network has…
Few connected components: often only 1 or a small number, indep. of network size
Small diameter: often a constant independent of network size (like 6) or perhaps growing only logarithmically with network
size or even shrink? typically exclude infinite distances
A high degree of clustering: considerably more so than for a random network in tension with small diameter
A heavy-tailed degree distribution: a small but reliable number of high-degree vertices often of power law form
April 12, 2023 Data Mining: Concepts and Techniques 8
Probabilistic Models of Networks
All of the network generation models we will study are probabilistic or statistical in nature
They can generate networks of any size They often have various parameters that can be set:
size of network generated average degree of a vertex fraction of long-distance connections
The models generate a distribution over networks Statements are always statistical in nature:
with high probability, diameter is small on average, degree distribution has heavy tail
Thus, we’re going to need some basic statistics and probability theory
April 12, 2023 Data Mining: Concepts and Techniques 9
Zipf’s Law Look at the frequency of English words:
“the” is the most common, followed by “of”, “to”, etc.
claim: frequency of the n-th most common ~ 1/n (power law, α = 1)
General theme: rank events by their frequency of occurrence resulting distribution often is a power law!
Other examples: North America city sizes personal income file sizes genus sizes (number of species)
People seem to dither over exact form of these distributions (e.g. value of α), but not heavy tails
April 12, 2023 Data Mining: Concepts and Techniques 10
Linear scales on both axes
Logarithmic scales on both axes
The same data plotted on linear and logarithmic scales. Both plots show a Zipf distribution with 300 datapoints
Zipf’s Law
April 12, 2023 Data Mining: Concepts and Techniques 11
Social Network Analysis
Social Network Introduction
Statistics and Probability Theory
Models of Social Network Generation
Networks in Biological System
Summary
April 12, 2023 Data Mining: Concepts and Techniques 12
Some Models of Network Generation
Random graphs (Erdös-Rényi models): gives few components and small diameter does not give high clustering and heavy-tailed degree
distributions is the mathematically most well-studied and understood model
Watts-Strogatz models: give few components, small diameter and high clustering does not give heavy-tailed degree distributions
Scale-free Networks: gives few components, small diameter and heavy-tailed
distribution does not give high clustering
Hierarchical networks: few components, small diameter, high clustering, heavy-tailed
Affiliation networks: models group-actor formation
April 12, 2023 Data Mining: Concepts and Techniques 13
The Clustering Coefficient of a Network
Let nbr(u) denote the set of neighbors of u in a graph all vertices v such that the edge (u,v) is in the graph
The clustering coefficient of u: let k = |nbr(u)| (i.e., number of neighbors of u) choose(k,2): max possible # of edges between vertices in nbr(u) c(u) = (actual # of edges between vertices in
nbr(u))/choose(k,2) 0 <= c(u) <= 1; measure of cliquishness of u’s neighborhood
Clustering coefficient of a graph: average of c(u) over all vertices u
k = 4choose(k,2) = 6c(u) = 4/6 = 0.666…
April 12, 2023 Data Mining: Concepts and Techniques 14
Clustering: My friends will likely know each other!
Probability to be connected C » p
C =# of links between 1,2,…n neighbors
n(n-1)/2
Networks are clustered [large C(p)]
but have a small characteristic path length
[small L(p)].
Network C Crand L N
WWW 0.1078 0.00023 3.1 153127
Internet 0.18-0.3 0.001 3.7-3.763015-6209
Actor 0.79 0.00027 3.65 225226
Coauthorship 0.43 0.00018 5.9 52909
Metabolic 0.32 0.026 2.9 282
Foodweb 0.22 0.06 2.43 134
C. elegance 0.28 0.05 2.65 282
The Clustering Coefficient of a Network
April 12, 2023 Data Mining: Concepts and Techniques 15
Erdos-Renyi: Clustering Coefficient
Generate a network G according to G(N,p) Examine a “typical” vertex u in G
choose u at random among all vertices in G what do we expect c(u) to be?
Answer: exactly p! In G(N,m), expect c(u) to be 2m/N(N-1) Both cases: c(u) entirely determined by overall
density Baseline for comparison with “more clustered”
models Erdos-Renyi has no bias towards clustered or
local edges
April 12, 2023 Data Mining: Concepts and Techniques 16
Scale-free Networks
The number of nodes (N) is not fixed Networks continuously expand by additional new
nodes WWW: addition of new nodes Citation: publication of new papers
The attachment is not uniform A node is linked with higher probability to a node
that already has a large number of links WWW: new documents link to well known sites
(CNN, Yahoo, Google) Citation: Well cited papers are more likely to be
cited again
April 12, 2023 Data Mining: Concepts and Techniques 17
Scale-Free Networks Start with (say) two vertices connected by an edge For i = 3 to N:
for each 1 <= j < i, d(j) = degree of vertex j so far let Z = S d(j) (sum of all degrees so far) add new vertex i with k edges back to {1, …, i-1}:
i is connected back to j with probability d(j)/Z Vertices j with high degree are likely to get more links! “Rich get richer” Natural model for many processes:
hyperlinks on the web new business and social contacts transportation networks
Generates a power law distribution of degrees exponent depends on value of k
April 12, 2023 Data Mining: Concepts and Techniques 18
Preferential attachment explains heavy-tailed degree distributions small diameter (~log(N), via “hubs”)
Will not generate high clustering coefficient no bias towards local connectivity, but towards
hubs
Scale-Free Networks
April 12, 2023 Data Mining: Concepts and Techniques 19
Social Network Analysis
Social Network Introduction
Statistics and Probability Theory
Models of Social Network Generation
Networks in Biological System
Mining on Social Network
Summary
April 12, 2023 Data Mining: Concepts and Techniques 20
protein-gene interactions
protein-protein interactions
PROTEOME
GENOME
Citrate Cycle
METABOLISM
Bio-chemical reactions
Bio-Map
April 12, 2023 Data Mining: Concepts and Techniques 21
Citrate Cycle METABOLISM Bio-chemical reactions
Metabolic Network
April 12, 2023 Data Mining: Concepts and Techniques 22
April 12, 2023 Data Mining: Concepts and Techniques 23
Nodes: chemicals (substrates)
Links: bio-chemical reactions
Metabolic Network
April 12, 2023 Data Mining: Concepts and Techniques 24
Organisms from all three domains of life are scale-free networks!
H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000)
Archaea Bacteria Eukaryotes
Metabolic Network
April 12, 2023 Data Mining: Concepts and Techniques 25
protein-gene interactions
protein-protein interactions
PROTEOME
GENOME
Citrate Cycle
METABOLISM
Bio-chemical reactions
Bio-Map
April 12, 2023 Data Mining: Concepts and Techniques 26
protein-protein interactions
PROTEOME
Protein Network
April 12, 2023 Data Mining: Concepts and Techniques 27
Nodes: proteins
Links: physical interactions (binding)
P. Uetz, et al. Nature 403, 623-7 (2000).
Yeast Protein Network
April 12, 2023 Data Mining: Concepts and Techniques 28
)exp()(~)( 00
k
kkkkkP
H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, 41-42 (2001)
Topology of the Protein Network
April 12, 2023 Data Mining: Concepts and Techniques 29
Nature 408 307 (2000)
…
“One way to understand the p53 network is to compare it to the Internet. The cell, like the Internet, appears to be a ‘scale-free network’.”
p53 Network
April 12, 2023 Data Mining: Concepts and Techniques 30
p53 Network (mammals)