time-varying networks inference and structured input-out ...epxing/ic/cs-ic08.pdf · statistical...
TRANSCRIPT
1
Eric XingEric Xing
ML/LTI/CSDSchool of Computer ScienceCarnegie Mellon University
TimeTime--Varying NetworksVarying Networks InferenceInferenceand and
Structured InputStructured Input--Out LearningOut Learning
http://www.sailing.cs.cmu.edu/
2
Learning in structured input/output spaceSemi-supervised and unsupervised maximum margin learningTheory and algorithm for optimization, inference and active learningApplications in genomics, machine translation, and multi-media analysis
Nonparametric Bayesian models for "open worlds"Domain-closure, unique name and stationarity assumptions are not always valid:
How many clusters/states/objects/relations out there?Ambiguous data association.Birth/death/evolution of possible worlds.
Infinite-capacity models based on Dirichlet process (Polya urn schemes)Applications in genetics and evolution, tracking and email filtering
Statistical modeling and inference of relational data Modeling the formation, evolution, and dynamics of networksInferring their semantic aspects, missing links, and node attributesBiological and social network analysis
Overview:Overview:Learning and Reasoning under UncertaintyLearning and Reasoning under Uncertainty
(Xing, et al. ICML 04,06, Ahmed SDM08)
(zen, et al. KDD 07, Zhu, et al, ICML08, Kim, UAI08)
(Guo, et al. ICML 07)
Genomics and regulatory evolutionStatistical models for genome evolution and natural selectionFunctional effects on gene regulation and morphogenesis Gene finding and functional prediction via comparative genomic analysis
Computation Developmental Biology of FlysImage analysis and database
Feature processing, segmentation, and pattern representationRecovering 3D structure from 2D imagesShape and deformation modeling and categorization
Spatial-temporal modeling of gene regulationTemporal shape evolution and models for morphogenesisThe genetics of pattern polymorphism and divergence
Genetic variation and diseases associationGenealogy/evolution models: how many founders, migration and evolution history...Models for linkages between variations and phenotypesClinical and forensic applications
++++
Overview: Overview: Computational Biology and Statistical GeneticsComputational Biology and Statistical Genetics
3
9/8/2006, CALD IC
Novel Statistical Models and Algorithms for Network Novel Statistical Models and Algorithms for Network Modeling, Mining, and Reverse EngineeringModeling, Mining, and Reverse Engineering
NSF IIS-0713379
PI: Eric Xing
Inferring TimeInferring Time--VaryingVarying
NetworksNetworks
Disease Spread
Social Network
Food Web
Citation Network
Internet
Network and Relational DataNetwork and Relational Data
4
Changing Social Networks In Changing Social Networks In WashingtonWashington
Corporativity,
Antagonism,
Cliques,…
over time?
T0 TN ?…
…
"Rewiring" Pathways in Biology"Rewiring" Pathways in Biology
5
Problem: Network reconstruction Problem: Network reconstruction
Reverse engineer "rewiring" networks
Temporal Exponential Random Graph Temporal Exponential Random Graph ModelsModels
( ) ( ) ( ){ }111 ,ln,exp −−− −Ψ⋅= ttttt AZAAAAP θθ
Markov Markov aassumptionssumption::
TimeTime--evolving network model:evolving network model:
( ) ( ) ( ) ( )1121121 APAAPAAPAAAAP tttt ,,,, LK −− =
6
"Dynamic" Potentials"Dynamic" Potentials
“Continuity”:
“Reciprocity”:
“Transitivity”:
“Density”:
( ) ( )( )( )∑ −−− −−+=Ψij
tij
tij
tij
tij
tt AAAAAA 1111 11,
( ) ( ) ( ){ }111 ,ln,exp −−− −Ψ⋅= ttttt AZAAAAP θθ
( ) ∑ −− =Ψij
tji
tij
tt AAAA 112 ,
( )∑∑
−−
−−
− =Ψijk
tkj
tik
ijktkj
tik
tijtt
AA
AAAAA 11
111
3 ,
( ) ∑=Ψ −
ij
tij
tt AAA 14 ,
Degeneracy Degeneracy (Handcock et al.)(Handcock et al.)
Some estimator can result in an ERGM that place most of the probability mass on a subset of the sample space containing networks that bear no resemblance to the observed networks
For such models, an MLE does not exist, resulting in poor fite.g., When the observed statistics do not lie inside of the convex hull of the set of all realizable u(A).
7
A A tERGMtERGM is nonis non--degenerate degenerate
Theorem: when the transition distribution factors over the edges, a tERGM is non-degenerate:
Straightforward -- tractable transition model; the partition function is the product of per edge terms
Computation is non-trivial
Given the graphical structure, run variable elimination algorithms, works well only for small graphs
InferenceInference (1)(1)
Gibbs sampling:
Need to evaluate the log-odds
Difficulty: Evaluate the ratio of Partition function Z(A')=ΣAexp(θΦ(A,A'))So far scale to ~20 genes
P(Network|Data) ?
8
TESLA: Temporally Smoothed L1-regularized logistic regression
Constrained convex optimizationNow scale to ~5000 genes, how about 20K+ ?
InferenceInference (2)(2)
9
T=1
molecular function
biological process
cellular component
T=2
10
T=3
T=4
11
T=5
T=6
12
T=7
T=8
13
T=9
T=10
14
T=11
T=12
15
T=13
T=14
16
T=15
T=16
17
T=17
T=18
18
T=19
T=20
19
T=21
T=22
20
T=23
Transient InteractionTransient Interaction
21
NIPS academic social networkNIPS academic social network
1987
1988
1998
1999
Open theoretical issues and on-going and future workApproximating Z in hTERGM
Scalability of network inference algorithm
Statistical guarantees on the estimates– Consistence (pattern, value, …)
– Confidence
– Stability
– Sample complexity
Applications:Reconstructing Temporally Rewiring Genetic Interactions During the Life Cycle of Drosophila melanogasterAuthor-paper networks in scientific literature
Open IssuesOpen Issues
22
Other Other ProblemProblemss: : Dynamic Dynamic Node ClusteringNode Clustering
Other Other ProblemProblemss: Network Alignment : Network Alignment
Corporate merging
+ = ?
Company A Company B
23
Our GoalsOur Goals
Develop new methods for latent theme distillation and data integration for network data.
Mixed Membership of Stochastic Blocks [Airoldi, Blei, Fienberg and Xing, 2005, 2006, 2007]
Develop new formalisms for modeling network evolution over time;and techniques for reverse engineering unobserved temporally rewiring networks from time series of entity attributes.
Temporal Exponential Random Graph Model [Hanneke and Xing, 2006]Hidden Temporal Exponential Random Graph Model [Guo, Hanneke, Fu and Xing, 2007]
Develop new algorithms for predicting the global topology of very large networks based on randomly sampled subnetworks; and investigate confidence guarantees.
Modern view
ACGTTTTACTGTACAATT
Traditional view
ACGTTTTACTGTACAATT
a a univariateunivariate phenotype:phenotype:
i.e., disease/controli.e., disease/control
Multivariate complex syndrome (e.g., asthma):Multivariate complex syndrome (e.g., asthma):age at onset, age at onset, presence/absence of presence/absence of eosinophiliceosinophilic inflammation, inflammation, history of eczemahistory of eczemagenomegenome--wide expression profilewide expression profile……
causal SNPcausal SNPcausal SNP networkscausal SNP networks
GenomeGenome--Phenome AssociationPhenome Associationand and Structured InputStructured Input--Out LearningOut Learning
24
Example: the asthma phenotype networkExample: the asthma phenotype network
Genome and Phenome StructuresGenome and Phenome Structures
Modern view
ACGTTTTACTGTACAATT
Traditional view
ACGTTTTACTGTACAATT
•• PairPair--wise association tests?wise association tests?– Ignore SNP dependencies– Many many FPs
•• Regression?Regression?– Over-stringent on coupled SNPs
•• Structured regularized regressionStructured regularized regression
explicitly capture structuresefficientsparse (parsimonious)provable guarantees
Goad: Goad: Inferring GenomeInferring Genome--Phenome Phenome AssociationAssociation
25
General Structured PredictionGeneral Structured Prediction
Inputs:a set of training samples: , where
Application ExamplesPart-of-speech (POS) Tagging:
Image Segmentation:
Outputs:a predictive function :
“Do you want sugar in it?” ⇒ <verb pron verb noun prep pron>
ACGTTTTACTGTACAATT
Overall effect of the weighted fusion penalty
Step 1: Thresholded correlation graph of phenotypes with weights
ACGTTTTACTGTACAATT
Step 2: Graph-weighted fused lasso
Weighted Fusion
GraphGraph--Weighted Fused LassoWeighted Fused Lasso
26
Phenotype Correlation Structure
Single-marker Single-trait test
LassoGraph-weighted Fused lasso
Asthma MultipleAsthma Multiple--trait Associationtrait Association
?
MarginMargin--Based Discriminative Learning Based Discriminative Learning ParadigmsParadigms
SVM SVM b r a c e
M3N
MED MED
M3N
MED-MN= SMED + Bayesian M3N
27
Maximum Entropy Discrimination Maximum Entropy Discrimination Markov NetworksMarkov Networks
Structured MaxEnt Discrimination (SMED):
Feasible subspace of weight distribution:
Bayesian M3N
Generalization GuaranteeGeneralization Guarantee
MaxEntNet is an averaging model– we also call it a Bayesian Max-Margin Markov Network
Theorem (PAC-Bayes Bound)
If
Then
28
Key ChallengesKey Challenges
Extremely high dimensionality and low data volumed ~ 1MN ~ 1KSample complexity with bounded error?
Sparsity bias of the modelOften <100 features out of the !M are relevant Regularization schemes to enforce sparsity
Structures and hidden variablesInputs and outputs often bear intricate structures (e.g., chain or graphical dependencies)How to capture other latent structures between unobserved variables
Generalizability and scalabilityMove efficient convex opt solver and Bayesian inference algorithms
Provable theoretical guaranteesConsistency and sparsistencyStability, convergence rate, etc.