exploiting context in topic models - semantic scholar · framework for topic models ... dmr (mimno...
TRANSCRIPT
Latent Topic Networks:
A Versatile Probabilistic Programming Framework for Topic Models
Jack Baskin School of Engineering University of California, Santa Cruz
James Foulds Shachi Kumar Lise Getoor
Probabilistic latent variable modeling
3
Understand, explore, predict
Data
Complicated, noisy, high-dimensional
Probabilistic latent variable modeling
4
Understand, explore, predict
Data
Complicated, noisy, high-dimensional
Latent variable model
Probabilistic latent variable modeling
5
Understand, explore, predict
Data
Complicated, noisy, high-dimensional
Low-dimensional, semantically meaningful representations
Latent variable model
Topic models
• Topic models are foundational building blocks for powerful latent variable models
– Authorship (Rosen-Zvi et al., 2004)
– Conversational Influence (Nguyen et al., 2014)
– Knowledge base construction
(Movshovitz-Attias and Cohen, 2015)
– Machine translation (Mimno et al., 2009)
– Political analysis (Grimmer, 2010), (Gerrish and Blei, 2011, 2012)
– Recommender systems (Wang and Blei, 2011), (Diao et al., 2014)
– Scientific impact (Dietz et al. 2007), (Foulds and Smyth, 2013)
– Social network analysis (Chang et al., 2009)
– Word-sense disambiguation (Boyd-Graber et al., 2007)
– …
6
Custom topic models
• Custom latent variable topic models useful for
data mining and computational social science
• The challenge is scalability
7
Custom topic models
• Custom latent variable topic models useful for
data mining and computational social science
• The challenge is scalability
8
Custom topic models
• Custom latent variable topic models useful for
data mining and computational social science
• The challenge is scalability
9
Sparse, stochastic, collapsed, distributed algorithms, …
Custom topic models
• Custom latent variable topic models useful for
data mining and computational social science
• The challenge is scalability
10
Sparse, stochastic, collapsed, distributed algorithms, …
Max Welling
There’s no end to speeding up LDA!
Custom topic models
• Custom latent variable topic models useful for
data mining and computational social science
• The bottleneck is human effort and expertise
11
Design time >> run time
Custom topic models
12
Understand, explore, predict
Data
Complicated, noisy, high-dimensional
Low-dimensional, semantically meaningful representations
Latent variable model
Custom topic models
13
Understand, explore, predict
Data
Complicated, noisy, high-dimensional
Low-dimensional, semantically meaningful representations
Latent variable model
Custom topic models
14
Understand, explore, predict
Data
Complicated, noisy, high-dimensional
Low-dimensional, semantically meaningful representations
Latent variable model (Algorithm, model) pair
carefully co-designed for tractability
Custom topic models
15
Understand, explore, predict
Data
Complicated, noisy, high-dimensional
Low-dimensional, semantically meaningful representations
Latent variable model (Algorithm, model) pair
carefully co-designed for tractability
Evaluate, iterate
Custom topic models
16
Understand, explore, predict
Data
Complicated, noisy, high-dimensional
Low-dimensional, semantically meaningful representations
General-purpose modeling framework
Evaluate, iterate
Our contribution
• We introduce latent topic networks
– A versatile, general-purpose framework for specifying custom topic models
– Models and domain knowledge specified using a simple logical probabilistic programming language
– A highly parallelizable EM training algorithm
17
Our contribution
• We introduce latent topic networks
– A versatile, general-purpose framework for specifying custom topic models
– Models and domain knowledge specified using a simple logical probabilistic programming language
– A highly parallelizable EM training algorithm
18
Our contribution
• We introduce latent topic networks
– A versatile, general-purpose framework for specifying custom topic models
– Models and domain knowledge specified using a simple logical probabilistic programming language
– A highly parallelizable EM training algorithm
19
Z
𝛉
𝚽
𝚽
𝚽 W
𝛉
𝛉
𝚽
Networks of dependencies between topics, distributions over topics
LDA likelihood
Latent topic networks
Z
𝛉
𝚽
𝚽
𝚽 W
X Observed covariates
𝛉
𝛉
𝚽
X
Networks of dependencies between topics, distributions over topics
LDA likelihood
Latent topic networks
Z
𝛉
𝚽
𝚽
𝚽 W
X Observed covariates
Y
Y
Labeled data
𝛉
𝛉
𝚽
X
Networks of dependencies between topics, distributions over topics
LDA likelihood
Latent topic networks
Z
𝛉
𝚽
𝚽
𝚽 W
X Observed covariates
Y
Y
Labeled data
𝛉
Z
𝛉
𝚽
X
Z
Latent variables
Networks of dependencies between topics, distributions over topics
LDA likelihood
Latent topic networks
30
≈6 months Grad student New custom topic model
= +
Latent topic networks
1 weekend
Shachi Kumar Master’s student, UCSC
Related work
31
Correlations / Dependencies
Observed Covariates
Additional Latent Variables
Constraints Probabilistic Programming
Systems for Encoding Domain Knowledge, Covariates, and Correlations
CTM (Blei and Lafferty, 2007)
DMR (Mimno & McCallum, 2008)
Dirichlet Forests (Andzejewski et al., 2009
xLDA (Wahabzada et al., 2010)
SAGE (Eisenstein et al., 2011)
STM (Roberts et al., 2013)
Graphical Modeling and Probabilistic Programming Systems
CTRF (Zhu & Xing, 2010)
Fold.all (Andrzejewski et al., 2011)
Logic LDA (Mei et al., 2014)
Latent Topic Networks
Related work
32
Correlations / Dependencies
Observed Covariates
Additional Latent Variables
Constraints Probabilistic Programming
Systems for Encoding Domain Knowledge, Covariates, and Correlations
CTM (Blei and Lafferty, 2007)
DMR (Mimno & McCallum, 2008)
Dirichlet Forests (Andzejewski et al., 2009
xLDA (Wahabzada et al., 2010)
SAGE (Eisenstein et al., 2011)
STM (Roberts et al., 2013)
Graphical Modeling and Probabilistic Programming Systems
CTRF (Zhu & Xing, 2010)
Fold.all (Andrzejewski et al., 2011)
Logic LDA (Mei et al., 2014)
Latent Topic Networks
34
Which are the most important articles?
Example: modeling influence in citation networks
Foulds and Smyth (2013), EMNLP
35
What are the influence relationships between articles?
Example: modeling influence in citation networks
Foulds and Smyth (2013), EMNLP
Topical influence regression
36 Foulds and Smyth (2013), EMNLP
Latent variables for document influence citation edge influence
Topical influence regression
37
Probabilistic dependencies along the citation graph
Foulds and Smyth (2013), EMNLP
Latent variables for document influence citation edge influence
Encoding dependencies via logical rules
38
Restrict dependencies to citation graph
Influence and topic value are both high
Citing document also has the topic
Encoding dependencies via logical rules
39
Restrict dependencies to citation graph
Influence and topic value are both high
Citing document also has the topic
Encoding dependencies via logical rules
40
Restrict dependencies to citation graph
Influence and topic are both high
Citing document also has the topic
Encoding dependencies via logical rules
41
Restrict dependencies to citation graph
Influence and topic are both high
Citing document also has the topic
Encoding dependencies via logical rules
42
Restrict dependencies to citation graph
Influence and topic are both high
Citing document also has the topic
Entire model with just 5 rules!
Statistical relational learning
• An “interface layer for AI.”
– Programming languages for
specifying models and
encoding domain knowledge
– Typically based on first-order logic
43
Probabilistic soft logic (PSL)
• A first-order logic-based SRL language
• Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models
44
5.0:
Predicate Logical operators
Rule weight Continuous random variables!
Probabilistic soft logic (PSL)
• A first-order logic-based SRL language
• Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models
45
5.0:
Predicate Logical operators
Rule weight Continuous random variables!
Probabilistic soft logic (PSL)
• A first-order logic-based SRL language
• Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models
46
5.0:
Predicate Logical operators
Rule weight Continuous random variables!
Probabilistic soft logic (PSL)
• A first-order logic-based SRL language
• Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models
47
5.0:
Predicate Logical operators
Rule weight Continuous random variables!
Probabilistic soft logic (PSL)
• A first-order logic-based SRL language
• Used to specify hinge-loss MRFs, a class of highly scalable continuous graphical models
48
5.0:
Predicate Logical operators
Rule weight Continuous random variables!
Probabilistic soft logic (PSL)
• A first-order logic-based SRL language
• Specifies a class of highly scalable continuous graphical models called hinge-loss MRFs
49
5.0:
Predicate Logical operators
Rule weight Continuous random variables!
Hinge-loss MRFs
51
Conditional random field over continuous random variables between 0 and 1
Feature functions are hinge loss functions
Hinge-loss MRFs
52
Feature functions are hinge loss functions
Conditional random field over continuous random variables between 0 and 1
Hinge-loss MRFs
53
Feature functions are hinge loss functions
Conditional random field over continuous random variables between 0 and 1
Linear function
Hinge-loss MRFs
54
Feature functions are hinge loss functions
Conditional random field over continuous random variables between 0 and 1
Linear function
Hinge-loss MRFs
55
Feature functions are hinge loss functions
Conditional random field over continuous random variables between 0 and 1
Linear function
2
Hinge-loss MRFs
56
Feature functions are hinge loss functions
Conditional random field over continuous random variables between 0 and 1
Hinge losses encode the distance to satisfaction for each instantiated rule
2
Linear function
Log posterior objective function
63
LDA log posterior
Hinge loss terms
Tractability from convexity, instead of conjugacy!
Log posterior objective function
64
LDA log posterior
Hinge loss terms
Tractability from convexity, instead of conjugacy!
Training algorithm
• Expectation Maximization
– E-step: the same as for LDA
– M-step:
66
LDA EM lower bound
Training algorithm
• Expectation Maximization
– E-step: the same as for LDA
– M-step:
67
LDA EM lower bound minus hinge loss terms
Training algorithm
• Expectation Maximization
– E-step: the same as for LDA
– M-step:
68 Convex optimization! Solve in parallel using consensus ADMM
LDA EM lower bound minus hinge loss terms
Weight learning
• Optimize pseudo-likelihood approximation:
• Gradient:
• Importance sample from the implied Dirichlet prior
69
Weight learning
• Optimize pseudo-likelihood approximation:
• Gradient:
• Importance sample from the implied Dirichlet prior
70
Weight learning
• Optimize pseudo-likelihood approximation:
• Gradient:
• Importance sample from the implied Dirichlet prior
71
Case study: Modeling US Presidential state of the Union addresses
• The US President updates Congress on the state of the Union, roughly annually
• Do these addresses depict the true, underlying state of the Union?
• Are they biased by political agendas?
76
Case study: Modeling US Presidential state of the Union addresses
77
State of the Union
Republican party bias
Democrat party bias
Topic model 𝛉
Address
Time (years)
Case study: Modeling US Presidential state of the Union addresses
78
State of the Union
Republican party bias
Democrat party bias
Topic model 𝛉
Address
Time (years)
Case study: Modeling US Presidential state of the Union addresses
79
State of the Union
Republican party bias
Democrat party bias
Topic model 𝛉
Address
Time (years)
Case study: Modeling US Presidential state of the Union addresses
80
State of the Union
Republican party bias
Democrat party bias
Topic model 𝛉
Address
Time (years)
Case study: Modeling US Presidential state of the Union addresses
81
State of the Union
Republican party bias
Democrat party bias
Topic model 𝛉
Address
Time (years)
Case study: Modeling US Presidential state of the Union addresses
82
Democrat topic
Republican topic
Case study: Modeling US Presidential state of the Union addresses
83
Democrat topic
Republican topic
Case study: Modeling US Presidential state of the Union addresses
84
Democrat topic
Republican topic
Case study: Modeling US Presidential state of the Union addresses
88
Document Completion Perplexity
Fully Held-Out Perplexity
Latent topic networks
2.33 x 103 2.43 x 103
LDA topic model 2.36 x 103
2.59 x 103
Dynamic topic model
2.43 x 103
2.55 x 103
Conclusion
• We introduce latent topic networks, a versatile general-purpose framework for building and inferring custom topic models.
• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, are competitive with state of the art special purpose models.
• Future directions – Using our framework to answer substantive questions in social science.
– New language primitives, non-parametric Bayesian models,
algorithmic advances …
89
Conclusion
• We introduce latent topic networks, a versatile general-purpose framework for building and inferring custom topic models.
• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.
• Future directions – Using our framework to answer substantive questions in social science.
– New language primitives, non-parametric Bayesian models,
algorithmic advances …
90
Conclusion
• We introduce latent topic networks, a versatile general-purpose framework for building and inferring custom topic models.
• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.
• Future directions – Using our framework to answer substantive questions in social science.
– New language primitives, non-parametric Bayesian models,
algorithmic advances …
91
Conclusion
• We introduce latent topic networks, a versatile general-purpose framework for building and inferring custom topic models.
• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.
• Future directions – Using our framework to answer substantive questions in social science.
– New language primitives, non-parametric Bayesian models,
algorithmic advances …
93
Code: coming soon at psl.cs.umd.edu !
Conclusion
• We introduce latent topic networks, a versatile general-purpose framework for building and inferring custom topic models.
• Our experimental results show that models specified in our framework with just a few lines of code in a logical language, can be competitive with state of the art special purpose models.
• Future directions – Using our framework to answer substantive questions in social science.
– New language primitives, non-parametric Bayesian models,
algorithmic advances …
94 Thank you for your attention
Code: coming soon at psl.cs.umd.edu !