graphical models for the internet
DESCRIPTION
TRANSCRIPT
![Page 1: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/1.jpg)
Graphical Models for the InternetAlexander Smola & Amr Ahmed
Yahoo! Research & Australian National UniversitySanta Clara, CA
[email protected] blog.smola.org
![Page 2: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/2.jpg)
Outline• Part 1 - Motivation
• Automatic information extraction • Application areas
• Part 2 - Basic Tools• Density estimation / conjugate distributions• Directed Graphical models and inference
• Part 3 - Topic Models (our workhorse)• Statistical model• Large scale inference (parallelization, particle filters)
• Part 4 - Advanced Modeling• Temporal dependence• Mixing clustering and topic models• Social Networks• Language models
![Page 3: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/3.jpg)
Part 1 - Motivation
![Page 4: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/4.jpg)
Data on the Internet• Webpages (content, graph)• Clicks (ad, page, social)• Users (OpenID, FB Connect)• e-mails (Hotmail, Y!Mail, Gmail)• Photos, Movies (Flickr, YouTube, Vimeo ...)• Cookies / tracking info (see Ghostery)• Installed apps (Android market etc.)• Location (Latitude, Loopt, Foursquared)• User generated content (Wikipedia & co)• Ads (display, text, DoubleClick, Yahoo)• Comments (Disqus, Facebook)• Reviews (Yelp, Y!Local)• Third party features (e.g. Experian)• Social connections (LinkedIn, Facebook)• Purchase decisions (Netflix, Amazon)• Instant Messages (YIM, Skype, Gtalk)• Search terms (Google, Bing)• Timestamp (everything)• News articles (BBC, NYTimes, Y!News)• Blog posts (Tumblr, Wordpress)• Microblogs (Twitter, Jaiku, Meme)
Finite resources• Editors are expensive• Editors don’t know users• Barrier to i18n• Abuse (intrusions are novel)• Implicit feedback• Data analysis (find interesting stuff
rather than find x)• Integrating many systems• Modular design for data integration• Integrate with given prediction tasks
Invest in modeling and naming rather than data generation
unlimited amounts of data
![Page 5: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/5.jpg)
Clustering documents
![Page 6: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/6.jpg)
Clustering documents
airline
restaurant
university
![Page 7: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/7.jpg)
Today’s mission
Find hidden structure in the data
Human understandableImproved knowledge for estimation
![Page 8: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/8.jpg)
Some applications
![Page 9: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/9.jpg)
Hierarchical Clustering
NIPS 2010Adams,Ghahramani,Jordan
![Page 10: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/10.jpg)
Topics in text
Latent Dirichlet Allocation; Blei, Ng, Jordan, JMLR 2003
![Page 11: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/11.jpg)
Word segmentation
Mochihashi, Yamada, Ueda, ACL 2009
![Page 12: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/12.jpg)
Language model
automatically synthesizedfrom Penn Treebank
Mochihashi, Yamada, UedaACL 2009
![Page 13: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/13.jpg)
User model over time
0 10 20 30 400
0.1
0.2
0.3
Prop
otio
nDay
Baseball
Finance
Jobs
Dating
0 10 20 30 400
0.1
0.2
0.3
0.4
0.5
Prop
otio
n
Day
Baseball
Dating
Celebrity
Health
SnookiTom CruiseKatie
Holmes PinkettKudrow
Hollywood
League baseball basketball, doubleheadBergesenGriffeybullpen Greinke
skinbody fingers cells toes
wrinkle layers
women mendating singles
personals seeking match
Dating Baseball Celebrity Health
job careerbusinessassistanthiring
part-‐timereceptionist
financial Thomson chart real StockTradingcurrency
Jobs Finance
Ahmed et al., KDD 2011
![Page 14: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/14.jpg)
Face recognition from captions
Jain, Learned-Miller, McCallum, ICCV 2007
![Page 15: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/15.jpg)
Storylines from news
Ahmed et al, AISTATS 2011
![Page 16: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/16.jpg)
Ideology detection
Ahmed et al, 2010; Bitterlemons collection
![Page 17: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/17.jpg)
Hypertext topic extraction
Gruber, Rosen-Zvi, Weiss; UAI 2008
![Page 18: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/18.jpg)
Alternatives
![Page 19: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/19.jpg)
Ontologies
• continuous maintenance
• no guarantee of coverage
• difficult categories
expensive, small
![Page 20: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/20.jpg)
Face Classification
• 100-1000 people
• 10k faces• curated
(not realistic)• expensive to
generate
![Page 21: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/21.jpg)
Topic Detection & Tracking
• editorially curated training data
• expensive to generate
• subjective in selection of threads
• language specific
![Page 22: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/22.jpg)
Advertising Targeting
• Needs training data in every language• Is it really relevant for better ads?• Does it cover relevant areas?
![Page 23: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/23.jpg)
Challenges• Scale
• Millions to billions of instances(documents, clicks, users, messages, ads)
• Rich structure of data (ontology, categories, tags)• Model description typically larger than memory of single workstation
• Modeling• Usually clustering or topic models do not solve the problem• Temporal structure of data
• Side information for variables • Solve problem. Don’t simply apply a model!
• Inference• 10k-100k clusters for hierarchical model• 1M-100M words• Communication is an issue for large state space
![Page 24: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/24.jpg)
Summary - Part 1• Essentially infinite amount of data• Labeling is prohibitively expensive• Not scalable for i18n• Even for supervised problems unlabeled data
abounds. Use it.• User-understandable structure for
representation purposes• Solutions are often customized to problem
We can only cover building blocks in tutorial.
![Page 25: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/25.jpg)
Part 2 - Basic Tools
![Page 26: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/26.jpg)
Statistics 101
![Page 27: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/27.jpg)
Probability• Space of events X
• server status (working, slow, broken)• income of the user (e.g. $95,000)• search queries (e.g. “graphical models”)
• Probability axioms (Kolmogorov)
• Example queries• P(server working) = 0.999• P(90,000 < income < 100,000) = 0.1
Pr(X) ∈ [0, 1], Pr(X ) = 1Pr(∪iXi) =
�i Pr(Xi) if Xi ∩Xj = ∅
![Page 28: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/28.jpg)
(In)dependence• Independence
• Login behavior of two users (approximately)• Disk crash in different colos (approximately)
• Dependent events• Emails• Queries• News stream / Buzz / Tweets• IM communication• Russian Roulette
Pr(x, y) = Pr(x) · Pr(y)
Pr(x, y) �= Pr(x) · Pr(y)
Everywhere!
![Page 29: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/29.jpg)
Independence
0.3 0.2
0.3 0.2
![Page 30: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/30.jpg)
Dependence
0.45 0.05
0.05 0.45
![Page 31: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/31.jpg)
A Graphical Model
Spam Mail
p(spam, mail) = p(spam) p(mail|spam)
![Page 32: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/32.jpg)
Bayes Rule
• Joint Probability
• Bayes Rule
• Hypothesis testing• Reverse conditioning
Pr(X|Y ) =Pr(Y |X) · Pr(X)
Pr(Y )
Pr(X,Y ) = Pr(X|Y ) Pr(Y ) = Pr(Y |X) Pr(X)
![Page 33: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/33.jpg)
AIDS test (Bayes rule)• Data
• Approximately 0.1% are infected• Test detects all infections• Test reports positive for 1% healthy people
• Probability of having AIDS if test is positive
Pr(a = 1|t) =Pr(t|a = 1) · Pr(a = 1)
Pr(t)
=Pr(t|a = 1) · Pr(a = 1)
Pr(t|a = 1) · Pr(a = 1) + Pr(t|a = 0) · Pr(a = 0)
=1 · 0.001
1 · 0.001 + 0.01 · 0.999= 0.091
![Page 34: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/34.jpg)
Improving the diagnosis• Use a follow-up test
• Test 2 reports positive for 90% infections• Test 2 reports positive for 5% healthy people
• Why can’t we use Test 1 twice?Outcomes are not independent but tests 1 and 2 are conditionally independent
0.01 · 0.05 · 0.9991 · 0.9 · 0.001 + 0.01 · 0.05 · 0.999
= 0.357
p(t1, t2|a) = p(t1|a) · p(t2|a)
![Page 35: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/35.jpg)
Application: Naive Bayes
![Page 36: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/36.jpg)
Naive Bayes Spam Filter• Key assumption
Words occur independently of each other given the label of the document
• Spam classification via Bayes Rule
• Parameter estimationCompute spam probability and word distributions for spam and ham
p(w1, . . . , wn|spam) =n�
i=1
p(wi|spam)
p(spam|w1, . . . , wn) ∝ p(spam)n�
i=1
p(wi|spam)
![Page 37: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/37.jpg)
A Graphical Model
spam
w1 w2 . . . wn
p(w1, . . . , wn|spam) =n�
i=1
p(wi|spam)
spam
wi
i=1..n
how to estimate p(w|spam)
![Page 38: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/38.jpg)
spam probability
Naive NaiveBayes Classifier• Two classes (spam/ham)• Binary features (e.g. presence of $$$, viagra)• Simplistic Algorithm
• Count occurrences of feature for spam/ham• Count number of spam/ham mails
p(xi = TRUE|y) = n(i, y)
n(y)and p(y) =
n(y)
n
feature probability
p(y|x) ∝ n(y)
n
�
i:xi=TRUE
n(i, y)
n(y)
�
i:xi=FALSE
n(y)− n(i, y)
n(y)
![Page 39: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/39.jpg)
Naive NaiveBayes Classifier
p(y|x) ∝ n(y)
n
�
i:xi=TRUE
n(i, y)
n(y)
�
i:xi=FALSE
n(y)− n(i, y)
n(y)
what if n(i,y)=0?
what if n(i,y)=n(y)?
![Page 40: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/40.jpg)
Estimating Probabilities
![Page 41: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/41.jpg)
Two outcomes (binomial)• Example: probability of ‘viagra’ in spam/ham• Data likelihood
• Maximum Likelihood Estimation• Constraint• Taking derivatives yields
p(X;π) = πn1(1− π)n0
π ∈ [0, 1]
π =n1
n0 + n1
![Page 42: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/42.jpg)
n outcomes (multinomial)
• Example: USA, Canada, India, UK, NZ• Data likelihood
• Maximum Likelihood Estimation• Constrained optimization problem • Using log-transform yields
p(X;π) =�
i
πnii
�
i
πi = 1
πi =ni�j nj
![Page 43: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/43.jpg)
Tossing a Dice
24
12060
12
![Page 44: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/44.jpg)
Conjugate Priors• Unless we have lots of data estimates are weak• Usually we have an idea of what to expect
we might even have ‘seen’ such data before• Solution: add ‘fake’ observations
• Inference (generalized Laplace smoothing)
p(θ|X) ∝ p(X|θ) · p(θ)
1n
n�
i=1
φ(xi) −→1
n + m
n�
i=1
φ(xi) +m
n + mµ0
p(θ) ∝ p(Xfake|θ) hence p(θ|X) ∝ p(X|θ)p(Xfake|θ) = p(X ∪Xfake|θ)
fake mean
fake count
![Page 45: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/45.jpg)
Conjugate Prior in action• Discrete Distribution
• Tossing a dice
• Rule of thumbneed 10 data points (or prior) per parameter
p(x = i) =ni
n−→ p(x = i) =
ni + mi
n + m
Outcome 1 2 3 4 5 6
Counts 3 6 2 1 4 4
MLE 0.15 0.30 0.10 0.05 0.20 0.20
MAP (m0 = 6) 0.15 0.27 0.12 0.08 0.19 0.19
MAP (m0 = 100) 0.16 0.19 0.16 0.15 0.17 0.17
mi = m · [µ0]i
![Page 46: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/46.jpg)
Honest dice
MLE
MAP
![Page 47: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/47.jpg)
Tainted dice
MLE
MAP
![Page 48: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/48.jpg)
Exponential Families
![Page 49: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/49.jpg)
Exponential Families
• Density function
• Log partition function generates cumulants
• g is convex (second derivative is p.s.d.)
p(x; θ) = exp (�φ(x), θ� − g(θ))
where g(θ) = log�
x�
exp (�φ(x�), θ�)
∂θg(θ) = E [φ(x)]
∂2θg(θ) = Var [φ(x)]
![Page 50: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/50.jpg)
Examples
• Binomial Distribution• Discrete Distribution
(ex is unit vector for x)• Gaussian• Poisson (counting measure 1/x!)• Dirichlet, Beta, Gamma, Wishart, ...
φ(x) = x
φ(x) = ex
φ(x) =�
x,12xx�
�
φ(x) = x
![Page 51: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/51.jpg)
Normal Distribution
![Page 52: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/52.jpg)
Poisson Distribution
p(x;λ) =λxe−λ
x!
![Page 53: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/53.jpg)
Beta Distribution
p(x;α,β) =xα−1(1− x)β−1
B(α,β)
![Page 54: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/54.jpg)
Dirichlet Distribution
... this is a distribution over distributions ...
![Page 55: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/55.jpg)
Maximum Likelihood• Negative log-likelihood
• Taking derivatives
We pick the parameter such that the distribution matches the empirical average.
− log p(X; θ) =n�
i=1
g(θ) − �φ(xi), θ�
−∂θ log p(X; θ) = m
�E[φ(x)]− 1
m
n�
i=1
φ(xi)
�
empiricalaveragemean
![Page 56: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/56.jpg)
Example: Gaussian Estimation• Sufficient statistics: • Mean and variance given by
• Maximum Likelihood Estimate
• Maximum a Posteriori Estimate
x, x2
µ = Ex[x] and σ2 = Ex[x2]−E2
x[x]
µ̂ =1
n
n�
i=1
xi and σ2 =1
n
n�
i=1
x2i − µ̂2
µ̂ =1
n+ n0
n�
i=1
xi and σ2 =1
n+ n0
n�
i=1
x2i +
n0
n+ n01− µ̂2
smoother
smoother
![Page 57: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/57.jpg)
Collapsing• Conjugate priors
Hence we know how to compute normalization• Prediction
p(θ) ∝ p(Xfake|θ)
p(x|X) =
�p(x|θ)p(θ|X)dθ
∝�
p(x|θ)p(X|θ)p(Xfake|θ)dθ
=
�p({x} ∪X ∪Xfake|θ)dθ
look up closedform expansions
(Beta, binomial)(Dirichlet, multinomial)
(Gamma, Poisson)(Wishart, Gauss)
http://en.wikipedia.org/wiki/Exponential_family
![Page 58: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/58.jpg)
Directed Graphical Models
![Page 59: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/59.jpg)
... some Web 2.0 service
• Joint distribution (assume a and m are independent)
• Explaining away
a and m are dependent conditioned on w
MySQL Apache
Website
p(m, a,w) = p(w|m, a)p(m)p(a)
p(m, a|w) =p(w|m, a)p(m)p(a)�
m�,a� p(w|m�, a�)p(m�)p(a�)
![Page 60: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/60.jpg)
... some Web 2.0 service
is working
MySQL is workingApache is working
is broken
At least one of the two services is broken
(not independent)
MySQL Apache
Website
![Page 61: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/61.jpg)
Directed graphical model
• Easier estimation• 15 parameters for full joint distribution• 1+1+3+1 for factorizing distribution
• Causal relations• Inference for unobserved variables
m a
w
m a
w
m a
w
uuser
action
![Page 62: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/62.jpg)
No loops allowed
p(c|e)p(e) or p(e|c)p(c)
p(c|e)p(e|c)
![Page 63: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/63.jpg)
Directed Graphical Model
• Joint probability distribution
• Parameter estimation• If x is fully observed the likelihood breaks up
• If x is partially observed things get interestingmaximization, EM, variational, sampling ...
p(x) =�
i
p(xi|xparents(i))
log p(x|θ) =�
i
log p(xi|xparents(i), θ)
![Page 64: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/64.jpg)
ClusteringDensity Estimation
Clustering
x
x
θ
y
p(x, θ) = p(θ)n�
i=1
p(xi|θ)
p(x, y, θ) = p(π)K�
k=1
p(θk)n�
i=1
p(yi|π)p(xi|θ, yi) θ
![Page 65: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/65.jpg)
ChainsMarkov Chain
past past present future future
Plate
Hidden Markov Chain
observeduser action
user’smindset
user model for traversal through search results
![Page 66: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/66.jpg)
ChainsMarkov Chain
Hidden Markov Chain
observeduser action
user’smindset
user model for traversal through search results
p(x, y; θ) = p(x0; θ)n−1�
i=1
p(xi+1|xi; θ)n�
i=1
p(yi|xi)
p(x; θ) = p(x0; θ)n−1�
i=1
p(xi+1|xi; θ)
Plate
![Page 67: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/67.jpg)
Factor Graphs
• Observed effectsClick behavior, queries, watched news, emails
• Latent factorsUser profile, news content, hot keywords, social connectivity graph, events
Latent Factors
ObservedEffects
![Page 68: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/68.jpg)
Recommender Systems
• Users u• Movies m• Ratings r (but only for a subset of users)
r
u m
... intersecting plates ...(like nested for loops)
news, SearchMonkey
answerssocial
rankingOMG
personals
![Page 69: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/69.jpg)
Challenges
• How to design models• Common (engineering) sense• Computational tractability
• Inference• Easy for fully observed situations• Many algorithms if not fully observed• Dynamic programming / message passing
domain expert
statistics
![Page 70: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/70.jpg)
Summary - Part 2
• Probability theory to estimate events• Conjugate priors and Laplace smoothing• Conjugate = phantasy data• Collapsing• Laplace smoothing• Directed graphical models
![Page 71: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/71.jpg)
Part 3 - Clustering & Topic Models
![Page 72: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/72.jpg)
Inference Algorithms
![Page 73: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/73.jpg)
ClusteringDensity Estimation
Clustering
x
x
θ
y
p(x, θ) = p(θ)n�
i=1
p(xi|θ)
p(x, y, θ) = p(π)K�
k=1
p(θk)n�
i=1
p(yi|π)p(xi|θ, yi) θ
find θfind θ
log-concave
general nonlinear
![Page 74: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/74.jpg)
Clustering• Optimization problem
• Options• Direct nonconvex optimization (e.g. BFGS)• Sampling (draw from the joint distribution)• Variational approximation
(concave lower bounds aka EM algorithm)
maximizeθ
�
y
p(x, y, θ)
maximizeθ
log p(π) +K�
k=1
log p(θk) +n�
i=1
log�
yi∈Y[p(yi|π)p(xi|θ, yi)]
![Page 75: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/75.jpg)
Clustering• Integrate out θ
• Y is coupled• Sampling• Collapsed p
x
y
θ• Integrate out y
• Nonconvex optimization problem
• EM algorithm
x
θ
x
Y
p(y|x) ∝ p({x} | {xi : yi = y} ∪Xfake)p(y|Y ∪ Yfake)
![Page 76: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/76.jpg)
Gibbs sampling• Sampling:
Draw an instance x from distribution p(x)• Gibbs sampling:
• In most cases direct sampling not possible• Draw one set of variables at a time
0.45 0.050.05 0.45
(b,g) - draw p(.,g)(g,g) - draw p(g,.)(g,g) - draw p(.,g)(b,g) - draw p(b,.)(b,b) ...
![Page 77: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/77.jpg)
Gibbs sampling for clustering
![Page 78: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/78.jpg)
Gibbs sampling for clustering
randominitialization
![Page 79: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/79.jpg)
Gibbs sampling for clustering
sample cluster labels
![Page 80: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/80.jpg)
Gibbs sampling for clustering
resamplecluster model
![Page 81: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/81.jpg)
Gibbs sampling for clustering
resamplecluster labels
![Page 82: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/82.jpg)
Gibbs sampling for clustering
resamplecluster model
![Page 83: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/83.jpg)
Gibbs sampling for clustering
resamplecluster labels
![Page 84: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/84.jpg)
Gibbs sampling for clustering
resamplecluster model e.g. Mahout Dirichlet Process Clustering
![Page 85: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/85.jpg)
Inference Algorithm ≠ ModelCorollary: EM ≠ Clustering
![Page 86: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/86.jpg)
Topic models
![Page 87: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/87.jpg)
Grouping objects
Singapore
![Page 88: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/88.jpg)
Grouping objects
airline
restaurant
university
![Page 89: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/89.jpg)
Grouping objects
Australia
Singapore
USA
![Page 90: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/90.jpg)
Topic Models
USA airline
Singaporeairline
Singapore food
USA food
Singapore university
Australia university
![Page 91: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/91.jpg)
Clustering & Topic ModelsClustering Topics
?
group objectsby prototypes
decompose objectsinto prototypes
![Page 92: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/92.jpg)
Clustering & Topic Models
x
y
θ
prior
cluster probability
cluster label
instance x
y
θ
prior
topic probability
topic label
instance
clustering Latent Dirichlet Allocation
α α
![Page 93: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/93.jpg)
Clustering & Topic Models
DocumentsmembershipCluster/
topicdistributions
x =
clustering: (0, 1) matrixtopic model: stochastic matrixLSI: arbitrary matrices
![Page 94: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/94.jpg)
Topics in text
Latent Dirichlet Allocation; Blei, Ng, Jordan, JMLR 2003
![Page 95: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/95.jpg)
Collapsed Gibbs Sampler
![Page 96: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/96.jpg)
sample zindependently
sample θindependently
Joint Probability Distribution
xij
zij
θi
language prior
topic probability
topic label
instance
α
ψkβ
p(θ, z,ψ, x|α,β)
=K�
k=1
p(ψk|β)m�
i=1
p(θi|α)
m,mi�
i,j
p(zij |θi)p(xij |zij ,ψ)
sample Ψindependently slow
![Page 97: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/97.jpg)
sample zsequentially
Collapsed Sampler
xij
zij
θi
language prior
topic probability
topic label
instance
α
ψkβ
p(z, x|α,β)
=m�
i=1
p(zi|α)k�
k=1
p({xij |zij = k} |β)fast
![Page 98: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/98.jpg)
Collapsed Sampler
xij
zij
θi
language prior
topic probability
topic label
instance
α
ψkβ
p(z, x|α,β)
=m�
i=1
p(zi|α)k�
k=1
p({xij |zij = k} |β)fast
n−ij(t, w) + βt
n−i(t) +�
t βt
n−ij(t, d) + αt
n−i(d) +�
t αt
Griffiths & Steyvers, 2005
![Page 99: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/99.jpg)
Sequential Algorithm• Collapsed Gibbs Sampler
• For 1000 iterations do• For each document do
• For each word in the document do• Resample topic for the word• Update local (document, topic) table• Update global (word, topic) table
this kills parallelism
![Page 100: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/100.jpg)
• For 1000 iterations do• For each document do
• For each word in the document do• Resample topic for the word• Update local (document, topic) table• Update CPU local (word, topic) table
• Update global (word, topic) table
State of the artUMass Mallet, UC Irvine, Google
p(t|wij) ∝ βwαt
n(t) + β̄+ βw
n(t, d = i)n(t) + β̄
+n(t, w = wij) [n(t, d = i) + αt]
n(t) + β̄
slow
changes rapidly
moderately fast
table out of sync
blocking
network bound
memoryinefficient
![Page 101: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/101.jpg)
Our Approach
table out of sync blocking
network bound
memoryinefficient
continuoussync
barrier free
concurrentcpu hdd net
minimal view
• For 1000 iterations do (independently per computer)• For each thread/core do
• For each document do• For each word in the document do
• Resample topic for the word• Update local (document, topic) table• Generate computer local (word, topic) message
• In parallel update local (word, topic) table• In parallel update global (word, topic) table
![Page 102: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/102.jpg)
Architecture details
![Page 103: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/103.jpg)
Multicore Architecture
• Decouple multithreaded sampling and updating (almost) avoids stalling for locks in the sampler
• Joint state table• much less memory required• samplers syncronized (10 docs vs. millions delay)
• Hyperparameter update via stochastic gradient descent• No need to keep documents in memory (streaming)
tokens
topics
file
combiner
count
updater
diagnostics
&
optimization
output to
filetopics
samplersampler
samplersampler
sampler
Intel Threading Building Blocks
joint state table
![Page 104: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/104.jpg)
Cluster Architecture
• Distributed (key,value) storage via memcached• Background asynchronous synchronization
• single word at a time to avoid deadlocks• no need to have joint dictionary• uses disk, network, cpu simultaneously
sampler sampler sampler sampler
iceiceiceice
![Page 105: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/105.jpg)
Cluster Architecture
• Distributed (key,value) storage via ICE• Background asynchronous synchronization
• single word at a time to avoid deadlocks• no need to have joint dictionary• uses disk, network, cpu simultaneously
sampler sampler sampler sampler
iceiceiceice
![Page 106: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/106.jpg)
Making it work• Startup
• Randomly initialize topics on each node (read from disk if already assigned - hotstart)
• Sequential Monte Carlo for startup much faster• Aggregate changes on the fly
• Failover• State constantly being written to disk
(worst case we lose 1 iteration out of 1000)• Restart via standard startup routine
• Achilles heel: need to restart from checkpoint if even a single machine dies.
![Page 107: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/107.jpg)
Easily extensible• Better language model (topical n-grams)
can process millions of users (vs 1000s)• Conditioning on side information (upstream)
estimate topic based on authorship, source, joint user model ...
• Conditioning on dictionaries (downstream)integrate topics between different languages
• Time dependent sampler for user modelapproximate inference per episode
![Page 108: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/108.jpg)
Google LDA Mallet Irvine’08 Irvine’09 Yahoo LDA
Multicore no yes yes yes yes
Cluster MPI no MPI point 2 point memcached
State table dictionarysplit
separatesparse separate separate joint
sparse
Schedule synchronousexact
synchronousexact
synchronousexact
asynchronousapproximate
messages
asynchronousexact
![Page 109: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/109.jpg)
Speed• 1M documents per day on 1 computer
(1000 topics per doc, 1000 words per doc)• 350k documents per day per node
(context switches & memcached & stray reducers)• 8 Million docs (Pubmed)
(sampler does not burn in well - too short doc)• Irvine: 128 machines, 10 hours• Yahoo: 1 machine, 11 days • Yahoo: 20 machines, 9 hours
• 20 Million docs (Yahoo! News Articles) • Yahoo: 100 machines, 12 hours
![Page 110: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/110.jpg)
Scalability
0
10
20
30
40
1 10 20 50 100
200k documents/computer
Runtime (hours) Initial topics per word x10
Likelihood even improves with parallelism!-3.295 (1 node) -3.288 (10 nodes) -3.287 (20 nodes)
CPUs
![Page 111: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/111.jpg)
The Competition
0
12500
25000
37500
50000
Google Irvine Yahoo
05
101520
Google Irvine Yahoo
032.5
6597.5130
Google Irvine Yahoo
Dataset size (millions)
Cluster size
Throughput/h
1506.4k
50k
![Page 112: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/112.jpg)
Design Principles
![Page 113: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/113.jpg)
Variable Replication• Global shared variable
• Make local copy• Distributed (key,value) storage table for global copy• Do all bookkeeping locally (store old versions)• Sync local copies asynchronously using message passing
(no global locks are needed)• This is an approximation!
x y z x y y’ z
local copysynchronize
computer
![Page 114: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/114.jpg)
Asymmetric Message Passing• Large global shared state space
(essentially as large as the memory in computer)• Distribute global copy over several machines
(distributed key,value storage)
old copy current copy
global state
![Page 115: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/115.jpg)
Out of core storage• Very large state space
• Gibbs sampling requires us to traverse the data sequentially many times (think 1000x)
• Stream local data from disk and update coupling variable each time local data is accessed
• This is exact
x y z
tokens
topics
file
combiner
count
updater
diagnostics
&
optimization
output to
filetopics
samplersampler
samplersampler
sampler
![Page 116: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/116.jpg)
Summary - Part 3
• Inference in graphical models• Clustering• Topic models• Sampling• Implementation details
![Page 117: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/117.jpg)
Part 4 - Advanced Modeling
![Page 118: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/118.jpg)
Chinese Restaurant Process
φ2φ1 φ3
![Page 119: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/119.jpg)
Problem• How many clusters should we pick?• How about a prior for infinitely many clusters?• Finite model
• Infinite modelAssume that the total smoother weight is constant
p(y|Y,α) = n(y) + αy
n+�
y� αy�
p(y|Y,α) = n(y)
n+�
y� αy�and p(new|Y,α) = α
n+ α
![Page 120: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/120.jpg)
Chinese Restaurant Metaphor
-‐For data point xi
-‐ Choose table j ∝ mj and Sample xi ~ f(φj)-‐ Choose a new table K+1 ∝ α
-‐ Sample φK+1 ~ G0 and Sample xi ~ f(φK+1)
GeneraBve Process
φ2φ1 φ3
the rich get richer
Pitman; Antoniak; Ishwaran; Jordan et al.; Teh et al.;
![Page 121: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/121.jpg)
Evolutionary Clustering
• Time series of objects, e.g. news stories• Stories appear / disappear• Want to keep track of clusters automatically
![Page 122: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/122.jpg)
Recurrent Chinese Restaurant Process
φ2,1φ1,1 φ3,1
T=1
T=2m'1,1=2 m'2,1=3 m'3,1=1
![Page 123: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/123.jpg)
φ2,1φ1,1 φ3,1
Recurrent Chinese Restaurant Process
φ2,1φ1,1 φ3,1
T=1
T=2m'1,1=2 m'2,1=3 m'3,1=1
![Page 124: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/124.jpg)
φ2,1φ1,1 φ3,1
Recurrent Chinese Restaurant Process
φ2,1φ1,1 φ3,1
T=1
T=2m'1,1=2 m'2,1=3 m'3,1=1
![Page 125: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/125.jpg)
φ2,1φ1,1 φ3,1
Recurrent Chinese Restaurant Process
φ2,1φ1,1 φ3,1
T=1
T=2m'1,1=2 m'2,1=3 m'3,1=1
Sample φ1,2 ~ P(.| φ1,1)
![Page 126: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/126.jpg)
φ2,1φ1,1 φ3,1
Recurrent Chinese Restaurant Process
φ2,1φ1,1 φ3,1
T=1
T=2m'1,1=2 m'2,1=3 m'3,1=1
![Page 127: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/127.jpg)
Recurrent Chinese Restaurant Process
φ2,1φ1,1 φ3,1
T=1
T=2
φ4,2
m'1,1=2 m'2,1=3 m'3,1=1
φ2,2φ1,2 φ3,1
dead cluster new cluster
![Page 128: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/128.jpg)
Longer History
φ2,1φ1,1 φ3,1
T=1
T=2
φ2,2φ1,2 φ3,1
m'1,1=2 m'2,1=3 m'3,1=1
φ4,2
T=3φ2,2φ1,2 φ4,2
m'2,3
![Page 129: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/129.jpg)
TDPM Generative PowerW= ∞λ = ∞
DPM
W=4λ = .4
TDPM
W= 0λ = ? (any)
Independent DPMs
Powerlaw
37
![Page 130: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/130.jpg)
User modeling
0 10 20 30 400
0.1
0.2
0.3
Prop
otio
n
Day
Baseball
Finance
Jobs
Dating
![Page 131: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/131.jpg)
Buying a camera
show ads now too latetime
![Page 132: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/132.jpg)
Problem formulaBon
CarDealsvan
jobHiringdiet
HiringSalaryDietcalories
AutoPriceUsedinspecBon
FlightLondonHotelweather
DietCaloriesRecipechocolate
MoviesTheatreArtgallery
SchoolSuppliesLoancollege
User modeling
![Page 133: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/133.jpg)
Problem formulaBon
CarDealsvan
jobHiringdiet
HiringSalaryDietcalories
AutoPriceUsedinspecBon
FlightLondonHotelweather
DietCaloriesRecipechocolate
MoviesTheatreArtgallery
SchoolSuppliesLoancollege
User modelingCARS Art
DietJobs
Travel
Collegefinance
![Page 134: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/134.jpg)
Problem formulaBon
FlightLondonHotelweather
SchoolSuppliesLoancollege
User modeling
Travel
Collegefinance
Input
• Queries issued by the user or Tags of watched content
• Snippet of page examined by user
• Time stamp of each acBon (day resoluBon)
Output
• Users’ daily distribuBon over intents• Dynamic intent representaBon
![Page 135: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/135.jpg)
Time dependent models
• LDA for topical model of users where• User interest distribution changes over time• Topics change over time
• This is like a Kalman filter except that• Don’t know what to track (a priori)• Can’t afford a Rauch-Tung-Striebel smoother• Much more messy than plain LDA
![Page 136: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/136.jpg)
Graphical Model
wij
zij
θi
α
φk
αt
β
θti
zij
wij
φtk
βt
αt−1 αt+1
θt−1i
θt+1i
φt−1k
βt−1
φt+1k
βt+1
plainLDA
time dependentuser interest
user actions
actions per topic
![Page 137: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/137.jpg)
All
week
job CareerBusinessAssistantHiring
Part-‐BmeRecepBonist
CarBlueBookKelleyPricesSmallSpeedlarge
BankOnlineCreditCarddebt
por_olioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilkBuaerPowder
month
Time t t+1
FoodChickenpizza
recipejobhiring
Part-‐BmeOpeningsalary
foodchickenPizzamillage
Kellyrecipecuisine
Diet Cars Job Finance
Prior for user acBons at Bme t
μ
μ2
μ3
Long-‐termshort-‐term
![Page 138: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/138.jpg)
Food ChickenPizza mileage
Car speed offerCamry accord career
At 0me t At 0me t+1
• For each user interacBon• Choose an intent from local distribuBon• Sample word from the topic’s word-‐distribuBon
•Choose a new intent ∝ α • Sample a new intent from the global distribuBon• Sample word from the new topic word-‐distribuBon
GeneraBve Process
short-‐termpriors
job CareerBusinessAssistantHiring
Part-‐BmeRecepBoni
st
CarAlBmaAccordBlueBookKelleyPricesSmallSpeed
BankOnlineCreditCarddebt
por_olioFinanceChase
RecipeChocolate
PizzaFood
ChickenMilkBuaerPowder
![Page 139: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/139.jpg)
At 0me t At 0me t+1 At 0me t+2 At 0me t+3
User 1process
User 2process
User 3process
Globalprocess
mm'
nn'
![Page 140: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/140.jpg)
Sample users
0 10 20 30 400
0.1
0.2
0.3
Prop
otio
nDay
Baseball
Finance
Jobs
Dating
0 10 20 30 400
0.1
0.2
0.3
0.4
0.5
Prop
otio
n
Day
Baseball
Dating
Celebrity
Health
SnookiTom CruiseKatie
Holmes PinkettKudrow
Hollywood
League baseball basketball, doubleheadBergesenGriffeybullpen Greinke
skinbody fingers cells toes
wrinkle layers
women mendating singles
personals seeking match
Dating Baseball Celebrity Health
job careerbusinessassistanthiring
part-‐timereceptionist
financial Thomson chart real StockTradingcurrency
Jobs Finance
![Page 141: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/141.jpg)
DatasetsData
![Page 142: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/142.jpg)
ROC score improvement
![Page 143: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/143.jpg)
ROC score improvement
50
52
54
56
58
60
62
Dataset−2
>100
0
[1000
,600]
[600,4
00]
[400,2
00]
[200,1
00]
[100,6
0]
[60,40
]
[40,20
]<2
0
baselineTLDATLDA+Baseline
![Page 144: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/144.jpg)
LDA for user profilingSample ZFor users
Sample ZFor users
Sample ZFor users
Sample ZFor users
Barrier
Write counts to
memcached
Write counts to
memcached
Write counts to
memcached
Write counts to
memcached
Collect counts and sample
Do nothing Do nothing Do nothing
Barrier
Read from memcached
Read from memcached
Read from memcached
Read from memcached
![Page 145: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/145.jpg)
News
![Page 146: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/146.jpg)
News Stream
![Page 147: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/147.jpg)
News Stream• Over 1 high quality news article per second • Multiple sources (Reuters, AP, CNN, ...)• Same story from multiple sources• Stories are related
• Goals• Aggregate articles into a storyline• Analyze the storyline (topics, entities)
![Page 148: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/148.jpg)
Clustering / RCRP• Assume active story
distribution at time t• Draw story indicator• Draw words from story
distribution • Down-weight story counts for
next day
Ahmed & Xing, 2008
![Page 149: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/149.jpg)
Clustering / RCRP• Pro
• Nonparametric model of story generation(no need to model frequency of stories)
• No fixed number of stories• Efficient inference via collapsed sampler
• Con• We learn nothing!• No content analysis
![Page 150: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/150.jpg)
Latent Dirichlet Allocation
• Generate topic distribution per article
• Draw topics per word from topic distribution
• Draw words from topic specific word distribution
Blei, Ng, Jordan, 2003
![Page 151: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/151.jpg)
Latent Dirichlet Allocation
• Pro• Topical analysis of stories• Topical analysis of words (meaning, saliency)• More documents improve estimates
• Con• No clustering
![Page 152: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/152.jpg)
• Named entities are special, topics less(e.g. Tiger Woods and his mistresses)
• Some stories are strange(topical mixture is not enough - dirty models)
• Articles deviate from general story(Hierarchical DP)
More Issues
![Page 153: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/153.jpg)
StorylinesAmr Ahmed, Quirong Ho, Jake Eisenstein,
Alex Smola, Choon Hui Teo, 2011
![Page 154: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/154.jpg)
Storylines Model• Topic model• Topics per cluster• RCRP for cluster• Hierarchical DP for
article• Separate model
for named entities• Story specific
correction
![Page 155: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/155.jpg)
Tightly-focused High-level concepts
46
Storylines Model
![Page 156: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/156.jpg)
The Graphical Model
Tightly-‐focused High-‐level concepts
Storylines Model
![Page 157: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/157.jpg)
The Graphical ModelStorylines Model
Each story has:•DistribuBon over words•DistribuBon over topics•DistribuBon over named enBtes
![Page 158: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/158.jpg)
• Document’s topic mix is sampled from its story prior• Words inside a document either global or story specific
The Graphical Model
49
Storylines Model
![Page 159: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/159.jpg)
The GeneraBve Process
50
Generative process
![Page 160: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/160.jpg)
The GeneraBve Process
51
Generative process
![Page 161: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/161.jpg)
The GeneraBve Process
52
Generative process
![Page 162: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/162.jpg)
The GeneraBve Process
53
Generative process
![Page 163: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/163.jpg)
Estimation• Sequential Monte Carlo (Particle Filter)
• For new time period draw stories s, topics z
using Gibbs Sampling for each particle• Reweight particle via
• Regenerate particles if l2 norm too heavy
p(xt+1|x1...t, s1...t, z1...t)
p(st+1, zt+1|x1...t+1, s1...t, z1...t)
![Page 164: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/164.jpg)
Numbers ...• TDT5 (Topic Detection and Tracking)
macro-averaged minimum detection cost: 0.714
This is the best performance on TDT5!• Yahoo News data
... beats all other clustering algorithms
time entities topics story words
0.84 0.90 0.86 0.75
![Page 165: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/165.jpg)
Stories������
���������������
���������� �����
�����
����������������
� ��������������������������������� �
��� ��
����������� ������� �
�����������
�������������� ����
������������������ �������������������������������
����������������� ���������������������� ����������
������� �
���� �������������������������� ������
���������������� ������ ������!�"�# #���#!�#��$$
��������
������������ ����������������������
%���&�����'&�#�( ���)����*�� +�����,#��� ������*��
��
���
����
���
��
![Page 166: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/166.jpg)
���������������� �
�������
������������������� ������ �� ���������������
����������������� ���������������������� ����������
����������� �����
���������������������������� ����� �������
�� ���������������������������� ������� ����
� ���� �����������
�������
���� �������������������������
�� ���� ��������� ��!"�#�������$���$
������������� ��������� �����
������������� ��������������� ���
�������������
Related Stories
![Page 167: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/167.jpg)
Detecting Ideologies
Ahmed and Xing, 2010
![Page 168: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/168.jpg)
Problem Statement
Build a model to describe both collecBons of data
VisualizaBon
• How does each ideology view mainstream events?• On which topics do they differ?• On which topics do they agree?
Ideologies
![Page 169: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/169.jpg)
Problem Statement
Build a model to describe both collecBons of data
VisualizaBon
Ideologies
ClassificaBon
•Given a new news arBcle or a blog post, the system should infer• From which side it was wriaen• JusBfy its answer on a topical level (view on aborBon, taxes, health care)
![Page 170: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/170.jpg)
Problem Statement
Build a model to describe both collecBons of data
VisualizaBon
Ideologies
ClassificaBon
Structured browsing
•Given a new news arBcle or a blog post, the user can ask for :•Examples of other arBcles from the same ideology about the same topic•Documents that could exemplify alterna0ve views from other ideologies
![Page 171: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/171.jpg)
Ω1 Ω2
β1
β1
βk-‐1
βk
φ1,1
φ1,2
φ1,k
φ2,1
φ2,2
φ2,k
Ideology 1Views
Ideology 2Views
Topics
Building a factored model
![Page 172: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/172.jpg)
Ω1 Ω2
β1
β2
βk-‐1
βk
φ1,1
φ1,2
φ1,k
φ2,1
φ2,2
φ2,k
Ideology 1Views
Ideology 2Views
Topics
λ
1−λ
λ
1−λ
Building a factored model
![Page 173: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/173.jpg)
Datasets
• BiAerlemons:
• Middle-‐east conflict, document wriaen by Israeli and PalesBnian authors.
• ~300 documents form each view with average length 740
• MulB author collecBon
• 80-‐20 split for test and train
• Poli0cal Blog-‐1:
• American poliBcal blogs (Democrat and Republican)
• 2040 posts with average post length = 100 words• Follow test and train split as in (Yano et al., 2009)
• Poli0cal Blog-‐2 (test generalizaBon to a new wriBng style)
• Same as 1 but 6 blogs, 3 from each side
• ~14k posts with ~200 words per post• 4 blogs for training and 2 blogs for test
Data
![Page 174: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/174.jpg)
Example: Biaerlemons corpus
palestinian israelipeace
year political process
state end right
government need conflict
waysecurity
palestinian israeliPeacepolitical
occupation process
end security conflict
way government
people time year
force negotiation
bush US president american sharon administration prime pressure policy washington
powell minister colin visit internal policy statement
express pro previous package work transfer
european
arafat state leader roadmap election month iraq yasir
senior involvement clinton terrorism
US role
PalesBnianView
IsraeliView
roadmap phase security ceasefire state plan
international step authority
end settlement implementation obligation
stop expansion commitment fulfill unit illegal present
previous assassination meet forward
process force terrorism unit provide confidence element
interim discussion union succee point build positive
recognize present timetable
Roadmap process
syria syrian negotiate lebanon deal conference concession
asad agreement regional october initiative relationship
track negotiation official leadership position
withdrawal time victory present second stand
circumstance represent sense talk strategy issue
participant parti negotiator
peace strategic plo hizballah islamic neighbor territorial radical iran relation think obviou countri mandate
greater conventional intifada affect jihad time
Arab Involvement
Bitterlemons dataset
![Page 175: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/175.jpg)
ClassificaBonClassification accuracy
![Page 176: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/176.jpg)
GeneralizaBon to New BlogsGeneralization to new blogs
![Page 177: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/177.jpg)
Geqng AlternaBve View
-‐ Given a document wriaen in one ideology, retrieve the equivalent-‐ Baseline: SVM + cosine similarity
144
Finding alternate views
![Page 178: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/178.jpg)
Can We use Unlabeled data?
• In theory this is simple•Add a step that samples the document view (v)•Doesn’t mix in pracBce because Bght coupling between v and (x1,x2,z)
•SoluBon•Sample v and (x1,x2,z) as a block using a Metropolis-‐HasBng step• This is a huge proposal!
Unlabeled data
![Page 179: graphical models for the Internet](https://reader031.vdocument.in/reader031/viewer/2022020217/54c25e524a7959ff3d8b45ce/html5/thumbnails/179.jpg)
Summary - Part 4
• Chinese Restaurant Process• Recurrent CRP• User modeling• Storylines• Ideology detection