plans on “latent topic model”
DESCRIPTION
Plans on “Latent Topic Model”. High-Level Architecture. Users. Ads. User Encoding. User Encoding. User Clustering. Prediction. eCTR / FB Prediction. Existing Pipeline. Encoding Auto-encoder for dimension reduction Political affiliation clustering - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/1.jpg)
Plans on “Latent Topic Model”
![Page 2: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/2.jpg)
High-Level Architecture
Users Ads
UserEncoding
eCTR / FB Prediction
UserClustering
UserEncoding
Prediction
![Page 3: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/3.jpg)
Existing Pipeline
• Encoding– Auto-encoder for dimension reduction– Political affiliation clustering– Output: Hive table (user id + low-dim representation)
• eCTR prediction– Optional: user clustering stage
![Page 4: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/4.jpg)
Approaches to use encoding in eCTR prediction
![Page 5: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/5.jpg)
Social Networks
![Page 6: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/6.jpg)
Information on a social network• Social graph
– Friendship networks– User-ads network ...
• Text– News feed– Messages– Ads text …
• Images – Album– Random posts– Ads figures …
• Demographics – Age, occupation …
• Very high-dimensional• Non-independent • Insufficient training data (this is
true even we use the whole web)• Hard to optimize and interpret
eCTR
![Page 7: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/7.jpg)
Essentials of a good user-ads representation
• Distilling all local attribute semantics– Social roles – Topical contents– Ideology/sentiment
• Capture relational information– long range indirect influence– social environments and contexts
• Capture dynamic trends– e.g., change of strength of interest– New/dying interests
• Discriminative: – optimize against well-defined predictive task rather than vague intermediate
goals such as clustering
• Low dimensional and (perhaps) interpretable
![Page 8: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/8.jpg)
Example:
![Page 9: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/9.jpg)
Proposed Models
…
…
![Page 10: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/10.jpg)
Dynamic tomography
• How to model dynamics in a simplex?
Project an individual/stock in network into a "tomographic" space
Trajectory of an individual/stock in the "tomographic" space
![Page 11: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/11.jpg)
Senate Network: role trajectoriesCluster legendJon Corzine’s seat (#28,
Democrat, New Jersey) was taken over by Bob Menendez from t=5
onwards.
Corzine was especially left-wing, so much that his views did not
align with the majority of Democrats (t=1 to 4).
Once Menendez took over, the latent space vector for senator
#28 shifted towards role 4, corresponding to the main Democratic voting clique.
Jon Corzine’s seat (#28, Democrat, New Jersey) was taken over by Bob Menendez from t=5
onwards.
Corzine was especially left-wing, so much that his views did not
align with the majority of Democrats (t=1 to 4).
Once Menendez took over, the latent space vector for senator
#28 shifted towards role 4, corresponding to the main Democratic voting clique.
Ben Nelson (#75) is a right-wing Democrat (Nebraska), whose views are more
consistent with the Republican party.
Observe that as the 109th Congress proceeds into 2006, Nelson’s latent space
vector includes more of role 3, corresponding to the main Republican
voting clique.
This coincides with Nelson’s re-election as the Senator from Nebraska in late 2006,
during which a high proportion of Republicans voted for him.
Ben Nelson (#75) is a right-wing Democrat (Nebraska), whose views are more
consistent with the Republican party.
Observe that as the 109th Congress proceeds into 2006, Nelson’s latent space
vector includes more of role 3, corresponding to the main Republican
voting clique.
This coincides with Nelson’s re-election as the Senator from Nebraska in late 2006,
during which a high proportion of Republicans voted for him.
![Page 12: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/12.jpg)
Visualization
•
![Page 13: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/13.jpg)
Visualization
![Page 14: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/14.jpg)
Algorithm Details
![Page 15: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/15.jpg)
Data
![Page 16: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/16.jpg)
Learning System
Given – a network of user/documents
Perform E-step(Gibbs sampling)in parallel way. Get Sufficient Stats
Perform M-stepIn parallel way
Repeat until convergence
Single Program
α, β, η, μα, β, η, μα, β, η, μα, β, η, μ
α, β, η, μα, β, η, μ
zz zz zz zz
![Page 17: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/17.jpg)
Project Plans and Milestones
• Scalable implementation of baseline user text model (M1)
• Discriminative M1
• M1 + network model M2
• M3 + history + time M3
• Parallel work on downstream utility– eCTR prediction– Visualization – User/ads clustering
![Page 18: Plans on “Latent Topic Model”](https://reader035.vdocument.in/reader035/viewer/2022062423/56814e07550346895dbb75af/html5/thumbnails/18.jpg)
Resources
• CMU: – First intern Keisuke will come in mid Oct , implementing
M1– Second intern Qirong Hu will come in later Dec,
implementing M2 and M3
• FB:– Rajat Raina– Rong Yang– System support