semantic history embedding in online generative topic models pu wang (presenter) authors: loulwah...

Semantic History Embedding in

OnlineGenerative Topic

ModelsPu Wang (presenter)

Authors:Loulwah AlSumait (lalsumai@gmu.edu)Daniel Barbará (dbarbara@gmu.edu)Carlotta Domeniconi (carlotta@cs.gmu.edu)

Department of Computer ScienceGeorge Mason University

SDM 2009

Outline Introduction and related work Online LDA (OLDA) Parameter Generation

Sliding history window Contribution weights

Experiments Conclusion and future work

Introduction When a topic is observed at a certain

time, it is more likely to appear in the future

previously discovered topics hold important information about the underlying structure of data

Incorporating such information in future knowledge discovery can enhance the inferred topics

Related Work Q. Sun, R. Li et al. ACL 2008.

LDA-based Fisher kernel to measure the text semantic similarity between blocks of LDA documents

X. Wang et al. ICDM 2007 Topical N-Gram model that automatically identified

feasible N-grams based on the context that surround it

X. Phan et al. IW3C2 2008. a classifier on both a small set of labeled documents

in addition to an LDA topic model estimated from Wikipedia.

Tracking Topics

t Time

(time between t & t+1 = ε)

Topic Evolution Tracking

PriorsConstruction

Emerging Topic

Detection

S t+ 1

Emerging Topic List

Online LDA (OLDA)

Inference Process

KVjwttt

βαzw

Current stream

Historicobservations

Parameter Generation

Simple inference problem Gibbs Sampling Current

streamHistoric

observations

Topic Evolution Tracking Topic alignment over time Handles changes in lexicon, topic drift

Topic 1 (0.65)

Bank (0.44), money (0.35), loan (0.21)

Topic 2 (0.35)

Factory (0.53), production (0.34), labor (0.13)

Topic 1 (0.43) Bank (0.5), credit (0.32), money (0.18)

Topic 2 (0.57) Factory (0.48), cost (0.32), manufacturing (0.2)

t Time t+1

P(topic) P(word|topic)

Aligned topicsover time

Sliding History Window Consider all topic-word distributions within

a “sliding history window” (δ) Alternatives for keeping track of history at

time t full memory, δ= t short memory, δ=1 Intermediate memory, δ= c

Matrix Evolution MatrixDictionary

Topic distribution over time

Contribution Control Evolution Tuning Parameters ω

Individual weights of models Decaying history: ω1 < ω2<…< ωδ

Equal contributions: ω1 = ω2=…= ωδ

Total weight of history (vs. weight of new observations)

Balanced weights (sum=1) Biased toward the past (sum>1) Biased toward the future (sum<1)

Parameter Generation Priors of Topic distribution over words at

time t+1

Generate topic distribution

Experimental Design “Matlab Topic Modeling Toolbox”, by Mark Steyvers

and Tom Griffiths Datasets:

NIPS Proceedings from 1988-2000 1,740 papers, 13,649 unique words, 2,301,375 word tokens 13 streams, size from 90 to 250 doc’s per stream

Reuters-21578 News from 26-FEB-1987 to 19-OCT-1987 10,337 documents; 12,112 unique words; 793,936 word tokens 30 streams (29/340 doc’s, 1/517 doc’s)

Baselines: OLDAfixed: no memory OLDA (ω(1) ): short memory

Performance Evaluation measure: Perplexity Test set: documents of next year or stream

ReutersOLDA with fixed β vs. OLDA with semantic β

No memory

ReutersOLDA with different window size and weights• Increasing window size enhanced prediction

• Incremental history information (δ>1,sum>1) did not improve topic estimation at all Increase window size

short memory

Equal contribution

Incremental History Information

NIPSOLDA with Different Window

No memory

Short memory

• Increasing window size enhanced prediction w.r.t. short memory

• Window size greater than 3 enhanced prediction

• Effect of total weight

NIPSOLDA with Different Total Weight

No memory

Sum of weight = 1

Decrease sum of weights

Models with lower total weight resulted in better prediction

NIPS & ReutersOLDA with Different Total Weight

• Variable sum(ω)

• δ = 2Decrease to

sum of weights

Increase total

sum of weights

NIPSOLDA with Equal vs Decaying History Contribution

Conclusions the effect of embedding semantic information in

LDA topic modeling of text streams Parameter generation based on topical

structures inferred in the past Semantic embedding enhances OLDA prediction Effect of

Total influence of history, History window size, and Equal or decaying contributions

Future work use of prior-knowledge effect of embedded historic semantics on detecting

emerging and/or periodic topics

semantic history embedding in online generative topic models pu wang (presenter) authors: loulwah...

topic detection t t

t short memory

emerging topic list

topic drift topic

time slide

topic estimation

lda topic model

total weight of history

Documents

transforming personal artifacts into probabilistic...

unsupervised learning – generative models (pixelrnns,...

generative structural analysis page 1 generative...

atelier generative sheetmetal design - vue...

getting started generative design -...

generative process, generative outcome: the transformational

generative adversarial nets: applications and...

healthy weight loss strategies lisa pawloski, phd assistant...

generative morphology

generative baroque algorithms · 2020-01-05 · 14h...

generative function

strong generative capacity, weak generative capacity, and...

conceptual framework for generative design - generative art

generative -...

data management overview for instruction librarians sarah...

t2s conference 2006 policy and networking: an ris in korea...

cs 465 computer architecture fall 2009 lecture 01:...

ogc demo at igarss06 july 30 - denver, co telecon 11 july...

generative adversarial networks (part...

wei sun and kc chang george mason university wsun@gmu.edu...