bag of timestamps: a simple and efficient bayesian chronological mining

Post on 27-Jan-2015

107 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Aim› Find trends in document collections

academic papers, patents, blog entries…

Idea› Construct timestamps arrays as a new

observed data

Method› Modify latent Dirichlet allocation (LDA)

Timestamp array for each document

t

“test”

t t

“test” “group” “group” “group” “effect” “space” “space”

t t−1 t −1 t+1 t+1

Modify LDA

› Draw a topic multinomial Multi(θd) from Dirichlet

› For each word tokens

Draw a topic t from Multi(θd)

Draw a word from multinomial Multi(φt)

› For each timestamp tokens

Draw a topic t from Multi(θd)

Draw a timestamp from multinomial Multi(ψt)

θαz t

z w

β φ

γ ψ

Different Dirichlet priors for word and

timestamp multinomials

› Taking Bayesian approach also for

timestamps

› Not just introducing new vocabulary

Topics over TimeBag of

TimestampsModification of LDA(Beta distributionfor continuous timestamps)

Modification of LDA(Dirichlet-multinomialfor discrete timestamps)

O(NK) time, O(N) spaceN: number of word tokens

O((N+L)K) time, O(N+L) spaceL: sum of timestamp array lengths

Non-Bayesian termin updating formulafor Gibbs sampling

Additional input parameterfor timestamp array lengths

θαz t

z w

β φ

ψ1,ψ2

Pros

› Bayesian also for timestamps

› Simple in updating computations

Cons

› Clueless in determining timestamp array

lengths

› Weak for fine-grained timestamps

Determining timestamp array lengths› Controlling strength of timestamp data

Parallelization› OpenMP

› CUDA

› MPICH2

top related