Transcript
  • Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet ProcessChong Wang and David M. BleiNIPS 2009Discussion led by Chunping WangECE, Duke UniversityMarch 26, 2010

  • OutlineMotivations LDA and HDP-LDASparse Topic Models Inference Using Collapsed Gibbs samplingExperimentsConclusions1/16

  • Motivations2/16 Topics modeling with the bag of words assumption An extension of the HDP-LDA model In the LDA and the HDP-LDA models, the topics are drawn from an exchangeable Dirichlet distribution with a scale parameter . As approaches zero, topics will be sparse: most probability mass on only a few terms less smooth: empirical counts dominant Goal: to decouple sparsity and smoothness so that these two properties can be achieved at the same time. How: a Bernoulli variable for each term and each topic is introduced.

  • LDA and HDP-LDA3/16LDAHDP-LDAtopic : document : word : topic : document : word : Nonparametric form of LDA, with the number of topics unbounded Base measureweights

  • Sparse Topic Models4/16The size of the vocabulary is VDefined on a V-1-simplexDefined on a sub-simplex specified by : a V-length binary vector composed of V Bernoulli variables one selection proportion for each topicSparsity: the pattern of ones in , controlled bySmoothness: enforced over terms with non-zero s throughDecoupled!

  • Sparse Topic Models5/16

  • Inference Using Collapsed Gibbs sampling6/16

  • Inference Using Collapsed Gibbs sampling6/16As in the HDP-LDA Topic proportions and topic distributions are integrated out.

  • Inference Using Collapsed Gibbs sampling6/16 Topic proportions and topic distributions are integrated out. The direct-assignment method based on the Chinese restaurant franchise (CRF) is used for and an augmented variable, table counts As in the HDP-LDA

  • Inference Using Collapsed Gibbs sampling7/16Notation: : # of customers (words) in restaurant d (document) eating dish k (topic) : # of tables in restaurant d serving dish k : marginal counts represented with dots K, u: current # of topics and new topic index, respectively : # of times that term v has been assigned to topic k : # of times that all the terms have been assigned to topic k conditional density of under the topic k given all data except

  • Inference Using Collapsed Gibbs sampling8/16Recall the direct-assignment sampling method for the HDP-LDA Sampling topic assignments

    if a new topic is sampled, then sample , and let and and Sampling stick length Sampling table counts

  • Inference Using Collapsed Gibbs sampling8/16Recall the direct-assignment sampling method for HDP-LDA Sampling topic assignments

    for HDP-LDAfor sparse TMInstead, the authors integrate out for faster convergence. Since there are total possible , this is the central computational challenge for the sparse TM.straightforward

  • Inference Using Collapsed Gibbs sampling9/16wheredefinevocabularyset of terms that have word assignments in topic kThis conditional probability depends on the selector proportions.

  • Inference Using Collapsed Gibbs sampling10/16

  • Inference Using Collapsed Gibbs sampling10/16

  • Inference Using Collapsed Gibbs sampling11/16 Sampling Bernoulli parameter ( using as an auxiliary variable)

    Sampling hyper-parameters : with Gamma(1,1) priors : Metropolis-Hastings using symmetric Gaussian proposal Estimate topic distributions from any single sample of z and bdefineset of terms with an on b sample conditioned on ; sample conditioned on .sparsitysmoothness on the selected terms

  • Experiments12/16 arXiv: online research abstracts, D = 2500, V = 2873 Nematode Biology: research abstracts, D = 2500, V = 2944 NIPS: NIPS articles between 1988-1999, V = 5005. 20% of words for each paper are used. Conf. abstracts: abstracts from CIKM, ICML, KDD, NIPS, SIGIR and WWW, between 2005-2008, V = 3733.Four datasets:Two predictive quantities: where the topic complexity

  • Experiments13/16better perplexity, simpler modelslarger : smootherless topics similar # of terms

  • Experiments14/16

  • Experiments15/16small (
  • Experiments15/16small (
  • Experiments15/16small (
  • Experiments15/16Infrequent words populate noise topics.small (
  • Conclusions16/16 A new topic model in the HDP-LDA framework, based on the bag of words assumption; Main contributions: Decoupling the control of sparsity and smoothness by introducing binary selectors for term assignments in each topic; Developing a collapsed Gibbs sampler in the HDP-LDA framework. Held out performance is better than the HDP-LDA.


Top Related