mining multi-faceted overviews of arbitrary topics in a text collection xu ling, qiaozhu mei,...

13
MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz Presented by: Qiaozhu Mei, UIUC 2008.08.25 1 University of Illinois at Urbana-Champaign

Upload: jeremy-york

Post on 27-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • MINING MULTI-FACETED OVERVIEWS OF ARBITRARY TOPICS IN A TEXT COLLECTION Xu Ling, Qiaozhu Mei, ChengXiang Zhai, Bruce Schatz Presented by: Qiaozhu Mei, UIUC 2008.08.25 1 University of Illinois at Urbana-Champaign
  • Slide 2
  • Motivation The common task: mining and extracting information from a text collection with ad hoc information needs Structured, faceted summarization Clustering search results Integrating expert/customer reviews Semi-structured summarization of scientific literatures Etc. etc. 2 University of Illinois at Urbana-Champaign
  • Slide 3
  • Multifaceted Text Overview Even if relevant information is found: Too much information 10 3 research papers 10 4 customer reviews 10 5 web search results Facet2: Design Facet1: Price Facet3: Driving experience - A multifaceted overview Sentence 1, Sentence 2, Sentence k, price 0.4 finance 0.3 cheap 0.05 interest 0.05 3 University of Illinois at Urbana-Champaign
  • Slide 4
  • Multi-Faceted Overview Mining Unsupervised A topic clustering problem Limitations: Topics do not necessarily reflect users preferences Summarizing a topic cluster is still challenging Supervised A categorization problem with training examples Limitations: Predefined facets, may not fit the need of a particular user Only works for a predefined domain and topics Training examples for each facet are often unavailable What is missing here? User interactions 4 University of Illinois at Urbana-Champaign
  • Slide 5
  • More Realistic New Setup Allow a user to flexibly describe each facet with keywords (1-2) Let the user determine what they want Mine a multi-faceted overview in a semi-supervised way No need of training examples Technical challenge: how to cast it as a semi-supervised learning problem 5 University of Illinois at Urbana-Champaign
  • Slide 6
  • Example (1): Consumer vs. Editor FacetsGenerated Overview (10k customer rev.)Editor's Review (1) Body Styles, Exterior Design Like the minor exterior styling changes from 2005 to 2006. Tried the Camry XLE first, nice ride, but lacked a few features i wanted, like dual zone A/C, and didn't like the wood trim.... Available trim levels include... The VP provides air conditioning, power windows... Powertrains Safety Interior Design The interior is beautiful - I got all of the features and the navigation is extremely easy to use. Accord's interior is top notch, nice design, clear gauges, comfy seats, lots of storage space The seating arrangements are top-notch, and the interior design and materials quality continue the high- caliber standards... The car's backseat is among the roomiest in the segment... Driving Impressions Honda accord 2006 6 University of Illinois at Urbana-Champaign
  • Slide 7
  • Example (2): Different Facets FacetsUser InputGenerated Overview Designdesign, styleLike the minor exterior styling changes from 2005 to 2006. Accord's interior is top notch, nice design, clear gauges, comfy seats, lots of storage space Engineengine, fuel Financefinance, priceWhen I bought it I was amazed at the trim level for the price. It is extremely fun to drive, fit and finish is fantastic, the oversteer could easily be corrected, at the price, it has no peer and is 10k less then a comparable BMW Safetysafety Drivingcomfort, fun What if the users want an overview with different facets? 7 University of Illinois at Urbana-Champaign
  • Slide 8
  • Approach Two-stage framework, using probabilistic topic models Model each facet with a language model (word distribution) Facet model initialization bootstrapping method to expand the original facet keywords with additional correlated words in the document collection Facet model estimation: to guide a generative topic model with user defined facets Propose probabilistic mixture models to estimate the word distribution of every facet Meanwhile, constraining a facet model to be close to the user specification Generate the overview: apply the estimated facet models to categorize the sentences into a semi-structured overviews 8 University of Illinois at Urbana-Champaign
  • Slide 9
  • Bootstrapped facet model initialization design feature fun drive comfortable price horsepower smooth performance fuel safety reliability exterior roof seat cheap engine performance 0.5 fuel 0.5 performance 0.4 fuel 0.3 horsepower 0.05 engine 0.03 smooth 0.03 9 University of Illinois at Urbana-Champaign
  • Slide 10
  • Semi-supervised facet model estimation Guide facet model estimation with Dirichlet Priors . Dirichlet prior, can be interpreted as pseudo word counts - Initialized distr. 10 University of Illinois at Urbana-Champaign
  • Slide 11
  • Semi-supervised facet model estimation Guide facet model estimation with Regularization the log likelihood of the text collection propagates the constraint through the entire collection according to document similarities Constrains the estimated facet models to close to the initial facet models 11 University of Illinois at Urbana-Champaign
  • Slide 12
  • Experimental Results The gene summarization task in biomedical literature The car review mining task for online customer reviews Our proposed system, especially the regularized Topic model, is quite effective in mining multi-faceted overviews FacetsPriorRegMQR SI0.440.450.47 GI0.510.470.41 GP0.200.220.20 EL0.220.250.18 MP0.25 0.20 WFPI0.090.190.15 Avg.0.290.310.27 FacetsPriorRegMQR BS0.1930.2000.174 PP0.2730.2780.207 SF0.2350.2430.208 IF0.3090.3240.294 DI0.3160.3190.264 Avg.0.2650.2730.229 ROUGE-1 Average R scores Precision @5 12 University of Illinois at Urbana-Champaign
  • Slide 13
  • - Please stop by our poster on Tuesday University of Illinois at Urbana-Champaign