turning text into insights: an introduction to topic models

12
AN INTRODUCTION TO TOPIC MODELING Turning text into insight:

Upload: datascience

Post on 18-Feb-2017

190 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Turning Text Into Insights: An Introduction to Topic Models

AN  INTRODUCTION  TO  TOPIC  MODELING  

Turning  text  into  insight:    

Page 2: Turning Text Into Insights: An Introduction to Topic Models

Handling  Raw,  Unlabeled  Text  

§  Common  Datasets:  ª Product/  Customer  Reviews  ª Call  Center  Transcripts  ª News  Paper  Articles  ª Legal  Documents  

§  Common  Tasks:  ª Find  documents  were  interested  in?  ª Categorize  documents?  ª Retrieve  information?  

2  

Page 3: Turning Text Into Insights: An Introduction to Topic Models

Handling  Raw,  Unlabeled  Text  

3  

§  Common  Datasets:  ª  Product/  Customer  Reviews  ª  Call  Center  Transcripts  ª  News  Paper  Articles  ª  Legal  Documents  

§  Common  Tasks:  ª  Find  documents  were  

interested  in?  ª  Categorize  documents?  ª  Retrieve  information?  

§  The  Challenge  ª  Normal  quantitative  approaches  don’t  work  with  text.  ª  Datasets  are  large,  complicated,  sparse,  and  unwieldy.  ª  Data  is  often  unlabeled.  

 

Page 4: Turning Text Into Insights: An Introduction to Topic Models

Example:  Understanding  Customer  Reviews  

4  

§  Mon  Ami  Gabi  is  a  restaurant  in  the  Paris  Paris  Hotel  and  Casino.  

§  Thousands  of  customer  reviews  for  the  restaurant  over  the  last    8  years.  

What  are    customers    saying?  

Excellent  breakfast  menu.  They  just  need  to  hire  more  staff  to  have  a  better  service.  

Great  place    for  brunch!  

Highly  recommend    the  steak  and  fries    and  sitting  outside.  

Had  a  great  meal  with  a  great  atmosphere  

Food  was  ok…  What  it  has  going  for  it  is  the  view  from  the  outside  

terrace.  

Page 5: Turning Text Into Insights: An Introduction to Topic Models

Topic  Modeling:  Framework  

5  

Excellent  breakfast    menu.  They  just  need    to  hire  more  staff  to  have    a  better  service  

Breakfast

Quality  of  Service  

breakfast  

better  

service  

staff  

Documents   Topics   Words  and  Phrases  

Page 6: Turning Text Into Insights: An Introduction to Topic Models

Topic  Modeling:  Preprocessing  

6  

§  Tokenize:  Extract  meaningful  units  from  sentences  ª  I  ordered  a  french  toast  

ª  Regular  expression  cleanup,  end-­‐of-­‐line  hyphenation,  contraction,  and  sentence-­‐initial  capitalization  rules.    

§  Stemming  Algorithm:  Consolidate  feature  space  into  word  stems  or  lemmas  ª  {I,  ordered,  a,  french  toast}  

ª  Suffix  stripping,  part  of  speech  tagging  

§  Matrix  Factorization:  Convert  text  into  data  structure  for  learning  algorithms.  

ª  Word-­‐document  matrices  often  have  1,000,000,000,000+  values.  Need  special  compression  algorithms  to  make  data  manageable.  

{I,  ordered,  a,  french  toast}  

{I,  order,  a,  french  toast}  

Page 7: Turning Text Into Insights: An Introduction to Topic Models

Topic  Modeling:  Estimation  with  Gibbs  Sampler  

7  

ª Use  Markov  Chain  Monte  Carlo  methods  to  simulate  our  document-­‐topic  and  topic-­‐word  probability  distributions.  

ª Results:  

Topic-­‐Word  

Breakfast   Service  

Breakfast:  0.31   Service:  0.28  

Eggs:  0.27   Staff:  0.24  

Coffee:  0.24   Friendly:  0.21  

Document-­‐Topic  

The  french  toast  was  great   The  staff  was  great,  but  the  outdoor  patio  was  a  bit  noisy.  

French  Toast:  0.71   Service:  0.51  

Breakfast:  0.25   Environment:  0.44  

Service:  0.03   Breakfast:  0.02  

Page 8: Turning Text Into Insights: An Introduction to Topic Models

Harnessing  the  Model:  Topic  Frequency  

8  

What  are  my  customers  talking  about?  

Page 9: Turning Text Into Insights: An Introduction to Topic Models

Harnessing  the  Model:  Evaluate  Products  and  Verticals  

9  

How  do  customers  feel  about  my  products?  

Page 10: Turning Text Into Insights: An Introduction to Topic Models

Harnessing  the  Model:  Temporal  Insights  

10  

How  has  customer  sentiment  evolved  among  my  product  lines  over  time?  

Page 11: Turning Text Into Insights: An Introduction to Topic Models

Harnessing  the  Model:  Deep  Product  Insights  

11  

Which  properties  of  French  Toast  drive  satisfaction  (or  dissatisfaction)?  

Page 12: Turning Text Into Insights: An Introduction to Topic Models

Thank  you.