dss + stan - dataiku · 2017-10-10 · topic 1 topic 2 topic 3 document topics words trump award...
TRANSCRIPT
DSS + StanBayesian Data Science
2 FormationEquancy 11,18,19/2/2016
A data science workflow
Data Acquisition &
Understanding
Data Preparation
Model Creation
Evaluation Deployment
Dataset 1
Scored dataset
Scored dataset
Iteration 1Iteration 2
Iteration n
Dataset 2
Dataset n
Business / Scientific Understanding
Adapted from the CRISP-DM methodology
3 FormationEquancy 11,18,19/2/2016
A data science workflow
4 FormationEquancy 11,18,19/2/2016
A data science workflow
5 FormationEquancy 11,18,19/2/2016
A data science workflow
6 FormationEquancy 11,18,19/2/2016
Today’s DemoTopic Modeling of Tweets
+ +
Topic Modeling
Topic ModelingA brief intro
News
Pop culture
Marketing
WordsTopicsDocument
Trump
Award
Video
Report
Product
Topic ModelingA brief intro
News
Pop culture
Marketing
WordsTopicsDocument
Trump
Award
Video
Report
Product
Topic ModelingA brief intro
News
Pop culture
Marketing
WordsTopicsDocument
Trump
Award
Video
Report
Product
Bayesian Topic Modeling
News
Pop culture
Marketing
WordsTopicsDocument
Trump
Award
Video
Report
Product
20%
70%
10%
Bayesian Topic Modeling
News
Pop culture
Marketing
WordsTopicsDocument
Trump
Award
Video
Report
Product
40%
40%
20%
5%
5%
Bayesian Topic Modeling
News
Pop culture
Marketing
WordsTopicsDocument
Trump
Award
Video
Report
Product
20%
70%
10%
40%
40%
20%
5%
5%
Topics are “hidden”
Topic 1
Topic 2
Topic 3
WordsTopicsDocument
Trump
Award
Video
Report
Product
20%
70%
10%
40%
40%
20%
5%
5%
WorkflowOutputsAlgorithmDocuments
X
Tweets
Latent DirichletAllocation
Distribution of topics for each document
Distribution of words for each topic
To DSS!