text analytics on google app

25
Text Analytics on Google App Nan (Miya) Wang, Chang Yin, Ellin Iqra, Anshu Vyas

Upload: nan-wang

Post on 22-Jan-2017

112 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Text Analytics on Google App

Text Analytics on Google App

Nan (Miya) Wang, Chang Yin, Ellin Iqra, Anshu Vyas

Page 2: Text Analytics on Google App

INTRODUCTION

2

Page 3: Text Analytics on Google App

ANALYSIS OVERVIEW

TEXT CLUSTER

ING(Python)

TEXT CLASSIFICAT

ION(Python)

SUCCESSANALYSIS(Tableau)

3

Page 4: Text Analytics on Google App

● March 1st - April 21● 2227 Google apps● 12 variables

● Two Text Variables

DATASET

Source: https://play.google.com/store/apps/category/GAME/collection/topselling_new_free4

Page 5: Text Analytics on Google App

EXPLANATORY ANALYSIS

5

Page 6: Text Analytics on Google App

EXPLANATORY ANALYSIS

Word Count for Description Text

Average: 1400

6

Page 7: Text Analytics on Google App

• Remove Numbers/Stopwords

• Lowercase

• Tokenize

• Stemming

• Feature Engineering

TEXT PREPARATION

7

Page 8: Text Analytics on Google App

TEXT CLUSTERINGFeature

Engineering-----TFIDF “min_df= 4”“ngram range” =(1,5)135 Features

Top 20 TFIDF Terms

8

Page 9: Text Analytics on Google App

TEXT CLUSTERING 'DT' 'JJ''JJS', 'JJR''NN''NNP' 'RB''VB''VBP''VBZ' 'RBR''VBD''VBN'

Feature Engineering-----POS

9

Page 10: Text Analytics on Google App

TEXT CLUSTERING

90 Principals can explain all the variance

Feature Engineering-----PCA

10

Page 11: Text Analytics on Google App

TEXT CLUSTERING

KMeans

Only 197 unique text about update for 2227 APPs.

Model Building

11

Page 12: Text Analytics on Google App

TEXT CLUSTERINGEvaluation----Cluster

0

12

Page 13: Text Analytics on Google App

TEXT CLUSTERINGEvaluation----Cluster

1

13

Page 14: Text Analytics on Google App

TEXT CLUSTERINGEvaluation----Cluster

2

14

Page 15: Text Analytics on Google App

TEXT CLUSTERINGEvaluation----Cluster

3

15

Page 16: Text Analytics on Google App

TEXT CLUSTERINGEvaluation----Cluster

4

16

Page 17: Text Analytics on Google App

TEXT CLASSIFICATIONPreprocessing

Choose Just 10 out of 23 categories

17

Page 18: Text Analytics on Google App

TEXT CLASSIFICATIONFeature Engineering ----

Parameter Setting

18

Page 19: Text Analytics on Google App

TEXT CLASSIFICATIONFeature EngineeringFeature Engineering ----- POS

Feature Counts by POS

19

Page 20: Text Analytics on Google App

TEXT CLASSIFICATIONFeature EngineeringFeature Engineering -----

TFIDFDrop Features with TFIDF

larger than 0.074 and smaller than 0.3

About 51000 features reduced to 20040

20

Page 21: Text Analytics on Google App

TEXT CLASSIFICATIONModel Building

21

Page 22: Text Analytics on Google App

SUCCESS ANALYSIS

● Average of Num of star and review num for each operating sys.

● The data is filtered on Num Install over 50,000

Varies with

Device 22

Page 23: Text Analytics on Google App

● Average of Num of star and review num for each subcategory.

● Data is filtered on Num Install over 50,000

Apps with largest installations are all in Strategy Group.

SUCCESS ANALYSIS

23

Page 24: Text Analytics on Google App

● Average of Num of star and review num for Top Developer or not.

● Data is filtered on Num Install over 50,000

SUCCESS ANALYSIS

24

Page 25: Text Analytics on Google App

CONCLUSION

● Five labels available for application updation:➢ Add New Features➢ GamePlay Modification➢ Improve Levels➢ Fix Bugs & New Versions➢ Fix Minor Bugs & Optimization

● 71% of Accuracy for classify apps into 10 subcategories.

● App Subcategory, Developer, Operating System have correlation relationship with APP success.

25