text analytics on google app
TRANSCRIPT
Text Analytics on Google App
Nan (Miya) Wang, Chang Yin, Ellin Iqra, Anshu Vyas
INTRODUCTION
2
ANALYSIS OVERVIEW
TEXT CLUSTER
ING(Python)
TEXT CLASSIFICAT
ION(Python)
SUCCESSANALYSIS(Tableau)
3
● March 1st - April 21● 2227 Google apps● 12 variables
● Two Text Variables
DATASET
Source: https://play.google.com/store/apps/category/GAME/collection/topselling_new_free4
EXPLANATORY ANALYSIS
5
EXPLANATORY ANALYSIS
Word Count for Description Text
Average: 1400
6
• Remove Numbers/Stopwords
• Lowercase
• Tokenize
• Stemming
• Feature Engineering
TEXT PREPARATION
7
TEXT CLUSTERINGFeature
Engineering-----TFIDF “min_df= 4”“ngram range” =(1,5)135 Features
Top 20 TFIDF Terms
8
TEXT CLUSTERING 'DT' 'JJ''JJS', 'JJR''NN''NNP' 'RB''VB''VBP''VBZ' 'RBR''VBD''VBN'
Feature Engineering-----POS
9
TEXT CLUSTERING
90 Principals can explain all the variance
Feature Engineering-----PCA
10
TEXT CLUSTERING
KMeans
Only 197 unique text about update for 2227 APPs.
Model Building
11
TEXT CLUSTERINGEvaluation----Cluster
0
12
TEXT CLUSTERINGEvaluation----Cluster
1
13
TEXT CLUSTERINGEvaluation----Cluster
2
14
TEXT CLUSTERINGEvaluation----Cluster
3
15
TEXT CLUSTERINGEvaluation----Cluster
4
16
TEXT CLASSIFICATIONPreprocessing
Choose Just 10 out of 23 categories
17
TEXT CLASSIFICATIONFeature Engineering ----
Parameter Setting
18
TEXT CLASSIFICATIONFeature EngineeringFeature Engineering ----- POS
Feature Counts by POS
19
TEXT CLASSIFICATIONFeature EngineeringFeature Engineering -----
TFIDFDrop Features with TFIDF
larger than 0.074 and smaller than 0.3
About 51000 features reduced to 20040
20
TEXT CLASSIFICATIONModel Building
21
SUCCESS ANALYSIS
● Average of Num of star and review num for each operating sys.
● The data is filtered on Num Install over 50,000
Varies with
Device 22
● Average of Num of star and review num for each subcategory.
● Data is filtered on Num Install over 50,000
Apps with largest installations are all in Strategy Group.
SUCCESS ANALYSIS
23
● Average of Num of star and review num for Top Developer or not.
● Data is filtered on Num Install over 50,000
SUCCESS ANALYSIS
24
CONCLUSION
● Five labels available for application updation:➢ Add New Features➢ GamePlay Modification➢ Improve Levels➢ Fix Bugs & New Versions➢ Fix Minor Bugs & Optimization
● 71% of Accuracy for classify apps into 10 subcategories.
● App Subcategory, Developer, Operating System have correlation relationship with APP success.
25