exploratory web analytics project of sample data from an ...exploratory web analytics project of...

63
Go data diving with Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated based the online data with Tableau

Upload: others

Post on 15-Oct-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Go data diving with

Exploratory Web Analytics Project of Sample data from an online retailer

Lily Qian Zhao

*The ‘flower’ was generated based the online data with Tableau

Page 2: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Content

• Intro • Data Analytics

-Exploratory Analysis

-Cluster Analysis

-Logistic Regression

-Multiple Regression

• Targeted

Recommendations

• Moving Forward • Technical Appendix

Page 3: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Project Overview

Sample Data (2013)• 9 Months (Except March-May)

• 21061 Rows of Records

• 12 possible Predictor Variables

1.Introduction

Recommendations

Data Cleansing

Predictive

ModelsExploratory

Analysis

Model Analysis

Page 4: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Business Objective

Find out how to achieve

better performance on

visits, orders and most

importantly, the sales.

Find out what factors

would be related to visits,

orders and sales.

1 2

1.Introduction

Page 5: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Technical Supports

Explore and summarize data’s

characteristics – Data Exploratory

Segment customers based on

shared/distinct features and

conduct according analytics –

Cluster Analysis

Summarize what factors may influence

purchase or not, predict who would

purchase – Logistic Regression

Summarize what factors may influence how much customer would purchase/ predict

how many purchases would occur and how to achieve better performance on visits,

orders and sales.

1 3

2

4

1.Introduction

Page 6: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Summary of Original Data

Data was recorded for each

site based on different platform

and different kinds of customers

per day

1

Acme Botly Pinnacle Sortly Tabular Widgetry Android BlackBerry IOS ······ ChromeOS New Returning Neither

9

Site 14

1 0 V

Platform 3New Customers

3Categorical Variable

1.Introduction

Page 7: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Summary of Original Data (Continued)

Day: From Jan. 1st – Feb. 28th and June. 1st – Dec. 31st in 2013

• All the distributions of variables (except for Month) are heavily right skewed.

Discrete Variable

Variable Min. Max. Median Mean

Visits 0 136057 24 1935

Distinct Sessions0 107104 19 1515

Orders 0 4916 0 62.38

Bounces 0 54512 5 743.3

Add to Cart 0 7924 4 166.3

Product Page Views

0 187501 53 4358

Search Page Views

0 506629 82 8584

Gross Sales 1 707642 851 16473

2

1.Introduction

Page 8: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Exploratory Analysis

2. Data Analytics

• In the data exploratory step, research and study of the basic characteristics of variables, interactions between them and the corresponding trends are conducted.

• The major study focuses on visits, orders and sales.

• Some other data inconsistencies are found and data is cleansed further and transformed accordingly. (Please refer to Technical Appendix for details)

Page 9: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Exploratory Analysis

• For instance, there are breakpoints on Feb 9th in the table of Visits’, Orders’ and Gross Sales’ trend based on different platforms throughout the months. Referring to the original data, it is very likely that ‘iPad’, ‘iPhone’ are regarded as ‘iOS’ after the date thus disappear from the table. And so does ‘Macintosh’ (recorded as ‘MacOSX’ after Feb 9th). So the data is transformed to make it consistent.

2. Data Analytics

Page 10: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Exploratory Analysis

There are five major parts in Exploratory Analysis. Based on these parts, study of numbers of Visits/Orders/Gross_Sales are conducted.

• Zero Visits

• Platforms

• Weekdays

• New Customers

• Sites

2. Data Analytics

Page 11: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

“Nearly 20% records of Site Sortly has 0 Visit, so does Pinnacle”

Zero Visit

4.6% 19.3%18.5%

2. Data Analytics

Page 12: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

“’Unknown’ and ‘WindowsPhone’ have great influence because of their 0 visit”

Zero Visit

Platform SymbianOS Blackberr

y

ChromeOS Unknown WindowsPhone Linux Other MacOSX Android Windows iOS

0 visit

rate

45.94% 29.39% 28.54% 23.99% 22.59% 19.59% 13.46% 4.73% 4.35% 2.62% 0.85%

Influence

Score

3 2 5 1080 590 402 0.005 4 23 0.3 1

0_𝑉𝑖𝑠𝑖𝑡𝑅𝑎𝑡𝑒(𝑃𝑙𝑎𝑡𝑓𝑜𝑟𝑚) =𝑟𝑒𝑐𝑜𝑟𝑑_𝑜𝑓_0_𝑉𝑖𝑠𝑖𝑡#

𝑇𝑜𝑡𝑎𝑙_𝑟𝑒𝑐𝑜𝑟𝑑#

𝐼𝑛𝑓𝑙𝑢𝑒𝑛𝑡𝑖𝑎𝑙_𝑆𝑐𝑜𝑟𝑒(𝑃𝑙𝑎𝑡𝑓𝑜𝑟𝑚) =𝑃𝑙𝑎𝑡𝑓𝑜𝑟𝑚′𝑠𝑉𝑖𝑠𝑖𝑡#

𝑇𝑜𝑡𝑎𝑙𝑉𝑖𝑠𝑖𝑡#× 0_𝑉𝑖𝑠𝑖𝑡𝑅𝑎𝑡𝑒 × 104

The platform with high 0_VisitRate and high proportion in total visits

would have greater influence because of their 0 visit.

2. Data Analytics

Page 13: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Distributions of Visits/Orders/Sales Based on Different Platforms

VisitsGross SalesOrders

2. Data Analytics

Page 14: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Trend of Visits/Orders/Sales throughout time Based on Different Platforms

Holiday season witnessed

the fast growth of v/o/s

while during the summer

there was a trough.

The trend lines for different

platform and for

visits/orders/sales are similar

over time .

Windows kept taking the

lead of Visits, Orders and

Sales.

2. Data Analytics

Page 15: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Week Days

Mondays achieve the best in V/O/S which

follows by Tuesday

Visits, Orders and Sales are almost evenly

distributed through days.

2. Data Analytics

Page 16: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Week Days

Holiday season witnessed

the fast growth of v/o/s.

During the summer, there

was a trough for Wed,Thu,Fri

and weekend.

The trend lines for different

platform and for

visits/orders/sales are similar

over time .

Trends from June to

December are very zig-zag

yet some days share the

similar trends.

2. Data Analytics

Page 17: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Distributions of Visits/Orders/Sales Based on Different Types of Customers

Majority (83%) of the orders and sales are from Returning Customers

Majority (85%) of the visitsare from Visiting Customers

2. Data Analytics

Page 18: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Trends of Visits/Orders/Sales Through Time Based on Different Types of Customers

Holiday season witnessed

the fast growth of v/o/s.

During the summer, there

was a trough for Wed,Thu,Fri

and weekend.

The majority of sales as well

as orders are brought by

returning customers while

the majority of visits are

brought by visitors

Though the highest visits

belong to visitors and the

highest sales&orders belong

to returning customers, they

share the very similar trend

2. Data Analytics

Page 19: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Distributions of Visits/Orders/Sales

Visit

Orders

Sales

The majority of visits, orders and sales

are all from Site Acme

Based on Different Types of Sites

2. Data Analytics

Page 20: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Distributions of Visits and Sales in SearchPageViews/ProductPageViews/DistinctSessions

Based on Different Types of Sites

Visit Sales

• The majority of visits and sales are

all from Site Acme

• Although Widgetry has less

visits/sales than the others, its

performance on Product Page

views is good

2. Data Analytics

Page 21: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Holiday season witnessed

the fast growth of v/o/s.

During the summer, there

was a trough for Acme.

The majority of sales as well

as orders are brought by

returning customers while

the majority of visits are

brought by visitors

This time trend is very similar

to other time trends in

‘customer’ and ‘platform’

Trends of Visits/Orders/Sales Through Time Based on Different Types of Sites

2. Data Analytics

Page 22: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Why Cluster Analysis?

Heterogeneity is the central concept in marketing because in almost all situations customers have different wants, needs and preferences. Whenever such heterogeneity exists, the website could recognize and accommodate differences can achieve an advantage over competitors in a category.

Clustering is a major approach for addressing heterogeneity. Customers with similar wants and needs are grouped into segments so the website can better meet the different needs.

Cluster Analysis

2. Data Analytics

Page 23: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Aim:

• Cluster the records based on their numeric values, profile them and find out differences between clusters to understand marketing/business performance.

Method: K-Means

Number of Clusters: 3-5

Assumptions

• Segment platform as Personal Computer (0.5) and Others (-0.5)

• Segment days as Weekdays and Weekends

• Define new_customer: Visitors -0.5; New Customers 0; Returning Customers 0.5.

Cluster Analysis

2. Data Analytics

Page 24: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Cluster Analysis

• Predictor Unadjusted • Group 1:

High Visits, High Search_Page_Views,

High Product_Page_Views;

Low Gross_Sales

• Group 2:

Low Visits, Low Search_Page_Views,

Low Product_Page_Views;

Low Gross_Sales

• Group 3:

Low Visits, High Search_Page_Views,

High Product_Page_Views;

High Gross_Sales

So if a new record has Low Visits, High

Search_Page_Views, High Product_Page_Views, it is

very likely that the record also has high Gross_Sales

2. Data Analytics

Page 25: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Cluster Analysis

• Predictor AdjustedFor every predictor has different scale and deviation, they are

standardized and then re-enter to cluster analysis. Here is the new

clustering result.

• Group 1:

High Visits, High Search_Page_Views, High Product_Page_Views;

Returning Customer, Use computer. High Gross_Sales

• Group 2:

Low Visits, Low Search_Page_Views, Low Product_Page_Views, New

customer or Visitors. Low Gross_Sales

• Group 3:

High add_to_chart rate, Low Visits, Low Search_Page_Views,

Low Product_Page_Views, Returning Customer. Low Gross_Sales

• Group 4:

High add_to_chart rate, Low Visits, High Conversion_Rates,

Returning or new customers. Low Gross_Sales

• Group 5:

High Visits, High Search_Page_Views, High Product_Page_Views, Use

computer

Low Gross_Sales

So if a new record has High Visits, High Search_Page_Views, High Product_Page_Views; Returning Customer, computer platform, it is very likely that the record also has high Gross_Sales

2. Data Analytics

Page 26: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Other Possible Clusters

Clustering on add-to-cart rate, gross sales

and site:

• Majority of gross sales comes from

Acme(site).

• 0.25 and 0.7 are two add to chart

rates to predict possible good gross

sales;

Interesting findings

2. Data Analytics

Page 27: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Other Possible Clusters

Clustering on conversion rate, gross sales

and new customer:

• Majority of gross sales comes from low

conversion rate parts.

• Returning customers have low

conversion rates; new customers have

medium conversion rates; visitors have

high conversion rates.

• Returning customers prone to spend

more per record than the new

customers.

Interesting findings

2. Data Analytics

Page 28: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Other Possible Clusters

Clustering on bounce rate, gross sales

and new customer:

• Returning customers have higher

bounce rate than the new customers.

• Visitors have a wide range of bounce

rate (0~1).

• The highest gross sales are contributed

by the returning customers within the

bounce rate range of 0.12-0.3.

Interesting findings

2. Data Analytics

Page 29: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Other Possible Clusters

Clustering on product page visits, search

page views and platform:

• product page visits and search page

views are positively related.

• Customers using computers have

relatively more search page views

than the phone users while the former

also have relatively lower product

page views than the phone users.

• On average, product page views are

as three times as search page views.

Interesting findings

2. Data Analytics

Page 30: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Other Possible ClustersInteresting findings

Clustering on visits, gross sales and new

customer:

• Majority of gross sales comes from low

conversion rate parts.

• For returning customers, the more they

visit the website, the more they would

purchase.

• If the numbers of visits are the same,

new customers would purchase more

than the returning customers within a

price range (less than 200,000).

2. Data Analytics

Page 31: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Logistics Regression

• Summarize what factors may influence purchase or not

• predict who would purchase

• Why Logistic Regression?

① It predicts binary response variable: purchase or not

② It could be measured whether the model is good: randomly divide the data into training and testing set; fit the model with training set and test the validation with test set and then compare the real outcome and the simulated one to get the correct rate.

Purposes of modeling

2. Data Analytics

Page 32: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Logistics RegressionTransformation before modeling

Categorical variables were transformed to be numbers

• Add binomial response variable “sales”:

if gross sales > 0, sales = 1; if gross sales = 0, sales = 0

• Platform: as Personal Computer (0.5) and Others (-0.5)

• Customer: Returning customer (0.5), new customer (0) and visitors (-0.5)

The scale of training set and test set is 7:3

2. Data Analytics

Page 33: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Logistics RegressionModel

• Final model:

Sales (0-1) is related to

① Search page views,

② product page views,

③ bounce rate, platform,

④ add to chart rate

⑤ new customer

The model could correctly predict 95% that whether there would be a sale

given the information of Search page views, product page views, bounce rate, platform, add to chart rate and new customer

2. Data Analytics

Page 34: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Logistics RegressionModel Interpretation

Please refer to the technical appendix for R’s output

Final model: (Positive or Negative relation between predictors and sales, followed by weigh)

Sales (0-1) is related to

• Search page views (Positive, less than 0.1)

• Product page views (Negative, less than -0.1)

• Bounce rate (Negative, -0.67)

• Platform

Blackberry(N,-0.33); ChromeOS(N,-0.82); iOS(N,-0.26); Linux(N,-0.61); MacOSX(P,0.67)

Other(N,-0.75); SymbianOS(N,-1.4); Unknown(P,1.16); Windows(P,0.46); WinPhone(P,-0.1)

• Add to chart rate (Positive, 1.24)

• Returning customer (Positive, the longer the better, 3.5)

So those with high search page views, low product page view, have low bounce rate, explore through Windows or MacOSX device, have high add to rate and be a returning customer are likely to create sales

2. Data Analytics

Page 35: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Logistics RegressionInteresting findings

Please refer to the technical appendix for R’s output

• Higher bounce rate is a killer for sales (highest negative value)

• Returning customers weigh the most for sales

• For platform, Windows, MacOSX and Unknown do a good job yet Chrome OS and SymbianOS are not;

2. Data Analytics

Page 36: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Logistics RegressionModel Diagnostics

• The ROC curve: turns out to be 0.98 which is fairly a nice indicator for the

accuracy of fitting the dataset.

2. Data Analytics

Page 37: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Logistics RegressionInteresting findings

Important Points (Statistically, influential points and outliers):

These outliers are deadly important and should be taken very seriously

There are four records have relatively great influence on the model.

• All of them happened on December 2nd 2013 (Cyber Monday)

• All of them are from Site Acme

2. Data Analytics

Page 38: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Logistics RegressionInteresting findings

Platform NewCustomer Visits Orders G_Sales P_P_V S_P_V C-rate B-rate A-rate

Windows New 3694 2390 247384 10728 19441 0.65 0.16 0.71

Windows Returning 26347 4916 707642 78159 175488 0.19 0.17 0.3

MacOSX Returning 18044 2766 458546 52556 111936 0.15 0.26 0.25

iOS Returning 8283 1225 184423 21933 44212 0.15 0.23 0.26

DataAverage

/ 1935 62 16473 4358 8584

2. Data Analytics

Page 39: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Multiple Linear Regression

• Summarize what factors may influence that how much customer would purchase

• Predict how many purchases would occur and how to achieve better performance on visits, orders and sales.

• Why Multiple Linear Regression?

Because it predicts numeric response variable: gross sales

It also could be measured whether the model is good: Besides the statistical criterion of the model (R-squared, P-values et.), randomly dividing the data into training and testing set; fit the model with training set and test the validation with test set and then compare the real outcome and the simulated one to get the correct rate.

Purposes of modeling

2. Data Analytics

Page 40: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Multiple Linear Regression

The table indicates the correlations between variables

The darker the color is, the more the two variables are correlated.

-visits and product_page_views; visits and search_page_views are highly correlated;

-orders and sales are highly correlated;

-

Variable selection

2. Data Analytics

Page 41: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Multiple Linear RegressionModel

Final model: (Positive or Negative relation between predictors and sales, followed by weigh)

Sales amount is related to

• Weekday(Positive to weekend, 30)

• Distinct Sessions(Negative, -3.25)

• Visits(Positive, 2.65)

• Platform

Blackberry(N,-315); ChromeOS(N,-187); iOS(P,522); Linux(N,-113); MacOSX(P,1248)

Other(P,380); SymbianOS(P,26); Unknown(N,-391); Windows(N,-3262); WinPhone(N,-220)

• Add to chart rate (Negative, -1195)

• Orders (Positive, 144)

• Returning customer (Positive, the longer the better, 1235)

So those with purchase on weekends, low distinct sessions, have visits, explore through MacOSX device, have lower add to rate and be a returning customer are likely to create higher sales

2. Data Analytics

Page 42: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Multiple Linear Regression

• R-squared: 0.9857 excellent!

• P-value < 2.2*10^(-16) great!

• Correct rate: (when the absolute value of(predicted value – actual sales)is less than the actual sales)

• The incorrect cases could be very likely related to the platform coefficient: windows is negative.

Model Diagnostics

66%

2. Data Analytics

Page 43: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Multiple Linear Regression

Used 3-fold (cross-validation) to avoid overfitting. This is how it

Model diagnostics

2. Data Analytics

Page 44: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Since returning customers are the backbones…

• Try to turn more new customers to returning customers

- Discounts/Coupons/Loyalty cards for returning customers

- Targeted ads for new customers

- Build email lists

- Regular greetings emails/messages

- Better customer service

… so we are trying turning everyone to be returning customers!

3. Targeted Recommendations

Page 45: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Since returning customers are the backbones…

• Make returning customers stay longer with the website

- Discounts/Coupons for returning customers who stay longer and spend more

- Targeted ads for returning customers

- Better customer service

- Seasonal greetings and birthday gift/coupons (“We care about you!”)

… and we’ve got them connected!

3. Targeted Recommendations

Page 46: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Since returning customers are the backbones…

• Attract more visitors to become new customers

Customers we have are just like the Ping-Pong balls in a basket with a hole on the bottom :

We need try to make the hole smaller as well as get more balls into the basket.

So we also need get more visitors to become new customers.

-Better online shopping experience

-Make it easy to registrar as a new customer for visitors

-First time shopping discount

… and we need more potential returning customers as well!

3. Targeted Recommendations

Page 47: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

4. Moving Forward

Data Collection

Good

Model

Frequency Monetary

Recency

A Good model of forecasting

customers’ future

value should be related to

customers’ behaviors

Recency, Frequency and Monetary.

Based on that, there could be

improvement of data

collecting which would provide

better support foranalytics.

Page 48: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Data Collection

• More Demographic data of individual-Gender-Geographic information

-Age group-Family size(single/married/with how many kids)

4. Moving Forward

Page 49: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Data Collection

• More Shopping related Data-sales category-source-order history for each customer

-promotional history (promotional response)

-Length of membership (loyalty)

4. Moving Forward

Page 50: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Data Collection

• Others-More information about site: why acme stands out while others are not performing well?-Find March-May data and regress the trends again

- Compare with data from 2012/2014 to see if time series/seasonal problems exist

- Conduct further study on those high purchase individuals and according targeted marketing suggestions

- distinguish lapsed customer groups and make relevant strategy

4. Moving Forward

Page 51: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Data CleansingMissing Value

Technical Appendix

There are 17835 missing values

in the original dataset. 8259 of

them are in the new_customer

category and the rest of them

(9576) are in the gross_sales.

NA in the new_customer, according to

the instruction, are considered to be

neither new customers nor returning

customers. They should be the window-

shopping customers or the visitors. So

the 8259 records are assigned as

“Visitors” in the new_customer category.Since there are 8031 rows

containing both missing

categories and assume that

visitors need to log in to

purchase, these rows’

gross_sales should be zero. The

other gross sales are fitted as

the average gross_sales.

1 2

3

Page 52: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Data Cleansing Data Inconsistency

Technical Appendix

There are 2465 records whose

orders are zero but the

gross_sales are not zero. In this

case, the orders are fixed as

the mean of the total order.

After first-step’s filling missing data,

there are still 9 records whose gross

sales are zero but orders are not zero. In

this case, the gross_sales are fixed as

the mean of the total gross sales.

There is a blank category

under the “platform” and it is

regarded as “Unknown”.

1 2

3

Page 53: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Data Cleansing New Variables

Technical Appendix

It is requested that three

new variables should be

created: conversion_rate,

bounce_rate and

add_to_cart_rate.

Because some days have none visit

record which may cause errors in

the rates, the following methods

are taken for calculating the

above three rates if the visit is zero,

based on according definition.

conversion_rate = 0;

bounce_rate = 1;

add_to_cart_rate =0.

Other new variables are also

created for further use :

(mostly use for the time series)

Weekday: the day of the date

(Monday, Tuesday etc.) ;

Month: the month of the date;

Page 54: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Selections for Clusters

Technical Appendix

• Predictor Unadjusted

For 4-cluster and 5-cluster models, I found that the high gross_sales groups are not distinctive from other groups.

For example, both cluster 2 & 3 in 4-cluster models have high value in search_pages_views and product_page_views, yet

Cluster 3 has high gross value while cluster 2 has very low gross sale. So when we get a group of records whose

search _pages_views and product_page_views are high, we could not tell whether the gross value would be high or not.

Apparently 3-cluster model does its job: low in visits, high in search_pages_views and product_page_views, and gross

sales

would be high for the group. So I got rid of 4-cluster and 5-cluster models and chose 3-cluster model.

Page 55: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Selections for Clusters

Technical Appendix

• Why adjusting Predictors?

The values of predictors are standardized before applying the final cluster analysis.

There are majorly four reasons for standardization:

① Commensurate units

② Skewed distribution

③ Variable weighting

④ Facilitating interpretation

Page 56: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Selections for Clusters

Technical Appendix

• Predictor Adjusted

After standardizing the predictors, all models worked perfectly. So I chose 5-cluster model to make

the high gross sales group more distinctive.

Page 57: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Selections for Logistics Regression Model

Technical Appendix

Final selection for Logit ModelAccurate Rate

Weigh(coefficient) of the predictors

Page 58: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Selections for Multiple Linear Regression Model

Technical Appendix

Data summary for Multi-Model

Page 59: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Selections for Multiple Linear Regression Model

Technical Appendix

Outlier Detection

Distribution of Residuals

Page 60: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Selections for Multiple Linear Regression Model

Technical Appendix

Accurate rate

Summary of final Multi-Model

Page 61: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Software used

Technical Appendix

Analytics Rstudio|SAS |MySQL | Python

Data Visualization Tableau | Rstudio | Photoshop

Code is available if needed

Page 62: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Extra part ILittle more about Lily

Teehee

I love painting, designing and doing creative innovations.

check here: http://lilyqianz.com/projects/smart-mailbox/

Extra icons for new_customer category

Page 63: Exploratory Web Analytics Project of Sample data from an ...Exploratory Web Analytics Project of Sample data from an online retailer Lily Qian Zhao *The ‘flower’ was generated

Thanks for reading.

Hope you have enjoyed and wowed :)

Data is beautiful.

Go data diving with