september 2, 2003 analyzing customer behavior at...

7
September 2, 2003 Andreas S. Weigend | Chief Scientist, Amazon.com Contact Information is at http://www.weigend.com Analyzing Customer Behavior at Amazon.com Andreas S. Weigend Chief Scientist, Amazon.com Analyzing Customer Behavior at Amazon.com KDD: August 2003 SAS: October 2003 2 Agenda Analyzing Customer Behavior at Amazon.com Andreas S. Weigend Chief Scientist, Amazon.com 1. Data Sources Characterizations 2. Actions E.g., Personalization, Pricing, Promotions… 3. Two data sets for research Share the Love network Ratings 4. Some reflections 5. Questions 3 1. Sources of Data Customer behavior Overall use of the site Buying vs selling Community features Purchase information Session information Individual click information Responses (and non-responses!) to links, ad campaigns, emails, … Customer service contacts Email, phone, product returns Amazon.com performance Page generation time Search results Delivery date relative to promised date Customer Satisfaction 4 How Many Sessions at Amazon.com per Day? Definition of session (also called visit): Begin: With first http request from that day (state kept via cookie) End: Midnight (Pacific time) Q: Number of sessions per day? 4 – 5 M Recognized (know customer ID) 1M Unrecognized (don’t know who) 2M Robots 1 – 2M Q: How long is a “typical session”? What shape of distribution would you expect? Less than 30% of all sessions are associated with a specific customer! 5 1 10 100 1000 10000 100000 1000000 10000000 0 0.5 1 1.5 2 2.5 Session length (number of hits, log base 10) Counts Session Length Distribution 32% of sessions* have a single hit only, more than expected by smooth continuation This indicates a mixture of processes 30 hits 100 hits 10 hits 300 hits 32% 10% *Non-robot and non- internal sessions only February 19, 2003 6 From Individual Sessions to Customers Analyze customer behavior over a period spanning 12 months From Aug 1, 2002 until July 31, 2003 Based on internal research data set created for longitudinal studies 100k customers selected randomly via 3 digits of their customer ID Q: Of the customers who visited in the last 12 months, how many had made a purchase prior to that period? About 50% Q’s: What is the number of visits, what is the number of purchases in last 12 months of previous customers “Previous customer”: To avoid bias due to new accounts, condition on accounts with at least one purchase before Aug 1, 2002

Upload: others

Post on 15-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: September 2, 2003 Analyzing Customer Behavior at Amazondbs.uni-leipzig.de/files/Research/WeigendSAS2003.pdf · Analyzing Customer Behavior Agenda at Amazon.com Andreas S. Weigend

September 2, 2003

Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com

Analyzing Customer Behavior at Amazon.com

Andreas S. WeigendChief Scientist, Amazon.com

Analyzing Customer Behaviorat Amazon.com

KDD: August 2003SAS: October 2003

2

AgendaAnalyzing Customer Behavior at Amazon.com

Andreas S. WeigendChief Scientist, Amazon.com

• 1. DataSources

Characterizations

• 2. ActionsE.g., Personalization, Pricing, Promotions…

• 3. Two data sets for researchShare the Love network

Ratings

• 4. Some reflections

• 5. Questions

3

1. Sources of Data

• Customer behaviorOverall use of the site

Buying vs selling

Community features

Purchase information

Session information

Individual click informationResponses (and non-responses!) to links, ad campaigns, emails, …

Customer service contactsEmail, phone, product returns

• Amazon.com performancePage generation time

Search results

Delivery date relative to promised date

CustomerSatisfaction

4

How Many Sessions at Amazon.com per Day?

Definition of session (also called visit):Begin: With first http request from that day (state kept via cookie)

End: Midnight (Pacific time)

Q: Number of sessions per day?4 – 5 M

Recognized (know customer ID)1M

Unrecognized (don’t know who)2M

Robots1 – 2M

Q: How long is a “typical session”?What shape of distribution would you expect?

Less than 30% of all sessions are associated

with a specific customer!

5

1

10

100

1000

10000

100000

1000000

10000000

0 0.5 1 1.5 2 2.5

Session length (number of hits, log base 10)

Cou

nts

Session Length Distribution

32% of sessions* have a single hit only,more than expected by smooth continuation

This indicates a mixture of processes

30 hits 100 hits10 hits 300 hits

32%10%

*Non-robot and non-internal sessions onlyFebruary 19, 2003 6

From Individual Sessions to Customers

• Analyze customer behavior over a period spanning 12 monthsFrom Aug 1, 2002 until July 31, 2003

Based on internal research data set created for longitudinal studies100k customers selected randomly via 3 digits of their customer ID

Q: Of the customers who visited in the last 12 months, how many had made a purchase prior to that period?

About 50%

Q’s: What is the number of visits, what is the number of purchases in last 12 months of previous customers

“Previous customer”: To avoid bias due to new accounts, condition on accounts with at least one purchase before Aug 1, 2002

Page 2: September 2, 2003 Analyzing Customer Behavior at Amazondbs.uni-leipzig.de/files/Research/WeigendSAS2003.pdf · Analyzing Customer Behavior Agenda at Amazon.com Andreas S. Weigend

September 2, 2003

Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com

Analyzing Customer Behavior at Amazon.com

7

How often did previous customers visit in the past 12 months?

8 36.5% of previous customers did visit but not purchase in past 12 months

How often did previous customers purchase in the past 12 months?

9

4 purchases

5 purchases

Median number of purchases per year: Between 4 and 5

10

How much does each group contribute to the total purchases?

11

Findings

• A typical typical customer? NO!Traditional approaches: Segmentation / Clustering

• A typical random sample? NO!E.g., sample of sessions ≠ sample of customers

12

Insights into the Shopping Process

Goal: Build models and obtain intuitions, in order to drive actions

• What’s easy? What’s hard?Conceptually easy…: Data a lot richer

Data richer both in quantity, and in quality (e.g., relational data)

… but implementation is hard: Clean data, access data, legacy, …

• An example of building up an intuitions about the online shopping process

Q: How long does it take a customer to make a purchase decision?

Page 3: September 2, 2003 Analyzing Customer Behavior at Amazondbs.uni-leipzig.de/files/Research/WeigendSAS2003.pdf · Analyzing Customer Behavior Agenda at Amazon.com Andreas S. Weigend

September 2, 2003

Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com

Analyzing Customer Behavior at Amazon.com

13

How long ago did a customer first look at the detail page of an item she purchased? (Conditioned on purchase)

14

Creating and Maintaining Product Space Awareness

• Q: Time scale of decision making process in shopping?

• Insights20% of items bought today were looked at before

Shopping process extends significantly across timeSession not a good atomic construct

Other relationships (not presented here)Product group

Price

Gender

15

Levels of Analysis: Time Scales and Amount of Data

• Levels of analysis

Customer level

Purchase level

Session level (daily aggregates)

Click level

Presentation level*

*What was displayed,

whether or not it was clicked on

• New data per day

1MB

10… 100MB

1… 10GB

100GB … 1 TB

10+TBAm

ou

nt

of

data

16

Summary: Dimensions of Models

• Time scales

• Flat vs relational

• Static vs dynamic

• Observable vs hidden

• Multi-scale / multi-level models

• Next: Two examples from the ends of the spectrumNo information from the current session

Only information from the current session

17

No Information from Current Session:Customer Profiling and Segmentation

Navigational style (e.g., Searcher vs browser / clicker)

Level of playfulness, of interest in exploring

Leader vs follower

Degree of focus

Degree of price sensitivity

Degree of time sensitivity

Degree of sophistication

Attitude to complexity

Brand conscious

Early adopter

...18

Information Before the First Click:Where Does a Visitors Come From?

Direct: No HTTP-referrer, no Associates tag

Associates: Companies and individuals (1M) (Associates tag)

Megadeals: AOL, MSN,…

Data: Wednesday February 19, 2003

Percentage of all session by referrer pageReferrer

Other

E-Mail

Search Engines

Megadeals

Associates

Direct

10%

4.7%

1.1%

11%

31%

43%

Page 4: September 2, 2003 Analyzing Customer Behavior at Amazondbs.uni-leipzig.de/files/Research/WeigendSAS2003.pdf · Analyzing Customer Behavior Agenda at Amazon.com Andreas S. Weigend

September 2, 2003

Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com

Analyzing Customer Behavior at Amazon.com

19

Only Information from Current Session:Predict Intentions and Modalities of Current Session

• Examples of sessions

Planned vs impulse session

Personal vs job-related session

At home vs at work

Is-in-a-hurry vs has-time-to-kill

Ready to make a decision

• Task: Make dynamic predictions

Prediction about remaining number of pages

Prob (next page is last page)

Prob (buy in this session with coupon), vsProb (buy in this session without coupon)

Evaluation:• Off-line analysis of past data• On-Line experiments 20

CustomerBehavior

MachineLearning,

GameTheory

CustomerIntentions,

State

CompanyStrategic

Goals

CompanyActions

PredictiveModels

andAlgorithms

from Observation to Action

21

2. From Observation to Action: Remarks on Personalization

• Two “projections”Targeting:

Find customer for product, store, site feature,…

Recommending:Find product etc. for customer

• “Collaborative filtering”E.g., NetPerceptions

Sad news from August 2003…

22

23

How Does Amazon.com Make Recommendations?

Recommendations algorithm now published:

Greg Linden, Brent Smith and Jeremy York: Amazon.com Recommendations: Item-to-Item Collaborative Filtering IEEE Internet Computing (January/February 2003) 7 (1) 76-80

Similarity measure: cosine

24

• Purchase similarities vs Session similaritiesCustomers who bought … also bought …

Customers who shopped for … also shopped for …

• Some examples that use Amazon.com dataVisualization

The Hive Group (Ben Shneiderman)

Clustering, based on “Customers who bought … also bought …”orgnet.com (Valdis Krebs)

Relational probabilistic modelsCleverSet, Inc. (Bruce D’Ambrosio)

• Use Amazon Web Services SOAP/XML interface to extract data, build models, create visualizations, build stores

Page 5: September 2, 2003 Analyzing Customer Behavior at Amazondbs.uni-leipzig.de/files/Research/WeigendSAS2003.pdf · Analyzing Customer Behavior Agenda at Amazon.com Andreas S. Weigend

September 2, 2003

Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com

Analyzing Customer Behavior at Amazon.com

25 26 Source: Valdis Krebs, orgnet.com

Book Network derived from “People who bought …also bought … data”

27 28

Agenda

1. DataSources

Characterizations

2. Some remarks on personalization

• 3. Two data sets for the research communityA: Share-the-Love Network

B: Ratings

• 4. Some reflections

• 5. Questions?

29

3. Two Data Sets for the Research Community

Have you ever received an email like this one:

Subject: Claudia Perlich has sent you a 10% discount

Claudia Perlich (your thoughtful pal) just bought the following item at Amazon.com and is using our Share the Love program to pass along an additional 10% discount to you.Click the links below to see more product information on your discount list and purchase the following item by ...

• Amazon Data Set A: Social Network Each time you place an order for books, music, DVDs, or videos with us, we'll offer you the chance to e-mail your friends and give them an additional 10% off the items you bought. (You select which items, of course.)

If any of those people purchases one of those items within a week, you'll receive a credit to use the next time you shop with us!

Your credit will equal the dollar amount of your friend's 10% discount.”

• Amazon Data Set B: Ratings 30

Page 6: September 2, 2003 Analyzing Customer Behavior at Amazondbs.uni-leipzig.de/files/Research/WeigendSAS2003.pdf · Analyzing Customer Behavior Agenda at Amazon.com Andreas S. Weigend

September 2, 2003

Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com

Analyzing Customer Behavior at Amazon.com

31

Amazon Data Set A: Share-the-Love (STL) Network

• Size4.0M nodes

3.2M edges

1.5M items (distinct)

• FieldsIDs (obfuscated)

Sender ID

Receiver ID

ZIP codesSender ZIP

Receiver ZIP

Date(s)When item was bought by sender and STL email was sent

If purchased, when that item was bought by receiver

ItemProduct ID (“ASIN” = Amazon Standard Item Number)

Product group

Price (as of time STL date)

32

Rating of Item (1 … 5 stars)

Helpfulness of Review(by other customers)

33

Amazon Data Set B: Ratings

• RatingsWhen a customer writes a review about an item, she is also asked to rate the item by giving it between 1 and 5 stars

Amazon.com makes available a random sample of 4M of these ratings

• FieldsRater ID (obfuscated)

Date (when rating was submitted)

ItemProduct ID (ASIN)

Price (as of August 6,2003)

Product group

Rating of itemNumber of stars (e.g., , given by this Rater to this Item)

Helpfulness of reviewFeedback from customers who found this Review “helpful” / “not helpful”, computed from:

34

How prolific are Amazon.com reviewers?

Some reviewers really are prolific!

More than1M customersreviewed a single item

***

35

What is the distribution over the number of reviewsreceived for a item?

3,800 reviews for Harry Potter 5

Ranking /Ordering/Surfacing/Presentingof reviews

What is the Shape of the Distribution of Number of Stars?

1 2 3 4 5

counts

1 2 3 4 5

counts

1 2 3 4 5

counts

1 2 3 4 5

counts

Guess?!

Page 7: September 2, 2003 Analyzing Customer Behavior at Amazondbs.uni-leipzig.de/files/Research/WeigendSAS2003.pdf · Analyzing Customer Behavior Agenda at Amazon.com Andreas S. Weigend

September 2, 2003

Andreas S. Weigend | Chief Scientist, Amazon.comContact Information is at http://www.weigend.com

Analyzing Customer Behavior at Amazon.com

37

Amazon Data Set B: Ratings

• Size of training set (Release date: September 30, 2003)3.5M ratings through August 2003

• Sizes of test sets (Release data: January 31, 2004)0.5M ratings from same time period as training set

0.5M ratings after end of training set

• Amazon Cup deadline: March 31, 2004

1 2 3 4 5

counts

Distribution of ratings

Will be revealed at the presentation

38

Tasks and Evaluations

For each test point:Give

Rater ID (obfuscated, consistent with training set)

Date (of submission of rating)

Item (ASIN, Price, Product Group)

• Task 1:

Predict distribution across number of stars

Probabilities of observing 1 star, 2 stars, 3 stars, 4 stars, 5 stars

Evaluated by mean log likelihood of observed data given prediction

1/N Σ log (prob (observed number of stars))

• Task 2:

Predict number of stars (point prediction, e.g., 3.27)

Evaluated by (1) mean absolute error and by (2) mean squared error

39

Discovery and Data Mining

• Your idea here _____________________________________

• Some suggestionsCharacterize / cluster reviewers, items

Find (opinion) leaders, followers

Predict helpfulness of review

Predict which item / product group a customer is likely to review

Understand effect of earlier ratings onto later ratings

Note: Might make the text of reviews available at a later stage

Use Amazon WebServices to access product table and other information

40

Agenda

1. Data sources and characterizations

2. Some remarks on personalization

3. Contest: Two data sets for the research community

• 4. Some reflections

• 5. Questions

41

4. Some Reflections

• DataSynthetic Benchmark Real (not cleaned) Real-time system

• StagesMeasure/Collect Describe/Characterize Predict(+Eval) Act/Control

• Role of experimentsShort-term vs long-term effects

• Goal: Computational marketingRequires multi-disciplinary effort on behavioral analytics

Machine learning, Data mining, Statistics, Control theory

Decision analytics (normative and descriptive)

Behavioral economics

Game theory 42

5. Questions?

• Thank you:Dave Liu (Amazon.com)

Bruce D’Ambrosio (CleverSet, Inc.)

Jimmy Pang (Amazon.com and Stanford)

• Questions?

• Further information, slides, data sets etc.:

Web: www.weigend.com

Email: [email protected], or [email protected]

Mobile phone: +1 (917) 697-3800

This presentation: http://www.weigend.com/WeigendSAS2003.pdf