chapter 12

Data Analysis

Goal today: a briefing What does the manager who will

commission, and pay for, market research need to know on this topic?

Data analysis itself is a specialty Typically conducted by Ph.D.s

Or specialized M.S. If limited to tabulations and routinized

analyses, may be handled by MBAs who ‘learned on the job’

Overview

Preparation of data Common types of data analysis Esoteric (but prevalent) analyses Example of statistical software

Data Preparation

Correct errors Early is better

Keep the raw paper forms (or the record of input)

Decide what to do about missing data Sometimes, discard the whole form

Translate paper entries to numbers Professionally, paper forms are in the process of

disappearing On a shoestring, paper forms will remain a staple

Let Excel be the first step in computerization of paper; but do the analysis in a specialized program

Data cleaning is mandatory

Common Types of Data Analysis

1. Cross-tabulation; comparison of proportions 3 way is about the limit for eyeballing; but n-way can

be statistically analyzed

2. Comparison of means Not limited to two groups, occasions, etc., but can

include n-groups

3. Prediction via regression Includes association via correlation

(2 variable case)

Note on Statistical Significance

Familiar phrases … “This difference is significant at the .05 level,”

“p .01,” “NS”

Many research questions reduce to, “Is there a real difference here?”

Statistical significance is a set of conventions designed to determine whether a nominal difference represents a real difference ‘real’ = a difference that makes a difference, to

some conclusion or decision; a difference that matters; a trustworthy difference

Statistical Significance (cont’d)

Virtually any kind of difference found in data can be tested using a ‘test statistic’ Means t test, F test, Z score Proportions Chi-square

Test statistics have sampling or probability distributions

‘Statistically significant’ means ‘unlikely (improbable) to have occurred by chance’ By convention, the threshold for ‘unlikely’ is

‘fewer than 5 times out of 100’ i.e., .05 ‘Real differences’, therefore, are those unlikely

to have occurred by chance

Cross-Tabulations & Proportions

Examples Agreement percentages by group Demographic composition by group (e.g., gender) Presence/absence of a behavior by group

(readership, attendance, usage, possession) Correspondence across samples

(e.g., whether the sample has the same proportion of small and large families as the Census)

Test Statistic Chi-square (for 2 way tabulations) Log odds or logit (for multi-way tabulations)

Comparison of Means

Examples Quantities that vary across a wide range

Income, age Expenditure Time and distance (hours/tv, miles commuting) Repeated behaviors (glasses wine/week) Ratings (especially averages of multiple items)

Test statistics T-test (two groups or two occasions) F-test [ANOVA] (multiple groups or occasions)

Caveat: never work with means without also examining the distribution Averages can be so misleading …

Regression Analysis

Examples Market sales as a function of prerequisites

Video game units = f(# players installed base, # new players sold, # tweener males, GDP)

Any quantity of interest expressed as an outcome of other factors, of whatever provenance

Customer satisfaction = f(service contract, size of firm, #changes on account team, cost of system)

Goal may be to discover drivers (causal analysis), or to find precursors (prediction)

Test statistic R2, beta coefficient Here, scatter plots, like distributions in the case of means,

serve as a safety check

More Esoteric Types of Data Analysis

Discriminant analysis What factors best differentiate these two groups?

e.g., product users vs. non-users

Cluster analysis What clumps are there, based on multiple

characteristics? e.g., how many different types of customers does a bank

have?

Multidimensional scaling What’s similar and what’s dissimilar, and what

underlying dimensions account for this? Which brands are associated/disassociated?

Content Analysis

Content analysis is a means of quantifying qualitative data

Judges rate or count qualities in accordance with coding rules Brand favorable or unfavorable comments Attributes or problems referenced Social or personal use of the product

Use of multiple judges allows calculation of reliability of judgments

Result is frequency counts that can be analyzed statistically Assuming you have a probability sample

Managerial Perspective on Data Analysis

The key skills are: The ability to spot patterns in data (as in the exhibits

attached to Harvard cases) The ability to infer an explanation for a perceived pattern Having a mental model of what causes what (so that you

can identify what data has to be obtained)

Statistical analysis will generally be outsourced The relevant managerial skills are:

Knowing how to pick good consultants A healthy skepticism, and enough grounding in statistics to

be able to ask pointed questions Never taking nominal differences at face value

chapter 12

Technology