chapter 12
DESCRIPTION
TRANSCRIPT
Data Analysis
Goal today: a briefing What does the manager who will
commission, and pay for, market research need to know on this topic?
Data analysis itself is a specialty Typically conducted by Ph.D.s
Or specialized M.S. If limited to tabulations and routinized
analyses, may be handled by MBAs who ‘learned on the job’
Overview
Preparation of data Common types of data analysis Esoteric (but prevalent) analyses Example of statistical software
Data Preparation
Correct errors Early is better
Keep the raw paper forms (or the record of input)
Decide what to do about missing data Sometimes, discard the whole form
Translate paper entries to numbers Professionally, paper forms are in the process of
disappearing On a shoestring, paper forms will remain a staple
Let Excel be the first step in computerization of paper; but do the analysis in a specialized program
Data cleaning is mandatory
Common Types of Data Analysis
1. Cross-tabulation; comparison of proportions 3 way is about the limit for eyeballing; but n-way can
be statistically analyzed
2. Comparison of means Not limited to two groups, occasions, etc., but can
include n-groups
3. Prediction via regression Includes association via correlation
(2 variable case)
Note on Statistical Significance
Familiar phrases … “This difference is significant at the .05 level,”
“p .01,” “NS”
Many research questions reduce to, “Is there a real difference here?”
Statistical significance is a set of conventions designed to determine whether a nominal difference represents a real difference ‘real’ = a difference that makes a difference, to
some conclusion or decision; a difference that matters; a trustworthy difference
Statistical Significance (cont’d)
Virtually any kind of difference found in data can be tested using a ‘test statistic’ Means t test, F test, Z score Proportions Chi-square
Test statistics have sampling or probability distributions
‘Statistically significant’ means ‘unlikely (improbable) to have occurred by chance’ By convention, the threshold for ‘unlikely’ is
‘fewer than 5 times out of 100’ i.e., .05 ‘Real differences’, therefore, are those unlikely
to have occurred by chance
Cross-Tabulations & Proportions
Examples Agreement percentages by group Demographic composition by group (e.g., gender) Presence/absence of a behavior by group
(readership, attendance, usage, possession) Correspondence across samples
(e.g., whether the sample has the same proportion of small and large families as the Census)
Test Statistic Chi-square (for 2 way tabulations) Log odds or logit (for multi-way tabulations)
Comparison of Means
Examples Quantities that vary across a wide range
Income, age Expenditure Time and distance (hours/tv, miles commuting) Repeated behaviors (glasses wine/week) Ratings (especially averages of multiple items)
Test statistics T-test (two groups or two occasions) F-test [ANOVA] (multiple groups or occasions)
Caveat: never work with means without also examining the distribution Averages can be so misleading …
Regression Analysis
Examples Market sales as a function of prerequisites
Video game units = f(# players installed base, # new players sold, # tweener males, GDP)
Any quantity of interest expressed as an outcome of other factors, of whatever provenance
Customer satisfaction = f(service contract, size of firm, #changes on account team, cost of system)
Goal may be to discover drivers (causal analysis), or to find precursors (prediction)
Test statistic R2, beta coefficient Here, scatter plots, like distributions in the case of means,
serve as a safety check
More Esoteric Types of Data Analysis
Discriminant analysis What factors best differentiate these two groups?
e.g., product users vs. non-users
Cluster analysis What clumps are there, based on multiple
characteristics? e.g., how many different types of customers does a bank
have?
Multidimensional scaling What’s similar and what’s dissimilar, and what
underlying dimensions account for this? Which brands are associated/disassociated?
Content Analysis
Content analysis is a means of quantifying qualitative data
Judges rate or count qualities in accordance with coding rules Brand favorable or unfavorable comments Attributes or problems referenced Social or personal use of the product
Use of multiple judges allows calculation of reliability of judgments
Result is frequency counts that can be analyzed statistically Assuming you have a probability sample
Managerial Perspective on Data Analysis
The key skills are: The ability to spot patterns in data (as in the exhibits
attached to Harvard cases) The ability to infer an explanation for a perceived pattern Having a mental model of what causes what (so that you
can identify what data has to be obtained)
Statistical analysis will generally be outsourced The relevant managerial skills are:
Knowing how to pick good consultants A healthy skepticism, and enough grounding in statistics to
be able to ask pointed questions Never taking nominal differences at face value