abdm4064 week 11 data analysis

198
Data Analysis Data Analysis ABDM4064 BUSINESS RESEARCH ABDM4064 BUSINESS RESEARCH by Stephen Ong Principal Lecturer (Specialist) Visiting Professor, Shenzhen University

Upload: stephen-ong

Post on 27-Jan-2015

109 views

Category:

Business


3 download

DESCRIPTION

Data analysis

TRANSCRIPT

Page 1: Abdm4064 week 11 data analysis

Data AnalysisData AnalysisData AnalysisData Analysis

ABDM4064 BUSINESS RESEARCHABDM4064 BUSINESS RESEARCH

byStephen Ong

Principal Lecturer (Specialist)Visiting Professor, Shenzhen University

Page 2: Abdm4064 week 11 data analysis

19–2

LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES

1. Know when a response is really an error and should be edited

2. Appreciate coding of pure qualitative research

3. Understand the way data are represented in a data file

4. Understand the coding of structured responses including a dummy variable approach

5. Appreciate the ways that technological advances have simplified the coding process

After studying this chapter, you should be able to

Page 3: Abdm4064 week 11 data analysis

6. Know what descriptive statistics are and why they are used

7. Create and interpret simple tabulation tables

8. Understand how cross-tabulations can reveal relationships

9. Perform basic data transformations

10. List different computer software products designed for descriptive statistical analysis

11. Understand a researcher’s role in interpreting the data

12. Implement the hypothesis-testing procedure

13. Use p-values to assess statistical significance

19–3

LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES

Page 4: Abdm4064 week 11 data analysis

14. Test a hypothesis about an observed mean compared to some standard

15. Know the difference between Type I and Type II errors

16. Know when a univariate χ2 test is appropriate and how to conduct one

17. Recognize when a bivariate statistical test is appropriate

18. Calculate and interpret a χ2 test for a contingency table

19. Calculate and interpret an independent samples t-test comparing two means

20. Understand the concept of analysis of variance (ANOVA)

21. Interpret an ANOVA table

19–4

LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES

Page 5: Abdm4064 week 11 data analysis

22. Apply and interpret simple bivariate correlations

23. Interpret a correlation matrix

24. Understand simple (bivariate) regression

25. Understand the least-squares estimation technique

26. Interpret regression output including the tests of hypotheses tied to specific parameter coefficients

27. Understand what multivariate statistical analysis involves and know the two types of multivariate analysis

28. Interpret results from multiple regression analysis

29. Interpret results from multivariate analysis of variance (MANOVA)

19–5

LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES

Page 6: Abdm4064 week 11 data analysis

30. Interpret basic exploratory factor analysis results

31. Know what multiple discriminant analysis can be used to do

32. Understand how cluster analysis can identify market segments

19–6

LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES

Page 7: Abdm4064 week 11 data analysis

Remember this,Remember this,

Garbage in, garbage out!Garbage in, garbage out! If data is collected improperly, or coded If data is collected improperly, or coded

incorrectly, then the research results incorrectly, then the research results are “garbage”.are “garbage”.

Page 8: Abdm4064 week 11 data analysis

Stages of Data AnalysisStages of Data Analysis Raw Data

The unedited responses from a respondent exactly as indicated by that respondent.

Nonrespondent Error Error that the respondent is not responsible

for creating, such as when the interviewer marks a response incorrectly.

Data Integrity The notion that the data file actually contains

the information that the researcher is trying to obtain to adequately address research questions.

Page 9: Abdm4064 week 11 data analysis

19–9

EXHIBIT 19.EXHIBIT 19.11 Overview of the Stages of Data AnalysisOverview of the Stages of Data Analysis

Page 10: Abdm4064 week 11 data analysis

EditingEditing Editing

The process of checking the completeness, consistency, and legibility of data and making the data ready for coding and transfer to storage.

E.g. How long you have stayed at your current address? 45

The researchers need to make adjustment/reconstruct responses

Field Editing – useful in personal interview

Preliminary editing by a field supervisor on the same day as the interview to catch technical omissions, check legibility of handwriting, and clarify responses that are logically or conceptually inconsistent.

Page 11: Abdm4064 week 11 data analysis

In-House Editing

A rigorous editing job performed by a centralized office staff.

Page 12: Abdm4064 week 11 data analysis

Editing – what to do?Editing – what to do? Checking for Consistency

Respondents match defined population – e.g. SBS?

Check for consistency within the data collection framework – e.g. items listed by the respondents are within the definition.

Taking Action When Response is Obviously in Error Change/correct responses only when there are

multiple pieces of evidence for doing so. Editing Technology

Computer routines can check for consistency automatically.

Page 13: Abdm4064 week 11 data analysis

19–13

Editing for CompletenessEditing for Completeness Item Nonresponse

The technical term for an unanswered question on an otherwise complete questionnaire resulting in missing data.

Most of the time the researchers will do nothing to it. But sometimes the question is linked to another question

therefore the researchers have to fill-in-the blank. Plug Value

An answer that an editor “plugs in” to replace blanks or missing values so as to permit data analysis.

Choice of value is based on a predetermined decision rule, e.g. take an average value or neutral value.

Several choices: Leave it blank Plug in alternate choices. Randomly select an answer. Impute a missing value.

Page 14: Abdm4064 week 11 data analysis

Impute

To fill in a missing data point through the use of a statistical process providing an educated guess for the missing response based on available information.

I.e. based on the respondent’s choices to other questions.

Page 15: Abdm4064 week 11 data analysis

Editing for Completeness Editing for Completeness (cont’d)(cont’d)

What about missing data? Many statistical software programs required complete data for an analysis to take place.

List-wise deletion The entire record for a respondent that has left a

response missing is excluded from use in statistical analysis.

Pair-wise deletion Only the actual variables for a respondent that

do not contain information are eliminated from use in statistical analysis.

Page 16: Abdm4064 week 11 data analysis

Please take note,Please take note,

When a questionnaire has too many When a questionnaire has too many missing answer, it may not be suitable missing answer, it may not be suitable for the planned data analysis. In such for the planned data analysis. In such situation, that particular questionnaire situation, that particular questionnaire has to be dropped from the sample.has to be dropped from the sample.

Page 17: Abdm4064 week 11 data analysis

Facilitating the Coding Facilitating the Coding ProcessProcess

Editing And Tabulating “Don’t Know” Answers Legitimate don’t know (no opinion) Reluctant don’t know (refusal to answer) Confused don’t know (does not

understand)

Page 18: Abdm4064 week 11 data analysis

Editing (cont’d)Editing (cont’d) Pitfalls of Editing

Allowing subjectivity to enter into the editing process. Data editors should be intelligent, experienced, and

objective. A systematic procedure for assessing the

questionnaire should be developed by the research analyst so that the editor has clearly defined decision rules.

Pretesting Edit Editing during the pretest stage can prove very

valuable for improving questionnaire format, identifying poor instructions or inappropriate question wording.

Page 19: Abdm4064 week 11 data analysis

Coding Qualitative ResponsesCoding Qualitative Responses Coding

The process of assigning a numerical score or other character symbol to previously edited data.

Codes Rules for interpreting, classifying, and

recording data in the coding process. The actual numerical or other character

symbols assigned to raw data. Dummy Coding

Numeric “1” or “0” coding where each number represents an alternate response such as “female” or “male.”

If k is the number of categories for a qualitative variable, k-1 dummy variables are needed.

Page 20: Abdm4064 week 11 data analysis

Data File TerminologyData File Terminology Field

A collection of characters that represents a single type of data—usually a variable.

String Characters Computer terminology to represent formatting

a variable using a series of alphabetic characters (nonnumeric characters) that may form a word.

Record A collection of related fields that represents

the responses from one sampling unit.

Page 21: Abdm4064 week 11 data analysis

Data File Terminology (cont’d)Data File Terminology (cont’d)

Data File The way a data set is stored electronically

in spreadsheet-like form in which the rows represent sampling units and the columns represent variables.

Value Labels Unique labels assigned to each possible

numeric code for a response.

Page 22: Abdm4064 week 11 data analysis

Code ConstructionCode Construction Two Basic Rules for Coding Categories:

1. They should be exhaustive, meaning that a coding category should exist for all possible responses.

2. They should be mutually exclusive and independent, meaning that there should be no overlap among the categories to ensure that a subject or response can be placed in only one category.

Test Tabulation – especially useful for open-ended questions

Tallying of a small sample of the total number of replies to a particular question in order to construct coding categories.

Purpose is to preliminarily identify the stability and distribution of answers that will determine a coding scheme.

Page 23: Abdm4064 week 11 data analysis

Test Tabulation

E.g. 1st respondent: I don’t like to use Facebook

because it is wasting time. 2nd respondent: I don’t know what is Facebook. 3rd respondent: Facebook takes me a lot of time.

Based on the above 3 answer, you can have 2 groups of answer: 1st group: Time factor 2nd group: No knowledge on Facebook

Page 24: Abdm4064 week 11 data analysis

Devising the Coding SchemeDevising the Coding Scheme A coding scheme should not be too

elaborate. The coder’s task is only to summarize the

data. Categories should be sufficiently

unambiguous that coders will not classify items in different ways.

Code book Identifies each variable in a study and gives

the variable’s description, code name, and position in the data matrix.

Page 25: Abdm4064 week 11 data analysis

The Nature of Descriptive The Nature of Descriptive AnalysisAnalysis

Descriptive Analysis The elementary transformation of raw data

in a way that describes the basic characteristics such as central tendency, distribution, and variability.

Histogram A graphical way of showing a frequency

distribution in which the height of a bar corresponds to the observed frequency of the category.

Page 26: Abdm4064 week 11 data analysis

20–26

EXHIBIT 20.EXHIBIT 20.11 Levels of Scale Measurement and Suggested Descriptive StatisticsLevels of Scale Measurement and Suggested Descriptive Statistics

Page 27: Abdm4064 week 11 data analysis

Creating and Interpreting Creating and Interpreting TabulationTabulation

Tabulation The orderly arrangement of data in a table or

other summary format showing the number of responses to each response category.

Tallying is the term when the process is done by hand.

Frequency Table A table showing the different ways

respondents answered a question. Sometimes called a marginal tabulation.

Page 28: Abdm4064 week 11 data analysis

Frequency Table ExampleFrequency Table Example

Page 29: Abdm4064 week 11 data analysis

Cross-TabulationCross-Tabulation Cross-Tabulation

Addresses research questions involving relationships among multiple less-than interval variables.

Results in a combined frequency table displaying one variable in rows and another variable in columns.

Contingency Table A data matrix that displays the frequency of some

combination of responses to multiple variables. Marginals

Row and column totals in a contingency table, which are shown in its margins.

Page 30: Abdm4064 week 11 data analysis

20–30

EXHIBIT 20.EXHIBIT 20.22 Cross-Tabulation Tables from a Survey Regarding AIG and Cross-Tabulation Tables from a Survey Regarding AIG and Government BailoutsGovernment Bailouts

Page 31: Abdm4064 week 11 data analysis

20–31

EXHIBIT 20.EXHIBIT 20.33 Different Ways of Depicting the Cross-Tabulation of Biological Sex Different Ways of Depicting the Cross-Tabulation of Biological Sex and Target Patronageand Target Patronage

Page 32: Abdm4064 week 11 data analysis

Cross-Tabulation (cont’d)Cross-Tabulation (cont’d) Percentage Cross-Tabulations

Statistical base – the number of respondents or observations (in a row or column) used as a basis for computing percentages.

Elaboration and Refinement Elaboration analysis – an analysis of the

basic cross-tabulation for each level of a variable not previously considered, such as subgroups of the sample.

Moderator variable – a third variable that changes the nature of a relationship between the original independent and dependent variables.

Page 33: Abdm4064 week 11 data analysis

EXHIBIT 20.EXHIBIT 20.44 Cross-Tabulation of Marital Status, Sex, and Responses to the Cross-Tabulation of Marital Status, Sex, and Responses to the Question “Do You Shop at Target?”Question “Do You Shop at Target?”

Page 34: Abdm4064 week 11 data analysis

Cross-Tabulation (cont’d)Cross-Tabulation (cont’d) How Many Cross-Tabulations?

Every possible response becomes a possible explanatory variable.

When hypotheses involve relationships among two categorical variables, cross-tabulations are the right tool for the job.

Quadrant Analysis An extension of cross-tabulation in which

responses to two rating-scale questions are plotted in four quadrants of a two-dimensional table.

Importance-performance analysis

Page 35: Abdm4064 week 11 data analysis

EXHIBIT 20.EXHIBIT 20.55 An Importance-Performance or Quadrant Analysis of HotelsAn Importance-Performance or Quadrant Analysis of Hotels

Page 36: Abdm4064 week 11 data analysis

20–36

Data TransformationData Transformation Data Transformation

Process of changing the data from their original form to a format suitable for performing a data analysis addressing research objectives.

Bimodal

Page 37: Abdm4064 week 11 data analysis

20–37

Problems with Data Problems with Data TransformationsTransformations

Median Split Dividing a data set into two categories by placing

respondents below the median in one category and respondents above the median in another.

The approach is best applied only when the data do indeed exhibit bimodal characteristics.

Inappropriate collapsing of continuous variables into categorical variables ignores the information contained within the untransformed values.

Page 38: Abdm4064 week 11 data analysis

20–38

EXHIBIT 20.EXHIBIT 20.66 Bimodal Distributions Are Consistent with Bimodal Distributions Are Consistent with Transformations into Categorical ValuesTransformations into Categorical Values

Page 39: Abdm4064 week 11 data analysis

20–39

EXHIBIT 20.EXHIBIT 20.77 The Problem with Median Splits with Unimodal DataThe Problem with Median Splits with Unimodal Data

Page 40: Abdm4064 week 11 data analysis

20–40

Index NumbersIndex Numbers Index Numbers

Scores or observations recalibrated to indicate how they relate to a base number.

Price indexes Represent simple data transformations that

allow researchers to track a variable’s value over time and compare a variable(s) with other variables.

Recalibration allows scores or observations to be related to a certain base period or base number.

Page 41: Abdm4064 week 11 data analysis

20–41

EXHIBIT 20.EXHIBIT 20.88 Hours of Television Usage per WeekHours of Television Usage per Week

Page 42: Abdm4064 week 11 data analysis

20–42

Calculating Rank OrderCalculating Rank Order

Rank Order Ranking data can be summarized by

performing a data transformation. The transformation involves multiplying

the frequency by the ranking score for each choice resulting in a new scale.

Page 43: Abdm4064 week 11 data analysis

20–43

EXHIBIT 20.EXHIBIT 20.99 Executive Rankings of Potential Conference DestinationsExecutive Rankings of Potential Conference Destinations

Page 44: Abdm4064 week 11 data analysis

20–44

EXHIBIT 20.EXHIBIT 20.1010 Frequencies of Conference Destination RankingsFrequencies of Conference Destination Rankings

Page 45: Abdm4064 week 11 data analysis

20–45

EXHIBIT 20.EXHIBIT 20.1111 Pie Charts Work Well with Tabulations and Cross-TabulationsPie Charts Work Well with Tabulations and Cross-Tabulations

Page 46: Abdm4064 week 11 data analysis

20–46

Computer Programs for Computer Programs for AnalysisAnalysis

Statistical Packages Spreadsheets

Excel Statistical software:

SAS SPSS (Statistical

Package for Social Sciences)

MINITAB

Page 47: Abdm4064 week 11 data analysis

20–47

Computer Graphics and Computer Graphics and Computer MappingComputer Mapping

Box and Whisker Plots Graphic representations of central

tendencies, percentiles, variabilities, and the shapes of frequency distributions.

Interquartile Range A measure of variability.

Outlier A value that lies outside the normal range

of the data.

Page 48: Abdm4064 week 11 data analysis

20–48

EXHIBIT 20.15EXHIBIT 20.15 Computer Drawn Computer Drawn Box and Whisker Box and Whisker

PlotPlot

Page 49: Abdm4064 week 11 data analysis

SPSS WindowsSPSS Windows The main program in SPSS is FREQUENCIES. It produces a The main program in SPSS is FREQUENCIES. It produces a

table of frequency counts, percentages, and cumulative table of frequency counts, percentages, and cumulative percentages for the values of each variable. It gives all of the percentages for the values of each variable. It gives all of the associated statistics. associated statistics.

If the data are interval scaled and only the summary statistics If the data are interval scaled and only the summary statistics are desired, the DESCRIPTIVES procedure can be used. are desired, the DESCRIPTIVES procedure can be used.

The EXPLORE procedure produces summary statistics and The EXPLORE procedure produces summary statistics and graphical displays, either for all of the cases or separately for graphical displays, either for all of the cases or separately for groups of cases. Mean, median, variance, standard deviation, groups of cases. Mean, median, variance, standard deviation, minimum, maximum, and range are some of the statistics that minimum, maximum, and range are some of the statistics that can be calculated. can be calculated.

Page 50: Abdm4064 week 11 data analysis

SPSS WindowsSPSS WindowsTo select these procedures click:To select these procedures click:

Analyze>Descriptive Statistics>FrequenciesAnalyze>Descriptive Statistics>FrequenciesAnalyze>Descriptive Statistics>DescriptivesAnalyze>Descriptive Statistics>DescriptivesAnalyze>Descriptive Statistics>ExploreAnalyze>Descriptive Statistics>Explore

The major cross-tabulation program is CROSSTABS.The major cross-tabulation program is CROSSTABS.This program will display the cross-classification tables and This program will display the cross-classification tables and provide cell counts, row and column percentages, the provide cell counts, row and column percentages, the chi-square test for significance, and all the measures of the chi-square test for significance, and all the measures of the strength of the association that have been discussed. strength of the association that have been discussed.

To select these procedures, click:To select these procedures, click:

Analyze>Descriptive Statistics>CrosstabsAnalyze>Descriptive Statistics>Crosstabs

Page 51: Abdm4064 week 11 data analysis

SPSS WindowsSPSS WindowsThe major program for conducting parametric tests in SPSS is The major program for conducting parametric tests in SPSS is COMPARE MEANS. This program can be used to conduct COMPARE MEANS. This program can be used to conduct tt tests tests on one sample or independent or paired samples. To select these on one sample or independent or paired samples. To select these procedures using SPSS for Windows, click:procedures using SPSS for Windows, click:

Analyze>Compare Means>Means …Analyze>Compare Means>Means …

Analyze>Compare Means>One-Sample T Test …Analyze>Compare Means>One-Sample T Test …

Analyze>Compare Means>Independent-Samples T Test …Analyze>Compare Means>Independent-Samples T Test …

Analyze>Compare Means>Paired-Samples T Test …Analyze>Compare Means>Paired-Samples T Test …

Page 52: Abdm4064 week 11 data analysis

SPSS WindowsSPSS WindowsThe nonparametric tests discussed in this chapter canThe nonparametric tests discussed in this chapter canbe conducted using NONPARAMETRIC TESTS. be conducted using NONPARAMETRIC TESTS.

To select these procedures using SPSS for Windows,To select these procedures using SPSS for Windows,click:click:

Analyze>Nonparametric Tests>Chi-Square …Analyze>Nonparametric Tests>Chi-Square …

Analyze>Nonparametric Tests>Binomial …Analyze>Nonparametric Tests>Binomial …

Analyze>Nonparametric Tests>Runs …Analyze>Nonparametric Tests>Runs …

Analyze>Nonparametric Tests>1-Sample K-S …Analyze>Nonparametric Tests>1-Sample K-S …

Analyze>Nonparametric Tests>2 Independent Samples …Analyze>Nonparametric Tests>2 Independent Samples …

Analyze>Nonparametric Tests>2 Related Samples …Analyze>Nonparametric Tests>2 Related Samples …

Page 53: Abdm4064 week 11 data analysis

1 - 53

Page 54: Abdm4064 week 11 data analysis

SPSS Windows: SPSS Windows: FrequenciesFrequencies

1.1. Select ANALYZE on the SPSS menu bar.Select ANALYZE on the SPSS menu bar.

2.2. Click DESCRIPTIVE STATISTICS and Click DESCRIPTIVE STATISTICS and select FREQUENCIES.select FREQUENCIES.

3.3. Move the variable “Familiarity [familiar]” Move the variable “Familiarity [familiar]” to the VARIABLE(s) box.to the VARIABLE(s) box.

4.4. Click STATISTICS.Click STATISTICS.

5.5. Select MEAN, MEDIAN, MODE, STD. Select MEAN, MEDIAN, MODE, STD. DEVIATION, VARIANCE, and RANGE.DEVIATION, VARIANCE, and RANGE.

Page 55: Abdm4064 week 11 data analysis

SPSS Windows: SPSS Windows: Frequencies Frequencies

6.6. Click CONTINUE.Click CONTINUE.

7.7. Click CHARTS.Click CHARTS.

8.8. Click HISTOGRAMS, then click CONTINUE.Click HISTOGRAMS, then click CONTINUE.

9.9. Click OK.Click OK.

Page 56: Abdm4064 week 11 data analysis

Introduction of a Third Variable in Introduction of a Third Variable in Cross-TabulationCross-Tabulation

Page 57: Abdm4064 week 11 data analysis

1 - 57

Page 58: Abdm4064 week 11 data analysis

SPSS Windows: Cross-SPSS Windows: Cross-tabulationstabulations

1.1. Select ANALYZE on the SPSS menu bar.Select ANALYZE on the SPSS menu bar.

2.2. Click on DESCRIPTIVE STATISTICS and select Click on DESCRIPTIVE STATISTICS and select CROSSTABS.CROSSTABS.

3.3. Move the variable “Internet Usage Group [iusagegr]” to Move the variable “Internet Usage Group [iusagegr]” to the ROW(S) box.the ROW(S) box.

4.4. Move the variable “Sex[sex]” to the COLUMN(S) box.Move the variable “Sex[sex]” to the COLUMN(S) box.

5.5. Click on CELLS.Click on CELLS.

6.6. Select OBSERVED under COUNTS and COLUMN under Select OBSERVED under COUNTS and COLUMN under PERCENTAGES. PERCENTAGES.

Page 59: Abdm4064 week 11 data analysis

SPSS Windows: Cross-SPSS Windows: Cross-tabulations tabulations

7.7. Click CONTINUE.Click CONTINUE.

8.8. Click STATISTICS.Click STATISTICS.

9.9. Click on CHI-SQUARE, PHI AND CRAMER’S Click on CHI-SQUARE, PHI AND CRAMER’S VV..

10.10. Click CONTINUE.Click CONTINUE.

11.11. Click OK.Click OK.

Page 60: Abdm4064 week 11 data analysis

20–60

InterpretationInterpretation Interpretation

The process of drawing inferences from the analysis results.

Inferences drawn from interpretations lead to managerial implications and decisions.

From a management perspective, the qualitative meaning of the data and their managerial implications are an important aspect of the interpretation.

Page 61: Abdm4064 week 11 data analysis

Hypothesis TestingHypothesis Testing Types of Hypotheses

Relational hypotheses Examine how changes in one variable vary with

changes in another. Hypotheses about differences between

groups Examine how some variable varies from one group

to another. Hypotheses about differences from some

standard Examine how some variable differs from some

preconceived standard. These tests typify univariate statistical tests.

Page 62: Abdm4064 week 11 data analysis

21–62

Types of Statistical AnalysisTypes of Statistical Analysis Univariate Statistical Analysis

Tests of hypotheses involving only one variable.

Testing of statistical significance

Bivariate Statistical Analysis Tests of hypotheses involving two variables.

Multivariate Statistical Analysis Statistical analysis involving three or more

variables or sets of variables.

Page 63: Abdm4064 week 11 data analysis

21–63

The Hypothesis-Testing The Hypothesis-Testing ProcedureProcedure

Process1. The specifically stated hypothesis is derived

from the research objectives.2. A sample is obtained and the relevant

variable is measured. 3. The measured sample value is compared to

the value either stated explicitly or implied in the hypothesis. If the value is consistent with the hypothesis, the

hypothesis is supported. If the value is not consistent with the hypothesis,

the hypothesis is not supported.

Page 64: Abdm4064 week 11 data analysis

21–64

Statistical Analysis: Key TermsStatistical Analysis: Key Terms Hypothesis

Unproven proposition: a supposition that tentatively explains certain facts or phenomena.

An assumption about nature of the world.

Null Hypothesis Statement about the status quo. No difference in sample and population.

Alternative Hypothesis Statement that indicates the opposite of the

null hypothesis.

Page 65: Abdm4064 week 11 data analysis

21–65

Significance Levels and p-Significance Levels and p-valuesvalues Significance Level

A critical probability associated with a statistical hypothesis test that indicates how likely an inference supporting a difference between an observed value and some statistical expectation is true.

The acceptable level of Type I error. p-value

Probability value, or the observed or computed significance level.

p-values are compared to significance levels to test hypotheses.

Higher p-values equal more support for an hypothesis.

Page 66: Abdm4064 week 11 data analysis

21–66

EXHIBIT 21.EXHIBIT 21.11 pp-Values and Statistical Tests-Values and Statistical Tests

Page 67: Abdm4064 week 11 data analysis

21–67

EXHIBIT 21.EXHIBIT 21.22

As the observed mean gets further from the standard (proposed population mean), the p-value decreases. The lower the p-value, the more confidence you have that the sample mean is different.

Page 68: Abdm4064 week 11 data analysis

21–68

An Example of Hypothesis TestingAn Example of Hypothesis TestingThe null hypothesis: the mean is equal to 3.0:

The alternative hypothesis: the mean does not equal to 3.0:

Page 69: Abdm4064 week 11 data analysis

21–69

An Example of Hypothesis TestingAn Example of Hypothesis Testing

Page 70: Abdm4064 week 11 data analysis

21–70

EXHIBIT 21.EXHIBIT 21.33 A Hypothesis Test Using the Sampling Distribution of A Hypothesis Test Using the Sampling Distribution of XX under the Hypothesis under the Hypothesis µµ = = 3.03.0

Critical Values Critical Values Values that lie Values that lie exactly on the exactly on the boundary of the boundary of the region of rejection.region of rejection.

Page 71: Abdm4064 week 11 data analysis

Type I and Type II ErrorsType I and Type II Errors

Type I Error An error caused by rejecting the null

hypothesis when it is true.

Has a probability of alpha (α).

Practically, a Type I error occurs when the researcher concludes that a relationship or difference exists in the population when in reality it does not exist.

““There really are no monsters under the bed.”There really are no monsters under the bed.”

Page 72: Abdm4064 week 11 data analysis

Type I and Type II Errors Type I and Type II Errors (cont’d)(cont’d)

Type II Error An error caused by failing to reject the null

hypothesis when the alternative hypothesis is true.

Has a probability of beta (β).

Practically, a Type II error occurs when a researcher concludes that no relationship or difference exists when in fact one does exist.

““There really are monsters under the bed.”There really are monsters under the bed.”

Page 73: Abdm4064 week 11 data analysis

EXHIBIT 21.EXHIBIT 21.44 Type I and Type II Errors in Hypothesis TestingType I and Type II Errors in Hypothesis Testing

Page 74: Abdm4064 week 11 data analysis

21–74

Choosing the Appropriate Choosing the Appropriate Statistical TechniqueStatistical Technique

Choosing the correct statistical technique requires considering: Type of question to be answered

E.g. Ranking question – rank order test Number of variables involved

One variable – univariate statistical analysis Two variable – bivariate statistical analysis More than two variables – multivariate analysis

Level of scale measurement E.g. in nominal scale, mean and median is

meaningless.

Page 75: Abdm4064 week 11 data analysis

21–75

Parametric versus Parametric versus Nonparametric TestsNonparametric Tests

Parametric Statistics Involve numbers with known, continuous

distributions. Appropriate when:

Data are interval or ratio scaled.Sample size is large.

Nonparametric Statistics Appropriate when the variables being analyzed

do not conform to any known or continuous distribution.

Page 76: Abdm4064 week 11 data analysis

EXHIBIT 21.EXHIBIT 21.55

Univariate Statistical Choice Made EasyUnivariate Statistical Choice Made Easy

Page 77: Abdm4064 week 11 data analysis

21–77

The The tt-Distribution-Distribution t-test

A hypothesis test that uses the t-distribution.

A univariate t-test is appropriate when the variable being analyzed is interval or ratio.

Degrees of freedom (d.f.) The number of

observations minus the number of constraints or assumptions needed to calculate a statistical term.

Page 78: Abdm4064 week 11 data analysis

21–78

EXHIBIT 21.EXHIBIT 21.66 The t-Distribution for Various Degrees of FreedomThe t-Distribution for Various Degrees of Freedom

Page 79: Abdm4064 week 11 data analysis

21–79

Calculating a Confidence Interval Estimate Calculating a Confidence Interval Estimate Using the Using the tt-Distribution-Distribution

Page 80: Abdm4064 week 11 data analysis

Calculating a Confidence Interval Estimate Calculating a Confidence Interval Estimate Using the t-Distribution (cont’d)Using the t-Distribution (cont’d)

28.5)18

81.2(12.289.3

49.2)18

81.2(12.289.3

Page 81: Abdm4064 week 11 data analysis

21–81

One-Tailed Univariate One-Tailed Univariate tt-Tests-Tests One-tailed Test

Appropriate when a research hypothesis implies that an observed mean can only be greater than or less than a hypothesized value.

E.g. “Females score higher than males in English Test”

Only one of the “tails” of the bell-shaped normal curve is relevant.

A one-tailed test can be determined from a two-tailed test result by taking half of the observed p-value.

When there is any doubt about whether a one- or two-tailed test is appropriate, opt for the less conservative two-tailed test.

Page 82: Abdm4064 week 11 data analysis

21–82

Two-Tailed Univariate Two-Tailed Univariate tt-Tests-Tests Two-tailed Test

Tests for differences from the population mean that are either greater or less. i.e. Identify whether there is any difference.

E.g. The English test scores of females are different from the scores of males.

Extreme values of the normal curve (or tails) on both the right and the left are considered.

When a research question does not specify whether a difference should be greater than or less than, a two-tailed test is most appropriate.

When the researcher has any doubt about whether a one- or two-tailed test is appropriate, he or she should opt for the less conservative two-tailed test.

Page 83: Abdm4064 week 11 data analysis

Univariate Hypothesis Test Univariate Hypothesis Test Utilizing the Utilizing the tt-Distribution-Distribution

Example: Suppose a Pizza Inn manager believes the

average number of returned pizzas each day to be 20.

The store records the number of defective assemblies for each of the 25 days it was opened in a given month.

The mean was calculated to be 22, and the standard deviation to be 5.

Page 84: Abdm4064 week 11 data analysis

20 0 :H

Univariate Hypothesis Test Univariate Hypothesis Test Utilizing theUtilizing the t t-Distribution: An -Distribution: An

ExampleExampleThe sample mean is

equal to 20.The sample mean is

equal not to 20.

20 1 :H

nSSX / 25/5 1

Page 85: Abdm4064 week 11 data analysis

Univariate Hypothesis Test Univariate Hypothesis Test Utilizing the Utilizing the tt-Distribution: An -Distribution: An

Example (cont’d)Example (cont’d) The researcher desired a 95 percent

confidence; the significance level becomes 0.05.

The researcher must then find the upper and lower limits of the confidence interval to determine the region of rejection. Thus, the value of t is needed. For 24 degrees of freedom (n-1= 25-1),

the t-value is 2.064.

Page 86: Abdm4064 week 11 data analysis

Univariate Hypothesis Test Utilizing Univariate Hypothesis Test Utilizing thethe t t-Distribution: An Example -Distribution: An Example

(cont’d)(cont’d)93617

25

5064220 ....

Xlc StLower limit

=

0642225

5064220 ....

Xlc StUpper limit

=

Page 87: Abdm4064 week 11 data analysis

Univariate Hypothesis Test Univariate Hypothesis Test Utilizing theUtilizing the t t-Distribution: -Distribution:

An Example (cont’d)An Example (cont’d)Univariate Hypothesis Test Univariate Hypothesis Test tt-Test-Test

X

obs S

Xt

1

2022

1

2 2

This is less than the critical t-value of 2.064 at the 0.05 level with 24 degrees of freedom hypothesis is not supported.

Page 88: Abdm4064 week 11 data analysis

21–88

The Chi-Square Test for The Chi-Square Test for Goodness of FitGoodness of Fit

Chi-square (χ2) test Tests for statistical significance. Is particularly appropriate for testing

hypotheses about frequencies arranged in a frequency or contingency table.

Goodness-of-Fit (GOF) A general term representing how well some

computed table or matrix of values matches some population or predetermined table or matrix of the same size.

Page 89: Abdm4064 week 11 data analysis

The Chi-Square Test for The Chi-Square Test for Goodness of Fit: An ExampleGoodness of Fit: An Example

Page 90: Abdm4064 week 11 data analysis

The Chi-Square Test for Goodness of The Chi-Square Test for Goodness of Fit: An Example (cont’d)Fit: An Example (cont’d)

i

ii( ²

E

E )²O

χ² = chi-square statisticsOi = observed frequency in the ith cellEi = expected frequency on the ith cell

Page 91: Abdm4064 week 11 data analysis

n

CRE jiij

Chi-Square Test: Estimation for Chi-Square Test: Estimation for Expected Number for Each CellExpected Number for Each Cell

Ri = total observed frequency in the ith rowCj = total observed frequency in the jth columnn = sample size

Page 92: Abdm4064 week 11 data analysis

Hypothesis Test of a ProportionHypothesis Test of a Proportion Hypothesis Test of a Proportion

Is conceptually similar to the one used when the mean is the characteristic of interest but that differs in the mathematical formulation of the standard error of the proportion.

pobs S

pZ

π is the population proportionp is the sample proportionπ is estimated with p

Page 93: Abdm4064 week 11 data analysis

What Is the Appropriate Test What Is the Appropriate Test of Difference?of Difference?

Test of Differences

An investigation of a hypothesis that two (or more) groups differ with respect to measures on a variable.

Behaviour, characteristics, beliefs, opinions, emotions, or attitudes

Bivariate Tests of Differences

Involve only two variables: a variable that acts like a dependent variable and a variable that acts as a classification variable.

Differences in mean scores between groups or in comparing how two groups’ scores are distributed across possible response categories.

Page 94: Abdm4064 week 11 data analysis

22–94

EXHIBIT 22.EXHIBIT 22.11 Some Bivariate HypothesesSome Bivariate Hypotheses

Page 95: Abdm4064 week 11 data analysis

Cross-Tabulation Tables: The Cross-Tabulation Tables: The χχ22 Test for Goodness-of-FitTest for Goodness-of-Fit

Cross-Tabulation (Contingency) Table A joint frequency distribution of observations

on two more variables. χ2 Distribution

Provides a means for testing the statistical significance of a contingency table.

Involves comparing observed frequencies (Oi) with expected frequencies (Ei) in each cell of the table.

Captures the goodness- (or closeness-) of-fit of the observed distribution with the expected distribution.

Page 96: Abdm4064 week 11 data analysis

Chi-Square TestChi-Square Test

i

ii

E

)²E(O χ²

χ² = chi-square statisticOi = observed frequency in the ith cellEi = expected frequency on the ith cell

n

CRE jiij

Ri = total observed frequency in the ith rowCj = total observed frequency in the jth columnn = sample size

Page 97: Abdm4064 week 11 data analysis

Degrees of Freedom (d.f.)Degrees of Freedom (d.f.)

d.f.=(R-1)(C-1)d.f.=(R-1)(C-1)

Page 98: Abdm4064 week 11 data analysis

22–98

Example: Papa John’s RestaurantsExample: Papa John’s RestaurantsUnivariate Hypothesis:Univariate Hypothesis:Papa John’s restaurants are Papa John’s restaurants are more likely to be located in more likely to be located in a stand-alone location or in a stand-alone location or in a shopping center.a shopping center.

Bivariate Bivariate Hypothesis: Stand-Hypothesis: Stand-alone locations are alone locations are more likely to be more likely to be profitable than are profitable than are shopping center shopping center locations.locations.

Page 99: Abdm4064 week 11 data analysis

Example: Papa John’s Example: Papa John’s Restaurants (cont’d)Restaurants (cont’d)

In this example, χ2 = 22.16 with 1 d.f. From Table A.4, the critical value at the

0.05 level with 1 d.f. is 3.84. Thus, we are 95 percent confident that

the observed values do not equal the expected values.

But are the deviations from the expected values in the hypothesized direction?

Page 100: Abdm4064 week 11 data analysis

χχ22 Test for Goodness-of-Fit Test for Goodness-of-Fit RecapRecap

Testing the hypothesis involves two key steps:

1. Examine the statistical significance of the observed contingency table.

2. Examine whether the differences between the observed and expected values are consistent with the hypothesized prediction.

Page 101: Abdm4064 week 11 data analysis

The The tt-Test for Comparing Two Means-Test for Comparing Two Means Independent Samples t-Test

A test for hypotheses stating that the mean scores for some interval- or ratio-scaled variable grouped based on some less-than-interval classificatory variable are not the same.

means random ofy Variabilit

2 MeanSample - 1 MeanSample t

21

21 XXS

t

Page 102: Abdm4064 week 11 data analysis

The The tt-Test for Comparing -Test for Comparing Two Means (cont’d)Two Means (cont’d)

Pooled Estimate of the Standard Error An estimate of the standard error for a t-test of

independent means that assumes the variances of both groups are equal.

2121

222

211 11

2

1121 nnnn

SnSnS XX

))(

Page 103: Abdm4064 week 11 data analysis

© 2010 South-Western/Cengage Learning. All rights reserved. May not

be scanned, copied or duplicated, or posted to a publically accessible

website, in whole or in part.22–103

EXHIBIT 22.EXHIBIT 22.22 Independent Samples Independent Samples tt-Test Results-Test Results

Page 104: Abdm4064 week 11 data analysis

Comparing Two Means (cont’d)Comparing Two Means (cont’d) Paired-Samples t-Test

Compares the scores of two interval variables drawn from related populations.

Used when means need to be compared that are not from independent samples.

Page 105: Abdm4064 week 11 data analysis

© 2010 South-Western/Cengage Learning. All rights reserved. May not

be scanned, copied or duplicated, or posted to a publically accessible

website, in whole or in part.22–105

EXHIBIT 22.EXHIBIT 22.44 Example Results for a Paired Samples Example Results for a Paired Samples tt-Test-Test

Page 106: Abdm4064 week 11 data analysis

A Classification of Hypothesis Testing A Classification of Hypothesis Testing Procedures for Examining DifferencesProcedures for Examining Differences

Page 107: Abdm4064 week 11 data analysis

1 - 107

Page 108: Abdm4064 week 11 data analysis

SPSS Windows: One SPSS Windows: One Sample Sample t t TestTest

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.

2.2. Click COMPARE MEANS and then ONE Click COMPARE MEANS and then ONE SAMPLE T TEST.SAMPLE T TEST.

3.3. Move “Familiarity [familiar]” in to the TEST Move “Familiarity [familiar]” in to the TEST VARIABLE(S) box.VARIABLE(S) box.

4.4. Type “4” in the TEST VALUE box.Type “4” in the TEST VALUE box.

5.5. Click OK.Click OK.

Page 109: Abdm4064 week 11 data analysis

SPSS Windows: SPSS Windows: Two Independent Samples t TestTwo Independent Samples t Test

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.

2.2. Click COMPARE MEANS and then INDEPENDENT Click COMPARE MEANS and then INDEPENDENT SAMPLES T TEST.SAMPLES T TEST.

3.3. Move “Internet Usage Hrs/Week [iusage]” in to the TEST Move “Internet Usage Hrs/Week [iusage]” in to the TEST VARIABLE(S) box.VARIABLE(S) box.

4.4. Move “Sex[sex]” to GROUPING VARIABLE box.Move “Sex[sex]” to GROUPING VARIABLE box.

5.5. Click DEFINE GROUPS. Click DEFINE GROUPS.

6.6. Type “1” in GROUP 1 box and “2” in GROUP 2 box. Type “1” in GROUP 1 box and “2” in GROUP 2 box.

7.7. Click CONTINUE.Click CONTINUE.

8.8. Click OK.Click OK.

Page 110: Abdm4064 week 11 data analysis

SPSS Windows: Paired Samples t SPSS Windows: Paired Samples t TestTest

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.

2.2. Click COMPARE MEANS and then PAIRED Click COMPARE MEANS and then PAIRED SAMPLES T TEST.SAMPLES T TEST.

3.3. Select “Attitude toward Internet [iattitude]” and Select “Attitude toward Internet [iattitude]” and then select “Attitude toward technology then select “Attitude toward technology [tattitude].” Move these variables in to the PAIRED [tattitude].” Move these variables in to the PAIRED VARIABLE(S) box.VARIABLE(S) box.

4.4. Click OK.Click OK.

Page 111: Abdm4064 week 11 data analysis

Relationship Amongst Test, Analysis of Relationship Amongst Test, Analysis of Variance, Analysis of Covariance, & Variance, Analysis of Covariance, &

RegressionRegression

One Independent One or More

Metric Dependent Variable

t Test

Binary

Variable

One-Way Analysisof Variance

One Factor

N-Way Analysisof Variance

More thanOne Factor

Analysis ofVariance

Categorical:Factorial

Analysis ofCovariance

Categoricaland Interval

Regression

Interval

Independent Variables

Page 112: Abdm4064 week 11 data analysis

The The ZZ-Test for Comparing -Test for Comparing Two ProportionsTwo Proportions

Z-Test for Differences of Proportions Tests the hypothesis that proportions are

significantly different for two independent samples or groups.

Requires a sample size greater than thirty.

The hypothesis is: Ho: π1 = π2

may be restated as: Ho: π1 - π2 = 0

Page 113: Abdm4064 week 11 data analysis

The The ZZ-Test for Comparing Two -Test for Comparing Two ProportionsProportions

ZZ-Test statistic for differences in large -Test statistic for differences in large random samples:random samples:

21

2121

ppS

ppZ

p1 = sample portion of successes in Group 1

p2 = sample portion of successes in Group 2

1 1) = hypothesized population proportion 1

minus hypothesized population proportion 2

Sp1-p2 = pooled estimate of the standard errors of

differences of proportions

Page 114: Abdm4064 week 11 data analysis

The The ZZ-Test for Comparing Two -Test for Comparing Two ProportionsProportions

To calculate the standard error of the To calculate the standard error of the differences in proportions:differences in proportions:

21

1121 nn

qpS pp

Page 115: Abdm4064 week 11 data analysis

One-Way Analysis of Variance One-Way Analysis of Variance (ANOVA)(ANOVA)

Analysis of Variance (ANOVA) An analysis involving the investigation of the

effects of one treatment variable on an interval-scaled dependent variable.

A hypothesis-testing technique to determine whether statistically significant differences in means occur between two or more groups.

A method of comparing variances to make inferences about the means.

The substantive hypothesis tested is: At least one group mean is not equal to another At least one group mean is not equal to another

group mean.group mean.

Page 116: Abdm4064 week 11 data analysis

Partitioning Variance in Partitioning Variance in ANOVAANOVA

Total Variability Grand Mean

The mean of a variable over all observations.

SST = Total of (observed value-grand mean)2

Page 117: Abdm4064 week 11 data analysis

Partitioning Variance in ANOVAPartitioning Variance in ANOVA

Between-Groups Variance The sum of differences between the group mean

and the grand mean summed over all groups for a given set of observations.

SSB = Total of ngroup(Group Mean − Grand Mean)2

Within-Group Error or Variance The sum of the differences between observed

values and the group mean for a given set of observations

Also known as total error variance.

SSE = Total of (Observed Mean − Group Mean)2

Page 118: Abdm4064 week 11 data analysis

The The FF-Test-Test F-Test

Used to determine whether there is more variability in the scores of one sample than in the scores of another sample.

Variance components are used to compute F-ratios

SSE, SSB, SST

groupswithinVariance

groupsbetweenVarianceF

Page 119: Abdm4064 week 11 data analysis

EXHIBIT 22.EXHIBIT 22.66 Interpreting ANOVAInterpreting ANOVA

Page 120: Abdm4064 week 11 data analysis

1 - 120

Page 121: Abdm4064 week 11 data analysis

SPSS WindowsSPSS Windows

One-way ANOVA can be efficiently One-way ANOVA can be efficiently performed using the program COMPARE performed using the program COMPARE MEANS and then One-way ANOVA. To MEANS and then One-way ANOVA. To select this procedure using SPSS for select this procedure using SPSS for Windows, click:Windows, click:

Analyze>Compare Means>One-Way ANOVA …Analyze>Compare Means>One-Way ANOVA …

N-way analysis of variance and analysis of N-way analysis of variance and analysis of covariance can be performed using covariance can be performed using GENERAL LINEAR MODEL. To select this GENERAL LINEAR MODEL. To select this procedure using SPSS for Windows, click:procedure using SPSS for Windows, click:

Analyze>General Linear Model>Univariate …Analyze>General Linear Model>Univariate …

Page 122: Abdm4064 week 11 data analysis

SPSS Windows: One-Way SPSS Windows: One-Way ANOVAANOVA

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.

2.2. Click COMPARE MEANS and then ONE-WAY ANOVA.Click COMPARE MEANS and then ONE-WAY ANOVA.

3.3. Move “Sales [sales]” in to the DEPENDENT LIST box.Move “Sales [sales]” in to the DEPENDENT LIST box.

4.4. Move “In-Store Promotion[promotion]” to the FACTOR Move “In-Store Promotion[promotion]” to the FACTOR box.box.

5.5. Click OPTIONS.Click OPTIONS.

6.6. Click Descriptive. Click Descriptive.

7.7. Click CONTINUE.Click CONTINUE.

8.8. Click OK.Click OK.

Page 123: Abdm4064 week 11 data analysis

SPSS Windows: Analysis of CovarianceSPSS Windows: Analysis of Covariance

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.

2.2. Click GENERAL LINEAR MODEL and then UNIVARIATE.Click GENERAL LINEAR MODEL and then UNIVARIATE.

3.3. Move “Sales [sales]” in to the DEPENDENT VARIABLE Move “Sales [sales]” in to the DEPENDENT VARIABLE box.box.

4.4. Move “In-Store Promotion[promotion]” to the FIXED Move “In-Store Promotion[promotion]” to the FIXED FACTOR(S) box. Then move “Coupon[coupon] also to FACTOR(S) box. Then move “Coupon[coupon] also to the FIXED FACTOR(S) box. the FIXED FACTOR(S) box.

5.5. Move “Clientel[clientel] to the COVARIATE(S) box.Move “Clientel[clientel] to the COVARIATE(S) box.

6.6. Click OK.Click OK.

Page 124: Abdm4064 week 11 data analysis

The BasicsThe Basics Measures of Association

Refers to a number of bivariate statistical techniques used to measure the strength of a relationship between two variables.

The chi-square (2) test provides information about whether two or more less-than interval variables are interrelated.

Correlation analysis is most appropriate for interval or ratio variables.

Regression can accommodate either less-than interval or interval independent variables, but the dependent variable must be continuous.

Page 125: Abdm4064 week 11 data analysis

23–125

EXHIBIT 23.EXHIBIT 23.11

Bivariate Analysis—Bivariate Analysis—Common Procedures for Common Procedures for

Testing AssociationTesting Association

Page 126: Abdm4064 week 11 data analysis

Simple Correlation Coefficient Simple Correlation Coefficient (continued)(continued)

Correlation coefficient A statistical measure of the covariation, or

association, between two at-least interval variables.

Covariance Extent to which two variables are

associated systematically with each other.

n

i

n

i

n

iii

yxxy

YYiXXi

YYXX

rr

1 1

22

1

Page 127: Abdm4064 week 11 data analysis

Simple Correlation CoefficientSimple Correlation Coefficient Correlation coefficient (r)

Ranges from +1 to -1 Perfect positive linear relationship = +1 Perfect negative (inverse) linear relationship = -1 No correlation = 0

Correlation coefficient for two variables (X,Y)

Page 128: Abdm4064 week 11 data analysis

EXHIBIT 23.EXHIBIT 23.22 Scatter Diagram to Illustrate Correlation PatternsScatter Diagram to Illustrate Correlation Patterns

Page 129: Abdm4064 week 11 data analysis

Correlation, Covariance, and Correlation, Covariance, and CausationCausation

When two variables covary (i.e. vary systematically), they display concomitant variation.

This systematic covariation does not in and of itself establish causality.

e.g., Rooster’s crow and the rising of the sun Rooster does not cause the sun to rise.

Page 130: Abdm4064 week 11 data analysis

Coefficient of DeterminationCoefficient of Determination

Coefficient of Determination (R2) A measure obtained by squaring the

correlation coefficient; the proportion of the total variance of a variable accounted for by another value of another variable.

Measures that part of the total variance of Y that is accounted for by knowing the value of X.

Variance Total

varianceExplained2 R

Page 131: Abdm4064 week 11 data analysis

Correlation MatrixCorrelation Matrix

Correlation matrix The standard form for reporting correlation

coefficients for more than two variables. Statistical Significance

The procedure for determining statistical significance is the t-test of the significance of a correlation coefficient.

Page 132: Abdm4064 week 11 data analysis

EXHIBIT 23.EXHIBIT 23.44 Pearson Product-Moment Correlation Matrix for Salesperson Pearson Product-Moment Correlation Matrix for Salesperson ExampleExampleaa

Page 133: Abdm4064 week 11 data analysis

Regression AnalysisRegression Analysis Simple (Bivariate) Linear Regression

A measure of linear association that investigates straight-line relationships between a continuous dependent variable and an independent variable that is usually continuous, but can be a categorical dummy variable.

The Regression Equation (Y = α + βX ) Y = the continuous dependent variable X = the independent variable α = the Y intercept (regression line intercepts

Y axis) β = the slope of the coefficient (rise over run)

Page 134: Abdm4064 week 11 data analysis

130

120

110

100

90

80

80 90 100 110 120 130 140 150 160 170

X

Y

XaY ˆˆ

XY

Regression Line and SlopeRegression Line and Slope

Page 135: Abdm4064 week 11 data analysis

The Regression EquationThe Regression Equation Parameter Estimate Choices

β is indicative of the strength and direction of the relationship between the independent and dependent variable.

α (Y intercept) is a fixed point that is considered a constant (how much Y can exist without X)

Standardized Regression Coefficient (β) Estimated coefficient of the strength of

relationship between the independent and dependent variables.

Expressed on a standardized scale where higher absolute values indicate stronger relationships (range is from -1 to 1).

Page 136: Abdm4064 week 11 data analysis

The Regression Equation (cont’d)The Regression Equation (cont’d)

Parameter Estimate Choices Raw regression estimates (b1)

Raw regression weights have the advantage of retaining the scale metric—which is also their key disadvantage.

If the purpose of the regression analysis is forecasting, then raw parameter estimates must be used.

This is another way of saying when the researcher is interested only in prediction.

Standardized regression estimates (β) Standardized regression estimates have the advantage

of a constant scale. Standardized regression estimates should be used when

the researcher is testing explanatory hypotheses.

Page 137: Abdm4064 week 11 data analysis

EXHIBIT 23.EXHIBIT 23.55 The Advantage of Standardized Regression WeightsThe Advantage of Standardized Regression Weights

Page 138: Abdm4064 week 11 data analysis

EXHIBIT 23.EXHIBIT 23.66 Relationship of Sales Potential to Building Permits IssuedRelationship of Sales Potential to Building Permits Issued

Page 139: Abdm4064 week 11 data analysis

EXHIBIT 23.EXHIBIT 23.77 The Best Fit Line or Knocking Out the PinsThe Best Fit Line or Knocking Out the Pins

Page 140: Abdm4064 week 11 data analysis

Ordinary Least-Squares Ordinary Least-Squares (OLS) Method of Regression (OLS) Method of Regression

AnalysisAnalysis OLS Guarantees that the resulting straight line will produce the

least possible total error in using X to predict Y. Generates a straight line that minimizes the sum of

squared deviations of the actual values from this predicted regression line.

No straight line can completely represent every dot in the scatter diagram.

There will be a discrepancy between most of the actual scores (each dot) and the predicted score .

Uses the criterion of attempting to make the least amount of total error in prediction of Y from X.

Page 141: Abdm4064 week 11 data analysis

Ordinary Least-Squares Method Ordinary Least-Squares Method of Regression Analysis (OLS) of Regression Analysis (OLS)

(cont’d)(cont’d)

Page 142: Abdm4064 week 11 data analysis

Ordinary Least-Squares Method Ordinary Least-Squares Method of Regression Analysis (OLS) of Regression Analysis (OLS)

(cont’d)(cont’d)

The equation means that the predicted value for any value of X (Xi) is determined as a function of the estimated slope coefficient, plus the estimated intercept coefficient + some error.

Page 143: Abdm4064 week 11 data analysis

© 2010 South-Western/Cengage Learning. All rights reserved. May not

be scanned, copied or duplicated, or posted to a publically accessible

website, in whole or in part.23–143

Ordinary Least-Squares Ordinary Least-Squares Method of Regression Method of Regression Analysis (OLS) (cont’d)Analysis (OLS) (cont’d)

Page 144: Abdm4064 week 11 data analysis

© 2010 South-Western/Cengage Learning. All rights reserved. May not

be scanned, copied or duplicated, or posted to a publically accessible

website, in whole or in part.23–144

Ordinary Least-Squares Ordinary Least-Squares Method of Regression Method of Regression Analysis (OLS) (cont’d)Analysis (OLS) (cont’d) Statistical Significance Of Regression Model

F-test (regression) Determines whether more variability is explained

by the regression or unexplained by the regression.

Page 145: Abdm4064 week 11 data analysis

Ordinary Least-Squares Method Ordinary Least-Squares Method of Regression Analysis (OLS) of Regression Analysis (OLS)

(cont’d)(cont’d) Statistical Significance Of Regression ModelStatistical Significance Of Regression Model

ANOVA Table:ANOVA Table:

Page 146: Abdm4064 week 11 data analysis

Ordinary Least-Squares Method Ordinary Least-Squares Method of Regression Analysis (OLS) of Regression Analysis (OLS)

(cont’d)(cont’d) R2

The proportion of variance in Y that is explained by X (or vice versa)

A measure obtained by squaring the correlation coefficient; that proportion of the total variance of a variable that is accounted for by knowing the value of another variable.

875.040.882,3

49.398,32 R

Page 147: Abdm4064 week 11 data analysis

EXHIBIT 23.EXHIBIT 23.88 Simple Regression Results for Building Permit ExampleSimple Regression Results for Building Permit Example

Page 148: Abdm4064 week 11 data analysis

EXHIBIT 23.EXHIBIT 23.99 OLS Regression LineOLS Regression Line

Page 149: Abdm4064 week 11 data analysis

Simple Regression and Simple Regression and Hypothesis TestingHypothesis Testing

The explanatory power of regression lies in hypothesis testing. Regression is often used to test relational hypotheses. The outcome of the hypothesis test involves

two conditions that must both be satisfied: The regression weight must be in the hypothesized

direction. Positive relationships require a positive coefficient and negative relationships require a negative coefficient.

The t-test associated with the regression weight must be significant.

Page 150: Abdm4064 week 11 data analysis

What is Multivariate Data What is Multivariate Data Analysis?Analysis?

Research that involves three or more variables, or that is concerned with underlying dimensions among multiple variables, will involve multivariate statistical analysis. Methods analyze multiple variables or even

multiple sets of variables simultaneously. Business problems involve multivariate data

analysis: most employee motivation research customer psychographic profiles research that seeks to identify viable market segments

Page 151: Abdm4064 week 11 data analysis

The “Variate” in MultivariateThe “Variate” in Multivariate

Variate A mathematical way in which a set of

variables can be represented with one equation.

A linear combination of variables, each contributing to the overall meaning of the variate based upon an empirically derived weight.

A function of the measured variables involved in an analysis: Vk = f (X1, X2, . . . , Xm )

Page 152: Abdm4064 week 11 data analysis

EXHIBIT 24.EXHIBIT 24.11 Which Multivariate Approach Is Appropriate?Which Multivariate Approach Is Appropriate?

Page 153: Abdm4064 week 11 data analysis

24–153

Classifying Multivariate Classifying Multivariate TechniquesTechniques

Dependence Techniques Explain or predict one or more dependent

variables. Needed when hypotheses involve distinction

between independent and dependent variables. Types:

Multiple regression analysis Multiple discriminant analysis Multivariate analysis of variance Structural equations modeling

Page 154: Abdm4064 week 11 data analysis

Classifying Multivariate Classifying Multivariate Techniques (cont’d)Techniques (cont’d)

Interdependence Techniques Give meaning to a set of variables or seek

to group things together. Used when researchers examine questions

that do not distinguish between independent and dependent variables.

Types: Factor analysis Cluster analysis Multidimensional scaling

Page 155: Abdm4064 week 11 data analysis

Classifying Multivariate Classifying Multivariate Techniques (cont’d)Techniques (cont’d)

Influence of Measurement Scales The nature of the measurement scales will

determine which multivariate technique is appropriate for the data.

Selection of a multivariate technique requires consideration of the types of measures used for both independent and dependent sets of variables.

Nominal and ordinal scales are nonmetric. Interval and ratio scales are metric.

Page 156: Abdm4064 week 11 data analysis

24–156

EXHIBIT 24.EXHIBIT 24.22 Which Multivariate Dependence Technique Should I Use?Which Multivariate Dependence Technique Should I Use?

Page 157: Abdm4064 week 11 data analysis

24–157

EXHIBIT 24.EXHIBIT 24.33 Which Multivariate Interdependence Technique Should I Use?Which Multivariate Interdependence Technique Should I Use?

Page 158: Abdm4064 week 11 data analysis

Analysis of DependenceAnalysis of Dependence General Linear Model (GLM)

A way of explaining and predicting a dependent variable based on fluctuations (variation) from its mean due to changes in independent variables.

μ = a constant (overall mean of the dependent variable)

∆X and ∆F = changes due to main effect independent variables(experimental variables) and blocking independent variables (covariates or grouping variables)

∆ XF = represents the change due to the combination(interaction effect) of those variables.

Page 159: Abdm4064 week 11 data analysis

Interpreting Multiple RegressionInterpreting Multiple Regression Multiple Regression Analysis

An analysis of association in which the effects of two or more independent variables on a single, interval-scaled dependent variable are investigated simultaneously.

inni eXbXbXbXbbY 3322110

•Dummy variable The way a dichotomous (two group)

independent variable is represented in regression analysis by assigning a 0 to one group and a 1 to the other.

Page 160: Abdm4064 week 11 data analysis

Multiple Regression AnalysisMultiple Regression Analysis

A Simple Example Assume that a toy manufacturer wishes to explain

store sales (dependent variable) using a sample of stores from Canada and Europe.

Several hypotheses are offered: H1: Competitor’s sales are related negatively to

sales. H2: Sales are higher in communities with a sales

office thanwhen no sales office is present.

H3: Grammar school enrollment in a community is related

positively to sales.

Page 161: Abdm4064 week 11 data analysis

Multiple Regression Analysis Multiple Regression Analysis (cont’d)(cont’d) Statistical Results of the Multiple Regression

Regression Equation:

Coefficient of multiple determination (R2) = 0.845

F-value= 14.6, p < 0.05

321 7362115387018102 XXXY ....

Page 162: Abdm4064 week 11 data analysis

Multiple Regression Analysis Multiple Regression Analysis (cont’d)(cont’d)

Regression Coefficients in Multiple Regression Partial correlation

The correlation between two variables after taking into account the fact that they are correlated with other variables too.

R2 in Multiple Regression The coefficient of multiple determination in

multiple regression indicates the percentage of variation in Y explained by all independent variables.

Page 163: Abdm4064 week 11 data analysis

24–163

Multiple Regression Analysis Multiple Regression Analysis (cont’d)(cont’d)

Statistical Significance in Multiple Regression F-test

Tests statistical significance by comparing the variation explained by the regression equation to the residual error variation.

Allows for testing of the relative magnitudes of the sum of squares due to the regression (SSR) and the error sum of squares (SSE).

MSE

MSR

knSSe

kSSrF

1/

/

Page 164: Abdm4064 week 11 data analysis

Multiple Regression Analysis Multiple Regression Analysis (cont’d)(cont’d)

Degrees of Freedom (d.f.) k = number of independent variables n = number of observations or

respondents Calculating Degrees of Freedom (d.f.)

d.f. for the numerator = k d.f. for the denominator = n - k - 1

Page 165: Abdm4064 week 11 data analysis

FF-test-test

MSE

MSR

knSSe

kSSrF

1/

/

Page 166: Abdm4064 week 11 data analysis

EXHIBIT 24.EXHIBIT 24.44

Interpreting Multiple Interpreting Multiple Regression ResultsRegression Results

Page 167: Abdm4064 week 11 data analysis

ANOVA (n-way) and MANOVAANOVA (n-way) and MANOVA

Multivariate Analysis of Variance (MANOVA) A multivariate technique that predicts

multiple continuous dependent variables with multiple categorical independent variables.

Page 168: Abdm4064 week 11 data analysis

ANOVA (n-way) and MANOVA ANOVA (n-way) and MANOVA (cont’d)(cont’d)

Interpreting N-way (Univariate) ANOVA1. Examine overall model F-test result. If

significant, proceed.2. Examine individual F-tests for individual

variables.3. For each significant categorical independent

variable, interpret the effect by examining the group means.

4. For each significant, continuous covariate, interpret the parameter estimate (b).

5. For each significant interaction, interpret the means for each combination.

Page 169: Abdm4064 week 11 data analysis

Discriminant AnalysisDiscriminant Analysis A statistical technique for predicting the

probability that an object will belong in one of two or more mutually exclusive categories (dependent variable), based on several independent variables. To calculate discriminant scores, the linear

function used is:

niniii XbXbXbZ 2211

Page 170: Abdm4064 week 11 data analysis

Discriminant Analysis Discriminant Analysis ExampleExample

332211 XbXbXbZ

321 0007001300690 XXX ...

Page 171: Abdm4064 week 11 data analysis

EXHIBIT 24.EXHIBIT 24.55 Multivariate Dependence Techniques SummaryMultivariate Dependence Techniques Summary

Page 172: Abdm4064 week 11 data analysis

Factor AnalysisFactor Analysis Statistically identifies a reduced number

of factors from a larger number of measured variables.

Types: Exploratory factor analysis (EFA)—performed

when the researcher is uncertain about how many factors may exist among a set of variables.

Confirmatory factor analysis (CFA)—performed when the researcher has strong theoretical expectations about the factor structure before performing the analysis.

Page 173: Abdm4064 week 11 data analysis

EXHIBIT 24.EXHIBIT 24.66 A Simple Illustration of Factor AnalysisA Simple Illustration of Factor Analysis

Page 174: Abdm4064 week 11 data analysis

Factor Analysis (cont’d)Factor Analysis (cont’d)

How Many Factors Eigenvalues are a measure of how much

variance is explained by each factor. Common rule:

Base the number of factors on the number of eigenvalues greater than 1.0.

Factor Loading Indicates how strongly a measured

variable is correlated with a factor.

Page 175: Abdm4064 week 11 data analysis

Factor Analysis (cont’d)Factor Analysis (cont’d) Factor Rotation

A mathematical way of simplifying factor analysis results to better identify which variables “load on” which factors.

Most common procedure is varimax rotation. Data Reduction Technique

Approaches that summarize the information from many variables into a reduced set of variates formed as linear combinations of measured variables.

The rule of parsimony: an explanation involving fewer components is better than one involving many more.

Page 176: Abdm4064 week 11 data analysis

Factor Analysis (cont’d)Factor Analysis (cont’d)

Creating Composite Scales with Factor Results When a clear pattern of loadings exists, the

researcher may take a simpler approach by summing the variables with high loadings and creating a summated scale.

Very low loadings suggest a variable does not contribute much to the factor.

The reliability of each summated scale is tested by computing a coefficient alpha estimate.

Page 177: Abdm4064 week 11 data analysis

Factor Analysis (cont’d)Factor Analysis (cont’d)

Communality A measure of the percentage of a

variable’s variation that is explained by the factors.

A relatively high communality indicates that a variable has much in common with the other variables taken as a group.

Communality for any variable is equal to the sum of the squared loadings for that variable.

Page 178: Abdm4064 week 11 data analysis

Factor Analysis (cont’d)Factor Analysis (cont’d)

Total Variance Explained Squaring and totaling each loading factor;

dividing the total by the number of factors provides an estimate of variance in a set of variables explained by a factor.

This explanation of variance is much the same as R2 in multiple regression.

Page 179: Abdm4064 week 11 data analysis

1 - 179

Page 180: Abdm4064 week 11 data analysis

SPSS SPSS WindowsWindows

To select this procedure using SPSS for To select this procedure using SPSS for Windows, click:Windows, click:

Analyze>Data Reduction>Factor …Analyze>Data Reduction>Factor …

Page 181: Abdm4064 week 11 data analysis

SPSS Windows: Principal Components SPSS Windows: Principal Components

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.2.2. Click DATA REDUCTION and then FACTOR.Click DATA REDUCTION and then FACTOR.3.3. Move “Prevents Cavities [v1],” “Shiny Teeth [v2],” “Strengthen Gums [v3],” Move “Prevents Cavities [v1],” “Shiny Teeth [v2],” “Strengthen Gums [v3],”

“Freshens Breath [v4],” “Tooth Decay Unimportant [v5],” and “Attractive Teeth “Freshens Breath [v4],” “Tooth Decay Unimportant [v5],” and “Attractive Teeth [v6]” into the VARIABLES box[v6]” into the VARIABLES box

4.4. Click on DESCRIPTIVES. In the pop-up window, in the STATISTICS box check Click on DESCRIPTIVES. In the pop-up window, in the STATISTICS box check INITIAL SOLUTION. In the CORRELATION MATRIX box, check KMO AND INITIAL SOLUTION. In the CORRELATION MATRIX box, check KMO AND BARTLETT’S TEST OF SPHERICITY and also check REPRODUCED. Click BARTLETT’S TEST OF SPHERICITY and also check REPRODUCED. Click CONTINUE.CONTINUE.

5.5. Click on EXTRACTION. In the pop-up window, for METHOD select PRINCIPAL Click on EXTRACTION. In the pop-up window, for METHOD select PRINCIPAL COMPONENTS (default). In the ANALYZE box, check CORRELATION COMPONENTS (default). In the ANALYZE box, check CORRELATION MATRIX. In the EXTRACT box, check EIGEN VALUE OVER 1(default). In the MATRIX. In the EXTRACT box, check EIGEN VALUE OVER 1(default). In the DISPLAY box, check UNROTATED FACTOR SOLUTION. Click CONTINUE.DISPLAY box, check UNROTATED FACTOR SOLUTION. Click CONTINUE.

6.6. Click on ROTATION. In the METHOD box, check VARIMAX. In the DISPLAY Click on ROTATION. In the METHOD box, check VARIMAX. In the DISPLAY box, check ROTATED SOLUTION. Click CONTINUE.box, check ROTATED SOLUTION. Click CONTINUE.

7.7. Click on SCORES. In the pop-up window, check DISPLAY FACTOR SCORE Click on SCORES. In the pop-up window, check DISPLAY FACTOR SCORE COEFFICIENT MATRIX. Click CONTINUE.COEFFICIENT MATRIX. Click CONTINUE.

8.8. Click OK.Click OK.

Page 182: Abdm4064 week 11 data analysis

Cluster AnalysisCluster Analysis Cluster analysis

A multivariate approach for grouping observations based on similarity among measured variables.

Cluster analysis is an important tool for identifying market segments.

Cluster analysis classifies individuals or objects into a small number of mutually exclusive and exhaustive groups.

Objects or individuals are assigned to groups so that there is great similarity within groups and much less similarity between groups.

The cluster should have high internal (within-cluster) homogeneity and external (between-cluster) heterogeneity.

Page 183: Abdm4064 week 11 data analysis

EXHIBIT 24.EXHIBIT 24.77 Clusters of Individuals on Two DimensionsClusters of Individuals on Two Dimensions

Page 184: Abdm4064 week 11 data analysis

24–184

EXHIBIT 24.EXHIBIT 24.88 Cluster Analysis of Test-Market CitiesCluster Analysis of Test-Market Cities

Page 185: Abdm4064 week 11 data analysis

1 - 185

Page 186: Abdm4064 week 11 data analysis

SPSS WindowsSPSS Windows

To select this procedure using SPSS for To select this procedure using SPSS for Windows, click:Windows, click:

Analyze>Classify>Hierarchical Cluster …Analyze>Classify>Hierarchical Cluster …

Analyze>Classify>K-Means Cluster …Analyze>Classify>K-Means Cluster …

Analyze>Classify>Two-Step Cluster Analyze>Classify>Two-Step Cluster

Page 187: Abdm4064 week 11 data analysis

SPSS Windows: Hierarchical ClusteringSPSS Windows: Hierarchical Clustering

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.

2.2. Click CLASSIFY and then HIERARCHICAL CLUSTER.Click CLASSIFY and then HIERARCHICAL CLUSTER.

3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.“Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.

4.4. In the CLUSTER box, check CASES (default option). In the DISPLAY box, check In the CLUSTER box, check CASES (default option). In the DISPLAY box, check STATISTICS and PLOTS (default options).STATISTICS and PLOTS (default options).

5.5. Click on STATISTICS. In the pop-up window, check AGGLOMERATION Click on STATISTICS. In the pop-up window, check AGGLOMERATION SCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OF SOLUTIONS. SCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OF SOLUTIONS. Then, for MINIMUM NUMBER OF CLUSTERS, enter 2 and for MAXIMUM NUMBER Then, for MINIMUM NUMBER OF CLUSTERS, enter 2 and for MAXIMUM NUMBER OF CLUSTERS, enter 4. Click CONTINUE.OF CLUSTERS, enter 4. Click CONTINUE.

6.6. Click on PLOTS. In the pop-up window, check DENDROGRAM. In the ICICLE Click on PLOTS. In the pop-up window, check DENDROGRAM. In the ICICLE box, check ALL CLUSTERS (default). In the ORIENTATION box, check box, check ALL CLUSTERS (default). In the ORIENTATION box, check VERTICAL. Click CONTINUE.VERTICAL. Click CONTINUE.

7.7. Click on METHOD. For CLUSTER METHOD, select WARD’S METHOD. In the Click on METHOD. For CLUSTER METHOD, select WARD’S METHOD. In the MEASURE box, check INTERVAL and select SQUARED EUCLIDEAN DISTANCE. MEASURE box, check INTERVAL and select SQUARED EUCLIDEAN DISTANCE. Click CONTINUE.Click CONTINUE.

8.8. Click OK.Click OK.

Page 188: Abdm4064 week 11 data analysis

SPSS Windows: K-Means SPSS Windows: K-Means ClusteringClustering

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.

2.2. Click CLASSIFY and then K-MEANS CLUSTER.Click CLASSIFY and then K-MEANS CLUSTER.

3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.into the VARIABLES box.

4.4. For NUMBER OF CLUSTER, select 3.For NUMBER OF CLUSTER, select 3.

5.5. Click on OPTIONS. In the pop-up window, in the STATISTICS Click on OPTIONS. In the pop-up window, in the STATISTICS box, check INITIAL CLUSTER CENTERS and CLUSTER box, check INITIAL CLUSTER CENTERS and CLUSTER INFORMATION FOR EACH CASE. Click CONTINUE.INFORMATION FOR EACH CASE. Click CONTINUE.

6.6. Click OK.Click OK.

Page 189: Abdm4064 week 11 data analysis

SPSS Windows: Two-Step SPSS Windows: Two-Step ClusteringClustering

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.

2.2. Click CLASSIFY and then TWO-STEP CLUSTER.Click CLASSIFY and then TWO-STEP CLUSTER.

3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the CONTINUOUS VARIABLES box.the CONTINUOUS VARIABLES box.

4.4. For DISTANCE MEASURE, select EUCLIDEAN.For DISTANCE MEASURE, select EUCLIDEAN.

5.5. For NUMBER OF CLUSTER, select DETERMINE For NUMBER OF CLUSTER, select DETERMINE AUTOMATICALLY.AUTOMATICALLY.

6.6. For CLUSTERING CRITERION, select AKAIKE’S INFORMATION For CLUSTERING CRITERION, select AKAIKE’S INFORMATION CRITERION (AIC).CRITERION (AIC).

7.7. Click OK.Click OK.

Page 190: Abdm4064 week 11 data analysis

Multidimensional ScalingMultidimensional Scaling

Multidimensional Scaling Measures objects in multidimensional

space on the basis of respondents’ judgments of the similarity of objects.

Page 191: Abdm4064 week 11 data analysis

EXHIBIT 24.EXHIBIT 24.99 Perceptual Map of Six Graduate Business Schools: Simple SpacePerceptual Map of Six Graduate Business Schools: Simple Space

Page 192: Abdm4064 week 11 data analysis

1 - 192

Page 193: Abdm4064 week 11 data analysis

1 - 193

Page 194: Abdm4064 week 11 data analysis

SPSS WindowsSPSS Windows

The multidimensional scaling program allows individual The multidimensional scaling program allows individual differences as well as aggregate analysis using ALSCAL. The differences as well as aggregate analysis using ALSCAL. The level of measurement can be ordinal, interval or ratio. Both level of measurement can be ordinal, interval or ratio. Both the direct and the derived approaches can be accommodated. the direct and the derived approaches can be accommodated.

To select multidimensional scaling procedures using SPSS To select multidimensional scaling procedures using SPSS for Windows, click:for Windows, click:

Analyze>Scale>Multidimensional Scaling …Analyze>Scale>Multidimensional Scaling …

The conjoint analysis approach can be implemented using The conjoint analysis approach can be implemented using regression if the dependent variable is metric (interval or regression if the dependent variable is metric (interval or ratio). ratio).

This procedure can be run by clicking:This procedure can be run by clicking:

Analyze>Regression>Linear …Analyze>Regression>Linear …

Page 195: Abdm4064 week 11 data analysis

SPSS Windows : MDSSPSS Windows : MDSFirst convert similarity ratings to distances by subtracting each First convert similarity ratings to distances by subtracting each value of Table 21.1 from 8. The form of the data matrix has to value of Table 21.1 from 8. The form of the data matrix has to be square symmetric (diagonal elements zero and distances be square symmetric (diagonal elements zero and distances above and below the diagonal. See SPSS file Table 21.1 Input). above and below the diagonal. See SPSS file Table 21.1 Input).

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.2.2. Click SCALE and then MULTIDIMENSIONAL SCALING Click SCALE and then MULTIDIMENSIONAL SCALING

(ALSCAL).(ALSCAL).3.3. Move “Aqua-Fresh [AquaFresh],” “Crest [Crest],” “Colgate Move “Aqua-Fresh [AquaFresh],” “Crest [Crest],” “Colgate

[Colgate],” “Aim [Aim],” “Gleem [Gleem],” “Ultra Brite [Colgate],” “Aim [Aim],” “Gleem [Gleem],” “Ultra Brite [UltraBrite],” “Ultra-Brite [var00007],” “Close-Up [CloseUp],” [UltraBrite],” “Ultra-Brite [var00007],” “Close-Up [CloseUp],” “Pepsodent [Pepsodent],” and “Sensodyne [Sensodyne]” into “Pepsodent [Pepsodent],” and “Sensodyne [Sensodyne]” into the VARIABLES box.the VARIABLES box.

Page 196: Abdm4064 week 11 data analysis

SPSS Windows : MDSSPSS Windows : MDS

4.4. In the DISTANCES box, check DATA ARE DISTANCES. In the DISTANCES box, check DATA ARE DISTANCES. SHAPE should be SQUARE SYMMETRIC (default).SHAPE should be SQUARE SYMMETRIC (default).

5.5. Click on MODEL. In the pop-up window, in the LEVEL OF Click on MODEL. In the pop-up window, in the LEVEL OF MEASUREMENT box, check INTERVAL. In the SCALING MEASUREMENT box, check INTERVAL. In the SCALING MODEL box, check EUCLIDEAN DISTANCE. In the MODEL box, check EUCLIDEAN DISTANCE. In the CONDITIONALITY box, check MATRIX. Click CONTINUE.CONDITIONALITY box, check MATRIX. Click CONTINUE.

6.6. Click on OPTIONS. In the pop-up window, in the DISPLAY Click on OPTIONS. In the pop-up window, in the DISPLAY box, check GROUP PLOTS, DATA MATRIX and MODEL box, check GROUP PLOTS, DATA MATRIX and MODEL AND OPTIONS SUMMARY. Click CONTINUE.AND OPTIONS SUMMARY. Click CONTINUE.

7.7. Click OK.Click OK.

Page 197: Abdm4064 week 11 data analysis

24–197

EXHIBIT 24.EXHIBIT 24.1010 Summary of Multivariate Techniques for Analysis of InterdependenceSummary of Multivariate Techniques for Analysis of Interdependence

Page 198: Abdm4064 week 11 data analysis

Further ReadingFurther Reading COOPER, D.R. AND SCHINDLER, P.S. (2011)

BUSINESS RESEARCH METHODS, 11TH EDN, MCGRAW HILL

ZIKMUND, W.G., BABIN, B.J., CARR, J.C. AND GRIFFIN, M. (2010) BUSINESS RESEARCH METHODS, 8TH EDN, SOUTH-WESTERN

SAUNDERS, M., LEWIS, P. AND THORNHILL, A. (2012) RESEARCH METHODS FOR BUSINESS STUDENTS, 6TH EDN, PRENTICE HALL.

SAUNDERS, M. AND LEWIS, P. (2012) DOING RESEARCH IN BUSINESS & MANAGEMENT, FT PRENTICE HALL.