1 1 – intro & hist. - na chan 2 – basics of anova - alla tashlitsky 3 - data collection -...

119
AMS 572 ANOVA: One-Way, Two-Way, and Multiway. 1

Upload: dustin-anderson

Post on 16-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

1

AMS 572

ANOVA: One-Way, Two-Way, and Multiway.

Page 2: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

2

Group 3

1 – Intro & Hist. - Na Chan2 – Basics of ANOVA - Alla Tashlitsky3 - Data Collection - Bryan Rong4 - Checking Assumptions in SAS - Junying Zhang5 - 1-Way ANOVA derivation - Yingying Lin and Wenyi Dong6 - 1-Way ANOVA in SAS - Yingying Lin and Wenyi Dong7 - 2-Way ANOVA derivation - Peng Yang8 - 2-Way ANOVA in SAS - Phil Caffrey and Yin Diao9 - Multi-Way ANOVA Derivation - Michael Biro10 - ANOVA and Regression – Cris (Jiangyang) Liu

Page 3: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

3

Intro & History

Na Chen

Page 4: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

4

USES OF T-TEST

• A one-sample location test of whether the mean of a normally distributed population has a value specified in a null hypothesis.

• A two sample location test of the null hypothesis that the means of two normally distributed populations are equal

Page 5: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

5

USES OF T-TEST

• A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero

• A test of whether the slope of a regression line differs significantly from 0

Page 6: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

6

BACKGROUND

• If comparing means among > 2 groups, 3 or more t-tests are needed

-Time-consuming (Number of t-tests

increases)

-Inherently flawed (Probability of making a Type I error increases)

Page 7: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

7

RONALD A.FISHER

• Biologist• Eugenicist• Geneticist• Statistician “A genius who almost single-handedly created the

foundations for modern statistical science” - Anders Hald “The greatest of Darwin's successors” -Richard Dawkins

Informally used by researchers in the 1800s

Formally proposed by Ronald A. Fisher in 1918

Page 8: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

8

HISTORY

• Fisher proposed a formal analysis of variance in his paper The Correlation Between Relatives on the Supposition of Mendelian Inheritance in 1918.

• His first application of the analysis of variance was published in 1921.

• Become widely known after being included in Fisher's 1925 book Statistical Methods for Research Workers in 1925.

Page 9: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

9

DEFINITION

• An abbreviation for: ANalysis Of VAriance

• The procedure to consider means from k independent groups, where k is 2 or greater.

Page 10: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

10

ANOVA and T-TEST

• ANOVA and T-Test are similar -Compare means between groups • 2 groups, both work

• 2 or more groups, ANOVA is better

Page 11: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

11

TYPES

• ANOVA - analysis of variance – One way (F-ratio for 1 factor ) – Two way (F-ratio for 2 factors)

• ANCOVA - analysis of covariance

• MANOVA - multiple analysis

Page 12: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

12

APPLICATION

• Biology• Microbiology• Medical Science• Computer Science• Industry • Finance

Page 13: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

13

Basics of ANOVA

Alla Tashlitsky

Page 14: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

14

Definition

• ANOVA can determine whether there is a significant relationship between variables. It is also used to determine whether a measurable difference exists between two or more sample means.

• Objective: To identify important independent variables (predictor variables – yi’s) and determine how they affect the response variables.

• One-way, two-way, or multi-way ANOVA depend on the number of independent variables there are in the experiment that affect the outcome of the hypothesis test.

Page 15: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

15

Model & Assumptions

• (Simple Model)

• E(εi) = 0

• Var(ε1) = Var(ε2) = … = Var(εk): homoscedasticity

• All εi’s are independent.

• εi ~ N(0,σ2)

Page 16: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

16

Classes of ANOVA

1. Fixed Effects: concrete (e.g. sex, age)

2. Random Effects: representative sample (e.g. treatments, locations, tests)

3. Mixed Effects: combination of fixed and random

Page 17: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

17

Procedure• H0: µ1=µ2=…=µk vs

Ha: at least one the equalities doesn’t hold

• F~fk,n-(k+1),α = MSR/MSE = t2 (when there are only 2 means)– Where mean square regression: MSR = SSR/1 and mean

square error: MSE = SSE/n-2

• The rejection region for a given significance level is F > f

Page 18: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

18

Regression• SST (sum of squares total) = SSR (sum of

squares regression) + SSE (sum of squares error)

• Sample variance: S2 = MSE = SSE/n-k → Unbiased estimator for σ2

n

i

n

i

n

i

yyyyyy iiiiSST

1

2

1

2

1

2

)ˆ()ˆ()(

Page 19: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

19

Mean Variation

Page 20: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

20

Data Collection

Bryan Rong

Page 21: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

21

Data Collection

• 3 industries – Application Software, Credit Service, Apparel Stores

• Sample 15 stocks from each industry• For each stock, we observed the last 30 days

and calculated– Mean daily percentage change– Mean daily percentage range– Mean Volume

Page 22: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

22

Application software

• CA, Inc. [CA] • Compuware Corporation [CPWR] • Deltek, Inc. [PROJ] • Epicor Software Corporation [EPIC] • Fundtech Ltd. [FNDT] • Intuit Inc. [INTU] • Lawson Software, Inc. [LWSN] • Microsoft Corporation [MSFT• MGT Capital Investments, Inc. [MGT] • Magic Software Enterprises Ltd. [MGIC] • SAP AG [SAP] • Sonic Foundry, Inc. [SOFO] • RealPage, Inc. [RP] • Red Hat, Inc. [RHT] • VeriSign, Inc. [VRSN]

Page 23: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

23

Credit Service

• Advance America, Cash Advance Centers, Inc. [AEA]• Alliance Data Systems Corporation [ADS] • American Express Company [AXP] • Asset Acceptance Capital Corp. [AACC] • Capital One Financial Corporation [COF] • CapitalSource Inc. [CSE] • Cash America International, Inc. [CSH] • Discover Financial Services [DFS] • Equifax Inc. [EFX] • Global Cash Access Holdings, Inc. [GCA] • Federal Agricultural Mortgage Corporation [AGM]• Intervest Bancshares Corporation [IBCA] • Manhattan Bridge Capital, Inc. [LOAN] • MicroFinancial Incorporated [MFI] • Moody's Corporation [MCO]

Page 24: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

24

APPAREL STORES

• Abercrombie & Fitch Co. [ANF] • American Eagle Outfitters, Inc. [AEO] • bebe stores, inc. [BEBE] • DSW Inc. [DSW] • Express, Inc. [EXPR]• J. Crew Group, Inc. [JCG] • New York & Company, Inc. [NWY] • Nordstrom, Inc. [JWN] • Pacific Sunwear of California, Inc. [PSUN]• The Gap, Inc. [GPS] • The Buckle, Inc. [BKE] • The Children's Place Retail Stores, Inc. [PLCE]• The Dress Barn, Inc. [DBRN] • The Finish Line, Inc. [FINL] • Urban Outfitters, Inc. [URBN]

Page 25: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

25

Page 26: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

26

Page 27: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

27

Final Data look

Page 28: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

28

Checking Assumptions

ZhangJunying

Page 29: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

29

Major Assumptions of Analysis of Variance

• The Assumptions– Normal populations– Independent samples– Equal (unknown) population variances

• Our Purpose– Examine these assumptions by graphical analysis of residual

Page 30: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

30

Residual plot

• Violations of the basic assumptions and model adequacy can be easily investigated by the examination of residuals.

• We define the residual for observation j in treatment i as

• If the model is adequate, the residuals should be

structureless; that is, they should contain no obvious patterns.

ijijij yye

Page 31: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

31

Normality

• Why normal?– ANOVA is an Analysis of Variance – Analysis of two variances, more specifically, the ratio of two variances– Statistical inference is based on the F distribution which is given by

the ratio of two chi-squared distributions– No surprise that each variance in the ANOVA ratio come from a parent

normal distribution• Normality is only needed for statistical inference.

Page 32: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

32

Sas code for getting residual

PROC IMPORT datafile = 'C:\Users\junyzhang\Desktop\mydata.xls' out = stock;RUN;PROC PRINT DATA=stock;RUN;Proc glm data=stock;Class indu;Model adpcdata=indu;Output out =stock1 p=yhat r=resid;Run;PROC PRINT DATA=stock1;RUN;

Page 33: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

33

Normality test

The normal plot of the residuals is used to check the normality test.

proc univariate data= stock1 normal plot; var resid;run;

Page 34: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

Normality Tests

Tests for Normality

Test --Statistic--- -----p Value------

Shapiro-Wilk W 0.731203 Pr < W <0.0001Kolmogorov-Smirnov D 0.206069 Pr > D <0.0100Cramer-von Mises W-Sq 1.391667 Pr > W-Sq <0.0050Anderson-Darling A-Sq 7.797847 Pr > A-Sq <0.0050

Tests for Normality

Test --Statistic--- -----p Value------

Shapiro-Wilk W 0.989846 Pr < W 0.6521Kolmogorov-Smirnov D 0.057951 Pr > D >0.1500Cramer-von Mises W-Sq 0.03225 Pr > W-Sq >0.2500Anderson-Darling A-Sq 0.224264 Pr > A-Sq >0.2500

  Normal Probability Plot 2.3+ ++ * | ++* | +** | +** | **** | *** | **+ | ** | *** | **+ | *** 0.1+ *** | ** | *** | *** | ** | +*** | +** | +** | **** | ++ | +* -2.1+*++ +----+----+----+----+----+----+----+----+----+----+

-2 -1 0 +1 +2

Normal Probability Plot

8.25+

| *

|

|

| *

|

| *

| +

4.25+ ** ++++

| ** +++

| *+++

| +++*

| ++****

| ++++ **

| ++++*****

| ++******

0.25+* * ******************

+----+----+----+----+----+----+----+----+----+----+

34 34

Page 35: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

NormalityTests

35

Page 36: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

36

Independence

• Independent observations– No correlation between error terms– No correlation between independent variables and error

• Positively correlated data inflates standard error– The estimation of the treatment means are more accurate than the

standard error shows.

Page 37: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

37

SAS code for independence test

The plot of the residual against the factor is used to check the independence.

proc plot; plot resid* indu; run;

Page 38: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

Independence Tests

38

Page 39: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

39

Homogeneity of Variances

• Eisenhart (1947) describes the problem of unequal variances as follows– the ANOVA model is based on the proportion of the mean squares

of the factors and the residual mean squares – The residual mean square is the unbiased estimator of 2, the

variance of a single observation – The between treatment mean squares takes into account not only

the differences between observations, 2, just like the residual mean squares, but also the variance between treatments

– If there was non-constant variance among treatments, we can replace the residual mean square with some overall variance, a

2, and a treatment variance, t

2, which is some weighted version of a2

– The “neatness” of ANOVA is lost

Page 40: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

40

Sas code for Homogeneity of Variances test

The plot of residuals against the fitted value is used to check constant variance assumption.

proc plot; plot resid* yhat;run;

Page 41: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

Data with homogeneity of Variances

41

Page 42: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

Tests for Homogeneity of Variances

42

Page 43: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

Result about our data

– Normal populations

– Nearly independent samples

– Equal (unknown) population variances

So we can employ ANOVA to analyze our data.

43

Page 44: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

44

1-Way ANOVA

Yin gying Lin&

Wenyi Dong

Derivation and SAS

Page 45: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

45

Derivation – 1-Way ANOVA

• Hypotheses– H0: μ= μ1 = μ2 = μ3 = … = μn

– H1: μi ≠ μj for some i,j

• We assume that the jth observation in group i is related to the mean by xij = μ+ (μi – μ) + εij, where εij is a random noise term.

• We wish to separate the variability of the individual observations into parts due to differences between groups and individual variability

Page 46: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

46

Derivation – 1-Way ANOVA – Cont’

Page 47: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

47

Derivation – 1-Way ANOVA – Cont’

• We can show that

• Using the above equation, we define

Page 48: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

48

Derivation – 1-Way ANOVA – Cont’

• Given the distributions of the MSS values, we can reject the null hypothesis if the between group variance is significantly higher than the within group variance. That is,

• We reject the null hypothesis if F > fn-1,N-n,α

Page 49: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

49

Brief Summary Statistics

• Codeproc means data=stock maxdec=5 n mean std;by industry;var ADPC;

Get simple summary statistics(sample size, sample mean and SD of each industry) with max of 5 decimal places

Page 50: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

50

Brief Summary Statistics

• Output

Industry N Mean Std Dev

Apparel Stores

15 0.00253 0.00356

Application Software

15 0.00413 0.00742

Credit Service 15 0.00135 0.00443

Page 51: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

51

Data Plot

• Codeproc plot data=stock;plot industry*ADPC;

Produce crude graphical output

Page 52: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

52

Data Plot

• Output Plot of industry*ADPC. Legend: A = 1 obs, B = 2 obs, D = 4 obs. industry |CreditSe + A A B A AAA AABA A A

Applicat + A D A AAAAA A A A A

ApparelS + AA B A B B B A BA | -+---------+---------+---------+---------+---------+---------+---------+----- -0.015 -0.010 -0.005 0.000 0.005 0.010 0.015 0.020 0.025

ADPC

Page 53: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

53

One Way ANOVA Test• Code• proc anova data=stock;• class industry;• model ADPC=industry;

• means industry/tukey cldiff;

• means industry/tukey lines;

Class statement indicates that “industry” is a

factor.

Assumes”industry”influences average daily percentage change.

Multiple comparison by Tukey’s method—get actual Confidence Intervals.

Get pictorial display of

comparisons.

Page 54: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

54

GLM analysis

• Codeproc glm data=stock;class industry; model ADPC=industry;output out=stockfit p=yhat r=resid;

This procedure is similar to 'proc anova' but 'glm' allows residual plots but gives more junk

output.

Page 55: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

55

One Way ANOVA Test

• Output Dependent Variable: ADPC Sum of Source DF Squares Mean Square F Value Pr > F Model 2 0.00005833 0.00002916 1.00 0.3757 Error 42 0.00122217 0.00002910 Corrected Total 44 0.00128050 R-Square Coeff Var Root MSE ADPC Mean 0.045552 201.8054 0.005394 0.002673 Source DF Anova SS Mean Square F Value Pr > F industry 2 0.00005833 0.00002916 1.00 0.3757

1.00 0.3757

Page 56: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

56

One Way ANOVA Test

Tukey's Studentized Range (HSD) Test for ADPCAlpha Error Degrees of Freedom Error Mean Square Critical Value of Studentized Range Minimum Significant Difference

0.0542

.0000293.43582

.0048

Page 57: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

57

One Way ANOVA Test

Difference Industry Between Simultaneous 95%Comparison Means Confidence LimitsApplicat - ApparelS 0.001601 -0.003184 0.006387Applicat - CreditSe 0.002778 -0.002008 0.007563ApparelS - Applicat -0.001601 -0.006387 0.003184ApparelS - CreditSe 0.001177 -0.003609 0.005962CreditSe - Applicat -0.002778 -0.007563 0.002008CreditSe - ApparelS -0.001177 -0.005962 0.003609

Page 58: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

58

Univariate Procedure

• Code• proc univariate data=stockfit plot normal;• var resid;

We use the proc univariate to produce the stem-and-leaf and normal

probability plots and we use the stem-leaf plot to visualize the overall

distribution of a variable.

Page 59: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

59

Univariate Procedure

• Output MomentsN 45 Sum Weights 45Mean 0 Sum Observations 0Std Deviation 0.00527035 Variance 0.00002778Skewness 1.33008795 Kurtosis 5.46395169UncorrectedSS 0.00122217Corrected SS 0.00122217Coeff Variation . Std Error Mean 0.00078566

Page 60: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

60

Tests for Location: Mu0=0

Test -Statistic- -----p Value------ Student's t t 0 Pr > |t| 1.0000 Sign M -1.5 Pr >= |M| 0.7660Signed Rank S -43.5 Pr >= |S| 0.6288

Page 61: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

61

Basic Statistical Measures

Location Variability Mean 0.00000 Std Deviation 0.00527Median -0.00048 Variance 0.0000278 Mode . Range 0.03389 Interquartile Range 0.00623

Page 62: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

62

Tests for Normality

Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.904256 Pr < W 0.0013Kolmogorov-Smirnov D 0.112584 Pr > D >0.1500Cramer-von Mises W-Sq 0.096018 Pr > W-Sq 0.1266Anderson-Darling A-Sq 0.781507 Pr > A-Sq 0.0410

Page 63: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

63

Quantiles Quantile Estimate 100% Max 0.021509105 99% 0.021509105 95% 0.007261567 90% 0.005106613 75% Q3 0.002667399 50% Median -0.000477723 25% Q1 -0.003565176 10% -0.004824061 5% -0.005444811 1% -0.012376248 0% Min -0.012376248

Page 64: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

64

Extreme Observations

-------Lowest------- -------Highest------ Value Obs Value Obs -0.01237625 41 0.00510661 6 -0.00807339 25 0.00596875 34 -0.00544481 13 0.00726157 29 -0.00483936 3 0.00814126 27 -0.00482406 28 0.02150911 22

Page 65: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

65

Stem Leaf Plot and Boxplot Stem Leaf # Boxplot 20 5 1 * 18 16 14 12 10 8 1 1 | 6 03 2 | 4 4561 4 | 2 0027922 7 +-----+ 0 334669 6 | + | -0 9809753 7 *-----* -2 97688551 8 +-----+ -4 4888772 7 | -6 | -8 1 1 | -10 | -12 4 1 | ----+----+----+----+ Multiply Stem.Leaf by 10**-3

Page 66: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

66

Plot

• Code • proc plot;• plot resid*industry;• plot resid*yhat;• run;

Plot the qq graph of residual VS industry, and residual VS the approximated ADPC value.

Page 67: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

67

Normal Probability Plot

0.021+ * | | | | +++ | ++++ | ++* | ++++* | ++***** | +***** | +**** | ***** | ****** | * ******+ | ++++ | *++ | ++++ -0.013++++* +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2

Page 68: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

68

Graph 0.025 + | A 0.020 + 0.010 + | A | A | A 0.005 + B | A A | A C | B A B | A 0.000 + C B | A B | A B A | A B | B A A -0.005 + B D | A -0.010 + | A -0.015 + | ---+-------------------------+-------------------------+-- industry ApparelS Applicat CreditSe

Plot of resid*industry.

Legend: A = 1 obsB = 2 obsD = 4 obs

Page 69: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

69

Plot of resid*yhatresid0.025 + | A0.010 + | A | A | A 0.005 + B | A A | C A | B B A | A 0.000 + B C | A B | A A B | B A | A B A-0.005 + B D | A | A-0.015 + --+------------+------------+------------+------------+------------+------------ 0.0010 0.0015 0.0020 0.0025 0.0030 0.0035 yhat

Plot of resid*yhat. Legend: A = 1 obs, B = 2 obs,D=4 obs.

Page 70: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

70

Conclusion

• After the analysis of one way anova test,we can get the result of F=1.00 and p=0.3757. Since the p-value is bigger, we accept the null hypothesis which indicates that there is no difference between the mean of daily average percentage change of stocks of different industries. Thus, there is no different if we buy the stocks in different industries in the long term.

Page 71: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

71

2-Way ANOVA

Peng YangPhil Caffrey

Yin Diao

Derivation and SAS

Page 72: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

72

2-Way ANOVADerivation

We now have two factors (A & B)

A B

𝒂𝟏

𝒂𝟐

𝒂𝒊

𝒃𝟏

𝒃𝟏

𝒃 𝒋… …

Totaling n

Tests to Conduct

Page 73: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

73

2-Way ANOVADerivation

Linear Model

𝑿 𝒊𝒋𝒌=𝝁+𝝉 𝒊+𝜷 𝒋+(𝝉𝜷 )𝒊𝒋+𝝐𝒊𝒋𝒌

Dot Notation

letting

.

Page 74: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

74

2-Way ANOVADerivation

Least Square Method

∑𝒊=𝟏

𝒂

∑𝒋=𝟏

𝒃

∑𝒌=𝟏

𝒏

( 𝒙𝒊𝒋𝒌− �̂�… )𝟐

¿∑𝑖=1

𝑎

∑𝑗=1

𝑏

∑𝑘=1

𝑛

(𝑥𝑖 ..−𝑥…)2+¿∑𝑖=1

𝑎

∑𝑗=1

𝑏

∑𝑘=1

𝑛

(𝑥 . 𝑗 .−𝑥… )2+∑𝑖=1

𝑎

∑𝑗=1

𝑏

∑𝑘=1

𝑛

(𝑥 𝑖𝑗 .−𝑥𝑖 ..−𝑥 . 𝑗 .−𝑥…)2+¿∑𝑖=1

𝑎

∑𝑗=1

𝑏

∑𝑘=1

𝑛

(𝑥 𝑖𝑗𝑘− 𝑥𝑖𝑗 .)2¿¿

¿𝑏𝑛∑𝑖=1

𝑎

�̂� 𝑖2+𝑎𝑛∑

𝑖=1

𝑏

�̂� 𝑗2+𝑛∑

𝑖=1

𝑎

∑𝑗=1

𝑏

𝜏𝛽 𝑖𝑗2+∑

𝑖=1

𝑎

∑𝑗=1

𝑏

∑𝑘=1

𝑛

𝑒𝑖𝑗𝑘2

SST = SSA + SSB+ SSAB + SSE

SST = SSA + SSB+ SSAB + SSE

𝒃𝒏∑

𝒊=𝟏

𝒂

�̂� 𝒊𝟐

𝒂𝒏∑𝒊=𝟏

𝒃

�̂� 𝒋𝟐

𝒏∑𝒊=𝟏

𝒂

∑𝒋=𝟏

𝒃

𝝉𝜷𝒊𝒋𝟐

∑𝒊=𝟏

𝒂

∑𝒋=𝟏

𝒃

∑𝒌=𝟏

𝒏

𝒆𝒊𝒋𝒌𝟐

(𝑵 −𝟏 )

𝑫 .𝑭 .

(𝒂−𝟏 )

(𝒃−𝟏 )

(𝒂−𝟏 ) (𝒃−𝟏 )

(𝑵 −𝒂𝒃 )

Page 75: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

75

2-Way ANOVADerivation

Test Criteria

At least one

At least one

At least one

Rejection Conditions

Page 76: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

76

2-Way ANOVADerivation

Pivotal Quantity

At least one

𝑿 𝒊𝒋𝒌=𝝁+𝝉 𝒊+𝜷 𝒋+(𝝉𝜷 )𝒊𝒋+𝝐𝒊𝒋𝒌𝑿 𝒊𝒋𝒌=𝝁++𝜷 𝒋+(𝝉𝜷 )𝒊𝒋+𝝐 𝒊𝒋𝒌𝝉 𝒊

𝒙 𝒊𝒋𝒌=𝝁+𝜷 𝒋+(𝝉𝜷)𝒊𝒋+𝒆𝒊𝒋𝒌

Page 77: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

77

2-Way ANOVADerivation

Pivotal Quantity (Cont’)

𝑺𝑺𝑬 ′=∑𝒊=𝟏

𝒂

∑𝒋=𝟏

𝒃

∑𝒌=𝟏

𝒏

(𝒆𝒊𝒋𝒌− �̂� 𝒊 )𝟐

¿∑𝒊=𝟏

𝒂

∑𝒋=𝟏

𝒃

∑𝒌=𝟏

𝒏

𝒆𝒊𝒋𝒌𝟐+∑

𝒊=𝟏

𝒂

∑𝒋=𝟏

𝒃

∑𝒌=𝟏

𝒏

�̂�𝒊𝟐+𝟐∑

𝒊=𝟏

𝒂

∑𝒋=𝟏

𝒃

∑𝒌=𝟏

𝒏

(𝒆𝒊𝒋𝒌 �̂� 𝒊 )𝟐

¿𝑺𝑺𝑬+𝑺𝑺𝑨

𝑴𝑺𝑨=𝑺𝑺𝑨𝒂−𝟏

𝑴𝑺𝑬=𝑺𝑺𝑬𝑵 −𝒂𝒃

𝒔𝑬𝟐= 𝑺𝑺𝑬

(𝑵−𝒂𝒃 )𝝈𝟐

(𝑵 −𝒂𝒃)∗ 𝝌𝑵−𝒂𝒃−𝟏

𝟐

Page 78: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

78

Two-Way ANOVA in SAS

By: Philip Caffrey&

Yin Diao

Page 79: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

79

Model

• An extension of one way ANOVA. It provides more insight about how the two IVs interact and individually affect the DV. Thus, the main effects and interaction effects of two IVs have on the DV need to be tested.

• Model:

• Null hypothesis:

𝑿 𝒊𝒋𝒌=𝝁+𝝉 𝒊+𝜷 𝒋+(𝝉𝜷 )𝒊𝒋+𝝐𝒊𝒋𝒌

At least one

At least one

At least one

Page 80: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

80

Sum of Squares

Every term compared with the error term leads to F distribution. In this way, we can conclude whether there is main effect or interaction effect.

SSTOTAL = SSA + SSB + SSINTERACTION + SSERROR

Page 81: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

81

Example

Using the same data from the One-Way analysis, we will now separate the data further by introducing a second factor, Average Daily Volume.

Page 82: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

82

Example

Factor 1: Industry• Apparrel Stores• Application Software• Credit Services

Factor 2: Average Daily Volume• Low• Medium• High

Page 83: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

Two-Way Design

INDUSTRY

Credit Apparel Software

VOLUME

Low

Medium

High

Repeat 5 times each

83

Page 84: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

84

Using SAS

SAS code:

PROC IMPORT DATAFILE=PROC IMPORT DATAFILE='G:\Stony Brok Univ Text Books\AMS Project\Data.xls'

OUT=TWOWAY;RUN;

PROC ANOVA DATA = TWOWAY;TITLE “ANALYSIS OF STOCK DATA”;

CLASS INDUSTRY VOLUME;MODEL ADPC = INDUSTRY | VOLUME;MEANS INDUSTRY | VOLUME / TUKEY CLDIFF;

RUN;

Page 85: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

85

/*PLOT THE CELL MEANS*/

PROC MEANS DATA=WAY NWAY NOPRINT;CLASS INDT ADTV;VAR ADPC;OUTPUT OUT=MEANS MEAN=;RUN;

PROC GPLOT DATA=MEANS;PLOT INDT*ADTV;RUN;

Using SAS

Page 86: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

86

ANOVA TableTests of Between-Subjects Effects

Source Sum of

Squares df Mean

Square F Sig.

Corrected Model

.000a 8 3.335E-5 1.184 .335

Industry 6.906E-5 2 3.453E-5 1.226 .305

Volume 9.534E-5 2 4.767E-5 1.693 .198

Industry * Volume

7.950E-5 4 1.988E-5 .706 .593

Error .001 36 2.816E-5

Corrected Tota l

.001 44

No Sig. Results

Page 87: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

87

To test the main effect of one IV, we should combine all the data of the other IV. And this is done in the one way ANOVA.

From the ANOVA we know there is no significant main effects or interaction effect of the two IVs.

To indicate if there is an interaction effect, we can plot of means of each cell formed by combination of all levels of IVs.

Using SAS

Page 88: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

88

PLOT OF CELL MEANSIndustry by Average Daily Volume

Page 89: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

89

Interpreting the Output

Given that the F tests were not significant we would normally stop our analysis here.

If the F test is significant, we would want to know exactly which means are different from each other.

Use Tukey’s Test. MEANS INDUSTRY | VOLUME / TUKEY CLDIFF;

Page 90: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

90

Interpreting the Output

Comparing Means

Comparison Diff. b/w Means 95% CISoftware - Apparel 0.001601 [-0.003184 0.006387]

Software - Credit 0.002778 [-0.002008 0.007563]

Credit - Apparel -0.001177 [-0.005962 0.003609]

MedVol. - LowVol. -0.003698 [-0.008435 0.001038]

Med.Vol. - HighVol. -0.001252 [-0.005989 0.003484]

HighVol. - LowVol. -0.002446 [-0.007182 0.002290]

Page 91: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

91

Conclusion

• We cannot conclude that there is a significant difference between any of the group means.

• The two IVs have no effects on the DV.

Page 92: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

92

Mutli-Way ANOVA

Michael Biro&

Cris Liu

Derivation

Page 93: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

93

M-way ANOVA (Derivation)

• Let us have n factors, A1,A2,…,An , each with 2 or more levels, a1,a2,…,an, respectively. Then there are N = a1a2…an types of treatment to conduct, with each treatment having sample size ni. Let xi1i2…ink be the kth observation from treatment i1i2…in

.

• By the assumption for ANOVA, xi1i2…ink is a random variable that follows the normal distribution. Using the model xi1i2…ink = µi1i2…ink + εi1i2…ink where each (residual) εi1i2…

ink are i.i.d. and follows N(0,σ2).

Page 94: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

94

M-way ANOVA (Derivation)

Using “dot notation”, let

, , …, ,…, .

Let

, and , where is the grand mean (see above), is the mean effect of factor subtract by the grand

mean, and is the mean effect of factor subtract by the grand mean. Then we can model the above as a linear equation of

Page 95: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

95

M-way ANOVA (Derivation)

Applying Least Square Estimation we get

Which is the ANOVA Identity,

Page 96: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

96

M-way ANOVA (Derivation)

• These are all distributed as independent χ2

random variables (when multiplied by the correct constants and when some hypotheses hold) with d.f. satisfying the equation:

Page 97: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

97

M-way ANOVA (Derivation)

• There are a total of 2m hypotheses in an m-way ANOVA.– The null hypothesis, which states that there is no

difference or interaction between factors– For k from 1 to m, there are mCk alternative

hypotheses about the interaction between every collection of k factors.

– Then we have 1 + mC1 + mC2 + … + mCm = 2m by a well known combinatorial identity.

Page 98: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

98

M-way ANOVA (Derivation)

• These hypotheses are: At least one

At least one

...

At least one

At least one

...

Test for all combination of

Page 99: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

99

M-way ANOVA (Derivation)

• We want to see if the variability between groups is larger that the variability within the groups.

• To do this, we use the F distribution as our pivotal quantity, and then we can derive the proper tests, very similar to the 1-way and 2-way tests.

Page 100: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

100

M-way ANOVA (Derivation)

...

...

...

Continue to see whether all combination of

Page 101: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

101

ANOVA and Regression

Presenter: Cris J.Y. Liu

RELATIONSHIP BETWEEN

Page 102: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

102

• What we know:– regression is the statistical model that you use to predict

a continuous outcome on the basis of one or more continuous predictor variables.

– ANOVA compares several groups (usually categorical predictor variables) in terms of a certain dependent variable(continuous outcome )

( if there are mixture of categorical and continuous data, ANCOVA is an alternative method.)

• Take a second look: They are the just different sides of the same coin!

Page 103: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

103

Review of ANOVA

• Compare the means of different groups• n groups, ni elements for ith group, N element

in total.• SST= +SSbetween SSwithin

How about only two group,X and Y,Each have n data?

Page 104: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

104

Review of Simple Linear Regression

• We try to find a line y = β0 + β1 x that best fits our data so that we can calculate the best estimate of y from x

• It will find such β0 and β1 that minimize the distance Q between the actual and estimated score

• Let predicted value be of one group, while the other group consist all of original value ..

• It is a special (and also simple) case of ANOVA!

Minimize me

Page 105: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

105

Review of Regression

= +

Total = Model + Error(Between) (Within)

d.f.: 2-1 = 1 d.f.:n-2d.f.: n-1

Page 106: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

106

ANOVA table of Regression

Page 107: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

107

How are they alike?

• If we use the group mean to be our X values from which we predict Y we can see that ANOVA and regression is the same!!

• The group mean is the best prediction of a Y-score.

Page 108: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

108

Term comparison

Regression ANOVA Dependent variable

Explaintory variable

total mean

SSR SSbetween

SSE SSwithin

Page 109: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

109

Term comparison

if more than one predictor…..

Regression ANOVA

Multiple Regression Multi-way ANOVA dummy variable categorical variable interaction effect covariance …………………. ……………

Page 110: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

110

Notes:

• Both of them are applicable only when outcome variables are continuous.

• They share basically the same procedure of checking the underlying assumption.

Page 111: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

111

Robust ANOVA

-Taguchi Method

Page 112: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

112

What is Robustness?

• The term “robustness” is often used to refer to methods designed to be insensitive to distributional assumptions (such as normality) in general, and unusual observations (“outliers”) in particular.

• Why Robust ANOVA?• There is always the possibility that some observations may

contain excessive noise. • excessive noise during experiments might lead to incorrect

inferences. • Widely used in Quality control

Page 113: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

113

Robust ANOVA

• What we want from robust ANOVA? robust ANOVA methods could withstand non-

ideal conditions while no more difficult to perform than ordinary ANOVA

• Standard technique----least squares method is highly sensitive to unusual observations

Page 114: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

114

Robust ANOVA

Our aim is to minimize by choosing β:

In standard ANOVA, we let

we can also try some other ρ(x) .

Page 115: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

115

Least absolute deviation

• It is well-known that the median is much more robust to outliers than the mean.

• least absolute deviation (LAD) estimate, which takes

• How is LAD related to median? the LAD estimator determines the “center” of the data set by minimizing the sum of the absolute deviations from the estimate of the center, which turns out to be the median.• It has been shown to be quite effective in the presence of

fat tailed data

Page 116: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

116

M-estimation• M-estimation is based on replacing ρ(.) with a

function that is less sensitive to unusual observations than is the quadratic .

• The M means we should keep ρ follows MLE.• LSD with , is an example of a robust

M-estimator. • Another popular choice of ρ : Tukey bisquare:

and (;)1rcρ= otherwise, where r is the residual and c is a constant.

Page 117: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

117

Suggestion

• these robust analyses may not take the place of standard ANOVA analyses in this context;

• Rather, we believe that the robust analyses should be undertaken as an adjunct to the standard analyses

Page 118: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

118

Questions?

Page 119: 1 1 – Intro & Hist. - Na Chan 2 – Basics of ANOVA - Alla Tashlitsky 3 - Data Collection - Bryan Rong 4 - Checking Assumptions in SAS - Junying Zhang

119

Thank You