8_two samples test- updated- 1 july 14.pdf

62
Two sample Test A.Ramesh Department of Management Studies Indian institute of Technology Roorkee

Upload: prashantsaini

Post on 20-Nov-2015

12 views

Category:

Documents


0 download

TRANSCRIPT

  • Two sample Test

    A.RameshDepartment of Management Studies

    Indian institute of Technology Roorkee

  • Two-Sample Tests Overview

    The means of two independent populations

    The means of two related populations

    The proportions of two independent populations

    The variances of two independent populations

  • Two-Sample Tests Overview

    Two Sample Tests

    Independent

    Population

    Means

    Means,

    Related

    Populations

    Independent

    Population

    Variances

    Group 1 vs. Group 2

    Same group before vs. after treatment

    Variance 1 vs.Variance 2

    Examples

    Independent

    Population

    Proportions

    Proportion 1vs. Proportion 2

  • Two-Sample Tests

    Independent

    Population Means

    1 and 2 known

    1 and 2 unknown

    Goal: Test hypothesis or form

    a confidence interval for the

    difference between two

    population means, 1 2

    The point estimate for the

    difference between sample

    means:

    X1 X2

  • Sampling Distribution of the Difference Between

    Two Sample Means

    1

    1

    X n

    x=

    Population 1

    Population 2

    2

    2

    X n

    x=

    1 2X X

    1X

    2X

    1 2X X

  • Sampling Distribution of the Difference

    between Two Sample Means

    1 2X X1 2X X

    1 2

    1

    2

    1

    2

    2

    2X X n n

    = +

    1 2

    1 2X X=

  • Z Formula for the Difference in Two Sample

    Means

    ( ) ( )Z

    X X

    n n

    =

    +

    1 2 1 2

    1

    2

    1

    2

    2

    2

    n1 30, n2 30, and Independent Samples

  • Confidence Interval to Estimate 1 - 2 When

    n1 and n2 are large and 1, 2 are unknown

    ( ) ( )1 2

    1

    2

    1

    2

    2

    21 2 1 2

    1

    2

    1

    2

    2

    2

    X XSn

    Sn X X

    Sn

    Sn

    Z Z + +

    ( ) ( )Pr [ ]ob Z ZX X SnSn X X

    Sn

    Sn1 2

    1

    2

    1

    2

    2

    21 2 1 2

    1

    2

    1

    2

    2

    2

    1 + + =

  • Two-Sample Tests

    Independent Populations

    Lower-tail test:

    H0: 1 2H1: 1 < 2

    i.e.,

    H0: 1 2 0

    H1: 1 2 < 0

    Upper-tail test:

    H0: 1 2H1: 1 > 2

    i.e.,

    H0: 1 2 0

    H1: 1 2 > 0

    Two-tail test:

    H0: 1 = 2H1: 1 2

    i.e.,

    H0: 1 2 = 0

    H1: 1 2 0

    Two Independent Populations, Comparing Means

  • Two-Sample Tests

    Independent Populations

    Two Independent Populations, Comparing Means

    Lower-tail test:

    H0: 1 2 0

    H1: 1 2 < 0

    Upper-tail test:

    H0: 1 2 0

    H1: 1 2 > 0

    Two-tail test:

    H0: 1 2 = 0

    H1: 1 2 0

    /2 /2

    -z -z/2z z/2

    Reject H0 if Z < -Za Reject H0 if Z > Za Reject H0 if Z < -Za/2or Z > Za/2

  • Problem 1: Two Sample Z test

    A random sample of 32 advertising managers from across the United States is taken. The advertising managers are contacted by telephone and asked what their annual salary is.

    A similar random sample is taken of 34 auditing managers. The resulting salary data are listed in Table , along with the sample means, the population standard deviations, and the population variances.

  • Hypothesis Testing for Differences Between

    Means: The Wage Example

    Advertising Managers

    74.256 57.791 71.115

    96.234 65.145 67.574

    89.807 96.767 59.621

    93.261 77.242 62.483

    103.030 67.056 69.319

    74.195 64.276 35.394

    75.932 74.194 86.741

    80.742 65.360 57.351

    39.672 73.904

    45.652 54.270

    93.083 59.045

    63.384 68.508

    164.264

    253.16

    700.70

    32

    2

    1

    1

    1

    1

    =

    =

    =

    =

    S

    S

    X

    n

    411.166

    900.12

    187.62

    34

    2

    2

    2

    2

    2

    =

    =

    =

    =

    S

    S

    X

    n

    Auditing Managers

    69.962 77.136 43.649

    55.052 66.035 63.369

    57.828 54.335 59.676

    63.362 42.494 54.449

    37.194 83.849 46.394

    99.198 67.160 71.804

    61.254 37.386 72.401

    73.065 59.505 56.470

    48.036 72.790 67.814

    60.053 71.351 71.492

    66.359 58.653

    61.261 63.508

  • Hypothesis Testing for Differences Between Means:

    The Wage Example

    1 2X X

    RejectionRegion

    Non Rejection Region

    Critical Values

    RejectionRegion

    1 2X X

    025.2

    =

    025.2

    =

    H

    H

    o

    a

    :

    :

    1 2

    1 2

    0

    0

    =

  • Hypothesis Testing for Differences Between Means:

    The Wage Example

    .Hreject not do 1.96, Z 1.96- If

    .Hreject 1.96, > or Z 1.96- < ZIf

    o

    o

    RejectionRegion

    Non Rejection Region

    Critical Values

    RejectionRegion

    96.1=Zc0 96.1=Zc

    025.2

    =025.

    2=

  • Hypothesis Testing for Differences between

    Means: The Wage Example

    ( ) ( )

    ( ) ( )35.2

    34

    411.166

    32

    253.256

    0187.62700.70

    2

    2

    2

    1

    2

    1

    2121

    =

    +

    =

    +

    =

    nS

    nS

    XXZ

    .Hreject not do 1.96, Z 1.96- If

    .Hreject 1.96, > or Z 1.96- < ZIf

    o

    o

    .Hreject 1.96, > 2.35 = ZSince o

    RejectionRegion

    Non Rejection Region

    Critical Values

    RejectionRegion

    cZ = 233.

    025.2

    =

    0 cZ = 233.

    025.2

    =

  • Problem 2: Two Sample Z test Greystone Department Stores, Inc., operates two stores in

    Buffalo, New York: One is in the inner city and the other is in

    a suburban shopping center.

    The regional manager noticed that products that sell well in

    one store do not always sell well in the other.

    The manager believes this situation may be attributable to

    differences in customer demographics at the two locations.

    Customers may differ in age, education, income, and so on.

    Suppose the manager asks us to investigate the difference

    between the mean ages of the customers who shop at the two

    stores.

  • Data

    1 = 10 and 2 = 10

    = .05

    n1 = 30

    n2 = 40

    X1 bar = 82 and x2 bar= 78.

  • Solution

    The margin of error is 4.06 years and the 95%

    confidence interval estimate of the difference

    between the two population means is 5 - 4.06= .94

    years to 5 - 4.06 = 9.06 years.

    Do not reject Ho.

  • Two-Sample Tests

    Independent Populations: 1 and 2 unknown

    Independent

    Population Means

    1 and 2 known

    1 and 2 unknown

    Assumptions:

    Samples are randomly andindependently drawn

    Populations are normally

    distributed

    Population variances are

    unknown but assumed equal

  • Two-Sample Tests

    Independent Populations

    Independent

    Population Means

    1 and 2 known

    1 and 2 unknown

    Forming interval estimates:

    The population variances

    are assumed equal, so use

    the two sample standard

    deviations and pool them to

    estimate

    the test statistic is a t value

    with (n1 + n2 2) degrees

    of freedom

  • The t Test for Differences in Population Means

    Each of the two populations is normally distributed.

    The two samples are independent.

    At least one of the samples is small, n < 30.

    The values of the population variances are unknown.

    The variances of the two populations are equal. 1

    2 = 22

  • t Formula to Test the Difference in Means

    Assuming 12 = 2

    2

    ( ) ( )( ) ( )

    tX X

    S n S nn n n n

    =

    +

    + +

    1 2 1 2

    1

    2

    1 2

    2

    2

    1 2 1 2

    1 1

    2

    1 1

  • Problem 1: Independent Populations and 1and 2 unknown and equal

    At the Hernandez Manufacturing Company, an application of this test arises.

    New employees are expected to attend a three-day seminar to learn about the company. At the end of the seminar, they are tested to measure their knowledge about the company.

    The traditional training method has been lecture and a question-and-answer session. Management decided to experiment with a different training procedure, which processes new employees in two days by using DVDs and having no question-and-answer session.

    If this procedure works, it could save the company thousands of dollars over a period of several years. However, there is some concern about the effectiveness of the two-day method, and company managers would like to know whether there is any difference in the effectiveness of the two training methods.

  • Hernandez Manufacturing Company: Test

    Scores for New Employees After Training

    Training Method A

    56 51 45

    47 52 43

    42 53 52

    50 42 48

    47 44 44

    Training Method B

    59

    52

    53

    54

    57

    56

    55

    64

    53

    65

    53

    57

    1

    1

    1

    2

    15

    47 73

    19 495

    n

    X

    S

    =

    =

    =

    .

    .

    2

    2

    2

    2

    12

    56 5

    18 273

    n

    X

    S

    =

    =

    =

    .

    .

  • Hernandez Manufacturing Company

    H

    H

    o

    a

    :

    :

    1 2

    1 2

    0

    0

    =

    If t < - 2.060 or t > 2.060, reject H .

    If - 2.060 t 2.060, do not reject H .

    o

    o

    2

    05

    2025

    2 15 12 2 25

    2 060

    1 2

    0 25 25

    = =

    = + = =

    =

    ..

    .. ,

    df n n

    t

    RejectionRegion

    Non Rejection Region

    Critical Values

    RejectionRegion

    2025=.

    0 . , .025 25 2060t =

    2025=.

    . ,.

    025 252060t =

  • Hernandez Manufacturing Company

    Since t = -5.20 < -2.060, reject H .o

    ( ) ( )( ) ( )

    ( )

    ( )( ) ( )( )

    tX X

    S n S nn n n n

    =

    +

    + +

    =

    +

    + +

    =

    1 2 1 2

    1

    2

    1 2

    2

    2

    1 2 1 2

    1 1

    2

    1 1

    47 73 56 50 0

    19 495 14 18 273 11

    15 12 2

    1

    15

    1

    12

    5 20

    . .

    . .

    .

    If t < -2.060 or t > 2.060, reject H .

    If -2.060 t 2.060, do not reject H .

    o

    o

  • Confidence Interval to Estimate 1 -

    2 with Small Samples and 12 = 2

    2

    ( )( ) ( )

    1 2

    1

    2

    1 2

    2

    2

    1 2 1 2

    1 2

    1 1

    2

    1 1

    2

    X XS n S n

    n n n n

    n n

    t

    where df

    +

    + +

    = +

  • Problem 2: Independent Populations and 1and 2 unknown and equal

    You are a financial analyst for a brokerage firm. Is there a difference in dividend yield between stocks listed on the NYSE & NASDAQ? You collect the following data:

    NYSE NASDAQ (National Association of Securities Dealers Automated Quotations.)

    Number 21 25

    Sample mean 3.27 2.53

    Sample std dev 1.30 1.16

    Assuming both populations are approximately normal with equal

    variances, is there a difference in average yield ( = 0.05)?

  • Solution

    H0: 1 - 2 = 0 i.e. (1 = 2)

    H1: 1 - 2 0 i.e. (1 2)

  • Two-Sample Tests

    Independent Populations

    ( ) ( ) ( ) ( )1.5021

    1)25(1)-(21

    1.161251.30121

    1)n()1(n

    S1nS1nS

    22

    21

    2

    22

    2

    112p =

    +

    +=

    +

    +=

    ( ) ( ) ( )2.040

    25

    1

    21

    15021.1

    02.533.27

    n

    1

    n

    1S

    XXt

    21

    2p

    2121 =

    +

    =

    +

    =

    The test statistic is:

  • Two-Sample Tests

    Independent Populations

    H0: 1 - 2 = 0 i.e. (1 = 2)

    H1: 1 - 2 0 i.e. (1 2)

    = 0.05

    df = 21 + 25 - 2 = 44

    Critical Values: t = 2.0154

    Test Statistic: 2.040

    t0 2.0154-2.0154

    .025

    Reject H0 Reject H0

    .025

    Decision: Reject H0 at = 0.05

    2.040

    Conclusion: There is evidence

    of a difference in the means.

  • Two-Sample Tests: Dependent Samples

    Before and After Measurements on the same individual

    Studies of twins

    Studies of spouses

    Individual

    1

    2

    3

    4

    5

    6

    7

    Before

    32

    11

    21

    17

    30

    38

    14

    After

    39

    15

    35

    13

    41

    39

    22

  • Two-Sample Tests

    Related Populations

    D = X1 - X2

    Tests Means of 2 Related Populations

    Paired or matched samples

    Repeated measures (before/after)

    Use difference between paired values:

    Assumptions:

    Both Populations Are Normally Distributed

  • Two-Sample Tests

    Related Populations

    The ith paired difference is Di , where

    n

    D

    D

    n

    1i

    i==

    Di = X1i - X2iThe point estimate for the population mean

    paired difference is D :

  • Two-Sample Tests

    Related Populations

    Suppose the population standard deviation of

    the difference scores, D, is known.

    The test statistic for the mean difference is a Z value:

    n

    DZ

    D

    D=

    Where

    D = hypothesized mean difference

    D = population standard deviation of differences

    n = the sample size (number of pairs)

  • Two-Sample Tests

    Related Populations

    If D is unknown, you can estimate the unknown population standard deviation with a sample standard deviation:

    1n

    )D(D

    S

    n

    1i

    2

    i

    D

    =

    =

  • Two-Sample Tests

    Related Populations

    1n

    )D(D

    S

    n

    1i

    2i

    D

    =

    =

    n

    S

    Dt

    D

    D=

    The test statistic for D is now a t statistic:

    Where t has n - 1 d.f.

    and SD is:

  • Two-Sample Tests

    Related Populations

    Lower-tail test:

    H0: D 0

    H1: D < 0

    Upper-tail test:

    H0: D 0

    H1: D > 0

    Two-tail test:

    H0: D = 0

    H1: D 0

    /2 /2

    -t -t/2t t/2

    Reject H0 if t < -ta Reject H0 if t > ta Reject H0 if t < -ta/2or t > ta/2

  • Problem 1: Two-Sample Tests

    Related Populations

    Assume you send your salespeople to a customer service training workshop. Has the training made a difference in the number of complaints? You collect the following data:

    Salesperson Number of Complaints Difference, Di

    (2-1)Before (1) After (2)

    C.B. 6 4 -2

    T.F. 20 6 -14

    M.H. 3 2 -1

    R.K. 0 0 0

    M.O 4 0 -4

  • Two-Sample Tests

    Related Populations Example

    2.4n

    D

    D

    n

    1i

    i

    ==

    =

    5.67

    1n

    )D(DS

    2

    i

    D

    =

    =

    Salesperson Number of Complaints Difference, Di

    (2-1)Before (1) After (2)

    C.B. 6 4 -2

    T.F. 20 6 -14

    M.H. 3 2 -1

    R.K. 0 0 0

    M.O 4 0 -4

  • Two-Sample Tests

    Related Populations Example

    Has the training made a difference in the number of complaints (at the = 0.01 level)?

    H0: D = 0

    H1: D 0Critical Value = 4.604

    d.f. = n - 1 = 4

    Test Statistic:

    1.6655.67/

    04.2

    n/S

    t

    D

    D =

    =

    =D

  • Two-Sample Tests

    Related Populations Example

    Reject

    - 4.604 4.604

    Reject

    /2

    - 1.66

    Decision: Do not reject H0(t statistic is not in the reject region)

    Conclusion: There is no

    evidence of a significant change

    in the number of complaints

    /2

  • Two-Sample Tests

    Related Populations

    The confidence interval for D (known) is:

    n

    DZD

    Where

    n = the sample size (number of pairs in the paired sample)

  • Two-Sample Tests

    Related Populations

    The confidence interval for D ( unknown) is:

    1n

    )D(D

    S

    n

    1i

    2

    i

    D

    =

    =

    n

    StD D1n

    where

  • Sampling Distribution of Differences

    in Sample ProportionsFor large sam ples

    1.

    2.

    3. and

    4. where q = 1 - p

    the difference in sam ple proportions is norm ally distributed with

    p and

    p

    1

    1

    2

    2

    n

    n

    n

    n

    1

    1

    >

    >

    >

    >

    =

    =

    +

    1

    1

    2

    2

    1 2

    1 1

    1

    2 2

    2

    5

    5

    5

    5

    2

    2

    ,

    ,

    ,

    p

    q

    p

    q

    P P

    P Q

    nP Q

    n

    p

    p

  • Z Formula for the Difference

    in Two Population Proportions

    ( ) ( )Z

    p p P P

    P Q

    nP Q

    n

    p

    p

    =

    +

    =

    =

    =

    =

    =

    =

    =

    =

    1 2 1 2

    1 1

    1

    2 2

    2

    1

    2

    proportion from sam ple 1

    proportion from sam ple 2

    size of sam ple 1

    size of sam ple 2

    proportion from population 1

    proportion from population 2

    1 -

    1 -

    1

    2

    1

    2

    1 1

    2 2

    n

    n

    P

    P

    Q P

    Q P

  • Z Formula to Test the Difference

    in Population Proportions

    ( ) ( )

    ( )

    Z

    P Q

    P

    Q P

    p p P P

    n n

    X Xn n

    n p n p

    n n

    =

    +

    =+

    +

    =+

    +

    =

    1 2 1 2

    1 2

    1 2

    1 2

    1 1 2 2

    1 2

    1 1

    1

  • Two Population Proportions

    Hypothesis for Population Proportions

    Lower-tail test:

    H0: 1 2H1: 1 < 2

    i.e.,

    H0: 1 2 0

    H1: 1 2 < 0

    Upper-tail test:

    H0: 1 2H1: 1 > 2

    i.e.,

    H0: 1 2 0

    H1: 1 2 > 0

    Two-tail test:

    H0: 1 = 2H1: 1 2

    i.e.,

    H0: 1 2 = 0

    H1: 1 2 0

  • Two Population Proportions

    Hypothesis for Population Proportions

    Lower-tail test:

    H0: 1 2 0

    H1: 1 2 < 0

    Upper-tail test:

    H0: 1 2 0

    H1: 1 2 > 0

    Two-tail test:

    H0: 1 2 = 0

    H1: 1 2 0

    /2 /2

    -z -z/2z z/2

    Reject H0 if Z < -Z Reject H0 if Z > Z Reject H0 if Z < -Z/2or Z > Z/2

  • Two Independent Population

    Proportions: Example

    Is there a significant difference between the proportion of men and the proportion of women who will vote Yes on Proposition A?

    In a random sample of 72 men, 36 indicated they would vote Yes and, in a sample of 50 women, 31 indicated they would vote Yes

    Test at the .05 level of significance

  • Two Independent Population

    Proportions: Example

    H0: 1 2 = 0 (the two proportions are equal)

    H1: 1 2 0 (there is a significant difference between proportions)

    The sample proportions are:

    Men: p1 = 36/72 = .50

    Women: p2 = 31/50 = .62

    The pooled estimate for the overall proportion is:

    .549122

    67

    5072

    3136

    nn

    XXp

    21

    21 ==+

    +=

    +

    +=

  • Two Independent Population

    Proportions: Example

    The test statistic for 1 2 is:

    ( ) ( )

    ( ) ( )1.31

    50

    1

    72

    1.549)(1.549

    0.62.50

    n

    1

    n

    1)p(1p

    z

    21

    2121

    =

    +

    =

    +

    =

    pp

    Critical Values = 1.96For = .05

    .025

    -1.96 1.96

    .025

    -1.31

    Decision: Do not reject H0

    Conclusion: There is no evidence of a

    significant difference in proportions who

    will vote yes between men and women.

    Reject H0 Reject H0

  • Two Independent Population

    Proportions

    ( )2

    22

    1

    1121

    n

    )(1

    n

    )(1 ppppZpp

    +

    The confidence interval for 1 2 is:

  • F Test for Two Population Variances

    1

    1

    22min

    11

    2

    2

    2

    1

    ==

    ==

    =

    n

    n

    SS

    atordeno

    numerator

    df

    df

    F

  • F Distribution with 1 = 10 and 2 = 8

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.00 1.00 2.00 3.00 4.00 5.00 6.00

  • A Portion of the F Distribution Table

    for = 0.025

    Numerator Degrees of Freedom

    DenominatorDegrees of Freedom

    . , ,025 9 11F

    1 2 3 4 5 6 7 8 9

    1 647.79 799.48 864.15 899.60 921.83 937.11 948.20 956.64 963.28

    2 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.39

    3 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.47

    4 12.22 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.90

    5 10.01 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68

    6 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.52

    7 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.82

    8 7.57 6.06 5.42 5.05 4.82 4.65 4.53 4.43 4.36

    9 7.21 5.71 5.08 4.72 4.48 4.32 4.20 4.10 4.03

    10 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.78

    11 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.59

    12 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.44

  • Testing Population Variances

    Purpose: To determine if two independent

    populations have the same variability.

    H0: 12 = 2

    2

    H1: 12 2

    2

    H0: 12 2

    2

    H1: 12 < 2

    2

    H0: 12 2

    2

    H1: 12 > 2

    2

    Two-tail test Lower-tail test Upper-tail test

  • Suppose a machine produces metal sheets that are specified to be 22 milli meters thick.

    Because of the machine, the operator, the raw material, the manufacturing environment, and other factors, there is variability in the thickness.

    Two machines produce these sheets. Operators are concerned about the consistency of the two machines. To test consistency, they randomly sample 10 sheets produced by machine 1 and 12 sheets produced by machine 2.

    The thickness measurements of sheets from each machine are given in the table on the following page. Assume sheet thickness is normally distributed in the population.

    How can we test to determine whether the variance from each sample comes from the same population variance (population variances are equal) or from different population variances (population variances are not equal)?

  • Sheet Metal Example: Hypothesis Test for

    Equality of Two Population Variances

    H

    H

    o

    a

    :

    :

    1

    2

    2

    2

    1

    2

    2

    2

    =

    .025,9,11F =359.

    If

    If

    F 3.59 reject H.

    0.28 F do reject H.

    o

    o

    ,

    . , 359

    = .975,11,9

    .025,9,11

    FF

    1

    1

    359

    028

    =

    =

    .

    .

    F

    df

    df

    SS

    n

    n

    numerator

    deno ator

    =

    = =

    = =

    1

    2

    2

    2

    1 1

    2 2

    1

    1

    min

    =

    =

    =

    005

    10

    12

    1

    2

    .

    n

    n

  • Sheet metal Manufacturer

    Rejection Regions

    Critical Values

    . , ,.

    025 9 113 59F =

    Non RejectionRegion

    . , ,.

    975 11 90 28F =

    If

    If

    F 3.59 reject H.

    0.28 F do reject H.

    o

    o

    ,

    . , 359

  • Sheet Metal Example

    Machine 1

    22.3 21.8 22.2

    21.8 21.9 21.6

    22.3 22.4

    21.6 22.5

    Machine 2

    22.0

    22.1

    21.8

    21.9

    22.2

    22.0

    21.7

    21.9

    22.0

    22.1

    21.9

    22.1

    1

    1

    2

    10

    0 1138

    n

    S

    =

    = .

    2

    2

    2

    12

    0 0202

    n

    S

    =

    = .F

    SS

    = = =1

    2

    2

    2

    01138

    0 0202563

    .

    ..

    .Hreject 3.59, = F > 5.63 =F Since oc