lect w7 t_test_amp_chi_test

FFT 2074WEEK 7: T-TESTCHI-SQUARE

Prepared by: Mdm. Yusrina Amin

T-test

This test is use for comparing the means of two samples (or treatments), even if they have different numbers of replicates. !The t test is used when the population standard deviation is unknown and must be estimated by the sample standard deviation. Since the population standard deviation is generally unknown, this is the more common test statistic.

Assumptions for the t test

a. The population standard deviation is unknown and is estimated by the sample standard deviation. !

b. Numerical data is independently and randomly drawn from a normal distribution !

c. If the population is not normal, but not very skewed and the sample size is large (> 30), the t distribution provides a good approximation to the sampling distribution of the sample mean.

T-test formula

Example

A healthcare actuary has been investigating the cost of maintaining cancer patients within its plan. These people have typically been running up costs at the rate of $1240 per month. A sample of 15 cases for October and an average cost of $1080 with a standard deviation of $180. Is there any evidence of a significant change?

Solution

Chi-Square

It is mainly used two types of data, numerical and categorical data. It is used to investigate whether distributions of categorical variables differ from one another. (significant difference between the expected frequencies and the observed frequencies in one or more categories)

The chi-square test is used in two similar but distinct circumstances: ✓ for estimating how closely an observed distribution

matches an expected distribution - we'll refer to this as the goodness-of-fit test

✓ for estimating whether two random variables are independent.

…goodness of fit

The hypothesis test takes only one form

H0 : The observed frequency distribution is the same as the hypothesized frequency distribution

H1 : The observed and hypothesized frequency distributions are different

….goodness of fit

Test statistic

!

!Let Oi denote the observed frequency of the i-th category.

The test statistic is based on the difference between the observed and expected frequencies, Oi - Ei.

∑ =

−=

k

ii

ii

EEO

1

22 )(

χ

If the observed and expected frequencies are almost equal in each category, therefore the values for both Oi – Ei and will be small . !If the values are small, fail to reject H0. Reject H0 if large. The test is always right tailed.

2χ

2χ2χ

Expected Frequencies

To find the value for chi-square, determine whether the observed frequencies differ significantly from the expected frequencies. Find the expected frequencies for chi-square in two ways: 1. Set the hypothesis that ALL frequencies are equal in each

category. i.e. you might expect half of the prawn samples of 200 are identified as male and female. The e x p e c t e d frequency is divide the number of samples by the number of categories (200/2), to get 100 as expected frequencies in each category.

2. Determine the expected frequencies on the basis of some prior knowledge.

i.e. by using the same example, but this time we have prior knowledge of male and female in each category from previous samples, when 60% of the prawn were male and 40% were women. Let’s say this new sample you might expect that 60% of the total would be male and 40% female. If the total prawn samples were 200, you would expect 120 males (60% x 200) and 80 females (40% x 200).

Four steps to chi-square hyp.test.

STEP 1: Establish hypotheses. STEP 2: Calculate chi-square statistic. Doing so requires knowing ▪ The number of observations ▪ Expected values ▪ Observed values

STEP 3: Assess significance level. Doing so requires knowing the number of degrees of freedom. STEP 4: Finally, decide whether to accept or reject the null hypothesis.

Decision and conclusion

If the calculated chi-square value for the set of data you are analyzing (26.95) is equal to or greater than the table value (9.49 ), reject the null hypothesis. There IS a significant difference between the data sets that cannot be due to chance alone. !If the number you calculate is LESS than the number you find on the table, than you can probably say that any differences are due to chance alone.

Example (t-test & chi-square)

A geneticist interested in human population has been studying growth patterns in US males since 1900. A monograph written in 1902 states that the mean height of adult US males is 67.0 inch with a standard deviation of 3.5 inch.

In the 20th century the geneticists measured a random sample of 28 adult US males and found that = 69.4 inch and s = 4.0 inch. Are these values significantly different from the values published in 1902?

X

There are two questions here i) about the mean (t-test) ii) about the standard deviation or variance (chi-

square) Therefore, two sets of hypotheses and two test statistics is required. i. For means, the hypotheses are H0 : μ = 67.0 inch H1 : μ ≠ 67.0 inch

Solution

Given n = 28 and α = 0.01 !!!From the table, where v = n – 1 = 27, the critical values are ± 2.771. Therefore, reject H0 since 3.16>2.77. !The modern mean is significantly different from that reported

in 1902 and is higher than the reported value (because the t-value falls in the right hand tail). P (Type I error)< 0.01.

16.376.04.2

280.4

0.674.69==

−=

−=

ns

Xt µ

ii) For variance, the hypotheses are H0 : H1 : Let n = 28 The question about variability is answered with a Chi-

square statistic. The value is expected to be close to that of 27 (n-1).

22 25.12 inch≠σ

22 25.12 inch=σ

3.3525.1216)128()1(

2

22 =

−=

−=

σχ

sn

2χ

From the table, at α = 0.01 for v = 27, the critical values for are 11.8 and 49.6.

!Since 11.8 < 35.3 < 49.6, therefore fail to reject H0. !In conclusion, the mean height of adult US males is higher

now than reported in 1902, but the variability in heights is not significantly different today than in 1902.

2χ

Example goodness of fit

The progeny of self-fertilized four-o’clocks were expected to flower red, pink and white in the ratio of 1:2:1. There were 240 progeny produced with 55 red plants, 132 pink plants, and 53 white plants. Are these data reasonably consistent with the Mendelian 1:2:1 ratio?

Solution

The hypotheses are H0: The data are consistent with a Mendelian model

(1:2:1) H1: The data are inconsistent with a Mendelian model

(1:2:1) !The THREE colours are the THREE categories. In order to calculate

frequencies, no parameters need to be estimated. The Mendelian ratios are given; 25% red, 50% pink and 25% white.

Using the fact that there are 240 observations, the number of expected red four-o’clock is 0.25 × 240 = 60 ie Ei = 60.

Similar calculations for pink and white yield the following table:

Category O E

Red 55 60 0.42

Pink 132 120 1.20

White 53 60 0.82

Total 240 240 2.44

i

ii

EEO 2)( −

44.282.020.142.0)(3

1

22 =++=

−=∑ =i

i

ii

EEO

χ

v = d.f. = no. of categories -1 = 3 -1 = 2 !Let α = 0.05 Because the test is right tailed, the critical value occurs

when . From Chi-Square table, d.f. =2 and p=1-α = 0.95,

the critical value is found to be 5.99. Since 2.44<5.99, Fail to reject H0. There is no

significant difference in the Mendelian 1:2:1 ratio.

αχχ => )( 21

21p

lect w7 t_test_amp_chi_test

Education