ugcmodule7 new2

8/7/2019 ugcmodule7 new2

1/59

TESTING OF HYPOTHESIS

` The greatest tragedy of research is the slaying of abeautiful hypothesis by an ugly fact.

` Statistics can be broadly classified into twocategories namely

` (i) Descriptive Statistics and`

(II) Inferential Statistics.` This unit deals with the inferential statistics,in particular, Testing of hypothesis.


2/59


3/59

STATISTICAL INFERENCESTATISTICAL INFERENCE

Statistical Inference means making inferenceabout the population from the sample taken

from it. This is classified into two broad

categories, namely1. Estimation Theory

2. Testing of Hypothesis.

y

The overview of Statistical Inference isfurnished below.


4/59


5/59

y

Hypothesis:A statistical hypothesis is some statementmade about the population characteristic in termsof the parameter value of parent distribution.

In testing of hypothesis we either reject oraccept the null hypothesis.

y The basic idea is take a random sample fromthe population. If the sample supports your

hypothesis then accept the hypothesis. (i.e.,the hypothesis is true), otherwise reject thehypothesis


6/59

y Statistical testing involves verifying or

refuting statements concerning propertiesof the population, with some probability oferror, based on data from a subset of thatpopulation.

y The subset of the population, which isselected randomly, is termed a sample.

y From the sample data, a test statistic is

calculated which allows us to make adecision regarding the null hypothesis.


7/59

Test procedure involves the following 5 stepsTest procedure involves the following 5 steps

Step 1.

Hypothesis Framing of Hypotheses

y A. Null hypothesis

y Null Hypotheses are presumptions about the statusquo in the population and are either upheld or disalloweddepending upon the result of a statistical test.

y Null Hypothesis:

yVersion 1

Null

neutral (Text Book)

y Version 2 Null nullify(RA Fisher Reject /

Discard)


8/59

ILLUSTRATIONNull hypothesis is a neutral or null state

hypothesis

Horlicks Vs ComplanHypo 1: H > C

Hypo 2: H < CHypo 3: H = CAmong them which one is in null state?How to guarantee the null state?

Begin your hypothesis with :There is no significant difference between_____________ and ______________


9/59

Version 2Version 2

y

According to RA FisherN

ull Hypothesis isan Hypothesis which is tested for possiblerejection under the assumption that it is true.

He insists us to select the easily reject able

hypothesis as the null hypothesisy As the consequence, Instead of simpliy

writing the inference as, null hypothesis isaccepted, it is expected to be written as since the data do not support for the rejection ofNull hypothesis, it is not rejected


10/59

B. ALTERNATIVE HYPOTHESIS

The researcher must give an AlternativeHypothesis in the test procedure.

Whenever the Ho is rejected the H1 will beaccepted automatically.

Based on Alternative hypothesis, the test iscalled one-tailed test or two-tailed test.


11/59

yUnless there are good reasons aboutthe hypothesis, a two tailed testshould be postulated.yOne-tailed tests are appropriate

when a theoretical perspective basedon previous research would suggest

that the difference should occurother than the one predicated.


12/59

Step 2. FixingStep 2. Fixing lvellvel of significance basedof significance based

on Type 1 and Type 2 errorson Type 1 and Type 2 errors

ACCEPT REJECT

HO TRUE Type I error

HO FALSE Type II error


13/59

EDUCATION:

PASS FAIL

Good Student Type I error

Poor Student Type II error


14/59

I

ACCEPT REJECT

Good Product Type I error

Poor Quality Type II error


15/59

Our aim is to minimize both Type Iand Type II error simultaneously.

But unfortunately they work onopposite direction. i.e., if youreduce Type I error then the Type IIerror will increase and vice versa.


16/59

What can at best achieved is a reasonable

balance between these two errors.

In all testing of hypothesis procedures, it issimply assumed that type I error is much

more severe than type II error and so needsto be controlled.Hence what we do in statistical testingprocedure is, we are fixing the Type I errorat a specified level. Usually at 5% or 1%.But this is not mandatory.You can fix any level of your choice.Hence fixing the level of significance isthe work of the scholar.


17/59

Step 3. Construction of Test Statistic:

The general formula for test statistic isgiven by

Example : Large Sample:

Z = ~ N[0,1]

)(StatisticErrorStandard

c)E(Statisti-Statistic:StatisticTest

=

n

x

W

Q


18/59

SmallSample

t= ~

tn-1

n

S

x Q


19/59

Step 4: to find the CalculatedStep 4: to find the Calculated

valuevalue

In the case of problems, we calculate

and substitute the required entities inthe test statistics from the sample and

do the simplification.Ultimately we

get a single value known as the

Calculated Value


20/59

STEP 5. INFERENCE:

Inference can be done in any one of three

approaches. All of them will lead to the sameconclusion.

The three approaches are:

1. Critical value approach.2. P- value approach.

3. Confidence interval approach.

Note: In the SPSS output, one can see the

provisions for all these three. The researchercan select any one of the approaches. Selectp-value approach due its advantages.


21/59

Classical approach:Classical approach: Class room teachingClass room teaching

y Under the Classical approach, with a fixedpredetermined value namely critical value or

table value, a test will produce a decision as to

whether or not to reject Ho.

y Decision rule

If the calculated value is less than the table

value , the sample does not provide any evidence

to reject the null hypothesis and hence H 0 isaccepted. Otherwise, it is rejected.

y


22/59

RemarkRemarky But merely comparing the observed test

statistic with some critical value andconcluding as

y using a 5% test, reject Ho ory reject Ho with significance level 5% or

result is significant at 5%

(all equivalent statements)

y does not provide the recipient of the resultswith clear detailed information on thestrength for the evidence against Ho.


23/59

P-value

y

A more informative approach is to calculate and quotethe probability value (p value ) of the observed teststatistic.

y The p-values is the lowest level at which Ho can be

rejected. The smaller the p-value, the stronger is theevidence against the null hypothesis.

y Decision Rule:

If the p value is less than the level of significance , the

null hypothesis H0 is rejected. Otherwise it is accepted.

Note the difference between these two approaches: Theresearcher may likely to have confusion but be clear.


24/59

Let us revisit the Statistical testing procedure whichLet us revisit the Statistical testing procedure which

requires the following steps:requires the following steps:

data, statements(hypotheses) regarding the

population are formulated,

A sample, randomly selected from thepopulation,containing information

pertinent to the hypothesis, is drawn,

An acceptable significance level (alpha) is

established,

A test statistic and its observed significance

level (p value) are computed from the

sample


25/59


26/59


27/59

SHAPE OF THE DISTRIBUTION; NORMALITY

An important aspect of the description of a

variable is the shape of its distribution,

which tells you the frequency of values from

different ranges of the variable.

Typically a researcher is interested in howwell the distribution can be approximated by

the normal distribution (see also the

Elementary Concepts section). Simple descriptive statistics ( Histogram,can

provide some information relevant to this

issue.


28/59

TESTS OF MEANS:TESTS OF MEANS: tt TESTS:TESTS:

Although there are a variety ofparametric tests, this module concentrates on

three commonly used tests of means.

All three of these procedures assume that

the data is measured at a interval scale or

quantitative data in random samples selected

from a Normally-distributed Population.

Each of these tests employs a statistic whichfollows a t-distribution.


29/59

One Samples t test:

The first procedure is the one-sample t-

test.

In this case the objective is to test whether the

population mean is equal to a specific value.

The null hypothesis, which is a statementof the status quo, states that the population

mean is equal to the hypothesized value.

Another way to state this null hypothesis is to

say that we are testing whether the difference

between the population mean and the

hypothesized value is equal to zero.


30/59

H0 : = 0 or H0 : - 0 = 0

The two-sided alternate hypothesis states that this

mean difference is not zero. This procedure istermed a one-sample test since only one sample

from the population is involved.

H1 : 0 or H1 : - 0 0


31/59


32/59

` Another procedure is the dependent-sample t-test.

` In this instance the purpose is to test whether the

population means of two different variables are

equal.` Here the null hypothesis states that the two

population means are equal, which is the same

as stating that the mean of the difference

between the two variables is equal to zero.


33/59

Continue:Continue:

y The two-sided alternate hypothesisspecifies that this difference isunequal to zero.

y This procedure is sonamed becauseboth variables being tested comefrom the same observations and aretherefore statistically dependent.


34/59

Assumption: differences constitute a randomAssumption: differences constitute a random

sample from a normal distributionsample from a normal distribution

Ho: Qo (=Q1 - Q2) = H

Test statistics is under Ho = 1~ no

tntS

D


35/59

Why Paired t-test?

In testing for difference between twopopulation means, the use of independentsamples can have a major drawback.

Even if a real difference does exist, thevariability among the responses within eachsample can be large enough to make it.

The random variation within the samples

will mask the real difference between thepopulations from which they come.


36/59

CONTINUE:

One way to control this variability external tothe issue in question is

to use a pair of responses from each subject

and then work with the differences within the

pairs.

The aim is to remove as far as possible the

subject-to-subject variation from the analysis,

and thus

to home in on any real difference between

the populations.


37/59

` The final procedure is the independent-

samples t-test.

` The goal here is to test whether two population

means are equal.` For this test, two samples are drawn (or one

sample is divided into two mutually exclusive

groups ), and the test is performed using a

variable common to both groups.


38/59

The null hypothesis states that the population

means of the two groups are equal.

The two-sided alternate hypothesis states that

the means are unequal.

The two samples, or sample groups, are

independent of each other because no

observation is present in both groups, hence the

name independentsamples t-test.


39/59


40/59

Continue:

Test statistic: z =

(b) W12 , W2

2 unknown much the more

usual situation when W12

, W22

known.

2

2

2

1

2

1

21

nn

xx

WW

H


41/59

Further, the Central Limit Theorem

justifies the use of a normal

approximation for the distribution of the

test statistic in sampling from any

reasonable populations, so therequirement that we are sampling from

normal distributions is not necessary.


42/59

Small Samples:Small Samples:

y Under the assumptions W12=W2

2 (=W2 say), this

common variance is estimated by Sp2, and is

distributed as t with n1 + n2 2 degrees of

freedom under H0. So

y

y t =

21

21

11

nns

xx

p

H


43/59

Continue:

y Remember that Sp2 =

y Problem

In an experiment using identical twin babies,one baby in each of 10 sets was fed food stuff, A. Whilethe other twin was fed food stuff B. the gains in weightin kg are given below

A 24 28 31 32 25 27 37 31 26 29B 19 24 32 28 28 29 31 33 2927

Identify which type of t-test has to apply.

2n

1)s-(n+s1)-(n

21

2

2

2

1

n


44/59

Analysis of variance is one of the

powerful tool in testing of hypothesis.

If one is interested in testing equality of

several means at a time, the technique of

Analysis of Variance is used.

There are many situations where data

are classified into one, two or more ways.


45/59

One way classification:

Let there be three treatments A,B,C each

replicated n1,n2 and n3 times.

A B C

X11 X21 X31

X12 X22 X32

. . .

. . .

. . .

X1n1 X2n2 X3n3

TotalT1 T2 T3


46/59

ANOVA continue

y Ho :

y H1 : Atleast one treatment is different from others.

y Grand total (G) = T1 + T2 + T3 =

Correction Factor CF =

cBAQQQ !!

ji

ijx

nsobservatioofnumberis,nn

G


47/59

Sourcesof

variation

Sum ofsquares

d.f Mean sum ofsquares

F- ratio

Treatment

ErrorTotal

SST

SSETSS

(K-1)

(N-K)(N-1)

MST = SST/K-1

MSE = SSE/N-K

F =MST/MSE

Inference : If Fcal < Ftab Accept Ho otherwisereject.


48/59

ILLUSTRATION: YIELD IN GRAMS

Catalyst 1 Catalyst 2 Catalyst 3

100 76 108

96 80 100

92 75 96

96 84 98

92 82 100

Total 476 397 502

Based on the above data, can it be inferred that the effect of three

catalysts differ significantly with respect to yield of the chemical

product?


49/59

y Ho: Average yield of the chemical product

under the three catalysts is same.

y H1: Average yield of the chemical product foratleast one catalyst is different from the yield

corresponding to the remaining catalysts.

y Following the general practical steps, we findthe various sums of squares, noting that here

number of levels 3 (which is equal to k ) and

n1

= 5 (number of batches tested for each

level).

y


50/59

1. G = (100 + 96+ .. + 92) +

2. (76+80+.82) +

(108+100++100)

= 1375

2. = (1002 + 962+ .. + 922) +

(762+802+.822) +

(1082+1002++1002)= 127425

3. N = 7 n1= n

1+ n2 + n3 = 5+ 5+ 5 = 15

ji

ijx


51/59

Anova calculations

4. CF = G2 / N = 1375-2 / 15 = 126041.67

5. TSS = 77 - CF = 127425 126041.67

= 1383.33

6. SS = 7 ( 7 / n) - CF= 4762 + 3972++ 5022) / 5 126041.67

= 1196.1

7. SS4

= TSS SS4

= 1383.33 1196.13

= 187.20


52/59


53/59

Inference:Inference:

y Here calculated value of F is greater than the

tabulated value of F and hence the null

hypothesis is rejected. Therefore, we may

conclude that the three catalysts differ

significantly.

P blP bl


54/59

Problem :Problem :

y The gain in weight of two random samples of

rats fed on two different diets A and B are given

below. Examine whether the difference in mean

increases in weight is significant.

D

iet A: 13 14 10 11 12 16 10 8Diet B: 7 10 12 8 10 11 9 10

Identifytheappropriatet-test.


55/59

Problem :

Tow horses A and B were tested according to thetime (in seconds) to run a particular track withthe following results.

Horse A: 28 30 32 33 33 29 34Horse B: 29 30 30 24 27 29

Test whether the two horses have the samerunning capacity. Find the appropriate t-test.


56/59

` A test was given to five students taken at randomfrom the fifth class of three schools of a townindividuals scores are:

`School 1 : 9 7 6 5 8

`School 2 : 7 4 5 4 5

`

School 3 : 6 5 6 7 6


57/59

Carryout analysis of variance and stateCarryout analysis of variance and state

your conclusion.your conclusion.

Problem : Yields of three varieties of wheat

given below

Variety yield

y I 10 9 8

y II 7 7 6

y III 8 5 4


58/59

` Parametric tests are more powerful when the

following conditions are satisfied:

` Normality

` Measurement in interval scale

` Sample size is reasonable


59/59

Note:

If any of these conditions are

violated the parametric tests are not

appropriate. In such cases one should go for the

Nonparametric tests.

That is the focal theme of the nextmodule.

ugcmodule7 new2

Documents