3. parametric assumptions

34
Steve Saffhill Research Methods in Sport & Exercise Basic Statistics

Upload: steve-saffhill

Post on 22-Feb-2017

297 views

Category:

Education


0 download

TRANSCRIPT

Page 1: 3. parametric assumptions

Steve Saffhill

Research Methods in Sport & ExerciseBasic Statistics

Page 2: 3. parametric assumptions

Refresher

• Data handling assessment – VLE

• Basic Stats – 2 types:• Descriptives (figures & Tables) • Inferentials (accept/reject hypotheses)

• We test hypotheses with statsi.e., HO: There will be no significant correlation between attendance % and exam %

• We set an alpha (0.05) to compare to the p (probability) value SPSS gives us to make inferences about what we have found (95% Confidence)

Page 3: 3. parametric assumptions

Parametric Assumptions for Deciding on Inferential Statistics

• The aim of research is to make factual descriptive statements about a group of people. • E.g., the ingestion of creatine monohydrate over a 6 week period

before competition will enhance power output by 10%.

• However, it would be nonsense to measure every single person who might be related to this statement.

• We therefore look at a selected sample of participants and make an

educated guess about the whole of the population.

• But first you need to test if the selected sample is representative of that population

Page 4: 3. parametric assumptions

• Karl Gauss found that if you take a survey of most groups of people they will behave in certain set ways.

• It is irrelevant how you measure them, most people are average.

• There are usually some extremes at either end, but most people behave in an average, predictable way.

• This is what he called normal distribution.

Page 5: 3. parametric assumptions

• If any group of students take an exam or write a piece of coursework, most of them will score around the average (40% - 60%).

• There are always a few who score highly...

• But, to compensate, there are always a few who do badly...

Page 6: 3. parametric assumptions

Normal Distribution Curve

0

2

4

6

8

10

12

14

16

18

<20 21-30 31-40 41-50 51-60 61-70 71-80 >80

Score (%)

Freq

uenc

y

Bell-shaped curve showing marks along x-axis, number of students on y-axis.

Page 7: 3. parametric assumptions

• Karl Gauss showed how most groups of people will behave in this predictable way.

• It is therefore not necessary to measure a whole population to make a true statement about it. A sample will be sufficient.

• That is.....Providing certain criteria are met, the results can then be extrapolated to the population.

• These criteria are what we call parametric properties….

Page 8: 3. parametric assumptions

= most common inferential tests for you!

Page 9: 3. parametric assumptions

What are these criteria?

• These criteria must be explored before running ANY inferential statistics!

• The descriptive statistics (last week) allow us to determine if the criteria have been met!

• If we run the wrong inferential test there is a risk of errors (type I or type II error)!

Page 10: 3. parametric assumptions

Errors in statistics• Type I = (false-positive result) occurs if the Null

hypothesis is rejected when it is actually true (e.g., the effects of the training are interpreted as being significantly different when they are not).

• Type II = (false-negative result) occurs if the Null hypothesis is accepted when it is actually false(e.g., the effects of the training are interpreted as being equal when they are actually significantly different).

Page 11: 3. parametric assumptions

So What Exactly Are these Criteria that help Us Choose the Correct Inferential Test?

• Called “the parametric assumptions”

1. Random sampling – must be randomly sampled2. Level of data being used – must be interval or ratio (high

level data).3. Normal Distribution – Must be normally distributed (2

checks: 3a and 3b)4. Equal variance in scores – The variance scores of one

variable should be twice as big as the variance of the other variable.

Page 12: 3. parametric assumptions

1. Random Sampling• Suppose I wish to make a statement about plyometric training enhancing sprint

times for all athletes.

• If I just measured a top athletics team, their sprint times might very well improve due to the best nutrition, coaching and training facilities. Plyometric training might have nothing to do with their success.

• I need to look at an unbiased sample, which I have selected by chance (i.e., randomly).

• If we have used a random sample we have accepted this 1st criteria to use parametric inferential statistics and then check the other 3 criteria

• If not: we can’t run a parametric test so run a non-parametric test (we still check the other 3 criteria for our report)

Page 13: 3. parametric assumptions

2. High Level Data• Low level data is called this because it is quite crude and

therefore hard to make good judgements about a population from a sample.

• Putting athletes in a specific order is not constructive either in terms of learning information about them.

• Supposing I told you I had been in a friendly tennis tournament and that I had come third.

• You might not be impressed……..

Until I told you that Roger Federer had come first and Kim Clijsters second

Page 14: 3. parametric assumptions

• The person who came fourth was actually the cleaner who I had persuaded to join us to make up four so the tournament could take place!!!

• You might again revise your opinion then!!!

• This is why ordinal or ranking data is low level.

Page 15: 3. parametric assumptions

Nominal/categorical - categories • E.g., male/female; rugby players/football/cricket

players; netball players/hockey players/tennis/badminton players.

Ordinal - rank order • E.g., 1st, 2nd, 3rd,

• High level data is much better to use. If I gave you a set of high level data, you could tell me a great deal about the group.

Page 16: 3. parametric assumptions

• Interval - rating scale, Fahrenheit - zero and minus are meaningful

• E.g., 28ºF, 0ºF.

• Many psychological questionnaires are interval data (eg IQ, GEQ, CSAI-2).

• Ratio - equal distances between numbers (zero means nothing, minus has no meaning) 0kg = no kilograms, -0 kg is meaningless.

• A good deal of physiological data is ratio (eg height, weight, time in minutes and seconds).

Page 17: 3. parametric assumptions

So...(after checking random sampling)

• If you have interval/ratio data we can accept the 2nd rule/criteria of parametric assumptions and we then check the other assumptions (no. 3 & 4).

• If not (i.e., we have ordinal/nominal data) we cannot run a parametric test and we must run a non-parametric test (but we still check the other 2 assumptions for our report)

Page 18: 3. parametric assumptions

3. Normal distribution

• Think back to the beginning of the lecture to Karl Gauss and how he found that if you take a survey of most groups of people, they will behave in certain set ways.

• It is irrelevant how you measure them, most people are average…some extremes but mainly average

• This is what he called normal distribution.

Page 19: 3. parametric assumptions

0

2

4

6

8

10

12

14

16

18

20

<20 21-30 31-40 41-50 51-60 61-70 71-80 >80

Freq

uenc

y

Score (%)

Normal Distribution Curve

Bell-shaped curve showing marks along x-axis, number of students on y-axis.

Page 20: 3. parametric assumptions

• We must check whether the data does actually fit the pattern of normal distribution as the 3rd parametric rule/criteria

• It may be that if the data were plotted on a graph, it would not fit the normal pattern.

• It would then be dangerous to assume that the population behaves in the same way and run a particular test that assumes they are and lead to a type I or II error!

Normal distribution

Page 21: 3. parametric assumptions

• As the graphs we are working with are 2-D, the two ways in which the data can shift away from normality is on the horizontal axis and the vertical axis.

• It may either shift to the left extreme or right extreme. This is known as positive or negative skewness.

Resting Heart Rates of Athletes 10 Minutes After Exercise

0

2

4

6

8

10

12

<50 51-60 61-70 71-80 81-90 91-100 >100

Heart Rate (BPM)

Freq

uenc

y

Resting Heart Rates of Sedentary Non-Athletes 10 Minutes After Exercise

0

2

4

6

8

10

12

<50 51-60 61-70 71-80 81-90 91-100 >100

Heart Rate (BPM)

Freq

uenc

y

Page 22: 3. parametric assumptions

Or, the shift may be a very peaked or very flat curve

0

2

4

6

8

10

12

14

16

18

20

<50 51-60 61-70 71-80 81-90 91-100 >100

Heart Rate (BPM)

2

3

4

5

6

<50 51-60 61-70 71-80 81-90 91-100 >100

Heart Rate (BPM)

Peaked = Leptokurtic

Flat = Platykurtic

Mesokurtic = normal

Page 23: 3. parametric assumptions

Normal distribution Curve• Can be visualised in SPSS when you run the descriptives:

AGE

29.028.027.026.025.024.023.022.021.0

6

5

4

3

2

1

0

Std. Dev = 2.46

Mean = 25.4

N = 22.00

Page 24: 3. parametric assumptions

• While a graph is very helpful in deciding if a sample is normally distributed, it does not actually tell the researcher how skewed or kurtotic the data is.

• A number therefore needs to be found to determine accurately skewness and kurtosis values.

• These TWO numbers are found in your descriptive statistics and by dividing the skewness figure by the skewness std error.

• The answer is the skewness statistic (Z skew).

• And then repeat dividing the kurtosis figure by the kurtosis error (Z kurt).

• If these figures are between -1.96 and 1.96 it is said not to be skewed or kurtotic = Normally Distributed!!.

• These numbers are called Z scores.

Page 25: 3. parametric assumptions

Z Scores

• The information you need to work out these Z scores are provided by your descriptive statistics.

Descriptives

1.50 .0731.35

1.65

1.501.50.255.505

1211

.000 .343-2.089 .674

MeanLow er BoundUpper Bound

95% Conf idenceInterval for Mean

5% Trimmed MeanMedianVarianceStd. DeviationMinimumMaximumRangeInterquartile RangeSkew nessKurtosis

AnxietyStatistic Std. Error

Divide .000 by .343 to give you your Z skewness score = 0

Divide -2.089 by .674 to give you your Z kurtosis score = -3.09

Both the Z skewness and Z kurtosis scores need to fall between -1.96 and 1.96 for that variable to be deemed normally distributed

You would need to test this for ALL your variables!!!

Page 26: 3. parametric assumptions

So.....

• If all of our Z scores (skewness and kurtosis) are between -1.96 and 1.96 we meet the 3rd assumption and run a parametric test (we still check the 4th assumption but it is least important)

• If not then we must run a non-parametric test (we still check the final 4th assumption).

Page 27: 3. parametric assumptions

4. Equal (Homogeneity) Variance

• This criteria looks at two characteristics of a sample to see if their variances are reasonably similar or very different.

• You take the smallest variance, double it and see whether it is smaller or larger than the larger variance.

• If the smallest variance doubled is now larger than the biggest variance, the two data sets are known as homogenous and this criteria is accepted.

• If the smallest variance doubled is still smaller than the larger variance, the two data sets are known as heterogenous and this criteria is not accepted.

Page 28: 3. parametric assumptions

• For example, supposing our data sets have the variances 15 and 28.

• You take the smaller variance and double it. 15 x 2 = 30.

• Is 30 bigger or smaller than the biggest variance?

• Yes, 30 is bigger than 28 so the two variables are showing homogeneity of variance – so we say yes we’ve accepted this criteria and run a parametric test (pending rules 1-3)!

• For the same reason, 15 and 32 are heterogenous – therefore we wouldn’t be able to accept this criteria and run (what we do depends on rules 1-3).

Page 29: 3. parametric assumptions

The final parametric assumption….

Descri pt i ves

11. 8010 . 3001011. 1221

12. 4799

11. 735011. 7600

. 901. 9490010. 7614. 033. 27

1. 22001. 448 . 6872. 890 1. 334

15. 5680 1. 3884112. 4272

18. 7088

15. 095613. 905019. 277

4. 3905312. 5627. 0814. 52

3. 68002. 388 . 6876. 132 1. 334

MeanLower BoundUpper Bound

95% Conf idenceI nt erval f or Mean

5% Tr immed MeanMedianVar ianceSt d. Deviat ionMinimumMaximumRangeI nt erquar t ile RangeSkewnessKur t osisMean

Lower BoundUpper Bound

95% Conf idenceI nt erval f or Mean

5% Tr immed MeanMedianVar ianceSt d. Deviat ionMinimumMaximumRangeI nt erquar t ile RangeSkewnessKur t osis

GRO UPelit e

amat eur

TI MESt at ist ic St d. Error

Some inferential tests (e.g., independent t-test) double check the variance in the actual SPSS output( e.g., Levene’s test of equality of variance) as it is vital that the two groups are of equal variance for the test to run!!!

Page 30: 3. parametric assumptions

Levene’s TestInd e p e n de nt Sa mple s Te s t

8 .3 11 .0 0 4 1 .7 0 4 1 9 3 .0 9 0 .2 5 4 2 .1 4 9 1 9 -.0 4 0 0 9 .5 4 8 4 3

1 .7 2 0 1 8 5 .9 6 1 .0 8 7 .2 5 4 2 .1 4 7 7 8 -.0 3 7 3 7 .5 4 5 7 1

Eq u a l v a ria n c e sa s s u me dEq u a l v a ria n c e sn o t a s s u me d

CS1F Sig .

L e v e n e 's T e s t fo rEq u a lity o f Va ria n c e s

t d f Sig . (2 -ta ile d )Me a n

Diffe re n c eStd . Erro r

Diffe re n c e L o we r Up p e r

9 5 % Co n fid e n c eIn te rv a l o f th eDiffe re n c e

t-te s t fo r Eq u a lity o f Me a n s

• If less than .05 then there is no equal variance!

• If more than .05 then there is equal variance!

Page 31: 3. parametric assumptions

• These 4 assumptions are of progressive importance.

• If you do not meet #1 then use Non-parametric inferential tests

• Some can be violated but you must justify doing so with supporting evidence!

Page 32: 3. parametric assumptions

Some Assumptions Can be Re-Run after Re-Checking the Data

• For example if your data is not normally distributed & you have lots of cases then you could check & remove the outliers/extremes!

• Remember to justify it!! (Small v Large sample)

• You must then re-check all assumptions again

Page 33: 3. parametric assumptions

1010N =

GROUP

amateurelite

TIM

E

30

20

10

0

18

6

Extreme Score

Outlier Score

Page 34: 3. parametric assumptions

You MUST go through this process every time you analyse

data!

Or…risk running the wrong tests & getting a type I or II error!