population dynamics required background knowledge: data and variability concepts data collection...

42
POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge: Data and variability concepts Data collection Measures of central tendency (mean, median, mode, variance, stdev) Normal distribution and SE Student’s t-test and 95% confidence intervals Chi-Square tests MS Excel

Upload: elijah-carpenter

Post on 25-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:

• Data and variability concepts

Data collection

• Measures of central tendency (mean, median, mode, variance, stdev)

• Normal distribution and SE

• Student’s t-test and 95% confidence intervals

• Chi-Square tests

• MS Excel

Page 2: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

IF n is very, very large : we use Z distribution to calculate normal deviates

Z = (x – μ)

σx

STATISTICS: z-DISTRIBUTION

t = (x – μ)

sx Equation 3

If n is not large, we must uset distribution:

Page 3: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

But first..WHY do we do all this??

Integral part of science…

HYPOTHESIS TESTING

Model Explanation or theory (maybe >1)

Hypothesis Prediction deduced from modelGenerate null hypothesis – H0: Falsification test

Test Experiment•IF H0 rejected – model supported•IF H0 accepted – model wrong

Pattern Observation Rigorously Describe

Page 4: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

HYPOTHESIS TESTING

You can say with 95% certainty that

the pattern you have observed is

not due to chance alone

You can say with 99% certainty that

the pattern you have observed is

not due to chance alone

p-value

Measure of certainty

1.0 0

0.05

0.01

α

Not significant

Significant

These are proportions…if expressed as %

1. Collect data

2. Analyse data

3. Set up hypotheses:

• H0 = results are due to CHANCE alone

• H1 = results are significant and are not due to chance alone

4. Test hypotheses:

Determine significance level for hypothesis testing (α) ~ termed ‘Alpha’

Usually either α = 0.05 or α = 0.01

Calculate probability value (p)

If p < α then reject H0 ; accept H1 (i.e results are significant and are NOT due to chance alone)

If p > α then reject H1; accept H0 (i.e results are not significant and ARE due to chance alone)

Page 5: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:

• Data and variability concepts

Data collection

• Measures of central tendency (mean, median, mode, variance, stdev)

• Normal distribution and SE

• Student’s t-test and 95% confidence intervals

• Chi-Square tests

• MS Excel

Page 6: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

First, some important concepts about t-tests…

Page 7: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Because it is based on the normal distribution, the t distribution has all the attributes of the normal distribution:• Completely symmetrical• Area under any part of the curve reflects proportion of t values involved• etc….

STATISTICS: t-DISTRIBUTION

Height (mm)

Fre

qu

ency

(%

)

02468

1012

0 2 4 6 8 10 12 14 16 18 20 22 24

Shape of the t distribution

varies with v (Degrees of Freedom: n-1): the bigger the

n, the less spread the distribution

-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9

t

V = 100V = 10V = 5V = 1

Page 8: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Tails of the t-distribution

0.10.1

-4 -3 - 2 -1 0 1 2 3 4

t

α (1)

-4 -3 - 2 -1 0 1 2 3 4

t

α (1)

One-Tailed hypothesis testing

0.05 0.05

-4 -3 - 2 -1 0 1 2 3 4

t

α (2)

Two-Tailed hypothesis testing

STATISTICS: t-DISTRIBUTION CONCEPTS

Example: if our sample size is 11 (v = 10), what is the value of t beyond which 10% (0.1) of the curve is enclosed? – Two possible t-values

H0 : μ = 25

H1 : μ < 25

H0 : μ = 25

H1 : μ ≠ 25

OR

Page 9: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Measure of certainty

1.0 0

0.05

0.01

Critical t-value

Not significant

Significant

T-statistic

t

STATISTICS: T-DISTRIBUTION: CONCEPTS

Critical values

p-value

Measure of certainty

1.0 0

0.05

0.01

α

Not significant

Significant

-4 -3 - 2 -1 0 1 2 3 4

α (2)

-2.064 2.064

t = (x – μ)

sx

α = 0.05

T-statistic compared with critical value

If t-statistic > 2.064 OR < -2.064 then reject H0 ; accept H1

(i.e results are significant and are NOT due to chance alone)

Critical values

Page 10: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

α (2) 0.5 0.2 0.1 0.05 0.02

α (1) 0.25 0.1 0.05 0.025 0.011 1.000 3.078 6.314 12.7062 0.816 1.886 2.920 4.3033 0.765 1.638 2.353 3.1824 0.741 1.533 2.132 2.7765 0.727 1.476 2.015 2.571

6 0.718 1.440 1.943 2.4477 0.711 1.415 1.895 2.3658 0.706 1.397 1.860 2.3069 0.703 1.383 1.833 1.26210 0.700 1.372 1.812 2.228

11 0.697 1.363 1.796 2.20112 0.695 1.356 1.782 2.17913 0.694 1.350 1.771 2.16014 0.692 1.345 1.761 2.14515 0.691 1.341 1.753 2.131

16 0.690 1.337 1.746 2.12017 0.689 1.333 1.740 2.11018 0.688 1.330 1.734 2.10119 0.688 1.328 1.729 2.09320 0.687 1.325 1.725 2.086

21 0.686 1.323 1.721 2.08022 0.686 1.321 1.717 2.07423 0.685 1.319 1.714 2.06924 0.685 1.318 1.711 2.06425 0.684 1.316 1.708 2.060

v

-4 -3 - 2 -1 0 1 2 3 4

t

α (1)

0.1

-1.372

One-TailedV=10

0.05 0.05

-4 -3 - 2 -1 0 1 2 3 4

t1.812-1.812

α (2) Two-TailedV=10

If our sample size is 11 (v = 10), what is the value of t beyond which 10% (0.1) of the curve is enclosed (i.e what is the critical value of t)?

STATISTICS: T-DISTRIBUTION: CONCEPTS

Critical values are found on the t-tables

Page 11: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

1. Establish hypotheses (determine if one-tail or two-tailed test

• One tail: H0 has > or < in it

• Two tail: H0 has ≠ in it

2. Determine: n, x, μ, s and v (n-1)

3. Calculate the t-statistic using

4. Determine significance level for hypothesis testing (α) ~ termed ‘Alpha

• Usually either α = 0.05 or α = 0.01 (area in each tail)

5. Calculate the critical value of t

• use T-statistic table, looking up the value for t

6. Compare t-statistic with critical value to know if you should accept or reject H0

Steps of Student t-tests:

t = (x – μ)

sx

t significance level (α 1 or 2), v

Page 12: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Based on this observation we want to determine if the intensification of agricultural practices has resulted in a

significant change to the nitrate concentration of the freshwater resources.

HOW? …Need to determine the probability that a the sample (n = 25, x =

24.23 mg.l-1) could be randomly generated from a population with μ = 22 mg.l-1?

The mean nitrate concentration of water in all the upstream tributaries of a large river prior to intensive agriculture is 22 mg.l-1.Afterwards the mean nitrate concentration in 25 of these tributaries is 24.23 mg.l-1 and s = 4.24 mg.l-1

OBSERVATION MADE:

STATISTICS: T-DISTRIBUTION: EXAMPLE

Nitrate (before agriculture)

μ = 22 mg.l-1

n= ALL tributaries

Nitrate (after

agriculture)x = 24.23 mg.l-1

n= 25 sample tributaries

Page 13: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

1. Establish hypotheses

2. Determine: n, x, μ, s, n and v (n-1)

3. Calculate the t-statistic

4. Determine significance level (α)

5. Calculate the critical value of t

• use T-statistic table, looking up the value for t

• One tail or two tail?

Student t-tests: steps for calculation

t significance level (α 1 or 2), v

H0: μ = 22

H1: μ ≠ 22

What is the probability that a the sample (n=25, x = 24.23 mg.l-1) could be randomly generated from a population with μ = 22 mg.l-1?

n = 25, x = 24.23, μ = 22.00, s = 4.24, v = 24

t = (x – μ)

sx

(24.23 – 22)

0.848

= 2.23

0.848

= = 2.629

sx

s

n

=

√ 4.24

25

=

√ 4.24

5= = 0.848

t = 2.629

Either α = 0.05 or α = 0.01 (area in each tail)

α = 0.05

t 0.05 (α 2), 24

t

α (1)

0.05

One-Tailed

0.025 0.025

t

α (2)

Two-Tailed

Go to the hypotheses H0: μ = 22 H1: μ ≠ 22

Page 14: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

The critical value of t 0.05 (α 2), 24 =2.064

-4 -3 - 2 -1 0 1 2 3 4

t 2.064-2.064

0.025 0.025

α (2) 0.5 0.2 0.1 0.05 0.02

α (1) 0.25 0.1 0.05 0.025 0.011 1.000 3.078 6.314 12.7062 0.816 1.886 2.920 4.3033 0.765 1.638 2.353 3.1824 0.741 1.533 2.132 2.7765 0.727 1.476 2.015 2.571

6 0.718 1.440 1.943 2.4477 0.711 1.415 1.895 2.3658 0.706 1.397 1.860 2.3069 0.703 1.383 1.833 1.26210 0.700 1.372 1.812 2.228

11 0.697 1.363 1.796 2.20112 0.695 1.356 1.782 2.17913 0.694 1.350 1.771 2.16014 0.692 1.345 1.761 2.14515 0.691 1.341 1.753 2.131

16 0.690 1.337 1.746 2.12017 0.689 1.333 1.740 2.11018 0.688 1.330 1.734 2.10119 0.688 1.328 1.729 2.09320 0.687 1.325 1.725 2.086

21 0.686 1.323 1.721 2.08022 0.686 1.321 1.717 2.07423 0.685 1.319 1.714 2.06924 0.685 1.318 1.711 2.06425 0.684 1.316 1.708 2.060

v

Page 15: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

t = 2.629 > critical value

1. Establish hypotheses

2. Determine: n, x, μ, s, n and v (n-1)

3. Calculate the t-statistic

4. Determine significance level (α)

5. Calculate the critical value of t

6. Compare t-statistic with critical value

H0: μ = 22

H1: μ ≠ 22n = 25, x = 24.23, μ = 22.00, s =

4.24, v = 24t = 2.629

α = 0.05

STATISTICS: T-DISTRIBUTION: EXAMPLE

Critical value = 2.064

-4 -3 - 2 -1 0 1 2 3 4

t 2.064-2.064

0.025 0.025

2.629

SO…means it is very unlikely that a random sample (size 25)

would generate a mean of 24.23 mg.l-1 from a population with a

mean of 22 mg.l-1

So unlikely, in fact, that we don’t believe it can happen by

chance…Reject H0 and accept H1

What is the probability that a the sample (n=25, x = 24.23 mg.l-1) could be randomly generated from a population with μ = 22 mg.l-1?

Page 16: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

STATISTICS: T-DISTRIBUTION: EXAMPLES

Nitrate (before agriculture)

μ = 22 mg.l-1

n= ALL tributaries

Nitrate (after

agriculture)x = 24.23 mg.l-1

n= 25 sample tributaries

What we can then say, is that the before and after nitrate levels in the water are (statistically) significantly different from each other

(p < 0.05)

We are not making any judgment about whether there is more nitrate in the water after than before, only that the

concentrations are different…though some things are self evident!

Page 17: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Now you try…25 intertidal crabs were exposed to air at 24.3 C, and their body temperatures were measured.

Student-t steps to follow:

1. Establish hypotheses

2. Determine: n, x, μ, s, n and v (n-1)

3. Calculate the t-statistic

4. Determine significance level (α)

5. Calculate the critical value of t

6. Compare t-statistic with critical value

H0: μ = 24.3 C

i.e crab body temp is NOT different from ambient temp

H1: μ ≠ 24.3 C

i.e crab body temp IS different from ambient temp

Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3 C

Page 18: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Now you try…25 intertidal crabs were exposed to air at 24.3 C, and their body temperatures were measured.

Student-t steps to follow:

1. Establish hypotheses

2. Determine: n, x, μ, s, n and v (n-1)

3. Calculate the t-statistic

4. Determine significance level (α)

5. Calculate the critical value of t

6. Compare t-statistic with critical value

Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3 C

Switch to Excel and do the calculations25.4025

22.9024

24.8023

27.0022

23.9021

25.5020

25.4019

26.3018

23.5017

24.8016

28.1015

25.5014

23.3013

24.6012

24.3011

26.2010

23.909

24.508

24.007

27.306

25.105

22.904

26.103

24.602

25.801

Body temp (C)Crab ID

Page 19: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

α = 0.05

Now you try…25 intertidal crabs were exposed to air at 24.3 C, and their body temperatures were measured.

Student-t steps to follow:

1. Establish hypotheses

2. Determine: n, x, μ, s, n and v (n-1)

3. Calculate the t-statistic

4. Determine significance level (α)

5. Calculate the critical value of t

6. Compare t-statistic with critical value

Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3 C

t = 2.7128

t significance level (α 1 or 2), v

Page 20: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

α (2) 0.5 0.2 0.1 0.05 0.02

α (1) 0.25 0.1 0.05 0.025 0.011 1.000 3.078 6.314 12.7062 0.816 1.886 2.920 4.3033 0.765 1.638 2.353 3.1824 0.741 1.533 2.132 2.7765 0.727 1.476 2.015 2.571

6 0.718 1.440 1.943 2.4477 0.711 1.415 1.895 2.3658 0.706 1.397 1.860 2.3069 0.703 1.383 1.833 1.26210 0.700 1.372 1.812 2.228

11 0.697 1.363 1.796 2.20112 0.695 1.356 1.782 2.17913 0.694 1.350 1.771 2.16014 0.692 1.345 1.761 2.14515 0.691 1.341 1.753 2.131

16 0.690 1.337 1.746 2.12017 0.689 1.333 1.740 2.11018 0.688 1.330 1.734 2.10119 0.688 1.328 1.729 2.09320 0.687 1.325 1.725 2.086

21 0.686 1.323 1.721 2.08022 0.686 1.321 1.717 2.07423 0.685 1.319 1.714 2.06924 0.685 1.318 1.711 2.06425 0.684 1.316 1.708 2.060

v

t 0.05 (α 2), v

Page 21: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

α = 0.05

Now you try…25 intertidal crabs were exposed to air at 24.3 C, and their body temperatures were measured.

Student-t steps to follow:

1. Establish hypotheses

2. Determine: n, x, μ, s, n and v (n-1)

3. Calculate the t-statistic

4. Determine significance level (α)

5. Calculate the critical value of t

6. Compare t-statistic with critical value

Q: Is the mean body temperature of this species of crab the same as the ambient air temperature of 24.3 C

t = 2.713

Critical value = 2.064t = 2.7128 Critical value =

2.064>

H0: μ = 24.3 C [i.e crab body temp is NOT different from ambient temp]

H1: μ ≠ 24.3 C [i.e crab body temp IS different from ambient temp]

REJECT

-4 -3 - 2 -1 0 1 2 3 4

t 2.064-2.064

0.025 0.025

2.173

Page 22: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:

• Data and variability concepts

Data collection

• Measures of central tendency (mean, median, mode, variance, stdev)

• Normal distribution and SE

• Student’s t-test and 95% confidence intervals

• Chi-Square tests

• MS Excel

Page 23: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

To do this, we need a set of t-tables, and V (N-1)sx

The t-Distribution allows us to calculate the 95% (or 99%) confidence intervals around an estimate of the population

mean

0.025 0.025

t

α (2)

Two-Tailed

In other words, what are limits around our estimate of the population mean, WITHIN which we can be 95% (or 99%) confident that the REAL value of the population mean lies

When we express dispersion around some measure of central tendency, we normally use Standard Deviation:

x s±

STATISTICS: 95 % CONFIDENCE INTERVALS

Page 24: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

To do this, we need a set of t-tables, and V (n-1)sx

IF

n

sx

x = 42.3 mm

= 26 (V = 25)

= 2.15

α (2) 0.5 0.2 0.1 0.05 0.02

α (1) 0.25 0.1 0.05 0.025 0.011 1.000 3.078 6.314 12.7062 0.816 1.886 2.920 4.3033 0.765 1.638 2.353 3.1824 0.741 1.533 2.132 2.7765 0.727 1.476 2.015 2.571

6 0.718 1.440 1.943 2.4477 0.711 1.415 1.895 2.3658 0.706 1.397 1.860 2.3069 0.703 1.383 1.833 1.26210 0.700 1.372 1.812 2.228

11 0.697 1.363 1.796 2.20112 0.695 1.356 1.782 2.17913 0.694 1.350 1.771 2.16014 0.692 1.345 1.761 2.14515 0.691 1.341 1.753 2.131

16 0.690 1.337 1.746 2.12017 0.689 1.333 1.740 2.11018 0.688 1.330 1.734 2.10119 0.688 1.328 1.729 2.09320 0.687 1.325 1.725 2.086

21 0.686 1.323 1.721 2.08022 0.686 1.321 1.717 2.07423 0.685 1.319 1.714 2.06924 0.685 1.318 1.711 2.06425 0.684 1.316 1.708 2.060

v

Then the 95% Confidence Interval (CI) around the mean is calculated as:

sx

* tά 2

The Confidence Interval expression is then written as: 42.3 mm ± 4.43 mmi.e we are 95% confident that μ lies between 37.87 and 46.73

STATISTICS: 95 % CONFIDENCE INTERVALS

= 4.429= 2.15 *2.06

- 4.43 mm

+ 4.43 mm

0.025 0.025

α (2)

x = 42.3 mm= 4.429

Page 25: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:

• Data and variability concepts

Data collection

• Measures of central tendency (mean, median, mode, variance, stdev)

• Normal distribution and SE

• Student’s t-test and 95% confidence intervals

• Chi-Square tests

• MS Excel

Page 26: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Nominal data – gender, colour, species, genus, class, town, country, model etc

Continuous data – concentration, depth, height, weight, temperature, rate etc

Discrete data – numbers per unit space, numbers per entity etc

Types of Data

The type of data collected influences their statistical analysis

Male Female

Blue Red Black White

100 g 200 g

121.34 g 162.18 g 180.01 g

5 people

Understanding stats…

Page 27: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Nominal Continuous Discrete1

DATA

Type

z-testst-tests

ANOVA…etc3 Choice of

statistical test

Chi - squared

2 Distribution

NormalBinomial

Poisson…etc

+

Understanding stats…

Data do NOT

have to be

normally

distributed

Page 28: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:

• Data and variability concepts

Data collection

• Measures of central tendency (mean, median, mode, variance, stdev)

• Normal distribution and SE

• Student’s t-test and 95% confidence intervals

• Chi-Square tests

• MS Excel

Page 29: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Testing Patterns in Discrete (count) Data: the Chi-Square Test

Examples of count data: Number of petals per flowerNumber of segments per insect legNumber of worms per quadratNumber of white cars on campus…etc

You can covert continuous data to discrete data, by assigning data to data classes

1.85 1.65 1.55 1.91.6 1.95 1.7 1.7

1.95 1.75 1.8 1.71.65 1.55 1.65 1.751.45 1.85 1.85 1.81.9 1.75 1.7 2.051.4 2 1.35 21.8 1.65 1.5 1.81.9 2.1 1.8 1.5

1.75 1.2 1.5 2.151.3 1.7 1.6 1.55

1.85 1.45 1.8 1.851.5 1.75 1.75 1.251.8 1.95 1.75 21.9 1.7 1.8 1.9

1.75 1.85 1.8 1.751.7 1.9 1.45 1.65

1.35 1.65 1.7 1.61.75 1.5 1.55 1.551.6 1.8 1.75 1.85

2.05 1.6 1.85 1.71.65 1.7 1.4 1.751.95 1.9 1.65 1.61.75 1.65 1.7 1.851.8 1.75 1.95 1.65

1.55 2.2 1.751.7 1.6 1.6

0

2

4

6

8

10

12

14

16

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9 2

2.1

2.2

Height (m)

Fre

qu

en

cy

Height (m) Frequency1.2 1

1.25 11.3 1

1.35 21.4 2

1.45 31.5 5

1.55 61.6 8

1.65 101.7 12

1.75 151.8 11

1.85 91.9 7

1.95 52 3

2.05 22.1 1

2.15 12.2 1

Page 30: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Often want to determine if the population from which you have obtained count data conforms to a certain prediction

Q: Does the OBSERVED ratio differ (SIGNIFICANTLY) from the EXPECTED ratio?

STATISTICS: CHI-SQUARED TESTS

Hypothesised (EXPECTED) ratio:

n =134

Observed numbers:

113 yellow

21 green

Expected numbers: 100.5 yellow

33.5 green

=134 * 0.75 =134 * 0.25

3 : 1 ¾ : ¼OR 0.75 : 0.25

OR

113 : 21OBSERVED ratio: 5.4 : 1OR

= Σχ2 (O – E)2

E[ ] Equation 4

Where O = Observed, E = Expected

The bigger the difference between O and E, the greater the χ2

When there is no difference will be ZERO = Goodness of Fitχ2

A geneticist raises a progeny of 134 flowers from this cross:

Page 31: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

STATISTICS: CHI-SQUARED TESTS

1.Establish hypotheses

2.Determine Observed and Expected frequencies

3.Calculate the X2-statistic using

4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)

5.Calculate the critical value of X2

• use X2-statistic table

6.Compare X2-statistic with critical value

7.If X2-statistic > critical value reject H0 (significant differences between O and E)

8.If X2-statistic < critical value accept H0 (no significant differences between O and E)

NB: must always use counts (frequencies) NOT percentages or proportions

= Σχ2 (O – E)2

E[ ]

Steps of X2 tests:

Critical value: X2 significance level,

vNumber of

categories (K) -1

Page 32: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

STATISTICS: CHI-SQUARED TESTS

1.Establish hypotheses

• H0: Observed and expected ratios are not significantly different

• H1: Observed and expected ratios are significantly different

2.Determine Observed and Expected frequencies

• Yellow flowers: Observed = 113 ; Expected = 100.5

• Green flowers: Observed = 21 ; Expected = 33.5

3.Calculate the X2-statistic using

4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)

5.Calculate the critical value of X2

Does the OBSERVED ratio (113:21) differ (SIGNIFICANTLY) from the Expected (100.5:33.5) ratio?

Critical value: X2 significance level,

v= χ2 (113 – 100.5)2

100.5[ ] [ ](21 – 33.5)2

33.5+ = 1.55 + 4.66 = 6.22

Yellow flowers Green flowers

Page 33: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

v α = 0.999 0.995 0.99 0.975 0.95 0.9 0.75 0.5 0.25 0.1 0.05 0.025 0.01

1 0.000 0.000 0.000 0.001 0.004 0.016 0.102 0.455 1.323 2.706 3.841 5.024 6.635

2 0.002 0.010 0.020 0.051 0.103 0.211 0.575 1.386 2.773 4.605 5.991 7.378 9.21

3 0.024 0.072 0.115 0.216 0.352 0.584 1.213 2.366 4.108 6.251 7.815 9.348 11.345

4 0.091 0.207 0.297 0.484 0.711 1.064 1.923 3.357 5.385 7.779 9.488 11.143 13.277

5 0.210 0.412 0.554 0.831 1.145 1.610 2.675 4.351 6.626 9.236 11.07 12.833 15.086

6 0.381 0.676 0.872 1.237 1.635 2.204 3.455 5.364 7.841 10.645 12.592 14.449 16.812

7 0.590 0.989 1.239 1.690 2.167 2.833 4.255 6.346 9.037 12.017 14.067 16.013 18.475

8 0.857 1.344 1.646 2.180 2.733 3.490 5.071 7.344 10.219 13.362 15.507 17.535 20.09

9 1.152 1.735 2.088 2.700 3.325 4.168 5.899 8.343 11.389 14.684 16.919 19.023 21.666

10 1.479 2.156 2.558 3.247 3.940 4.865 6.737 9.342 12.549 15.987 18.307 20.483 23.209

11 1.834 2.603 3.053 3.816 4.575 5.578 7.584 10.341 13.701 17.275 19.675 21.92

12 2.214 3.074 3.571 4.404 5.226 6.304 8.438 11.340 14.845 18.549 21.026

13 2.617 3.565 4.107 5.009 5.892 7.042 9.299 12.340 15.984 19.812

14 3.041 4.075 4.660 5.629 6.571 7.790 10.165 13.339 17.117

15 3.483 4.601 5.229 6.262 7.261 8.547 11.037 14.339

16 3.942 5.142 5.812 6.908 7.962 9.312 11.912

17 4.416 5.697 6.408 7.564 8.672 10.085

18 4.905 6.265 7.015 8.231 9.390

19 5.407 6.844 7.633 8.907

20 5.921 7.434 8.260

21 6.447 8.034

22 6.983

•Degrees of Freedom (v) = K – 1, where K = number of categories

•in this case two categories: (yellow-flowering and green-flowering) = (2 – 1)•…therefore v = 1

Critical value: X2 0.05, vCritical value: X2

0.05, 1

Critical value = 3.841

Page 34: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

STATISTICS: CHI-SQUARED TESTS

1.Establish hypotheses

• H0: Observed and expected ratios are not significantly different

• H1: Observed and expected ratios are significantly different

2.Determine Observed and Expected frequencies

• Yellow flowers: Observed = 113 ; Expected = 100.5

• Green flowers: Observed = 21 ; Expected = 33.5

3.X2-statistic = 6.22

4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)

5.Critical value = 3.841

6.X2-statistic > critical value therefore reject H0

Q: Does the OBSERVED ratio (113:21) differ (SIGNIFICANTLY) from the Expected (100.5:33.5) ratio?

A: the observed ratio is significantly different from the expected ratio

Page 35: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

1.Establish hypotheses

2.Determine Observed and Expected frequencies

3.Calculate the X2-statistic using

4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)

5.Calculate the critical value of X2

• use X2-statistic table

6.Compare X2-statistic with critical value

7.If X2-statistic > critical value reject H0 (significant differences between O and E)

8.If X2-statistic < critical value accept H0 (no significant differences between O and E)

= Σχ2 (O – E)2

E[ ]

Critical value: X2 significance level,

v

STATISTICS: CHI-SQUARED TESTS

Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EYellow - Smooth 152 9 140.63 11.38 129.39 0.92Yellow - Wrinkled 39 3 46.88 -7.88 62.02 1.32Green - Smooth 53 3 46.88 6.13 37.52 0.80Green - Wrinkled 6 1 15.63 -9.63 92.64 5.93

Total 250 16 250 8.97

Q: Has the geneticist sampled from a population having a

ratio of 9:3:3:1 ?

A plant geneticist has done some crossing between plants and come up with the following numbers of

different seeds

Now you try…

H0: Population sampled has YS:YW:GS:GW seeds in the ratio 9:3:3:1

H1: Population sampled does not have YS:YW:GS:GW seeds in the ratio 9:3:3:1

Page 36: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

1.Establish hypotheses

2.Determine Observed and Expected frequencies

3.Calculate the X2-statistic using

4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)

5.Calculate the critical value of X2

• use X2-statistic table

6.Compare X2-statistic with critical value

7.If X2-statistic > critical value reject H0 (significant differences between O and E)

8.If X2-statistic < critical value accept H0 (no significant differences between O and E)

= Σχ2 (O – E)2

E[ ]

Critical value: X2 significance level,

v

Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EYellow - Smooth 152 9 140.63 11.38 129.39 0.92Yellow - Wrinkled 39 3 46.88 -7.88 62.02 1.32Green - Smooth 53 3 46.88 6.13 37.52 0.80Green - Wrinkled 6 1 15.63 -9.63 92.64 5.93

Total 250 16 250 8.97

Now you try…STATISTICS: CHI-SQUARED TESTS

Q: Has the geneticist sampled from a population having a

ratio of 9:3:3:1 ?

A plant geneticist has done some crossing between plants and come up with the following numbers of

different seeds

Switch to Excel

Page 37: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

1.Establish hypotheses

2.Determine Observed and Expected frequencies

3.Calculate the X2-statistic

4.Determine significance level for hypothesis testing

5.Calculate the critical value of X2

• use X2-statistic tableCritical value: X2 significance level,

v

Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EYellow - Smooth 152 9 140.63 11.38 129.39 0.92Yellow - Wrinkled 39 3 46.88 -7.88 62.02 1.32Green - Smooth 53 3 46.88 6.13 37.52 0.80Green - Wrinkled 6 1 15.63 -9.63 92.64 5.93

Total 250 16 250 8.97

Now you try…STATISTICS: CHI-SQUARED TESTS

Q: Has the geneticist sampled from a population having a

ratio of 9:3:3:1 ?

A plant geneticist has done some crossing between plants and come up with the following numbers of

different seeds

χ2 = 8.97α = 0.05

Page 38: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

What is the critical value of χ2

v α = 0.999 0.995 0.99 0.975 0.95 0.9 0.75 0.5 0.25 0.1 0.05 0.025 0.01

1 0.000 0.000 0.000 0.001 0.004 0.016 0.102 0.455 1.323 2.706 3.841 5.024 6.635

2 0.002 0.010 0.020 0.051 0.103 0.211 0.575 1.386 2.773 4.605 5.991 7.378 9.21

3 0.024 0.072 0.115 0.216 0.352 0.584 1.213 2.366 4.108 6.251 7.815 9.348 11.345

4 0.091 0.207 0.297 0.484 0.711 1.064 1.923 3.357 5.385 7.779 9.488 11.143 13.277

5 0.210 0.412 0.554 0.831 1.145 1.610 2.675 4.351 6.626 9.236 11.07 12.833 15.086

6 0.381 0.676 0.872 1.237 1.635 2.204 3.455 5.364 7.841 10.645 12.592 14.449 16.812

7 0.590 0.989 1.239 1.690 2.167 2.833 4.255 6.346 9.037 12.017 14.067 16.013 18.475

8 0.857 1.344 1.646 2.180 2.733 3.490 5.071 7.344 10.219 13.362 15.507 17.535 20.09

9 1.152 1.735 2.088 2.700 3.325 4.168 5.899 8.343 11.389 14.684 16.919 19.023 21.666

10 1.479 2.156 2.558 3.247 3.940 4.865 6.737 9.342 12.549 15.987 18.307 20.483 23.209

11 1.834 2.603 3.053 3.816 4.575 5.578 7.584 10.341 13.701 17.275 19.675 21.92

12 2.214 3.074 3.571 4.404 5.226 6.304 8.438 11.340 14.845 18.549 21.026

13 2.617 3.565 4.107 5.009 5.892 7.042 9.299 12.340 15.984 19.812

14 3.041 4.075 4.660 5.629 6.571 7.790 10.165 13.339 17.117

15 3.483 4.601 5.229 6.262 7.261 8.547 11.037 14.339

16 3.942 5.142 5.812 6.908 7.962 9.312 11.912

17 4.416 5.697 6.408 7.564 8.672 10.085

18 4.905 6.265 7.015 8.231 9.390

19 5.407 6.844 7.633 8.907

20 5.921 7.434 8.260

21 6.447 8.034

22 6.983

Critical value: X2 0.05, 3

Page 39: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

1.Establish hypotheses

2.Determine Observed and Expected frequencies

3.Calculate the X2-statistic

4.Determine significance level for hypothesis testing (α = 0.05 or α = 0.01)

5.Calculate the critical value = 7.815

6.Compare X2-statistic with critical value

7.If X2-statistic > critical value

Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EYellow - Smooth 152 9 140.63 11.38 129.39 0.92Yellow - Wrinkled 39 3 46.88 -7.88 62.02 1.32Green - Smooth 53 3 46.88 6.13 37.52 0.80Green - Wrinkled 6 1 15.63 -9.63 92.64 5.93

Total 250 16 250 8.97

Now you try…STATISTICS: CHI-SQUARED TESTS

Q: Has the geneticist sampled from a population having a

ratio of 9:3:3:1 ?

A plant geneticist has done some crossing between plants and come up with the following numbers of

different seeds

χ 2= 8.97

Reject the Null Hypothesis that sample drawn from a

population showing 9:3:3:1 ratio of YS:YW:GS:GW

Page 40: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

IF Expected Counts are LESS than ONE, then you must combine the categories

Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EA 17 50 43.17 -26.17 684.80 15.86B 21 25 21.58 -0.58 0.34 0.02C 21 12.5 10.79 10.21 104.20 9.66D 23 6.25 5.40 17.60 309.90 57.43F 2 3.125 2.70 -0.70 0.49 0.18G 1 1.563 1.35 -0.35 0.12 0.09H 1 0.781 0.67 0.33 0.11 0.16I 0 0.391 0.34 -0.34 0.11 0.34

Total 86 100 86 83.73

Seed Type Observed Count Expected Ratio Expected Count (O-E) (O-E)2 (O-E)2 / EA 17 50 43.17 -26.17 684.79 15.86B 21 25 21.58 -0.58 0.34 0.02C 21 12.5 10.79 10.21 104.20 9.66D 23 6.25 5.40 17.60 309.90 57.43F 2 3.125 2.70 -0.70 0.49 0.18G 1 1.563 1.35 -0.35 0.12 0.09H + I 1 1.172 1.01 -0.01 0.00 0.00

Total 86 100 86 83.24

NB: By combining data you reduce value of K and also v

STATISTICS: CHI-SQUARED TESTS…final word…

Page 41: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

POPULATION DYNAMICS POPULATION DYNAMICS Required background knowledge:

• Data and variability concepts

Data collection

• Measures of central tendency (mean, median, mode, variance, stdev)

• Normal distribution and SE

• Student’s t-test and 95% confidence intervals

• Chi-Square tests

• MS Excel

Page 42: POPULATION DYNAMICS Required background knowledge: Data and variability concepts  Data collection Measures of central tendency (mean, median, mode, variance,

Continuous Discrete

DATA

Looking for probabilities: Z-TESTS

Comparing two means: T-TESTS

Chi - squared

Which stats test to use?

Use Getting started with data.xls for further advice