statistical analysis why?? (besides making your life difficult …) scientists must collect data...

Post on 18-Jan-2016

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Statistical analysis

Why?? (besides making your life difficult …)

Scientists must collect data AND analyze it

Does your data support your hypothesis? Is it valid?

Statistics helps us find relationships between sets of data.

You are the scientist now, you must be comfortable with analysis of your data

Let’s look at two sets of data

Sample 1-10, 0, 10, 20, 30

Sample 28, 9, 10, 11, 12

What can you tell me about this data???

Mean: the “average” of the data or the central tendency

Sample 1

-10, 0, 10, 20, 30

-10 + 0 + 10 + 20 + 30

5

Sample 2

8, 9, 10, 11, 12

8 + 9 + 10 + 11 + 12

5

Mean = 10 Mean = 10

Is this analysis complete???NO!

Range: how far is the spread?Largest # - smallest #

Sample 1

-10, 0, 10, 20, 30

30 – (-10)

Sample 2

8, 9, 10, 11, 12

12 - 8

Range = 40 Range = 4

Does this data help?Yes, Sample 1 is more dispersedObvious? Perhaps, but now shown mathematically

still notenough

Something more … standard deviation

SD is a measure to show how individual data points are dispersed around the mean

Assuming normal data distribution (bell curve)

68% of all collected values lie within +/- 1 SD

95% of all collected values lie within +/- 2 SD

So what???

Standard deviation A small SD indicates the

data values are clustered around the mean May also indicate few

exteme data points

A large SD indicates the data values are spread out May also indicate extreme

data points Outliers??

Standard deviation

= each data point = the meann = the total number

of data pointsΣ = the sum of all

the values

Let’s practice … Sample 1 -10, 0, 10, 20, 30

Remember = 10 (-10 – 10)2 + (0 – 10)2 + (10 – 10)2 + (20 – 10)2 + (30 – 10)2

(-20)2 + (-10)2 + (0)2 + (10)2 + (20)2

400 + 100 + 0 + 100 + 400 1000, divide by n – 1 (5 – 1 = 4) 1000/4 = 250, now √250 15.8

Let’s practice … Sample 2 8, 9, 10, 11, 12

Remember = 10 (8– 10)2 + (9 – 10)2 + (10 – 10)2 + (11 – 10)2 + (12 – 10)2

(-2)2 + (-1)2 + (0)2 + (1)2 + (2)2

4 + 1 + 0 + 1 + 4 10, divide by n – 1 (5 – 1 = 4) 10/4 = 2.5, now √2.5 1.58

Let’s compare …Sample 1

SD = 15.8Sample 2

SD = 1.58

How can I use this in my lab?

Error bars

Error bars represent the variability of your dataSTANDARD DEVIATIONrangemeasurement

uncertainties

Error bars

On a bar graph, the bar represents the mean of your data and the error bars represent +/- 1 sd

mean

sd

Error bars

On a line graph, the point represents the mean of your data and the error bars represent +/- 1 sd

mean

sd

t-test t-test determines statistical significance between 2

sample means Is the difference significant? Is the difference due to your variable?? Or is it random

chance?? How valid is your data?

t-test determines the probability that difference is due to random chance A p value (probability) of 0.05 (5%) shows a 5% chance of

randomness, but a 95% chance of confidence …

Key word!!!!!

You want 95% or higher!your difference IS DUE TO YOUR VARIABLE

t-testFor tests, you do

NOT need to calculate t-values, but you must be able to read a t-chart!!

For internal assessments, you may use calculators or excel to calculate t-values

This is therange youare hoping for

The difference between your samples has a HIGH probability of being due to your variable (and not chance)

Need to be able to calculate degrees of freedom

Calculating degrees of freedom

df = (n1 + n2) - 2

Size of sample 1

Size of sample 2

# of samples

Calculating degrees of freedom df = (n1 + n2) – 2

Population 1 -10, 0, 10, 20, 30

n1 = 5

Population 2 8, 9, 10, 11, 12

n2 = 5

df = (5 + 5) -2 df = 8

Using the t-table If df = 8 and t = 3.5, is

this a significant difference?

Less than 1% probability difference in data is due to chance

Therefore, greater than 99% probability difference in data is due to our variable

Other options, less commonly used in our class Median

The middle #, when arranged in numeric order

Sample 1 -10, 0, 10, 20, 30 Median = 10

Sample 2 8, 9, 10, 11, 12 Median = 10

Mode The # that occurs most

often

Sample 1 -10, 0, 10, 20, 30 No mode

Sample 2 8, 9, 10, 11, 12 No mode

Some practice: looking at plant height

Height in sun (cm)

Height in shade (cm)

124 131

120 60

153 131

98 160

124 212

141 117

156 131

128 95

139 145

117 118

Calculate the mean for both samples

Sun = 130 cmShade = 130 cm

Some practice: looking at plant height

Height in sun (cm)

Height in shade (cm)

124 131

120 60

153 131

98 160

124 212

141 117

156 131

128 95

139 145

117 118

Calculate the range for both samples

Sun = 58 cmShade = 152 cm

Some practice: looking at plant height

Height in sun (cm)

Height in shade (cm)

124 131

120 60

153 131

98 160

124 212

141 117

156 131

128 95

139 145

117 118

Calculate the median for both samples

Sun = 126 cmShade = 131 cm

If even # of samples, find the average of the two middle numbers

Some practice: looking at plant height

Height in sun (cm)

Height in shade (cm)

124 131

120 60

153 131

98 160

124 212

141 117

156 131

128 95

139 145

117 118

Calculate the mode for both samples

Sun = 124 cmShade = 131 cm

Some practice: looking at plant height

Height in sun (cm)

Height in shade (cm)

124 131

120 60

153 131

98 160

124 212

141 117

156 131

128 95

139 145

117 118

Calculate the sd for both samples

Sun = 17.56 cmShade = 39.85 cm

What does this mean?

Some practice: looking at plant height

Height in sun (cm)

Height in shade (cm)

124 131

120 60

153 131

98 160

124 212

141 117

156 131

128 95

139 145

117 118

Sun: sd = 17.56 cm Low sd indicates even

(close) distribution of data points

More valid

Shade: sd = 39.85 cm High sd indicates wide

spread of data points MAY indicate a problem

with your experimental design

Some practice: looking at plant height

Height in sun (cm)

Height in shade (cm)

124 131

120 60

153 131

98 160

124 212

141 117

156 131

128 95

139 145

117 118

If t = 1.5, is this a significant difference?No

Be careful: correlation vs. cause Observations (and carefully chosen data) may imply

a CORRELATION, but does NOT necessarily demonstrate a cause

The average global temperature has increased over the past 100 years.

The number of pirates in the world has decreased over the past 100 years.

Therefore, decreased number of pirates causes increased global temperatures

NO!

Be careful: correlation vs. cause

no

no

no !

Be careful: correlation vs. causeTo discern a

CAUSE, a valid EXPERIMENT must be done

Other scientists must also be able to repeat your experiment

Last word …Remember, it is

ALWAYS better to PROVE your experiment failed to support your hypothesis, than to lie about it being a success!!!

Any questions?

top related