statistics allow biologists to support the findings of their experiments

32
Statistics allow biologists to support the findings of their experiment s.

Upload: margaretmargaret-oconnor

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics allow biologists to support the findings of their experiments

Statistics allow

biologists to support the findings of

their experiments

.

Page 2: Statistics allow biologists to support the findings of their experiments

“Why is this Biology?”Variation in populations.

Variability in results.

affects

Confidence in conclusions.

The key methodology in Biology is hypothesis testing through experimentation.

Carefully-designed and controlled experiments and surveys give us quantitative

(numeric) data that can be compared.

We can use the data collected to test our hypothesis and form explanations of the

processes involved… but only if we can be confident in our results.

We therefore need to be able to evaluate the reliability of a set of data and the significance of any differences we have found in the data.

Image: 'Transverse section of part of a stem of a Dead-nettle (Lamium sp.) showing+a+vascular+bundle+and+part+of+the+cortex' http://www.flickr.com/photos/71183136@N08/6959590092 Found on flickrcc.net

Page 4: Statistics allow biologists to support the findings of their experiments

“Which medicine should I prescribe?”

Image from: http://www.msf.org/international-activity-report-2010-sierra-leoneDonate to Medecins Sans Friontiers through Biology4Good: http://i-biology.net/about/biology4good/

Generic drugs are out-of-patent, and are much cheaper than the proprietary (brand-name) equivalents. Doctors need to balance needs with available resources. Which would you choose?

Page 5: Statistics allow biologists to support the findings of their experiments

Hummingbirds are nectarivores (herbivores that feed on the nectar of some species of flower).

In return for food, they pollinate the flower. This is an example of mutualism – benefit for all.

As a result of natural selection, hummingbird bills have evolved.

Birds with a bill best suited to their preferred food source have

the greater chance of survival.

Photo: Archilochus colubris, from wikimedia commons, by Dick Daniels.

Page 6: Statistics allow biologists to support the findings of their experiments

Researchers studying comparative anatomy collect data on bill-length in two species of hummingbirds: Archilochus colubris (red-throated hummingbird) and Cynanthus latirostris (broadbilled hummingbird).

To do this, they need to collect sufficientrelevant, reliable data so they can testthe Null hypothesis (H0) that:

“there is no significant difference in bill length between the two species.”

Photo: Archilochus colubris (male), wikimedia commons, by Joe Schneid

Page 7: Statistics allow biologists to support the findings of their experiments

The Null hypothesis presumes that there is NO STATISTICAL DIFFERENCE between the two samples.

The ALTERNATIVE hypothesis presumes that there is a STATISTICAL DIFFERENCE between the two samples.

The t-test provides a probability that the two samples are the same.

A P < 0.05 is accepted as a low enough probability of sameness to reject the NULL hypothesis.

Page 8: Statistics allow biologists to support the findings of their experiments

The sample size must be large enough to provide

sufficient reliable data and for us to carry out relevant statistical

tests for significance.

We must also be mindful of uncertainty in our measuring tools

and error in our results.

Photo: Broadbilled hummingbird (wikimedia commons).

Page 9: Statistics allow biologists to support the findings of their experiments
Page 10: Statistics allow biologists to support the findings of their experiments

The mean is a measure of the central tendency of a set of data.

 

Table 1: Raw measurements of bill length in A. colubris and C. latirostris.     Bill length (±0.1mm)   n A. colubris C. latirostris

  1 13.0 17.0

  2 14.0 18.0

  3 15.0 18.0

  4 15.0 18.0

  5 15.0 19.0

  6 16.0 19.0

  7 16.0 19.0

  8 18.0 20.0

  9 18.0 20.0

  10 19.0 20.0

 Mean      s           

Calculate the mean using: • Your calculator (sum of values / n)

• Excel

=AVERAGE(highlight raw data)

n = sample size. The bigger the better. In this case n=10 for each group.

All values should be centred in the cell, with decimal places consistent with the measuring tool uncertainty.

Page 11: Statistics allow biologists to support the findings of their experiments
Page 12: Statistics allow biologists to support the findings of their experiments
Page 13: Statistics allow biologists to support the findings of their experiments

Standard deviation is a measure of the spread of most of the data.

 

Table 1: Raw measurements of bill length in A. colubris and C. latirostris.     Bill length (±0.1mm)   n A. colubris C. latirostris

  1 13.0 17.0

  2 14.0 18.0

  3 15.0 18.0

  4 15.0 18.0

  5 15.0 19.0

  6 16.0 19.0

  7 16.0 19.0

  8 18.0 20.0

  9 18.0 20.0

  10 19.0 20.0

 Mean 15.9 18.8   s 1.91 1.03        

Standard deviation can have one more decimal place. =STDEV (highlight RAW data).

Which of the two sets of data has:

a. The longest mean bill length?

b. The greatest variability in the data?

Page 14: Statistics allow biologists to support the findings of their experiments

Standard deviation is a measure of the spread of most of the data.

 

Table 1: Raw measurements of bill length in A. colubris and C. latirostris.     Bill length (±0.1mm)   n A. colubris C. latirostris

  1 13.0 17.0

  2 14.0 18.0

  3 15.0 18.0

  4 15.0 18.0

  5 15.0 19.0

  6 16.0 19.0

  7 16.0 19.0

  8 18.0 20.0

  9 18.0 20.0

  10 19.0 20.0

 Mean 15.9 18.8   s 1.91 1.03        

Standard deviation can have one more decimal place. =STDEV (highlight RAW data).

Which of the two sets of data has:

a. The longest mean bill length?

b. The greatest variability in the data?

C. latirostris

A. colubris

Page 15: Statistics allow biologists to support the findings of their experiments

Standard deviation is a measure of the spread of most of the data. Error bars are a graphical representation of the variability of data.

Which of the two sets of data has:

a. The highest mean?

b. The greatest variability in the data?

A

B

Error bars could represent standard deviation, range or confidence intervals.

Page 16: Statistics allow biologists to support the findings of their experiments

The overlap of a set of error bars gives a clue as to the significance of the difference between two sets of data.

Large overlap No overlap

Lots of shared data points within each data set.

Results are not likely to be significantly different from each other.

Any difference is most likely due to chance.

No (or very few) shared data points within each data set.

Results are more likely to be significantly different from each other.

The difference is more likely to be ‘real’.

Page 17: Statistics allow biologists to support the findings of their experiments
Page 18: Statistics allow biologists to support the findings of their experiments
Page 19: Statistics allow biologists to support the findings of their experiments
Page 20: Statistics allow biologists to support the findings of their experiments

-3.0

2.0

7.0

12.0

17.0

22.0

A. colubris, 15.9mm(n=10)

C. latirostris, 18.8mm(n=10)

Graph 1: Comparing mean bill lengths in two hummingbird species, A. colubris and C.

latirostris.(error bars = standard deviation)

Species of hummingbird

Mea

n Bi

ll le

ngth

(±0

.1m

m)

Our results show a very small overlap between the two sets of data.

So how do we know if the difference is significant or not?

We need to use a statistical test.

The t-test is a statistical test that helps us determine the significance of the difference between the means of two sets of data.

Page 21: Statistics allow biologists to support the findings of their experiments
Page 22: Statistics allow biologists to support the findings of their experiments

The Null Hypothesis (H0):

“There is no significant difference.”

This is the ‘default’ hypothesis that we always test.In our conclusion, we either accept the null hypothesis or reject it.

A t-test can be used to test whether the difference between two means is significant. • If we accept H0, then the means are not significantly different. • If we reject H0, then the means are significantly different.

Remember:• We are never ‘trying’ to get a difference. We design carefully-controlled experiments and

then analyse the results using statistical analysis.

Page 23: Statistics allow biologists to support the findings of their experiments

Excel can jump straight to a value of P for our results.One function (=ttest) compares both sets of data.

As it calculates P directly (the probability that the difference is due to chance), we can determine significance directly.

In this case, P=0.00051

This is much smaller than 0.005, so we are confident that we can:

reject H0.

The difference is unlikely to be due to chance.

Conclusion: There is a significant difference in bill length between A. colubris and C. latirostris.

Page 24: Statistics allow biologists to support the findings of their experiments

Two tails: we assume data are normally distributed, with two ‘tails’ moving away from mean. Type 2 (unpaired): we are comparing one whole population with the other whole population.

(Type 1 pairs the results of each individual in set A with the same individual in set B).

Page 25: Statistics allow biologists to support the findings of their experiments
Page 26: Statistics allow biologists to support the findings of their experiments

Cartoon from: http://www.xkcd.com/552/

Correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing "look over there."

Page 27: Statistics allow biologists to support the findings of their experiments
Page 28: Statistics allow biologists to support the findings of their experiments
Page 29: Statistics allow biologists to support the findings of their experiments
Page 31: Statistics allow biologists to support the findings of their experiments

Correlation does not imply causality.

Pirates vs global warming, from http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster#Pirates_and_global_warming

Where correlations exist, we must then design solid scientific experiments to determine the cause of the relationship. Sometimes a correlation exist because of confounding variables – conditions that the correlated variables have in common but that do not directly affect each other.

To be able to determine causality through experimentation we need: • One clearly identified independent variable• Carefully measured dependent variable(s) that can be attributed to change in the

independent variable• Strict control of all other variables that might have a measurable impact on the

dependent variable.

We need: sufficient relevant, repeatable and statistically significant data.

Some known causal relationships: • Atmospheric CO2 concentrations and global warming• Atmospheric CO2 concentrations and the rate of photosynthesis• Temperature and enzyme activity