introductory statistics for laboratorians dealing with high throughput data sets

Introductory Statistics for Laboratorians dealing with High

Throughput Data sets

Centers for Disease Control

Effects of Antihistamine use on Driving

• Antihistamines cause drowsiness and the box says you should not operate machinery while using.

• We will design an experiment to directly test the effects of Antihistamine Dose on driving performance.

• We will inject the drug to control for absorption rates. • Adjust dose by body weight• Use 10 different dosages (dosages are in milligrams per

kilogram body weight).– 0, 10, 15, 20, 25, 30, 35, 40, 45, 50

Method

• Testing will be done at a nice resort hotel.• Individuals will be given a 10 minute practice

session in the simulator and a review of traffic laws and the instructions.

• Individuals will be injected with the drug. • After 20 minutes they will enter the driving

simulator and perform a 30 minute driving test.

• The number of accidents and the number of concentration faults will be recorded.

• Their driving score is printed out by the simulator at the end of the run.

• The bigger the number the worse they did.• For safety, they will then be held at the test

facility overnight before being released.

• Placebo Control Group– The group that gets 0 mcg/kg is the control group– They will be injected with saline – placebo control

group – they will not know it wasn’t the antihistamine.

• Double Blind– The injections for each test will be prepared in a

secure location and transported to the test site. Neither the person giving the injection or the person receiving the injection will know the dose.

• We will have 10 groups of subjects• Each group will consist of 15 individuals

– n = 15, N = 150• Subjects will be randomly assigned to groups

by persons at the secure site to maintain the double blind.

• The study is funded.• We order the antihistamine from the drug

company that produces the drug. • We make a big deal of the fact thatone of the

vials should be a placebo that is indistinguishable (color, viscosity, etc.) from the real drug. The drug company has a fit and charges extra to develop the placebo.

• 10 vials come in and are stored in the safe at the lab

• Through some misunderstanding at the drug company, they send us 10 vials of the placebo and none of the drug.

• The mistake goes completely undetected. • That means that in this special case we know

for a fact that the Null Hypothesis is true.

Hypotheses

• H0: The injected chemical had no effect on the driving scores

• HA: The injected chemical had a dose related effect on the driving scores (higher dose > worse scores)

• Region of Rejection: Alpha = .05

Results

• In a standardization study run by the driving simulator manufacturer, several thousand individuals were run through the simulator.

• They were not on any drug. • Driving Score in the population

– Mean = 35– Standard Deviation = 10

• Our Null Hypotheses is basically the hypothesis that our 10 groups could all be samples from a population with a mean of 35 and a standard deviation of 10.

Effect of Antihistamine on DrivingData Table

Grp1 Grp 2 Grp3 Grp4 Grp5 Grp6 Grp7 Grp8 Grp9 Grp10Dose (mcg/kg) 0 10 15 20 25 30 35 40 45 50

22 43 23 33 46 40 47 16 26 4123 36 33 39 33 39 34 32 23 2833 57 49 44 34 24 20 38 41 2542 25 47 28 25 28 30 40 53 4023 27 32 33 46 38 58 21 43 4728 29 48 44 44 33 27 38 49 3741 28 28 34 24 44 34 18 33 5622 22 40 37 35 28 32 33 30 2540 38 30 30 28 41 20 25 44 4921 43 26 6 28 46 37 34 26 2124 51 29 47 39 26 34 43 27 4232 34 51 41 45 31 33 31 50 2428 33 40 33 26 26 27 28 25 3029 41 39 35 39 43 47 36 44 3427 28 46 13 44 40 31 29 40 43

Mean 29.00 35.67 37.40 33.13 35.73 35.13 34.07 30.80 36.93 36.13SD 7.19 9.93 9.31 11.09 8.16 7.45 10.15 8.04 10.26 10.46

Error

• What accounts for the differences among the 10 groups, since they all got exactly the same injection?

Error Rate

• We already know how to use a t-test for independent samples to test whether two means are significantly different.

• Here we have 10 independent samples. • Can we just use t-tests to test them in pairs?

Types of ErrorTruth about Population from which sample came

H0 True H0 False

Decision Basedon Sample

Reject H0 Type IError (Alpha)

CorrectDecision

Fail ToReject H0

CorrectDecision

Type IIError (Beta)

Error Rate

• With 10 groups, there would be 45 pairs of means to test.

• If we test each pair of means at alpha = .05 we are allowing 5 chances in 100 of being wrong (type I error) on each test.

• In 45 tests that would virtually guarantee that some of the means would test significantly different by chance alone

• This is very bad.

Error Rate

• One of the big problems in Science is to control the error rate in our experiments

• Experimentwise Error Rate– Ethical standards of Science require researchers to

keep the error rate for each experiment to a reasonable level.

• We need an overall test that would tell us if anything is going on

This is our situation (except 10 samples)

• If the Null Hypothesis is true, what we have is 10 random samples all from the same population.

• What does the Central Limit Theorem tell us about this situation?

Central Limit Theorem

• Given any population (with any distribution, normal or otherwise) with mean μ and variance σ2 , as the sample size increases the sampling distribution of the mean

1. Approaches a normal distribution with2. Mean μ and 3. Variance

Npop2

• We have 10 samples from a population with a mean of 35 and a standard deviation of 10.

• Specify the mean and standard error for the sampling distribution of means of samples of size 15.

• Central Limit Theorem Tells Us:•

•

•

35X

58.267.6 X

67.6151002 X

Compute the Following

• Use Excel to compute: • The Mean of our 10 means• The Variance of our 10 means• The Standard Deviation (standard error) of our

10 means.

H0: True

• Group means should be clustered close around population mean

H0: False

• Some of the Group means will be large, not clustered closely around the mean.

• If the Null Hypothesis is false, the variance of the means will be significantly larger than predicted by the central limit theorem.

• We can test this using Analysis of Variance

Analysis of VarianceANOVA

Driving Score

Sum of Squares df Mean Square F Sig.

Between Groups992.667 9 110.296 1.276 0.255

Within Groups12103.333 140 86.452

Total13096 149

There is no significant difference between the variance of our means and the variance that would be predicted on the basis of error alone – ie: on the variance predicted from the central limit theorem.

The ANOVA estimates the error variance to be 86.452. The variance of the group means is 110.296.The probability of this happening is 25.5 times in 100. Our region of rejection was 5 times in 100. We cannot reject the Null Hypothesis

Error Rate

• We have made a single comparison– Compare variance of means to an estimate of the

variance of the population– At alpha - .05 level.– The experimentwise error rate is 5 chances in 100.

Post-hoc / follow-up tests

• When an Analysis of Variance is significant it tells us there is more variance in the means than there should be if the null hypothesis is true but it doesn’t tell us which means are significantly different from which others.

• Use Post-hoc or follow-up tests to find out. • Must control the error rate for the follow-up

tests

Bonferroni Adjustment

• Control the error rate for multiple comparisons by making sure the total error rate for all comparisons adds up to the selected alpha.

• Example: to compare the means of 4 groups using t-tests with a total alpha of .05.

• 4 groups is 6 comparisons – 1 – 2, 1 – 3, 1 – 4, 2 – 3, 2 – 4, 3 - 4

• Use alpha = .05/6 = .00833 for each.

introductory statistics for laboratorians dealing with high throughput data sets

Documents

driving simulator

driving performance

minute driving test

traffic violations

driving simulation

driving score

traffic lights

hazardous errors