introductory statistics for laboratorians dealing with high throughput data sets

29
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Upload: jerod

Post on 23-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Introductory Statistics for Laboratorians dealing with High Throughput Data sets. Centers for Disease Control. Effects of Antihistamine use on Driving. Antihistamines cause drowsiness and the box says you should not operate machinery while using. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Introductory Statistics for Laboratorians dealing with High

Throughput Data sets

Centers for Disease Control

Page 2: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Effects of Antihistamine use on Driving

• Antihistamines cause drowsiness and the box says you should not operate machinery while using.

• We will design an experiment to directly test the effects of Antihistamine Dose on driving performance.

• We will inject the drug to control for absorption rates. • Adjust dose by body weight• Use 10 different dosages (dosages are in milligrams per

kilogram body weight).– 0, 10, 15, 20, 25, 30, 35, 40, 45, 50

Page 3: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Method

• Testing will be done at a nice resort hotel.• Individuals will be given a 10 minute practice

session in the simulator and a review of traffic laws and the instructions.

• Individuals will be injected with the drug. • After 20 minutes they will enter the driving

simulator and perform a 30 minute driving test.

Page 4: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

• The number of accidents and the number of concentration faults will be recorded.

• Their driving score is printed out by the simulator at the end of the run.

• The bigger the number the worse they did.• For safety, they will then be held at the test

facility overnight before being released.

Page 5: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

• Placebo Control Group– The group that gets 0 mcg/kg is the control group– They will be injected with saline – placebo control

group – they will not know it wasn’t the antihistamine.

• Double Blind– The injections for each test will be prepared in a

secure location and transported to the test site. Neither the person giving the injection or the person receiving the injection will know the dose.

Page 6: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

• We will have 10 groups of subjects• Each group will consist of 15 individuals

– n = 15, N = 150• Subjects will be randomly assigned to groups

by persons at the secure site to maintain the double blind.

Page 7: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

• The study is funded.• We order the antihistamine from the drug

company that produces the drug. • We make a big deal of the fact thatone of the

vials should be a placebo that is indistinguishable (color, viscosity, etc.) from the real drug. The drug company has a fit and charges extra to develop the placebo.

• 10 vials come in and are stored in the safe at the lab

Page 8: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

• Through some misunderstanding at the drug company, they send us 10 vials of the placebo and none of the drug.

• The mistake goes completely undetected. • That means that in this special case we know

for a fact that the Null Hypothesis is true.

Page 9: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Hypotheses

• H0: The injected chemical had no effect on the driving scores

• HA: The injected chemical had a dose related effect on the driving scores (higher dose > worse scores)

• Region of Rejection: Alpha = .05

Page 10: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Results

• In a standardization study run by the driving simulator manufacturer, several thousand individuals were run through the simulator.

• They were not on any drug. • Driving Score in the population

– Mean = 35– Standard Deviation = 10

• Our Null Hypotheses is basically the hypothesis that our 10 groups could all be samples from a population with a mean of 35 and a standard deviation of 10.

Page 11: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Effect of Antihistamine on DrivingData Table

Grp1 Grp 2 Grp3 Grp4 Grp5 Grp6 Grp7 Grp8 Grp9 Grp10Dose (mcg/kg) 0 10 15 20 25 30 35 40 45 50

22 43 23 33 46 40 47 16 26 4123 36 33 39 33 39 34 32 23 2833 57 49 44 34 24 20 38 41 2542 25 47 28 25 28 30 40 53 4023 27 32 33 46 38 58 21 43 4728 29 48 44 44 33 27 38 49 3741 28 28 34 24 44 34 18 33 5622 22 40 37 35 28 32 33 30 2540 38 30 30 28 41 20 25 44 4921 43 26 6 28 46 37 34 26 2124 51 29 47 39 26 34 43 27 4232 34 51 41 45 31 33 31 50 2428 33 40 33 26 26 27 28 25 3029 41 39 35 39 43 47 36 44 3427 28 46 13 44 40 31 29 40 43

Mean 29.00 35.67 37.40 33.13 35.73 35.13 34.07 30.80 36.93 36.13SD 7.19 9.93 9.31 11.09 8.16 7.45 10.15 8.04 10.26 10.46

Page 12: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Error

• What accounts for the differences among the 10 groups, since they all got exactly the same injection?

Page 13: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Error Rate

• We already know how to use a t-test for independent samples to test whether two means are significantly different.

• Here we have 10 independent samples. • Can we just use t-tests to test them in pairs?

Page 14: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Types of ErrorTruth about Population from which sample came

H0 True H0 False

Decision Basedon Sample

Reject H0 Type IError (Alpha)

CorrectDecision

Fail ToReject H0

CorrectDecision

Type IIError (Beta)

Page 15: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Error Rate

• With 10 groups, there would be 45 pairs of means to test.

• If we test each pair of means at alpha = .05 we are allowing 5 chances in 100 of being wrong (type I error) on each test.

• In 45 tests that would virtually guarantee that some of the means would test significantly different by chance alone

• This is very bad.

Page 16: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Error Rate

• One of the big problems in Science is to control the error rate in our experiments

• Experimentwise Error Rate– Ethical standards of Science require researchers to

keep the error rate for each experiment to a reasonable level.

• We need an overall test that would tell us if anything is going on

Page 17: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

This is our situation (except 10 samples)

Page 18: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

• If the Null Hypothesis is true, what we have is 10 random samples all from the same population.

• What does the Central Limit Theorem tell us about this situation?

Page 19: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Central Limit Theorem

• Given any population (with any distribution, normal or otherwise) with mean μ and variance σ2 , as the sample size increases the sampling distribution of the mean

1. Approaches a normal distribution with2. Mean μ and 3. Variance

Npop2

Page 20: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

• We have 10 samples from a population with a mean of 35 and a standard deviation of 10.

• Specify the mean and standard error for the sampling distribution of means of samples of size 15.

Page 21: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

• Central Limit Theorem Tells Us:•

35X

58.267.6 X

67.6151002 X

Page 22: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Compute the Following

• Use Excel to compute: • The Mean of our 10 means• The Variance of our 10 means• The Standard Deviation (standard error) of our

10 means.

Page 23: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

H0: True

• Group means should be clustered close around population mean

Page 24: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

H0: False

• Some of the Group means will be large, not clustered closely around the mean.

Page 25: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

• If the Null Hypothesis is false, the variance of the means will be significantly larger than predicted by the central limit theorem.

• We can test this using Analysis of Variance

Page 26: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Analysis of VarianceANOVA

Driving Score

Sum of Squares df Mean Square F Sig.

Between Groups992.667 9 110.296 1.276 0.255

Within Groups12103.333 140 86.452

Total13096 149

There is no significant difference between the variance of our means and the variance that would be predicted on the basis of error alone – ie: on the variance predicted from the central limit theorem.

The ANOVA estimates the error variance to be 86.452. The variance of the group means is 110.296.The probability of this happening is 25.5 times in 100. Our region of rejection was 5 times in 100. We cannot reject the Null Hypothesis

Page 27: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Error Rate

• We have made a single comparison– Compare variance of means to an estimate of the

variance of the population– At alpha - .05 level.– The experimentwise error rate is 5 chances in 100.

Page 28: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Post-hoc / follow-up tests

• When an Analysis of Variance is significant it tells us there is more variance in the means than there should be if the null hypothesis is true but it doesn’t tell us which means are significantly different from which others.

• Use Post-hoc or follow-up tests to find out. • Must control the error rate for the follow-up

tests

Page 29: Introductory Statistics  for  Laboratorians  dealing with High Throughput Data sets

Bonferroni Adjustment

• Control the error rate for multiple comparisons by making sure the total error rate for all comparisons adds up to the selected alpha.

• Example: to compare the means of 4 groups using t-tests with a total alpha of .05.

• 4 groups is 6 comparisons – 1 – 2, 1 – 3, 1 – 4, 2 – 3, 2 – 4, 3 - 4

• Use alpha = .05/6 = .00833 for each.