statistics in science statistical analysis & design in research structure in the experimental...
Post on 22-Dec-2015
218 Views
Preview:
TRANSCRIPT
Statistics
in
Science
Statistics
in
Science
Statistical Analysis & Design in Research
Structure in theExperimental Material
PGRM 10
Statistics
in
Science
Blocking – the idea
Detecting differences between treatments depends on the background noise (BN)
• BN is:– caused by inherent differences between the
experimental units– measured by the residual (error) mean square RMS
(alternatively! MSE)
• Comparing treatments on similar units would reduce background noise
• With blocks of units of differing contributing characteristics we measures the variation due to blocks and reduce residual variation
Statistics
in
Science
Blocking – the benefit
Reducing background noise:
• Gives more precise estimates
• Allows a reduction in replication, without loss of
power
(the probability of detecting an effect of a specified
size)
• Reduces cost!
Statistics
in
Science
Blocking and experimental materialExamples
1. A field: with fertility increasing from top to bottomWith 3 treatments group plots into BLOCKS of 3, starting at top and continuing to bottom.Randomise treatments within each block
Statistics
in
Science
Block Design
How many replicates per treatment?
What is the experimental unit?
Treat Blk A B C 1 T1 T3 T2 2 T3 T2 T1 3 T2 T1 T3 4 T1 T2 T3 5 T3 T1 T2 6 T1 T2 T3
What is the block?
Statistics
in
Science
Example
• 2 drugs (A, B) to control blood pressure
• 100 subjects – randomly assign 50 each to A and B
• Valid - but is it efficient?
• If subjects are heterogenous - likely to be a large variation (2) in the responses within each group.
• Design may not be very efficient.
Statistics
in
Science
Blocking and experimental material1. 100 subjects are selected to compare new drug to
control BP with a Control
Block into pairs by age & weight (believed to affect BP)
In each pair one is selected at random to receive the new drug, the other receives Control
Alternatively – see next slide
Statistics
in
Science
Groups (Blocks)
Age Sex Weight # T1 T2 >50 Male H 15 >50 Male N 11 >50 Male L 12 >50 Female H 11 >50 Female N 9 >50 Female L 13 <50 Male H 7 <50 Male N 2 <50 Male L 5 <50 Female H 4 <50 Female N 8 <50 Female L 3 Total 100
Statistics
in
Science
Groups (Blocks)
Age Sex Weight # T1 T2 >50 Male H 15 8 7 >50 Male N 11 5 6 >50 Male L 12 6 6 >50 Female H 11 5 6 >50 Female N 9 5 4 >50 Female L 13 6 7 <50 Male H 7 4 3 <50 Male N 2 1 1 <50 Male L 5 2 3 <50 Female H 4 2 2 <50 Female N 8 4 4 <50 Female L 3 2 1 Total 100 50 50
Statistics
in
Science
Blocking and experimental materialExamples
1. A field: with fertility increasing from top to bottomWith 3 treatments group plots into BLOCKS of 3, starting at top and continuing to bottom.Randomise treatments within each block
2. 100 subjects are selected to compare new drug to control BP with a ControlBlock into pairs by age & weight (believed to affect BP)In each pair one is selected at random to receive the new drug, the other receives Control
3. 3 products to be compared in 15 supermarkets:All 3 compared in each supermarket, regarded as BLOCKS
Statistics
in
Science
Blocking and experimental materialExamples (contd)
4. A crop experiment will take 5 days to harvest.The material is blocked into 5 sets of plots, and treatments assigned at random within each setA BLOCK of plots is harvested each day
Here: day effects, such as rain etc will be allowed for in the ANOVA table, not clouding the estimation of treatment effects, and reducing residual variation.
Statistics
in
Science
Reasons to BLOCK
1. Reduce BN (as above)
2. Material is naturally blocked (eg identical twins)
so using this a part of the design may reduce BN
3. To protect against factors that may influence the
experimental outcomes, and so cloud comparison
of treatments
4. To assess block variation itself
eg day to day variation large may indicate a
process that is not well controlled.
Statistics
in
Science
Typical Randomised Block Design (RBD) Layout
Block
1 T3 T1 T2 T4
2 T2 T3 T1 T4
3 T1 T2 T3 T4
4 T2 T4 T1 T3
5 T4 T2 T3 T1
6 T3 T1 T4 T2
4 treatments T1 – T4 BLOCKS of size 4
Example of random allocation within blocks:
Statistics
in
Science
ANOVA table
Source DF SS MS F Pr > F
Treatments t – 1 TSS TMS TMS/RMS Small?
Blocks b – 1 BSS BMS BMS/RMS Small?
Residual (t-1)(b-1) RSS RMS
Total tb - 1
each treatment occurs once in each blockt treatmentsb blockstb experimental units
MS = SS/DF
Statistics
in
Science
ExamplePGRM pg 10-2
Compare effect of washing solution used in retarding bacterial growth in food processing containers.
Only 3 trials can be run each day, and temperature is not controlled so day to day variability is expected.
BLOCKS: day
Treatments: 2%, 4%, 6% of active ingredient
Randomisation: 3 containers randomly allocated to 3 treatments on each of 4 days.
Response: bacterial count on each container each day (low score = cleaner)
Statistics
in
Science
Example (contd)
DaySolution(
%) Count
1 2 13
1 4 10
1 6 5
2 2 18
2 4 20
2 6 6
3 2 18
3 4 17
3 6 7
4 2 30
4 4 31
4 6 10
Day,Solution(%),Count1,2,131,4,101,6,52,2,182,4,20...
Note:Response values in a single columnExtra column to identify
BLOCK (day)TREATMENT (solution)
csvExcel
Statistics
in
Science
SAS GLM codeproc glm data = randb;
class solution day;
model score = solution day;
lsmeans solution;
lsmeans day;
estimate ‘2-6’ solution 1 0 -1;
estimate ‘linear ok?’ solution1 -2 1;
quit;
Statistics
in
Science
GLM OUTPUT: ANOVA
Source DF Sum of
Squares Mean Square F Value Pr > F
Model 5 748.08 149.6 11.68 0.0048
Error 6 76.8 12.8
Corrected Total 11 824.9
Source DF Type I SS Mean Square F Value Pr > F
solution 2 425.17 212.58 16.60 0.0036
Day 3 322.92 107.64 8.41 0.0144
425.17 + 322.92 =
748.09
So the Model SS has been partitioned into TREATMENT (solution) and BLOCK (Day)
Statistics
in
Science
GLM OUTPUT: means
solution score LSMEAN
2 19.75
4 19.5
6 7.0
Parameter Estimate Standard
Error t Value Pr > |t|
2-6 12.75 2.530 5.04 0.0024
linear ok? -12.25 4.383 -2.80 0.0314
Statistics
in
Science
ANOVA table
Source SS df MS F P
Days 425 ? 213 18.60 0.004
Solution 323 ? 108 8.41 0.014
Residual 76.8 ? 12.8
Solution 2 4 6 SED
19.8 19.5 7.0 2.53
Day 1 2 3 4 SED
9.3 14.7 14.0 23.7 2.92
Statistics
in
Science
Latin Square design – blocking by 2 Sources of variation
Variation in milk yield among cows is large (CV% = 25)
Variation in Yield across lactation is large
Use different treatments in sequence on each cow
Need to allow for a standardisation period (1-2) weeks between treatments
Lactation yield pattern
0
200
400
600
800
0 2 4 6 8 10
Month
Yie
ld (
kg)
Statistics
in
Science
Data
Cow
Period 1 2 3 4
1 T2 T1 T3 T4
2 T4 T2 T1 T3
3 T3 T4 T2 T1 4 T1 T3 T4 T2
Milk yield (kg/day) Cow
Period 1 2 3 4 1 9.7 14.0 20.2 20.9 2 15.1 20.3 17.8 24.3 3 16.4 20.1 21.3 21.5 4 11.8 19.1 21.3 20.6
Period Cow Treat yield 1 1 2 9.7 2 1 4 15.1 3 1 3 16.4
4 1 1 11.8 1 2 1 14.0
2 2 2 20.3
….
Columns for period,cow and
treatment codes
Statistics
in
Science
SAS GLM codeproc glm data = latinsq;
class period cow treat;
model yield = period cow treat;
lsmeans treat;
lsmeans period;
lsmeans cow;
estimate ‘1v2’ treat 1 -1 0 0 ;
Run;
Statistics
in
Science
Results
Treat Period Cow 1 16.28 16.21 13.24 2 17.98 19.37 18.38 3 20.01 19.82 20.16 4 19.33 18.18 21.82
SED 0.775 0.775 0.775
Source DF SS MS F P Period 3 31.2 10.41 8.68 0.013 Cow 3 165.8 55.28 46.06 0.000 Treat 3 32.5 10.82 9.01 0.012 Error 6 7.2 1.20
Means
Cow and Period removed much variation
Statistics
in
Science
Conclusions on Latin square design
CV greatly reduced to 6% - When the effect of period is allowed for, repeated measurements within a cow are not very variable.
Periods and cows are nuisance variables. Sometimes the row and column variables are of interest in themselves and so design is very efficient – information on 3 factors. (e.g. treatments, machines, operators).
Useful for screening but questionable whether short term results would apply for the long term.
top related