department of mathematics and computer science 1212 2ds01 statistics 2 for chemical engineering...
TRANSCRIPT
department of mathematics and computer science
2DS01
Statistics 2 for Chemical
Engineering
http://www.win.tue.nl/~sandro/2DS01
department of mathematics and computer science
Lecturers
• Marko Boon ([email protected])
• Dr. A. Di Bucchianico ([email protected])
• Ir. G.D. Mooiweer ([email protected])
• Drs. C.M.J. Rusch – Groot ([email protected])
department of mathematics and computer science
Important to remember
• Web site for this course: http://www.win.tue.nl/~sandro/2DS01/
• No textbook, but handouts + Powerpoint sheets through web site
• Bring notebook to fourth lecture (12th of April) and self-study
• Software:
– Statgraphics (version 5.1). If not installed, install through
http://w3.tue.nl/nl/diensten/dienst_ict/organisatie/groepen/wins/campus_software/
– Java (at least version 1.4). Install through http://java.com.
Java is needed to run Statlab (http://www.win.tue.nl/statlab).
Important: In order to run Statlab during the exams, security settings have to
be adjusted!
department of mathematics and computer science
Goals of this course
• teach students need for statistical basis of
experimentation
• teach students statistical tools for experimentation
– design of experiments (factorial designs, optimal designs)
– analysis of experiments (ANOVA)
– use of statistical software
• give students short introduction to recent
developments
department of mathematics and computer science
Week schedule
Week 1: Introduction to Analysis of Variance
(ANOVA)
Week 2: Factorial designs: screening
Week 3: Factorial designs: optimisation
Week 4: Optimal experimental design
and mixture designs (by A. Di Bucchianico
– Bring your laptop!)
department of mathematics and computer science
Detailed contents of week 1
• statistics and experimentation
• short recapitulation of regression analysis
• one-way ANOVA
• one-way ANOVA with blocks
• multiple comparisons
department of mathematics and computer science
Statistics and experimentation
Chemical experiments often depend on several
factors (pressure, catalyst, temperature, reaction
time, ...)
Two important questions:
• which factors are really important?
• what are optimal settings for important factors?
department of mathematics and computer science
Use of statistical experimentation in chemical engineering
•Chemical synthesis (synthetic steps; work up and separation;
reagents, solvents, catalysts; structure, reactivity and
properties, ...)
•Biotech industry (drug design, analytical biochemistry, process
optimization – fermentation, purification ,...)
•Process industry (process optimization and control -yield, purity,
through put time, pollution, energy consumption; product quality
and performance - material strength, warp, color, taste, odour; ...)
• ...
department of mathematics and computer science
Short history of statistics and experimentation
• 1920’s - ... introduction of statistical methods in
agriculture by Fisher and co-workers
• 1950’s - ... introduction in chemical engineering
(Box, ...)
• 1980’s - ... introduction in Western industry of Japanese
approach (Taguchi, robust design)
• 1990’s - ... combinatorial chemistry, high througput
processing
department of mathematics and computer science
Link to Statistics 1 for Chemical Engineering
• introduction to measurements
– data analysis
– error propagation
• regression analysis
• use of statistical software (Statgraphics)
department of mathematics and computer science
Types of regression analysis
Linear means linear in coefficients, not linear functions!
•Simple linear regression
•Multiple linear regression
• Non-linear regression
0 1Y x
0 1 1 2 2 ...Y x x
21Y C
department of mathematics and computer science
Model:
ssumptions:
• the model is linear (+ enough terms)
• the i's are normally distributed with =0 and
variance 2
• the i's are independent.
Linear regression
0 1 1 2 2 ...i i i iY x x
department of mathematics and computer science
Specific warmth
•specific warmth of vapour at constant pressure as function of
temperature
•data set from Perry’s Chemical Engineers’ Handbook
• thermodynamic theories say that quadratic relation between
temperature and specific warmth usually suffices:
2210 TTC p
department of mathematics and computer science
Scatter plot of specific warmth data
Plot of Cp vs T
T
Cp
250 300 350 4001800
1900
2000
2100
2200
department of mathematics and computer science
Regression output specific warmth data
Polynomial Regression Analysis-----------------------------------------------------------------------------Dependent variable: Cp----------------------------------------------------------------------------- Standard TParameter Estimate Error Statistic P-Value-----------------------------------------------------------------------------CONSTANT 3590.36 76.3041 47.0533 0.0000T -12.1386 0.454369 -26.7153 0.0000T^2 0.0213415 0.000670762 31.8169 0.0000-----------------------------------------------------------------------------
Analysis of Variance-----------------------------------------------------------------------------Source Sum of Squares Df Mean Square F-Ratio P-Value-----------------------------------------------------------------------------Model 169252.0 2 84626.2 6227.13 0.0000Residual 285.388 21 13.5899-----------------------------------------------------------------------------Total (Corr.) 169538.0 23
R-squared = 99.8317 percentR-squared (adjusted for d.f.) = 99.8156 percentStandard Error of Est. = 3.68645Mean absolute error = 2.94042Durbin-Watson statistic = 0.310971 (P=0.0000)Lag 1 residual autocorrelation = 0.640511
department of mathematics and computer science
Issues in regression output
• significance of model
• significance of individual regression parameters
• residual plots:
– normality (density trace, normal probability plot)
– constant variance (against predicted values + each independent
variable)
– model adequacy (against predicted values)
– outliers
– independence
• influential points
department of mathematics and computer science
Residual plot specific warmth data
This behaviour is visible in plot of fitted line only after rescaling!
Residual Plot
predicted Cp
Stu
dentized r
esid
ual
1800 1900 2000 2100 2200-3.8
-1.8
0.2
2.2
4.2
department of mathematics and computer science
Plot of fitted quadratic model for specific warmth data
Plot of Fitted Model
T
Cp
250 300 350 400 4501800
1900
2000
2100
2200
department of mathematics and computer science
Conclusion regression models for specific warmth data
• we need third order model (polynomial of degree
3)
• careful with extrapolation
• original data set contains influential points
• original data set contains potential outliers
department of mathematics and computer science
Analysis of variance
• name refers to mathematical technique, not to
goal
• comparison of means (!!) using variances
(extension of t-test to more than 2 samples)
• samples usually are groups of measurements with
constant factor settings
department of mathematics and computer science
Example: ANOVA
production of yarns: influence of fibre composition on
breaking tension
simplification:
one factor: % cotton
fixed factor levels: 15%, 20%, 25%, 30%, 35%
experimental design: produce on the same machine 5
threads of each type of fibre composition in random
order
department of mathematics and computer science
Statistical setting
Basis model: Yij = + i + ij
influencefactor levels
i=1,2,…k
error term:• normal =0, 2
• independent
replicationsj=1,2,…,n
• Basis hypotheses:H0: i = 0 for all iH1: i 0 for at least one i
overallmean
department of mathematics and computer science
Expectation under H0 (= no effect of factor level)
spread observations with respect to group
means
spread group means with respect to overall
meanchance
department of mathematics and computer science
Expectation under H1
spread observations with respect to
group means
chance
systematicspread group means with respect to
overall mean
department of mathematics and computer science
Illustration of group means
y
3y
2y
1y
department of mathematics and computer science
Group means versus overall mean
y
3y
2y
1y
33 yy j
yy3
yy j3
department of mathematics and computer science
Conclusion
Comparison of both spreads yields indication for H0 vs
H1.
2
1 1.
2
1...
2
1 1..
k
i
n
jiij
k
ii
k
i
n
jij yyyynyy
total treatment:between groups
rest: within groups= +
department of mathematics and computer science
Conclusion
Comparison of both spreads yields indication for H0 vs
H1.
2
1 1.
2
1...
2
1 1..
k
i
n
jiij
k
ii
k
i
n
jij yyyynyy
total treatment:between groups
rest: within groups= +
Spreads are converted into sums of squares:
department of mathematics and computer science
Mean Sums of Squares
sums of squares differ with respect to number of
contributions.
for fair comparison: divide by degrees of freedom.
• we expect under H0: MSbetween MSwithin
• we expect under H1: MSbetween >> MSwithin
summary in ANOVA table
department of mathematics and computer science
Completely Randomized One-factor DesignCompletely Randomized One-factor Design
Experiment, in which one factor varies on k levels.
At each level n measurements are taken.
The order of all measurements is random.
department of mathematics and computer science
Multiple comparisons
• ANOVA only indicates whether there are significantly different
group means
• ANOVA does not indicate which groups have different means
(although we may construct confidence intervals for differences)
• various methods exist for correctly performing pairwise
comparisons:
– LSD (Least Significant Difference) method
– HSD (Honestly Significant Difference) method
– Duncan
– Newman – Keuls
– ...
department of mathematics and computer science
Randomized one-factor block designRandomized one-factor block design
In each block all treatments occur equally often;randomization within blocks
Experiment with one factor and observations in blocks
Blocks are levels of noise factor.
department of mathematics and computer science
Example
testing method for material hardness :
forcepressure pin/tip
strip testing material
practical problem: 4 types of pressure pins do these yield the same results?
department of mathematics and computer science
Experimental design 1
1234
5678
9101112
13141516
pin 1 pin 2 pin 4pin 3
testingstrips
Problem: if the measurements of strips 5 through 8 differ, is
this caused by the strips or by pin 2?
department of mathematics and computer science
Experimental design 2
Take 4 strips on which you measure (in random
order) each pressure pin once :
1324
1432
4321
2314
strip 1 strip 2 strip 4strip 3
pressurepins
department of mathematics and computer science
Blocking
Advantage of blocked experimental design 2:
differences between strips are filtered out
Model: Yij = + i + j + ij
• Primary goal: reduction error term
factorpressure pin
block effectstrip
error term
department of mathematics and computer science
Summary
• completely randomized design
• randomized block design
• multiple comparisons
Reading material:
• Statgraphics lecture notes section 4.1 through 4.3
•
http://www.acc.umu.se/~tnkjtg/chemometrics/editorial/aug2002.htm
l