department of mathematics and computer science 1212 2ds01 statistics 2 for chemical engineering...

37
department of mathematics and computer science 2DS01 Statistics 2 for Chemical Engineering http://www.win.tue.nl/~sandro/2DS01

Upload: arron-james

Post on 11-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

2DS01

Statistics 2 for Chemical

Engineering

http://www.win.tue.nl/~sandro/2DS01

Page 2: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Lecturers

• Marko Boon ([email protected])

• Dr. A. Di Bucchianico ([email protected])

• Ir. G.D. Mooiweer ([email protected])

• Drs. C.M.J. Rusch – Groot ([email protected])

Page 3: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Important to remember

• Web site for this course: http://www.win.tue.nl/~sandro/2DS01/

• No textbook, but handouts + Powerpoint sheets through web site

• Bring notebook to fourth lecture (12th of April) and self-study

• Software:

– Statgraphics (version 5.1). If not installed, install through

http://w3.tue.nl/nl/diensten/dienst_ict/organisatie/groepen/wins/campus_software/

– Java (at least version 1.4). Install through http://java.com.

Java is needed to run Statlab (http://www.win.tue.nl/statlab).

Important: In order to run Statlab during the exams, security settings have to

be adjusted!

Page 4: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Goals of this course

• teach students need for statistical basis of

experimentation

• teach students statistical tools for experimentation

– design of experiments (factorial designs, optimal designs)

– analysis of experiments (ANOVA)

– use of statistical software

• give students short introduction to recent

developments

Page 5: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Week schedule

Week 1: Introduction to Analysis of Variance

(ANOVA)

Week 2: Factorial designs: screening

Week 3: Factorial designs: optimisation

Week 4: Optimal experimental design

and mixture designs (by A. Di Bucchianico

– Bring your laptop!)

Page 6: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Detailed contents of week 1

• statistics and experimentation

• short recapitulation of regression analysis

• one-way ANOVA

• one-way ANOVA with blocks

• multiple comparisons

Page 7: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Statistics and experimentation

Chemical experiments often depend on several

factors (pressure, catalyst, temperature, reaction

time, ...)

Two important questions:

• which factors are really important?

• what are optimal settings for important factors?

Page 8: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Use of statistical experimentation in chemical engineering

•Chemical synthesis (synthetic steps; work up and separation;

reagents, solvents, catalysts; structure, reactivity and

properties, ...)

•Biotech industry (drug design, analytical biochemistry, process

optimization – fermentation, purification ,...)

•Process industry (process optimization and control -yield, purity,

through put time, pollution, energy consumption; product quality

and performance - material strength, warp, color, taste, odour; ...)

• ...

Page 9: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Short history of statistics and experimentation

• 1920’s - ... introduction of statistical methods in

agriculture by Fisher and co-workers

• 1950’s - ... introduction in chemical engineering

(Box, ...)

• 1980’s - ... introduction in Western industry of Japanese

approach (Taguchi, robust design)

• 1990’s - ... combinatorial chemistry, high througput

processing

Page 10: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Link to Statistics 1 for Chemical Engineering

• introduction to measurements

– data analysis

– error propagation

• regression analysis

• use of statistical software (Statgraphics)

Page 11: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Types of regression analysis

Linear means linear in coefficients, not linear functions!

•Simple linear regression

•Multiple linear regression

• Non-linear regression

0 1Y x

0 1 1 2 2 ...Y x x

21Y C

Page 12: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Model:

ssumptions:

• the model is linear (+ enough terms)

• the i's are normally distributed with =0 and

variance 2

• the i's are independent.

Linear regression

0 1 1 2 2 ...i i i iY x x

Page 13: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Specific warmth

•specific warmth of vapour at constant pressure as function of

temperature

•data set from Perry’s Chemical Engineers’ Handbook

• thermodynamic theories say that quadratic relation between

temperature and specific warmth usually suffices:

2210 TTC p

Page 14: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Scatter plot of specific warmth data

Plot of Cp vs T

T

Cp

250 300 350 4001800

1900

2000

2100

2200

Page 15: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Regression output specific warmth data

Polynomial Regression Analysis-----------------------------------------------------------------------------Dependent variable: Cp----------------------------------------------------------------------------- Standard TParameter Estimate Error Statistic P-Value-----------------------------------------------------------------------------CONSTANT 3590.36 76.3041 47.0533 0.0000T -12.1386 0.454369 -26.7153 0.0000T^2 0.0213415 0.000670762 31.8169 0.0000-----------------------------------------------------------------------------

Analysis of Variance-----------------------------------------------------------------------------Source Sum of Squares Df Mean Square F-Ratio P-Value-----------------------------------------------------------------------------Model 169252.0 2 84626.2 6227.13 0.0000Residual 285.388 21 13.5899-----------------------------------------------------------------------------Total (Corr.) 169538.0 23

R-squared = 99.8317 percentR-squared (adjusted for d.f.) = 99.8156 percentStandard Error of Est. = 3.68645Mean absolute error = 2.94042Durbin-Watson statistic = 0.310971 (P=0.0000)Lag 1 residual autocorrelation = 0.640511

Page 16: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Issues in regression output

• significance of model

• significance of individual regression parameters

• residual plots:

– normality (density trace, normal probability plot)

– constant variance (against predicted values + each independent

variable)

– model adequacy (against predicted values)

– outliers

– independence

• influential points

Page 17: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Residual plot specific warmth data

This behaviour is visible in plot of fitted line only after rescaling!

Residual Plot

predicted Cp

Stu

dentized r

esid

ual

1800 1900 2000 2100 2200-3.8

-1.8

0.2

2.2

4.2

Page 18: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Plot of fitted quadratic model for specific warmth data

Plot of Fitted Model

T

Cp

250 300 350 400 4501800

1900

2000

2100

2200

Page 19: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Conclusion regression models for specific warmth data

• we need third order model (polynomial of degree

3)

• careful with extrapolation

• original data set contains influential points

• original data set contains potential outliers

Page 20: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Analysis of variance

• name refers to mathematical technique, not to

goal

• comparison of means (!!) using variances

(extension of t-test to more than 2 samples)

• samples usually are groups of measurements with

constant factor settings

Page 21: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Example: ANOVA

production of yarns: influence of fibre composition on

breaking tension

simplification:

one factor: % cotton

fixed factor levels: 15%, 20%, 25%, 30%, 35%

experimental design: produce on the same machine 5

threads of each type of fibre composition in random

order

Page 22: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Statistical setting

Basis model: Yij = + i + ij

influencefactor levels

i=1,2,…k

error term:• normal =0, 2

• independent

replicationsj=1,2,…,n

• Basis hypotheses:H0: i = 0 for all iH1: i 0 for at least one i

overallmean

Page 23: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Expectation under H0 (= no effect of factor level)

spread observations with respect to group

means

spread group means with respect to overall

meanchance

Page 24: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Expectation under H1

spread observations with respect to

group means

chance

systematicspread group means with respect to

overall mean

Page 25: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Illustration of group means

y

3y

2y

1y

Page 26: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Group means versus overall mean

y

3y

2y

1y

33 yy j

yy3

yy j3

Page 27: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Conclusion

Comparison of both spreads yields indication for H0 vs

H1.

2

1 1.

2

1...

2

1 1..

k

i

n

jiij

k

ii

k

i

n

jij yyyynyy

total treatment:between groups

rest: within groups= +

Page 28: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Conclusion

Comparison of both spreads yields indication for H0 vs

H1.

2

1 1.

2

1...

2

1 1..

k

i

n

jiij

k

ii

k

i

n

jij yyyynyy

total treatment:between groups

rest: within groups= +

Spreads are converted into sums of squares:

Page 29: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Mean Sums of Squares

sums of squares differ with respect to number of

contributions.

for fair comparison: divide by degrees of freedom.

• we expect under H0: MSbetween MSwithin

• we expect under H1: MSbetween >> MSwithin

summary in ANOVA table

Page 30: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Completely Randomized One-factor DesignCompletely Randomized One-factor Design

Experiment, in which one factor varies on k levels.

At each level n measurements are taken.

The order of all measurements is random.

Page 31: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Multiple comparisons

• ANOVA only indicates whether there are significantly different

group means

• ANOVA does not indicate which groups have different means

(although we may construct confidence intervals for differences)

• various methods exist for correctly performing pairwise

comparisons:

– LSD (Least Significant Difference) method

– HSD (Honestly Significant Difference) method

– Duncan

– Newman – Keuls

– ...

Page 32: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Randomized one-factor block designRandomized one-factor block design

In each block all treatments occur equally often;randomization within blocks

Experiment with one factor and observations in blocks

Blocks are levels of noise factor.

Page 33: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Example

testing method for material hardness :

forcepressure pin/tip

strip testing material

practical problem: 4 types of pressure pins do these yield the same results?

Page 34: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Experimental design 1

1234

5678

9101112

13141516

pin 1 pin 2 pin 4pin 3

testingstrips

Problem: if the measurements of strips 5 through 8 differ, is

this caused by the strips or by pin 2?

Page 35: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Experimental design 2

Take 4 strips on which you measure (in random

order) each pressure pin once :

1324

1432

4321

2314

strip 1 strip 2 strip 4strip 3

pressurepins

Page 36: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Blocking

Advantage of blocked experimental design 2:

differences between strips are filtered out

Model: Yij = + i + j + ij

• Primary goal: reduction error term

factorpressure pin

block effectstrip

error term

Page 37: department of mathematics and computer science 1212 2DS01 Statistics 2 for Chemical Engineering sandro/2DS01

department of mathematics and computer science

Summary

• completely randomized design

• randomized block design

• multiple comparisons

Reading material:

• Statgraphics lecture notes section 4.1 through 4.3

http://www.acc.umu.se/~tnkjtg/chemometrics/editorial/aug2002.htm

l