model discrimination and parameter estimation for … · model discrimination and parameter...

Model Discrimination and Parameter Estimation for Complex Reactive Systems

Yajun Wang, Weifeng Chen, Yisu NieL. T. Biegler

Chemical Engineering DepartmentCarnegie Mellon University

Pittsburgh, PA

2

Overview

Introduction

How? Model Building Tools Direct Transcription Parameter Estimation - Nonlinear Programming Inference - NLP Sensitivity

What? Industrial Case Studies Solid-Liquid Reactions Chemical Kinetics from Spectra

Why? Process Optimization

Summary and Conclusions

3

zi,I0 zi,II

0 zi,III0 zi,IV

0

zi,IVf

zi,If zi,II

f zi,IIIf

Bi

A + B CC + B P + EP + C G

Model Building and Optimization for Complex Reactive Systems

Model Building Formulation of First Principles Models Parameter Estimation Model Discrimination and Validation

Control Optimal reference trajectories Real-time optimization

Operations Transitions Upsets Integration with logistics

4

Optimization Models based on Physics and Chemistry (First Principles)

Goal: establish predictive capability that extrapolates beyond observed conditions

Apply conservation laws at macroscopic and microscopic levels

Apply constitutive relationships at smallest available time/length scale.

Assess assumptions and adjust for missing information through parameter estimation and model validation

All models are wrongsome are useful. G.E.P. Box

ExxonMobil PROPRIETARY

5

Work Process for Model Development(www.eurokin.org)

FundamentalsThermodynamicsKinetic Databases

Microkinetic ModelsAb Initio Calculations

Create Reaction NetworkConstruct rate expressions

Initialize parameter estimation

Incorporate into Reaction System

Parameter Estimation

Model Discrimination

and TestingDesign of Experiments

Uncertainty Quantification

Experimental Design and Data

Decision-making for model development- Versatile, interactive user interface- Fast, reliable numerical tools- Integrated data, tasks and results

6

tf, final timeu(t), control variablesp, time independent parameters

t, timez(t), differential variablesy(t), algebraic variables

Dynamic Optimization Model for Reactive Systems

s.t.

8

Nonlinear Programming Problem

uL

x

xxx

xc

xfn

=

0)(s.t

)(mins.t.

9

Full-space NLP Formulation for Parameter Estimation

Original Formulation

Barrier Approach

Can generalize for

As 0, x*() x* Fiacco and McCormick (1968)

10

Solution of the Barrier Problem - IPOPT

Newton Directions (KKT System)

0 )(0 0 )()(

===+

xceXVevxAxf

Solve Reducing the System

What are the Benefits for Parameter Estimation?

11

Inertial Corrections for Factorization of KKT Matrix

Modify KKT matrix to preserve correct inertia for each Newton iteration:

1 - Correct inertia to guarantee descent direction SSOSC 2 Correct rank deficient Ak LICQ

KKT matrix factored by sparse LTBL factorization

Solution with 1= 0 primal variables unique

Solution with 1= 2= 0 primal and dual variables unique

Estimation Result with 1= 0 unique (observable) parameter estimates necessary for predictive model

Reduced Hessian available for confidence regions

++IA

AIWTk

kkk

2

1

Sensitivity of KKT Conditions

Analyze sensitivity of estimates wrt changes in data

At solution we have linearized optimality conditions

Introduce perturbations and Obtain Covariance of parameters

http://www.cheme.cmu.edu/http://www.cheme.cmu.edu/

13

For normal, unbiased distributions, linear models and known V, this probability follows a 2 distribution so that the region can be defined by:

(true-*)TV-1 (true-*) c()

c() is 2 value for level of confidence with n degrees of freedom.

Elliptical confidence regions are correct if model is linear or for small levels of confidence, .

Elliptical confidence regions - commonly used for parameter screening

nonlinear confidence regions more expensive.

principal axes of V

99%

95%

90%*

ExxonMobil PROPRIETARY

14

Model DiscriminationPostulate first principle models, Mj-Rate controlling mechanisms? Slow reactions?-What are the competing models/mechanisms?

Occams Razor: balance model simplicity with best fit

15

Case Study I: Solid-Liquid Reactions(Y. Wang)

15

Surface reaction, dissolution, diffusion - reaction on solid or in liquid phase?

Different particle shapes and sizes - reaction surface?

Product effects products growing on surface or breaking off?

Preparation

Reaction

Solvent

Solid W

Liquid X

Solvent and reactant materials

Reactor discharge

Vent

Agitator

Reactor jacket

Cooling water inlet

Cooling water outlet

W(s) + X(l) Y(s/l) + Z(s)

16

Reactant reactantreactantFluid film

blc

slc

Liquid reactant diffuses onto the particle surface

Solid-liquid reaction

Reaction

Solid product breaks off from reaction surface

Shrinking particle model

1/ 1 1/0 0

1 1 0

1/ 1 1/0 0

1 1 0

Solid: (c )

Liquid: (c )

aks s k

aks s k

EK Ka a ss s RT

sk k sk s s k lk k s

EK Ka a sl s RT

lk k lk s s k lk k s

dN aMSR N N k edt R

dN aMSR N N k edt R

= =

= =

= =

= =

Surface area Reaction rateSurface reaction rate depends on surface concentration of the liquid reactant.

17

Dissolution model

17

Reactant reactantreactant

Solid particles dissolve into solvent

Liquid liquid reaction

Products precipitate into solid phase

1/ 1 1/0 0

1 0

1/ 1 1/0 0

1 0

Solid:

Liquid:

as s

as s

EKa as s RT

sk s sk s

EKa al s RT

lk s sk s

dN aM N N k edt R

dN aM N N k edt R

=

=

=

=

Surface area Dissolution rate

Rate independent of surface concentration of the liquid reactant.

18

Batch Reactor Model

18

Surface concentration of liquid reactant

F=0 Dissolution modelF>0 Shrinking particle model

Model indicating factor

19

Lots of Data - Too Few Informative Measurements (NS = 9 Data Batches)

19

Jacket temperatures (Tcw) Inlet flowrates (Fc) Reactor weight (WR)

Reactor temperatures (TR) Endpoint Concentrations

(Ci(tf))

2020

Measured output errors

Measured input errors

Reactor temperatures End-point concentrations Jacket temperatures Weights and flowrates

Errors in Variables Measured (EVM)

+ Simultaneous parameter estimation and model solution+ Better than output data fitting-Additional inputs as decision variables-EVM has 15771 variables and 13830 equation constraints

21

Estimation results

Estimation results of the full model by EVM

21

Large reliability factors of parameter D and F Parameter UA is estimated at its upper bound

22

Estimation results

22


0 0.2 0.4 0.6 0.8 10.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Scaled time

Scal

ed te

mpe

ratu

re

Reactor temperature

PredictData

23

D: diffusion coefficient Data are mainly temperatures, less information

of bulk and surface concentrations Set Cb,X = Cs,X

F: model indicating factor Zero is contained in the confidence interval Set F = 0 dissolution model

Heat transfer coefficient UA: the largest value of heat transfer coefficient UB: the highest value of temperature UC: the shape /spread Fix parameter UA

Use a linearly temperature-dependent heat transfer coefficient

Estimation Quality Analysis

23

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

24

Posterior Probability Share

24

According to previous analysis, the batch reactor model can be simplified step by step and 5 candidate models are generated

Estimations are conducted for all candidate models and posterior probability shares are evaluated by

Model 4 has the largest posterior probability share and requires the least computational time

25

Estimation Results of Selected Model

25

Model 4 Fixed parameter D, F and UA Nonlinear heat transfer coefficient

Large reliability factors are avoided Estimability of kinetic parameters are enhanced with even smaller variances

26

Estimation Results of Selected Model

26

Model 4. Data fitting

Measured input

Measured output

Measured output

27

Model Cross Validation 9-fold cross validationa) Randomly split measured output data to

9 setsb) In each iteration, estimate parameters

by 8 sets of data and use the left dataset to do model validation

c) Repeat step b) 9 times, each dataset is used once in model validation and 8 times in estimation

27

1 2 3 4 5 6 7 8 90.09

0.095

0.1

0.105

0.11

0.115

0.12

Iteration

Par

amet

er A

3

Estimated value in each iterationAverage of 9 estimationsEstimated value by all data

1 2 3 4 5 6 7 8 9

0.76

0.78

0.8

0.82

0.84

0.86

0.88

0.9

0.92

Iteration

Par

amet

er U

B

3

Estimated value in each iterationAverage of 9 estimationsEstimated value by all data

28

Case Study II: Measured Spectra and Reaction Models (Chen, B., 2016)

TD C S E= +

, ,ntp nwp ntp nc nwp ncD R C R S R

Measurement Model

2,, (0, )

ntp nwpi jE R N

Reaction Model( ) ( ( ), ( ), )

( ( ), ( )) 0

dc t f c t y tdt

g c t y t

=

=

Instrument PrecisionInstrument AgeingBackground Noise

29

Beer-Lambert Law (D = C ST)

Real Spectra

=

WavelengthTime

Time

Wavelengthc1 c2

s1s2

ConcentrationMatrix

AbsorbanceMatrix

1 1 2 2( , ) ( ) ( )+ ( ) ( ) ... ( ) ( )i j i j i j nc i nc jd t c t s c t s c t s = + +

UV Visible 190-700 nmNear Infrared 700-3000 nm

30

Multivariate Curve Resolution (MCR)Typical Current MethodsNon-Iterative Methods

Window Factor Analysis (WFA)Subwindow Factor Analysis (SFA)

Iterative ApproachesIterative Target Transformation Factor Analysis (ITTFA)Multivariate Curve Resolution Alternating Least Squares (MCR-ALS)

Model Free MCR Combined with Model-based Kinetics

Goals Develop method for simultaneous estimation of concentrations and kinetic parameters directly from spectraDeconvolute instrument noise and system disturbancesObtain confidence regions for estimated parameters

31

Reaction Model with Disturbances (SDEs)

( ) ( ( ), ( ), )( ( ) ( , )) 0

dc t dt f c t y tg c t y t

==

[ ]1 2( ) ( ( ), ( ), ) ( ), ( ) , ,...,( ( ), ( )) 0

Tncdc t f c t y t dt dW t W t W W W

g c t y t = + =

=

0 100 200 300 400 500 600 700 800 900 1000-4

-2

0

2

4

kW Standard Brownian Motion or Wiener Process(0) 0kW =(a).

with probability 1(b).

0 ( ) ( ) ~ (0,1)k ks t T W t W s t sN < (c). 0

( ) ( ) ( ) ( )k k k k

s t u v TW t W s indep W v W u

< < <

32

SDE Model Description

,

( ) ( ( ), ( ), ) ( ) ( ( ), ( )) 0

( ), 1,.., ; 1,..,i j jT

i

dc t f c t y t dt dW tg c t y tC c t j nc

D C

t

S E

i n p

=

= + = = = =

+

Convert Stochastic DAEs to DAEs through Euler discretization

Recover an independent Wiener process with (small) Gaussian noise

Compare to exact DAE solution to extract linear perturbation terms on disturbance distribution

Simplify Jacobian Terms Apply Maximum Likelihood Principles

33

Problem Transformation

( ) ( )FP c z pc

=

=1Fc

( ) ( )P c z p =

,

( ) ( ( ), ( ), ) ( ) ( ( ), ( )) 0

( ), 1,.., ; 1,..,i j iT

j

D C S E

dc t f c t y t dt dW tg c t y tC c t j nc i ntp

= +=

=

= =

+

=

, ,1

( ) ( ( ), ( ), )( ( ), ( )) 0( ) ( )+ ( )

( ) ( )+ , 1.. , 1.. , 1. .

k i k i k inc

i j k i k j i jk

dz t dt f z t y tg z t y tc t z t t

D c t s i ntp j nwp k nc

=

==

=

= = = =

Original ProblemDescription

34

Problem Transformation

,1 1

( ) ( )ntp nwp

Ti j

i j

p D CS E p = =

= =

,1 1 1 1

( , | ) ( ) ( ( ))ntp nwp ntp nc

Ti j k i

i j i k

p D CS E c z p p t = = = =

= = =

Measurement Independence Assumption

Disturbance and Measurement Independence Assumption

( )2

22 2,

1 1 1 1 1min ( ) ( ) ( ) ( )

( ). . ( ( ), ( ), )

( ( ), ( )) 0 0, 0

ntp nwp ntpnc nc

i j k i k j k i k i ki j k i k

D c t s c t z t

dz ts t f z t y tdt

g z t y tC S

= = = = =

+

=

=

Maximum Likelihood Principle with Assumed Variance

35

Variance Initialization/Estimation Roadmap

Solve TP1

Convg.?

Solve TP2

Solve TP3

VarianceEquations

Variancesk2, 2

NoYes

( )k ic t

( )k js ( )k iz t

Apply optimality conditions to NLP for , and , Substitute to get transformed problems:

P1 for TP1, P2 for j2 = k sk(j) k2 + 2 TP2, P3 for 2 TP3

36

Variance Estimation Roadmap

Solve TP1

Convg.?

Solve TP2

Solve TP3

VarianceEquations

Variancesk2, 2

NoYes

( )k ic t

( )k js ( )k iz t



37


Solve TP1

Convg.?

Solve TP2

Solve TP3

VarianceEquations

Variancesk2, 2

NoYes

( )k ic t

( )k js ( )k iz t



38


Solve TP1

Convg.?

Solve TP2

Solve TP3

VarianceEquations

Variancesk2, 2

NoYes

( )k ic t

( )k js ( )k iz t



39




40

1

2

3

4

2

2 2

( ) ( )

( ) ( )

d

c

k

k

k

k

k

k

SA AA ASA HAASA AA ASAA HAASAA H O ASA HA

AA H O HA

SA s SA lASA l ASA s

+ +

+ +

+ +

+

( )

( )( )

2

2

1 1

2 2

3 3

4 4

( ) ( )( ) ( )( ) ( )

( ) ( )

( ) ( ) , ( ) 0

0, ( ) 0

max ( ) ( ),0

SA AA

ASA AA

ASAA H O

AA H O

dsatd SA SA SA

d

SA

csatg c ASA ASA

r k c t c tr k c t c tr k c t c t

r k c t c t

k c T c t m tr

m t

r k c t c T

===

=

=

41

Aspirin Synthesis Case

Exact Estimated Abs Error Rel Error Std Deviation

k1: 0.036031 0.036011 2.010-5 0.056% 9.610-6

k2: 0.15961 0.15967 6.810-5 0.043% 1.710-4

k3: 6.8032 7.0390 0.24 3.5% 0.13

k4: 1.8029 1.8560 0.053 2.9% 0.037

kc: 0.75669 0.76021 0.0035 0.46% 2.210-3

kd: 7.1109 7.1073 0.0035 0.049% 4.610-3

: 2.0627 2.0629 2.410-4 0.012% 3.610-4

dim(D) = 471 x 111 CPU time = 83 s, 8 iterations for variance IPOPT = 9.63 CPUs for parameter estimation

Comparison between exact and estimated parameters

42

Aspirin Synthesis Case

Typical profile (ASA) with estimated parameter values and profile bands corresponding to standard deviations

0 50 100 150 2000

0.5

1

1.5

2

2.5

Time (min)

c AS

A(m

ol/L

)

9.45 9.5 9.550.133

0.134

0.135

0.136

165.5 165.52 165.54 165.56 165.58

0.503

0.5032

0.5034

0.5036

95.1 95.2 95.3

1.5317

1.5318

1.5319

43

Recipe Optimization with Validated ModelSemi-Batch Polymer Process (Nie et al., 2013)

44

Comprehensive population balance models for MWD properties Moment models implemented and compared Operating strategies validated in plant

Semi-batch polyether polyol process

45

Polyol Dynamic Process Validation

46

Process Recipe Optimization

47

Recipe Optimization Results

48

Optimal Constraint Profiles

49

Satisfaction of Product Specifications

50

Summary and Conclusions

Parameter Estimation and Model Discrimination with First Principle Models

Maximum Likelihood Formulations Normal measurement error distributions

Optimization-based tools Parameter estimation Statistical Inference Probability Shares

Challenging case studies Model discrimination with non-informative data Deconvolute spectral distributions Validation to ensure predictive optimization models

51

Extracting Reduced Hessian from IPOPT

If dynamic system is linear with Gaussian noise, this reduces to the Kalman Smoothing equations

KKT conditionsat optimal solution

xj is the j-th column of the inverted reduced Hessian In Ipopt KKT matrix is already factorized! One back-solve per column of the covariance

1. Zavala, V. M.; Laird, C. D. & Biegler, L. T.; Journal of Process Control, 2008, 18, 876-884

Interior point solvers do not form the Reduced Hessian, can be extracted from the optimality conditions1

51

52

Apply Collocation on Finite Elements NLP

( )2

22 2,

1 1 1 1 1

0

01

min ( ) ( ) ( ) ( )

. . ( ) ( , , ) 0, 1..

( , ) 0, 1.. , 1..

( ) + ( ) ,

ntp nwp ntpnc nc

i j k i k j k i k i ki j k i k

K

m jm j jm jmm

jm jm

KKj i i i j ij

j

D c t s c t z t

s t l z h f z y j ne

g z y j ne m K

z t z h z

= = = = =

=

=

+

= =

= = =

=

1..

0, 0

j nc

C S

=

How to get the variances, and ?

53

Posterior Probability Share

Choose from candidate models by Bayes theorem

Posterior Probability[6]

Normalized posterior probability share 53

PriorPenalty for the number of parameters

Penalty for the accumulated squared errors

Estimation results of the full model by EVM

54


Full model estimation results

Full model estimation results

55

Full Model Unscaled results

2.07 05 4.05 03 7.87 02 3.244.05 03 1.04 00 1.15 01 3.217.87 02 1.15 01 1.39 03 2.16

E E E EE E E EE E E E

+ +

+ +3.24 05 3.21 03 2.16 01 1.201.14 04 1.20 02 5.82 01 3.591.73 10 4.37 08 5.28 07 1.75

E E E EE E E EE E E E

2.89 02 7.41 00 9.40 01 3.435.80 00 1.49 03 1.88 04 6.82

E E E EE E E E

+ + + + +

Inversed reduced Hessian

Eigenvector9.73 01 3.60 03 5.012.88 03 9.98 01 6.441.12 05 6.48 03 1.00

E E EE E EE E E

1.96 01 1.18 03 1.511.21 01 7.23 03 4.031.64 05 1.30 06 8.36

E E EE E EE E E

3.24 04 6.26 02 9.231.61 06 4.47 04 1.90

E E EE E E

Eigenvalue of inversed reduced Hessian HR-1

1.50 077.66 01

1.35 037.11 06

9.15 045.74 04

2.89 029.92 06

EE

EE

EE

EE

+

+

*Order of eigenvalue/eigenvector is the same as parameter order in the result table

Selected model estimation results

56

Model 4 Fixed parameter D, F and UA Nonlinear heat transfer coefficient

Large reliability factors are avoided Estimability of kinetic parameters are enhanced with even smaller variances

57

Model 4 Unscaled results

1.10 05 3.17 03 2.15 05 2.13 02 4.34 003.17 03 9.87 01 4.04 03 7.92 00 1.61 032.15 05 4.04 03 1.39 04 2.32 02 4.79 002.13 02 7.92 00 2.32 02 4.24 02 8.70 044.34 00 1.61 03 4.79 00 8.70 04

E E E E EE E E E EE E E E EE E E E EE E E E

+ + + +

+ + + + + + + 1.78 07E

+

Inversed reduced Hessian HR-1

9.98 01 3.29 03 6.88 02 1.03 03 2.43 072.96 03 9.98 01 5.74 03 5.59 02 9.05 056.89 02 5.31 03 9.98 01 4.04 03 2.69 075.83 04 5.60 02 3.78 03 9.98 01 4.88 032.84 06 3.63 04 1.77 05 4

E E E E EE E E E EE E E E EE E E E EE E E

.87 03 1.00 00E E

+

Eigenvector

2.58E-07 0 0 0 00 0.84 0 0 00 0 1.14E-04 0 0 0 0 0 2.86E-02 00 0 0 0 1.78E+07

Selected model estimation results

Eigenvalue of inversed reduced Hessian HR-1

Dynamic Optimization Approaches

DAE Optimization Problem

Multiple Shooting

Embeds DAE Solvers/SensitivityHandles instabilities

Single Shooting

Hasdorff (1977), Sullivan (1977), Vassiliadis (1994)Discretize controls

Simultaneous Collocation(Direct Transcription)

Large/Sparse NLP - Betts; B

Apply a NLP solver

Efficient for constrained problems

Simultaneous Approach

Larger NLP

Discretize state, control variables

Variational Approach

Pontryagin et al.(1956)

Bock and coworkers

Take Full Advantage of Open StructureMany Degrees of FreedomPeriodic Boundary ConditionsMulti-stage Formulations

Reduced Hessian and Covariance

We can show that the inverse of the Reduced Hessian is the smoothed covariance1

where is the null space basis of the constraint Jacobian

Changing variables for simplicity

1. Pirnay, Lopez-Negrete, & Biegler, Optimal Sensitivity with IPOPT, Math Prog Comp, 201459

Simultaneous Estimation Comparison of WLS and EVM

60

0 0.2 0.4 0.6 0.8 1Scaled time

Reactor temperature

PredictData

0 0.2 0.4 0.6 0.80.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Scaled timeSc

aled

tem

pera

ture

Reactor temperature

PredicData

WLS EVM

Fitting by EVM is much better than it by WLSAccumulated squared errors of EVM is reduced by 44% compared with WLS

Model Discrimination and Parameter Estimation for Complex Reactive SystemsOverviewModel Building and Optimization for Complex Reactive SystemsOptimization Models based on Physics and Chemistry (First Principles)Work Process for Model Development(www.eurokin.org)Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Inertial Corrections for Factorization of KKT Matrix Sensitivity of KKT ConditionsSlide Number 13Model DiscriminationCase Study I: Solid-Liquid Reactions (Y. Wang)Slide Number 16Slide Number 17Batch Reactor Model Lots of Data - Too Few Informative Measurements (NS = 9 Data Batches)Slide Number 20Estimation resultsEstimation resultsEstimation Quality AnalysisPosterior Probability ShareEstimation Results of Selected ModelEstimation Results of Selected ModelModel Cross ValidationSlide Number 28Slide Number 29Slide Number 30Slide Number 31Slide Number 32Slide Number 33Slide Number 34Slide Number 35Slide Number 36Slide Number 37Slide Number 38Slide Number 39Slide Number 40Slide Number 41Slide Number 42Slide Number 43Slide Number 44Slide Number 45Process Recipe OptimizationSlide Number 47Optimal Constraint ProfilesSatisfaction of Product SpecificationsSummary and ConclusionsExtracting Reduced Hessian from IPOPTSlide Number 52Posterior Probability ShareFull model estimation resultsFull model estimation resultsSelected model estimation resultsSlide Number 57Slide Number 58Reduced Hessian and CovarianceSimultaneous Estimation

model discrimination and parameter estimation for … · model discrimination and parameter...

Documents