optimal design of dynamic experiments julio r. banga iim-csic, vigo, spain [email protected] “the...
TRANSCRIPT
Optimal Design of Dynamic Experiments
Julio R. Banga
IIM-CSIC, Vigo, Spain
“The Systems Biology Modelling Cycle (supported by BioPreDyn)” EMBL-EBI (Cambridge, UK), 12-15 May 2014
Optimal Experimental Design (OED)
Introduction OED: Why, what and how? Model building cycle OED to improve model calibration
Formulation and examples OED software and references
Modelling – considerations
Models starts with questions (purpose) and level of detail
We need a priori data & knowledge to build a 1st model
We then plan and perform new experiments, obtain new data and refine the model
We repeat until stopping criterion satisfied
Modelling – considerations
We plan and perform new experiments, obtain new data and refine the model
But, how do we plan these experiments?
Optimal experimental design (OED) Model-based OED
OED – Why?
We want to build a model We want to use the model for specific
purposes (“models starts with questions”)
OED allow us to plan experiments that will produce data with rich information content
OED - What ?
We need to take into account: (i) the purpose of the model, (ii) the experimental degrees of freedom
and constraints, (iii) the objective of the OED:
Parameter estimation Model discrimination Model reduction …
OED – simple example
We want to build a 3D model of an object from 2D pictures
OED – simple example
We want to build a 3D model of an object from 2D pictures
OED – simple example
We want to build a 3D model of an object from 2D pictures (i) purpose of the model: a rough 3D
representation of the object, (ii) experimental constraints: we can only take
2D snapshots (iii) degrees of freedom: pictures from any
angle “Experiment” with minimum number of
pictures?
OED – simple example
OED – basic considerations
There is a minimum amount of information in the data needed to build a model
This depends on: how detailed we want the model to be how complex the original object
(system) is safe assumptions we can make (e.g.
symmetry in 3D object -> less pictures needed)
Static model
Dynamic model
OED for dynamic models
We need time-series data with enough information to build a model
Data from different contexts, with enough time resolution
Reverse engineering (dynamic) Mario
Main characteristics:
Non-linear, dynamic models (i.e. batch or semi-batch processes)
Nonlinear constraints (safety and/or quality demands)
Distributed systems (T, c, etc.)
Coupled transport phenomena
Thus, mathematical models consist of sets of ODEs, DAEs, PDAEs, or even IPDAEs, with possible logic conditions (transitions, i.e. hybrid systems)
PDAEs models are usually transformed into DAEs (I.e. discretization methods, like FEM, NMOL, etc.)
Dynamic process models
ExperimentData
Model
Solver
Fitted Model
Model building
ExperimentData
Model
Solver
Fitted ModelIdentifiability Analysis
Identifiability Analysis
Parameter Estimation
Optimal Experimental Design
Model building
Model building cycle
OED
New experimen
ts
New data
Model selection
and discriminatio
n
Parameter estimation
Prior informatio
n
Experimental degrees of freedom and constraints
Initial conditionsDynamic stimuli: type and number of perturbations
Measurements
What? When? (sampling times, experiment duration)How many replicates?
How many experiments?
Etc.
Experimental design
Examples
Bacterial growth in batch culture
3-step pathway
Oregonator
Concentration of microorganisms
Concentration of growth limiting substrate
Example: Bacterial growth in batch culture
Concentration of microorganisms
Concentration of growth limiting substrate
Yield coefficient
Decay rate coefficient
Maximum growth rate
Michaelis-Menten constant
Example: Bacterial growth in batch culture
Example: Bacterial growth in batch culture
Experimental design:
Initial conditions?What to measure? (concentration of microorganisms and substrate?)When to measure? (sampling times, experiment duration)How many experiments?How many replicates?Etc.
Example: Bacterial growth in batch culture
Case A: 1 experiment11 equidistant sampling timesDuration: 10 hoursMeasurements of S and B
0 5 102
4
6
8
10
12
14
Time
obsB
0 5 10
0
5
10
15
20
25
30
Time
obsS
Example: Bacterial growth in batch culture
0 1 2 3 4 5 6 7 8 9 100
5
10
15
20
25
30
Time
obsB
fit cb
data3
data cs
Parameter estimation using GO method (eSS)
Example: Bacterial growth in batch culture
Case B: 1 experiment11 equidistant sampling timesDuration: 10 hoursMeasurements of S only
0 1 2 3 4 5 6 7 8 9 10
5
10
15
20
25
30
Time
obsS
Example: Bacterial growth in batch culture
0 1 2 3 4 5 6 7 8 9 10
5
10
15
20
25
30
Time
obsS
0 1 2 3 4 5 6 7 8 9 10
5
10
15
20
25
Time
cb
Predicted
Real
Good fit for substrate! But bad predictions for bacteria…
Example: Bacterial growth in batch culture
kd vsks
ks
kd
3 4 5 6 7 8 9 10
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
kd vsmumax
mumax
kd
0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1ks vsmumax
mumax
ks
0.2 0.3 0.4 0.5 0.6 0.7 0.8
3
4
5
6
7
8
9
10
yield vskd
kd
yiel
d
0.04 0.06 0.08 0.1
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1yield vsks
ks
yie
ld
3 4 5 6 7 8 9 10
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
yield vsmumax
mumax
yie
ld
0.2 0.4 0.6 0.8
0.4
0.6
0.8
1
Measuring both B and S…
Example: Bacterial growth in batch culture
kd vsks
ks
kd
3 4 5 6 7 8 9 10
0.04
0.06
0.08
0.1kd vsmumax
mumax
kd
0.2 0.4 0.6 0.8
0.04
0.06
0.08
0.1ks vsmumax
mumax
ks
0.2 0.4 0.6 0.8
4
6
8
10
yield vskd
kd
yiel
d
0.04 0.06 0.08 0.1
0.3
0.4
0.5
0.6
0.7
0.8
0.9
yield vsks
ks
yie
ld
3 4 5 6 7 8 9 10
0.3
0.4
0.5
0.6
0.7
0.8
0.9
yield vsmumax
mumax
yie
ld
0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Measuring only S…
Example: Bacterial growth in batch culture
So, for this case of 1 experiment, we should measure both B and S
But confidence intervals are rather large…
What happens if we consider a second experiment?(same experiment, but with different initial condition for S)
mmax : 4.0940e-001 9.4155e-002 (23%);Ks : 6.6525e+000 3.3475e+000 (50%);Kd : 3.9513e-002 7.0150e-002 (177%);Y : 4.8276e-001 1.6667e-001 (34%);
Example: Bacterial growth in batch culture
0 2 4 6 8 10
2
4
6
8
10
12
14
Time
obsB
0 2 4 6 8 10
5
10
15
20
25
30
Time
obsS
0 2 4 6 8 102
3
4
5
6
7
8
Time
obsB
0 2 4 6 8 100
5
10
15
Time
obsS
1st experiment 2nd experiment
mmax : 3.9542e-001 3.4730e-002 (9%)Ks : 5.3551e+000 9.1440e-001 (17%)Kd : 4.1657e-002 2.5753e-002 (62%)Y : 4.8529e-001 6.1227e-002 (13%);
Great improvement with a second experiment !BUT, can we do even better?
Example: Bacterial growth in batch culture
0 2 4 6 8 10
2
4
6
8
10
12
14
Time
obsB
0 2 4 6 8 10
5
10
15
20
25
30
Time
obsS
0 2 4 6 8 102
3
4
5
6
7
8
Time
obsB
0 2 4 6 8 100
5
10
15
Time
obsS
1st experiment 2nd experiment
mmax : 3.9542e-001 3.4730e-002 (9%)Ks : 5.3551e+000 9.1440e-001 (17%)Kd : 4.1657e-002 2.5753e-002 (62%)Y : 4.8529e-001 6.1227e-002 (13%);
Great improvement with a second experiment !BUT, can we do even better?OPTIMAL EXPERIMENTAL DESIGN
Example: simple biochemical pathway
C.G. Moles, P. Mendes y J.R. Banga, 2003. Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Research., 13:2467-2474.
Kinetics described by set of 8 ODEs with 36 parameters
Parameter estimation:
36 parameters
measurements: concentrations of 8 species
16 experiments (different values of S y P)
Example: simple biochemical pathway
Example: simple biochemical pathway
Initial conditions
for all the experiments
Experiments (S, P values)
21 measurements per experiment , tf = 120 s
Example: simple biochemical pathway
Multi-start local methods fail…
Multi-start SQP
Example: simple biochemical pathway
Parameter estimation: again (some) global methods can fail too…
tiempo
Conce
ntr
aci
ón E
1
0 20 40 60 80 100 1200
0.1
0.2
0.3
0.4
0.5
0 20 40 60 80 100 1200
0.5
1
1.5
2
2.5
tiempo
Conce
ntr
aci
ón M
2
Example: simple biochemical pathway
Parameter estimation: best fit looks pretty good but…
Contours [p1, p6]
Contours [p1, p4]
Example: simple biochemical pathway
Identifiability problems…
Practical identifiability problems are often due to data with poor information content
Need: more informative experiments (data sets)
Solution: optimal design of (dynamic) experiments
What about identifiability?
I.e. can the parameters be estimated in a unique way?
Identifiability:
A. Global a priori (theoretical, structural) B. Local a priori (local) C. Local a posteriori (practical)
(A) is hard to evaluate for realistic nonlinear models
(B) and (C) can be estimated via the FIM and other indexes...
(C) takes into account noise etc.
Parametric sensitivities
Fisher information matrix (FIM)
N
iiIii
TI tStWtSFIM
1
1
00 ,, ppptxxj
iij p
xS
Checking identifiability and other indexes…
Compute sensitivities (direct decoupled method) :
Build FIM , covariance and correlation matrices
Analyse possible correlations among parameters
N
iiIii
TI tStWtSFIM
1
1
00 ,, ppptxxj
iij p
xS
1C FIM , ; 1,ijij ij
ii jj
CR i j R i j
C C
Sensitivities w.r.t. p1 and p6 are highly correlated
(i.e. The system exhibits rather similar responses to changes in p1 and p6 for the given experimental design)
p1 & p6 p1 & p4
Checking identifiability and other indexes…
Compute sensitivities (direct decoupled method) :
Build FIM , covariance and correlation matrices
Analyse possible correlations among parameters
Compute confidence intervals
Check FIM-based criterions (practical identifiability) Singular FIM: unidentifiable parameters, non-informative experiments
Large condition number of FIM means lower practical identifiability
Rank parameters
v Finally, use OED to improve experimental design
N
iiIii
TI tStWtSFIM
1
1
00 ,, ppptxxj
iij p
xS
1C FIM , ; 1,ijij ij
ii jj
CR i j R i j
C C
Model structure & parameters
Parametric sensitivities
Ranking of parameters
Practical Identifiability analysis
Optimal experimental design
(New) Experiments Model
calibration
Balsa-Canto, E., Alonso, A. A., & Banga, J. R. (2010). An iterative identification procedure for dynamic modeling of biochemical networks. BMC Systems Biology 4:11
Optimal (dynamic) experimental design
Design the most informative experiments, facilitating parameter estimation and improving identifiability
How?
Define information criterion
Optimize it modulating experimental conditions
“If you want to truly understand something, try to change it.”Kurt Lewin, circa 1951
Optimal (dynamic) experimental design
Computational approaches that are applicable to support the optimal design of experiments in terms of
• how to manipulate the degrees of freedom (controls) of experiments,
• what variables to measure,
• why to measure them,
• when to take measurements.
ExperimentData
N
iiIii
TI tStWtSFIM
1
1
Optimal (dynamic) experimental design
Information content measured with the FIM
We will use scalar functions of the FIM (“alphabetical” criteria)
Find experiments which maximize information content
Some FIM-based Criterions...
• D-criterion (determinant of F), which measures the global accuracy of the estimated parameters
• E-criterion (smallest eigenvalue of F), which measures largest error
• Modified E-criterion (condition number of F), which measures the parameter decorrelation
• A-criterion (trace of inverse of F), which measures the arithmetic mean of estimation error
FJ max
)(/)( min minmax FFF J
)1 Ftrace( J
FIMJ min max
A criterion =
D criterion =
E criterion =
Modified-E criterion =
FIMFIM
min
maxmin
FIMminmax
FIMdetmax
1min FIMtrace
2
1
A-optimality
E-optimality
D-optimality
E criterion =
E-optimality: max the min eigenvalue of FIM
(minimizes the largest error)
FIMminmax
2
1
A-optimality
E-optimality
D-optimality
Modified-E criterion =
Maximize decorrelation between parameters
(make contours as circular as possible)
FIMFIM
min
maxmin
2
1
A-optimality
E-optimality
D-optimality
Calculate the dynamic scheme of measurements so as to
generate the maximum amount and quality of information
for model calibration purposes.
OED as a dynamic optimization problem
When to measure? (Optimal sampling times)
Which type of dynamic stimuli?
Calculate time-varying control profiles (u(t)), sampling
times, experiment duration and initial conditions (v) to
optimize a performance index (scalar measure of the FIM):
System dynamics (ODEs, PDEs):
Experimental constraints:
OED as a dynamic optimization problem
Back to example: Bacterial growth in batch culture
Experimental design:
Initial conditions?What to measure? (concentration of microorganisms and substrate?)When to measure? (sampling times, experiment duration)How many experiments?How many replicates?Etc.
Example: Bacterial growth in batch culture
0 2 4 6 8 10
2
4
6
8
10
12
14
Time
obsB
0 2 4 6 8 10
5
10
15
20
25
30
Time
obsS
0 2 4 6 8 102
3
4
5
6
7
8
Time
obsB
0 2 4 6 8 100
5
10
15
Time
obsS
1st experiment 2nd experiment
mmax : 3.9542e-001 3.4730e-002 (9%)Ks : 5.3551e+000 9.1440e-001 (17%)Kd : 4.1657e-002 2.5753e-002 (62%)Y : 4.8529e-001 6.1227e-002 (13%);
Great improvement with a second experiment !BUT, can we do even better?OPTIMAL EXPERIMENTAL DESIGN
Example: Bacterial growth in batch culture
Let us design the second experiment in an optimal way:
Criteria: E-optimality (minimize the largest error)
Degress of freedom we can ‘manipulate’ in the second experiment:
• Initial concentrations of S and B
• Duration of experiment
Example: Bacterial growth in batch culture
OED of second experiment1st experiment 2nd experiment after
OED
mmax : 3.9950e-001 1.7133e-002 (4.3%)Ks : 4.9530e+000 2.9647e-001 (6%)Kd : 5.0859e-002 2.9936e-003 (6%)Y : 5.0544e-001 1.4074e-002 (2.8%);
0 2 4 6 8 10
2
4
6
8
10
12
14
Time
obsB
0 2 4 6 8 10
5
10
15
20
25
30
Time
obsS
0 5 10 15
1.2
1.4
1.6
1.8
2
2.2
2.4
Time
obsB
0 5 10 15
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Time
obsS
free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11
Example: Bacterial growth in batch culture
Two arbitrary experiments
After OED of second experiment
mmax : 3.9950e-001 1.7133e-002 (4.3%)Ks : 4.9530e+000 2.9647e-001 (6%)Kd : 5.0859e-002 2.9936e-003 (6%)Y : 5.0544e-001 1.4074e-002 (2.8%);
free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11
mmax : 3.9542e-001 3.4730e-002 (9%)Ks : 5.3551e+000 9.1440e-001 (17%)Kd : 4.1657e-002 2.5753e-002 (62%)Y : 4.8529e-001 6.1227e-002 (13%);
Example: Bacterial growth in batch culture
Correlation matrix after OED of second experiment
free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11
mumax ks kd yield
mumax
ks
kd
yield
Crammer Rao based correlation matrix for global unknowns
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
mumax ks kd yield
mumax
ks
kd
yield
Crammer Rao based correlation matrix for global unknowns
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Example: Bacterial growth in batch culture
After OED of second experiment…
free initial conditions cb0:[1,5] cs: [5 40], free experiment duration [6,15] h, ns=11
-1 -0.5 0 0.5 1 1.5 2 2.5 3-1
-0.5
0
0.5
1
1.5
2
2.5
3mumax vs kd
mumax
kd
-1 -0.5 0 0.5 1 1.5 2 2.5 3-1
-0.5
0
0.5
1
1.5
2
2.5
3ks vs kd
ks
kd
-1 -0.5 0 0.5 1 1.5 2 2.5 3-1
-0.5
0
0.5
1
1.5
2
2.5
3kd vs yield
kd
yiel
d
0.95 1 1.05
0.94
0.96
0.98
1
1.02
1.04
1.06
mumax vs kd
mumax
kd
0.95 1 1.05
0.94
0.96
0.98
1
1.02
1.04
1.06
ks vs kd
ks
kd
0.95 1 1.05
0.95
1
1.05
kd vs yield
kd yi
eld
> Correlation between parameters has substantially improved (in general)
> Kd and Y are still highly correlated but the size of the confidence ellipse is much smaller
Example: Bacterial growth in batch culture
0.37 0.38 0.39 0.4 0.41 0.42 0.430
10
20
30
40
50
60Monte-Carlo based confidence interval
mumax
4.4 4.6 4.8 5 5.2 5.40
10
20
30
40
50
60Monte-Carlo based confidence interval
ks
0.048 0.049 0.05 0.051 0.052 0.053 0.0540
10
20
30
40
50Monte-Carlo based confidence interval
kd
0.48 0.49 0.5 0.51 0.52 0.53 0.540
10
20
30
40
50Monte-Carlo based confidence interval
yield
5%
7%
6%
3%
mmax : 3.9950e-001 1.7133e-002 (4.3%)Ks : 4.9530e+000 2.9647e-001 (6%)Kd : 5.0859e-002 2.9936e-003 (6%)Y : 5.0544e-001 1.4074e-002 (2.8%);
Robust confidence intervals are similar to those obtained by the FIM
Example: OED for the simple biochemical pathway
Moles, C. G., Pedro Mendes and Julio R. Banga (2003) Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Research 13(11):2467-2474
Original experimental design:
16 experiments (different S and P values)
Result: large E-criterion and modified E-criterion
Some FIM-based criterions for the original design...
Very large modified E-criterion indicates large correlation among (some) parameters, making the identification of the system hard
Can we improve this by an alternative (optimal) design of experiments?
Improve experimental design by solving OED
Find the values of S and P for a set of Nexp experiments
which e.g. maximize E-criterion s.t. constraints
(dynamics plus bounds)
Improved experimental design:
16 experiments with optimal S and P
E-criterion improved (> one order of magnitude)
Other criteria also improved
Improved design by solving OED problem:
Improved design by solving OED problem:
Original Design (E-crit= 60, logD=161) Improved Design (E-crit= 320, logD=181)
Example: OED for Oregonator reaction
The Oregonator is the simplest realistic model of the chemical dynamics of the oscillatory Belousov-Zhabotinsky (BZ) reaction(Zhabotinsky, 1991; Gray and Scott, 1991; Epstein and Pojman, 1998)
Oregonator reaction: highly nonlinear, oscillatory kinetics
Oregonator reaction: identifiability problems
Villaverde, A., J. Ross, F. Morán, E. Balsa-Canto, J.R. Banga (2011) Use of a Generalized Fisher Equation for Global Optimization in Chemical Kinetics.Journal of Physical Chemistry A115(30):8426-8436.
Oregonator reaction: OED
E-optimality criterion
Example: OED for Oregonator reaction
E-optimality criterion: improved 3 orders of magnitude
Villaverde, A., J. Ross, F. Morán, E. Balsa-Canto, J.R. Banga (2011) Use of a Generalized Fisher Equation for Global Optimization in Chemical Kinetics.Journal of Physical Chemistry A115(30):8426-8436.
Example: OED for Oregonator reaction
OED conclusions
Currently, most experiments are designed based on intuition of experimentalists and modellers
Model-based OED can be used to: Improve model calibration Discriminate between rival models
OED is a systematic and optimal approach
OED can take into account practical limitations and constraints by incorporating them into the formulation
Check identifiability
Use proper optimization methods for parameter estimation
Use optimal experimental design
Main tips for dynamic model building
Take-home messages
“All models are wrong, but some are useful”--- Statistician George E. P. Box
Main tips for dynamic model building
“All models are wrong, but some are useful”--- Statistician George E. P. Box
The practical question is:
How wrong do they have to be to not be useful?
Main tips for dynamic model building
Software for dynamic model building and OED
http://www.iim.csic.es/~amigo/
A few selected references…
Ashyraliyev M, Fomekong-Nanfack Y, Kaandorp JA & Blom JG (2009a). Systems biology: parameter estimation for biochemical models. FEBS J 276: 886–902.
Balsa-Canto, E. and Julio R. Banga (2011) AMIGO, a toolbox for Advanced Model Identification in systems biology using Global Optimization. Bioinformatics 27(16):2311-2313.
Balsa-Canto, E., Alonso, A. A., & Banga, J. R. (2010). An iterative identification procedure for dynamic modeling of biochemical networks. BMC Systems Biology 4:11.
Bandara, S., Schlöder, J. P., Eils, R., Bock, H. G., & Meyer, T. (2009). Optimal experimental design for parameter estimation of a cell signaling model. PLoS computational biology, 5(11), e1000558.
Banga, J.R. and E. Balsa-Canto (2008) Parameter estimation and optimal experimental design. Essays in Biochemistry 45:195–210.
Balsa-Canto, E., A.A. Alonso and J.R. Banga (2008) Computational Procedures for Optimal Experimental Design in Biological Systems. IET Systems Biology 2(4):163-172.
Chen BH, Asprey SP (2003) On the Design of Optimally Informative Dynamic Experiments for Model Discrimination in Multiresponse Nonlinear Situations. Ind Eng Chem Res 2003, 42:1379-1390.
Jaqaman K., Danuser G. Linking data to models: data regression. Nat. Rev. Mol. Cell Bio.7:813-819.
Kremling A, Saez-Rodriguez J: Systems Biology - An engineering perspective. J Biotechnol 2007, 129:329-351
Mélykúti, B., E. August, A. Papachristodoulou and H. El-Samad (2010) Discriminating between rival biochemical network models: three approaches to optimal experiment design. BMC Systems Biology 4:38.
van Riel N (2006) Dynamic modelling and analysis of biochemical networks: Mechanism-based models and model-based experiments. Brief Bioinform 7(4):364-374.
Villaverde, A.F. and J.R. Banga (2014) Reverse engineering and identification in systems biology: strategies, perspectives and challenges. J. Royal Soc. Interface 11(91):20130505
(review papers in yellow)
http://www.iim.csic.es/~gingproc/software.html