in copyright - non-commercial use permitted rights ...7975/eth... · de producción de plantas...
TRANSCRIPT
Research Collection
Doctoral Thesis
Modeling of steam consumption in chemical batch plants
Author(s): Pereira, Cecilia Mónica
Publication Date: 2013
Permanent Link: https://doi.org/10.3929/ethz-a-010060560
Rights / License: In Copyright - Non-Commercial Use Permitted
This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.
ETH Library
DISS. ETH NO. 21480
Modeling of Steam Consumption
in Chemical Batch Plants
Cecilia Pereira
DISS. ETH NO. 21480
Modeling of Steam Consumption
in Chemical Batch Plants
A thesis submitted to attain the degree of
DOCTOR OF SCIENCES of ETH ZURICH
(Dr. sc. ETH Zurich)
presented by
Cecilia Mónica Pereira
M Sc UZH, Universität Zürich
born on September 15th, 1979
citizen of Uruguay and Italy
accepted on the recommendation of
Prof. Dr. Konrad Hungerbühler, examiner
Prof. Dr. Rudiyanto Gunawan, co-examiner
2013
ACKNOWLEDGEMENTS I
Acknowledgements
During the time of my PhD thesis at the Safety and Environmental
Technology Group at the ETH Zürich, I have received generous
support and encouragement of many people. Konrad
Hungerbühler offered me the opportunity to carry out my work in
his group, where I have greatly benefited from his experience at
the interface of academia and industry. Very special thanks to
Konrad. Advice and guidance given by Stavros
Papadokonstantakis have contributed enormously to my work,
motivating and supporting me to explore new ideas and
challenges. I deeply appreciate suggestions and feedback offered
by Stefanie Hellweg, who participated in all progress meetings
and Zieldialogs during my thesis.
Without the collaboration of my industry partners, this dissertation
would not have been possible. They have provided me with the
main data source for my project, contributing with their expertise
and hospitality during the data collection campaigns. I would also
like to express my gratitude to the Bundesamt für Energie (BFE)
and Bundesamt für Umwelt (BAFU) for their financial support.
I appreciate the relevant work carried out by Ines Hauner during
her master thesis under my supervision. Claude Rerat, during
years my office mate, has contributed with his expertise in energy
consumption modeling to my work, and has significantly helped
me with my programming skills. I would also like to thank Jürgen
Sutter and Peter Mumenthaler for their technical support. Many
other current and former members of the Safety and
Environmental Technology Group have supported and shared
II ACKNOWLEDGEMENTS
interesting discussions with me. Special thanks go to Prisca Rohr,
Isabelle Lendvai, Martin Scheringer, Matthew MacLeod, Asif
Qureshi, Andreas Buser, Andrej Szijjarto, Andrea Bumann,
Sebastien Cap, Elisabet Capón and Annelle Gutiérrez.
Finally, I owe my deepest gratitude to my parents. They have
always supported me unconditionally in all aspects of life. Gracias
mamá, gracias papá!
SUMMARY III
Summary
The minimization of energy consumption in the chemical industry
is one of the key principles of green chemistry. This has led to the
development of evaluation tools, which include energy use as a
metric, not only in academia but also in the industry. The use of
these evaluation tools requires process specific data of energy
consumption, which is usually scarce particularly in multiproduct
and multipurpose batch plants. In this thesis we developed
shortcut models of steam consumption, which typically represents
the highest energy utility consumption.
First, we introduced a new methodology for modeling the steam
consumption in chemical batch plants based on standard process
documentation, rules of thumb, expert opinion, and
thermodynamic principles. Additionally, we proposed uncertainty
intervals for the model outputs based on fuzzy set theory. Three
case studies using production data from multiproduct and
multipurpose batch plants of three different chemical companies
were carried out for parameterization and validation of the
proposed methodology. While in the first two cases, the validation
against reference values considered the steam consumption in
several equipments or vessels involved in reaction and work-up
processes (kilograms of steam per equipment), in the third case
study the steam consumption was modeled and validated for
entire synthesis paths (kilograms of steam per product). The
validation results showed that the documentation based models
provide acceptable estimations of steam consumption in chemical
batch plants, and that the uncertainty intervals are in agreement
with the batch-to-batch variability of the steam consumption.
IV SUMMARY
Secondly, statistical models, namely probability density functions
(PDF) and classification trees were fitted to real production data.
These models take the form of generic intervals defined as
interquartile ranges derived from the PDF parameters, and as
classes derived from the classification trees. The use of the
models is possible at different levels of process design, the
minimal required information being the reaction type. The PDF
models were assessed with diverse metrics for the goodness-of-
fit and the classification trees by means of cross-validation. The
prediction performance of both types of models was further
evaluated in two case studies. The validation results show that
the statistical models proposed in this work provide satisfactory
interval estimations of steam consumption.
Depending on the application target one model might be more
convenient than the other, meaning that a compromise between
modeling time and accuracy has to be done. While the
documentation based approach is a more detailed procedure
which delivers a deterministic estimated value with an uncertainty
range, the statistical models resulting in generic intervals are
much faster to use. Even though the PDF models allow
reasonable predictions of steam consumption, their most
interesting applications are for benchmarking and uncertainty
analysis. Additionally, the transparency of the classification trees
facilitated the analysis of the effect of their predictor variables –
reaction type and operational parameters – upon the steam
consumption, resulting in a set of dominating classification rules.
Consequently, besides the predictive capabilities of the statistical
models, they serve as descriptive and explanatory tools. Both
modeling approaches to steam consumption proposed in this
SUMMARY V
thesis are of high importance in early phases of process design,
in the field of Life Cycle Assessment (LCA) and for
benchmarking.
VI ZUSAMMENFASSUNG
Zusammenfassung
Die Minimierung des Energieverbrauchs in der chemischen
Industrie ist eines der Hauptprinzipien der Grünen Chemie. Dies
hat in der Forschung sowie in der Industrie zur Entwicklung von
Bewertungsmethoden, die Energieverbrauch als Metrik
berücksichtigen, geführt. Die Anwendung dieser Methoden
benötigt spezifische Daten von Energieverbrauch, die
normalerweise nicht verfügbar sind, besonders in Mehrprodukt
und Mehrzweck Batch Betriebe. In dieser Doktorarbeit wurden
shortcut Modelle von Dampfverbrauch, den der grösste
Energieverbrauch darstellt entwickelt.
Zuerst wurde eine neue Methode, basierend auf
Betriebsvorschriften, Heuristiken, Expertisen und
thermodynamischen Prinzipien, zur Modellierung des
Dampfverbrauches in chemischen Batch-Anlagen entwickelt.
Zusätzlich wurden Unsicherheitsintervalle basierend auf die
Theorie der unscharfen Mengen (Fuzzy Set Theory) entwickelt.
Drei Fallstudien mit Produktionsdaten von Mehrprodukt- und
Mehrzweck-Batch-Anlagen von drei verschiedenen chemischen
Unternehmen, wurden zur Parametrisierung und Validierung der
neuen Methodologie durchgeführt. In den ersten zwei Fällen
wurde der Dampfverbrauch von mehreren Apparaten, die für die
Reaktionsstufe (Reaktor) und die Trennungsverfahren gebraucht
werden (Kilogramm Dampf pro Apparat), modelliert und gegen
Referenzwerte validiert. In der dritten Fallstudie wurde der
Dampfverbrauch für gesamte Synthesewegen (Kilogramm Dampf
pro Apparat) modelliert und validiert. Die Ergebnisse der
Validierung haben gezeigt, dass diese neuen Modelle vernünftige
ZUSAMMENFASSUNG VII
Abschätzungen des Energieverbrauches in chemischen Batch-
Anlagen liefern, und dass die Unsicherheitsintervalle mit der
Variabilität des Dampfverbrauches zwischen verschiedenen
Batches übereinstimmen.
Im zweiten Schritt wurden statistische Modelle –
Wahrscheinlichkeitsdichtefunktionen und Entscheidungsbäume –
an echte Produktionsdaten angepasst. Diese Modelle können als
Interquartil-Intervalle und als Klassen, abgeleitet jeweils aus den
Wahrscheinlichkeitsdichtefunktionen, und aus den
Entscheidungsbäumen, dargestellt werden. Die Anwendung
dieser Modelle ist auf verschiedenen Stufen der
Prozessauslegung möglich, wobei die minimale nötige Input-
Information der Reaktionstyp ist. Die
Wahrscheinlichkeitsdichtefunktions-Modelle wurden anhand von
verschiedenen Anpassungsgüte-Kriterien bewertet, die
Entscheidungsbäume anhand von Kreuzvalidierungsverfahren.
Die Voraussagekraft beider Modelle wurde zudem in zwei
Fallstudie bewertet. Die Validierungsresultate weisen darauf hin,
dass die statistischen Modelle, die in dieser Doktorarbeit
entwickelt wurden, ausreichende Intervallabschätzungen des
Dampfverbrauches liefern.
Abhängig vom Anwendungszweck ist einer der beiden
Modellierungsansätze geeigneter als der andere. Dies bedeutet,
dass ein Kompromiss bezüglich Modellierungszeit und
Genauigkeit betrachtet werden muss. Während der auf
Betriebsvorschriften basierende Ansatz eine detailliertere
Methode ist, die einen deterministischen Wert mit einem
Unsicherheitsintervall liefert, kann man die statistischen Modelle
VIII ZUSAMMENFASSUNG
viel schneller anwenden. Obwohl die
Wahrscheinlichkeitsdichtefunktions-Modelle vernünftige
Abschätzungen des Dampfverbrauches erlauben, ist ihre
interessanteste Anwendung das Benchmarking und die
Unsicherheitsanalyse. Die Transparenz der Entscheidungsbäume
erlaubt zusätzlich die Analyse des Effekts der Prädiktorvariablen
– Reaktionstyp und Betriebsparameter – auf den
Dampfverbrauch. Dies führt zu einem Set von
Klassifizierungsregeln. Während die statistische Modelle zur
Voraussagen des Dampfverbrauches genutzt werden können,
beide Modellierungsarten können für deskriptive und erläuternde
Zwecke angewendet werden.
Die zwei Modellierungsansätze, die in dieser Arbeit
vorgeschlagen wurden, sind von hoher Bedeutung in früheren
Phasen der Prozessauslegung, sowie für Life Cycle Assessment
und Benchmarking.
RESUMEN IX
Resumen
La reducción del consumo de energía en la industria química es
uno de los principios claves de la química verde. Esto ha
propiciado el desarrollo de herramientas de evaluación, que
incluyen consumo de energía como métrica, no solo en el ámbito
académico sino también en la industria. El uso de estas
herramientas de evaluación requiere datos específicos de
consumo de energía, los cuales son usualmente escasos,
especialmente en plantas de producción por lotes multi-producto
y multi-propósito. En esta tesis desarrollamos modelos
abreviados ‘shortcut’ de consumo de vapor, el cual representa
generalmente la fuente de energía de mayor consumo.
Primero, introdujimos una metodología para el modelado del
consumo de energía en plantas químicas por lotes basada en
documentación estándar de procesos, reglas de oro, opinión de
expertos y principios termodinámicos. Además, propusimos
intervalos de incertidumbre basados en la teoría de conjuntos
difusos para los resultados de los modelos. Para la
parametrización y la validación de la metodología propuesta se
realizaron tres estudios de caso, en los cuales se utilizaron datos
de producción de plantas multi-producto y multi-propósito.
Mientras que en los dos primeros casos se consideró el vapor
consumido en diferentes reactores o tanques durante la reacción
y los procesos de separación (kilogramos de vapor por reactor),
en el tercer estudio de caso el vapor consumido fue modelado y
validado para rutas completas de síntesis (kilogramos de vapor
por producto). La validación de los resultados demostró que los
modelos basados en documentación proporcionan estimaciones
X RESUMEN
satisfactorias de consumo de vapor en plantas química por lotes,
y que los intervalos de incertidumbre corresponden a la
variabilidad en el consumo de energía entre lotes.
En segundo lugar, modelos estadísticos, concretamente
funciones de densidad de probabilidad (FDP) y arboles de
clasificación fueron ajustados a datos de producción reales.
Estos modelos toman la forma de intervalos genéricos definidos
como rangos intercuartílicos derivados de los parámetros de las
funciones de densidad de probabilidad, y de clases derivadas de
los arboles de clasificación. El uso de estos modelos es posible a
diferentes niveles del diseño de procesos, siendo el tipo de
reacción la información mínima requerida. Los modelos FDP
fueron evaluados con diversas métricas de ajuste de bondad, y
los arboles de clasificación por medio de validación cruzada. El
rendimiento de predicción de los dos tipos de modelos fue
además evaluado en dos estudios de caso. Los resultados de la
validación demuestran que los modelos estadísticos propuestos
en este trabajo proporcionan resultados satisfactorios de
estimaciones de intervalos de consumo de vapor.
Dependiendo de cuál sea el objetivo de la aplicación, un modelo
puede ser más conveniente que el otro, lo que significa que se
debe llegar a un compromiso entre tiempo requerido para el
modelado y exactitud. Mientras que el enfoque basado en
documentación representa un procedimiento más detallado, que
proporciona valores determinísticos con intervalos de
incertidumbre, los modelos estadísticos resultan en intervalos
genéricos que son mucho más rápidos de aplicar. A pesar de
que los modelos FDP permiten predicciones razonables de
RESUMEN XI
consumo de vapor, su aplicación más interesante reside en el
benchmarking y en el análisis de incertidumbre. Además, la
transparencia proporcionada por los arboles de clasificación
facilita el análisis del efecto de sus variables de predicción – tipo
de reacción y parámetros operacionales – sobre el consumo de
vapor, resultando en un conjunto de reglas de clasificación
dominantes. Por lo tanto, además de las capacidades predictivas
de los modelos estadísticos, estos sirven como herramientas
descriptivas y explicativas. Los dos enfoques de modelo de
consumo de vapor propuestos en esta tesis son de importancia
significativa en etapas tempranas del diseño de procesos, en el
área de análisis de ciclo de vida y en benchmarking.
XII CONTENT
Table of Contents
1 Introduction ...................................................................... 1
1.1 Energy Consumption in the Chemical Industry ........... 1
1.2 State of the Art ............................................................ 4
1.3 Goal of the Thesis ....................................................... 7
1.4 Structure of the Thesis ................................................ 9
2 Documentation based Models of Steam Consumption10
2.1 Bottom-up Modeling .................................................. 10
2.2 Standard Operating Procedures (SOPs) ................... 10
2.3 Model Development .................................................. 12
2.3.1 Application Example ........................................ 20
2.4 Model Uncertainty ..................................................... 28
2.4.1 Fuzzy Intervals ................................................ 29
2.4.2 Application Example ........................................ 30
3 Statistical Models .......................................................... 34
3.1 System Boundaries ................................................... 34
3.2 Training and Validation Datasets .............................. 35
3.3 Stages of Process Design ......................................... 37
CONTENT XIII
3.4 Selection and Classification of Chemical Reactions .. 38
3.5 Classification Models ................................................ 39
3.5.1 Selection of Predictor Variables and
Discretization of Target Attribute ...................... 41
3.5.2 Model Selection and Evaluation ....................... 46
3.5.3 Selection of Important Rules ............................ 48
3.6 Probability Density Function Models ......................... 50
4 Results Documentation based Approach .................... 52
4.1 Case Study I ............................................................. 53
4.1.1 Dataset ............................................................ 53
4.1.2 Theoretical Energy Consumption ..................... 54
4.1.3 Energy Losses ................................................. 55
4.1.4 Sensitivity and Uncertainty Analysis ................ 56
4.1.5 Total Energy Consumption .............................. 60
4.2 Case Study II ............................................................ 62
4.2.1 Dataset ............................................................ 62
4.2.2 Total Energy Consumption .............................. 63
4.2.3 Top-down Energy Modeling ............................. 65
XIV CONTENT
4.3 Case Study III ........................................................... 69
5 Results Classification Trees ......................................... 71
5.1 Model Selection and Evaluation ................................ 71
5.2 Selection of Important Rules ..................................... 78
6 Results Probability Density Function Models ............. 85
6.1 Model Development .................................................. 85
6.2 Model Evaluation per Reaction Type ........................ 89
6.3 Further Parameterization of the Models .................... 91
7 Application of the Statistical Models ............................ 93
7.1 Case Study I ............................................................. 93
7.2 Case Study II ............................................................ 95
8 Conclusions and Outlook ........................................... 107
8.1 Practical Relevance and Applications ..................... 107
8.2 Outlook ................................................................... 111
8.2.1 Extension of the Modeling Approaches to other
Process Parameters ...................................... 111
8.2.2 Optimization Problem for Selection of
Classification Trees ....................................... 112
Nomenclature ........................................................................ 113
CONTENT XV
Appendix ............................................................................... 117
A Supporting Information to Chapter 2 ....................... 117
B Supporting Information to Chapter 3 ....................... 121
C Supporting Information to Chapter 4 ....................... 145
D Supporting Information to Chapter 5 ....................... 160
E Supporting Information to Chapter 6 ....................... 169
F Supporting Information to Chapter 7 ....................... 177
Bibliography .......................................................................... 182
INTRODUCTION 1
1 Introduction
1.1 Energy Consumption in the Chemical Industry
The environmental dimension of sustainable development is
significantly affected by energy consumption and management
(Patterson, 1996). In this context the chemical industry has been
identified as one of the major consumers of energy in
manufacturing compared to other industrial sectors (Steinmeyer,
2000, Vandecasteele et al., 2007). In addition recent
environmental assessment studies of the production of common
chemicals in developed countries have shown that energy related
impacts are often over 50% of the total environmental impact
(Wernet et al., 2011). Therefore improving of energy efficiency in
chemical production has been recognized as a key target for the
chemical sector and for environmental regulations (Jenck et al.,
2004). Moreover, minimization of energy use is an approach
towards changing the nature of a chemical product or process for
reducing the risk to the environment and human health (Paul T.
Anastas and Warner, 1998).
Improvement of energy efficiency of chemical production
processes can be achieved in different ways such as process
control and optimization for selecting the best operating
conditions (Le Lann et al., 1999), efficient heat transfer
(VaklievaBancheva et al., 1996, Phillips et al., 1997,
Oppenheimer and Sorensen, 1997), pinch analysis (Smith, 1995,
Shenoy, 1995, Linnhoff, 1993), use of catalysts for lowering the
reaction activation energy (Paul T. Anastas and Warner, 1998),
2 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
designing processes that minimize the requirement of separation
and purification steps, and recycling of waste (Capello et al.,
2008). Considering this, different design alternatives can be
compared as part of multi-objective decision-making frameworks
(Cano-Ruiz and McRae, 1998, Sugiyama et al., 2008b, Sugiyama
et al., 2008a). This is especially interesting at early phases of
process design, when modifications and improvements are less
costly and time-consuming to implement than in later stages.
For comparing the environmental impacts of the different
alternatives, methodologies such as Life Cycle Assessment
(LCA) (Burgess and Brennan, 2001) can be used. LCA is a
method for assessing the environmental impact of products and
processes over the entire product life cycle, thus it can be used in
process design for comparison and selection of options (G. E.
Kniel et al., 1996, Bauer and Maciel, 2004, Hellweg et al., 2004).
One of the Life Cycle Impact Assessment (LCIA) indicators is the
Cumulative Energy Demand (CED), which accounts for the total
amount of primary energy potential used during the production
life cycle. Recent studies have shown that the CED correlates
well with other LCIA indicators, serving as an estimation of
general environmental impact (Huijbregts et al., 2006, Wernet et
al., 2009). Other evaluation methodologies developed by the
industry such as the eco-efficiency analysis by BASF (Peter
Saling et al., 2002) and the green technology guide by
GlaxoSmithKline (Concepción Jiménez-González et al., 2001,
Concepción Jiménez-González et al., 2002) also consider energy
use as an impact indicator.
INTRODUCTION 3
Previous to the setting of energy goals, and besides the different
approaches for energy efficiency improvement and evaluation of
environmental impacts, an understanding of the energy use and
comparison against standards of similar processes
(benchmarking) are required. However, process energy
consumption data is usually scarce. This is mainly due to the low
cost of energy consumption in batch plants, namely 5-10% of the
total costs (VaklievaBancheva et al., 1996), compared to the
significant contribution of the raw materials costs. Thus, efforts for
achieving high energy efficiency in batch plants are usually
limited (Bieler et al., 2003). Additionally, energy flow
measurements can be more complicated (i.e., requiring mass
flow, temperature and pressure measurements of the energy
utilities) and therefore costly compared to the material flows of
reagents.
Most of the energy used in chemical production processes is in
the form of thermal energy (heating and cooling) or mechanical
energy (pumping, compression) (Steinmeyer, 2000). While the
most common energy utilities used for cooling during reaction and
separation processes are water and brine, the main energy utility
used for heating is steam. Electricity is the energy utility used to
generate mechanical energy. Among all these different energy
utilities used in the chemical industry, steam is the most prevalent
one, with the highest potential for improvement of energy
efficiency (Bieler, 2004).
4 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
1.2 State of the Art
While continuous production processes have been extensively
investigated for energy optimization by means of pinch analysis
and process integration (Linnhoff, 1993), the documentation of
energy flows in multiproduct and multipurpose batch plants, has
been traditionally neglected due to additional complexities caused
by the dynamic nature of the processes and to minor contribution
of energy to the plant economics. Consequently energy data es-
pecially for steam consumption, which is the energy utility with the
highest consumption and saving potentials, has to be modeled or
estimated by empirical know-how in many cases.
In order to fill this gap, different methodologies have been
proposed: energy estimations based on rigorous process
simulation (Concepción Jiménez-Gonzalez et al., 2000), in-house
technical knowledge (Rolf Bretz and Frankhauser, 1996), a top-
down approach which correlates the total energy utility
consumption with the total amount of chemicals produced in one
production building (Bieler et al., 2003), and a bottom-up
approach based on energy balances and estimation of thermal
losses of single unit operations, which can be further aggregated
for different levels of analysis (Bieler et al., 2004). In these last
two studies, it has been demonstrated that the top-down
approach is more suitable for dedicated monoproduct batch
plants, while the bottom-up methodology is also suitable for
multipurpose batch plants with high varying production
processes, being a more comprehensive but also more time-
consuming approach. Extended versions of the bottom-up
approach based on higher resolution data for dynamic modeling
INTRODUCTION 5
of energy consumption have claimed an average modeling error
of 10% at unit operation level and less than 30% at production
building level (Szïjjarto et al., 2008). This level of accuracy has
been shown to be adequate for allocation and monitoring of
energy consumption, highlighting energy saving potential in
multipurpose batch production buildings (Andrej Szïjjarto et al.,
2008, Rerat et al., 2013). However, since the high resolution
bottom-up approach requires extensive dynamic process data
from the control system as model input, it is not suitable for fast
screening purposes, unless it is embedded in the automated
plant monitoring systems. When this is not the case and energy
consumption has to be allocated to many different products in
different production buildings, or when a high resolution is not
required (e.g., estimating inventories for life cycle assessment), a
fast screening methodology based on standard process
documentation and rules of thumb can be a valuable tool.
Although the bottom-up models of Bieler (Bieler et al., 2004) can
serve as the basis for a methodology of this kind, a procedure for
systematic data extraction and filling of data gaps present in
standard process documentation has not been proposed yet.
In this work we propose a standard process documentation
based approach for modeling the steam consumption of single
unit operations in chemical batch plants, starting from previously
developed bottom-up models (Bieler et al., 2004) enhanced by
thermodynamic principles and rules of thumb for filling data gaps.
This new approach allows a systematic identification of the
different unit operations described in standard process
documentation by means of specific keywords, provides default
6 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
values and options for filling data gaps, and proposes uncertainty
intervals for the energy consumption models.
In spite of the simplicity of these documentation based models,
they are not of general applicability, since they still require de-
tailed process information as input, which is usually available in
later design stages and can be partially confidential. Therefore, a
second modeling approach is needed in cases when standard
operation procedures (SOP) are not available, namely in early
phases of process design, or when very fast estimations have to
be performed for screening purposes, a typical case for environ-
mental assessments. In addition to the documentation based ap-
proach, in this thesis we propose models of steam consumption
based on statistical analysis of production data available via a
consortium of industrial partners representing leading companies
in fine chemical and pharmaceutical production.
There are several examples of the use of statistical analysis in
the fields of Life Cycle Assessment (LCA) and process design,
such as modeling of relationships between design and inventory
parameters (Mueller and Besant, 1999, Mueller et al., 2004),
evaluation of distribution functions of emission factors (Cooper et
al., 2008), scenario analysis of process and material alternatives
by means of decision trees (Cooper et al., 2008), stochastic LCA
inventory modeling (Canter et al., 2002), characterization of rela-
tionships between technologies and pollutants by means of hier-
archal cluster analysis and principal component analysis (Cosmi
et al., 2004), development of greenness metric of synthetic pro-
cesses for active pharmaceutical ingredients using hierarchal
cluster analysis and principal component analysis (Curzons et al.,
INTRODUCTION 7
2007), enhancing the quality of Life Cycle Inventories (LCI) by
means of data reconciliation applying analysis of covariance (Hau
et al., 2007), and uncertainty analysis using stochastic models (B.
Maurice et al., 2000, Sugiyama et al., 2005).
1.3 Goal of the Thesis
The goal of this work is to provide shortcut models of steam con-
sumption of production processes in chemical batch plants, which
allow fast predictions for screening purposes. These models are
of two different types, one based on production documentation,
which results in a deterministic value with an uncertainty interval,
and a second type based on statistical analysis of production da-
ta, resulting in probability density functions and classification
trees, which can take the form of generic intervals.
The standard process documentation based approach estimates
the steam consumption of single unit operations in chemical
batch plants, starting from previously developed bottom-up mod-
els (Bieler et al., 2004) enhanced by thermodynamic principles
and rules of thumb for filling data gaps. This new approach allows
a systematic identification of the different unit operations de-
scribed in standard process documentation by means of specific
keywords, provides default values and options for filling data
gaps, and proposes uncertainty intervals for the energy consump-
tion models. Furthermore, this new approach is validated in two
case studies, demonstrating the general applicability of the bot-
tom-up approach for modeling the energy consumption in batch
production plants. Besides representing a prediction tool in itself,
8 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
the documentation based approach is used to generate the da-
taset for the building of the statistical models.
The probability density function (PDF) models describe the varia-
bility of the gate-to-gate steam consumption for the production of
one kilogram of product for a particular reaction type. Fitting of
distributions to data consists of finding the type of distribution and
the value of the parameters that give the highest probability of
generating the sample data. In our case the fitting was accom-
plished by means of the well known maximum likelihood method
(MLE) (Myung, 2003), and the goodness of the fit was evaluated
using standard statistical tests and the Akaike Information criteria
(Akaike, 1974). The generic interval models derived here corre-
spond to the interquartile ranges of the fitted distributions. The
interquartile range, which is the difference between the lower and
upper quartiles, namely the 25th and 75th percentiles of a PDF,
concentrates on the middle portion of the distribution. Thus, inter-
quartile ranges are judged to be useful as predictive models.
Classification trees can serve not only as predictive models, but
also as descriptive models to distinguish between objects from
different classes, and explain which features determine that ob-
ject to belong to a particular class in the same way as logistic re-
gression. In this work the models based on classification trees
assign categories – in the form of pre-defined intervals– to the
gate-to-gate steam consumption for the production of one kilo-
gram of product, given a particular set of attributes. Besides the
reaction type, which is in principle the only parameter considered
in the PDF models, the set of attributes of the classification trees
may also include information about process characteristics and
INTRODUCTION 9
operation parameters, depending on the stage of process design.
1.4 Structure of the Thesis
In Chapter 2 the documentation based approach is introduced,
including model development, uncertainty analysis and
application examples. Chapter 3 presents the selection and
evaluation procedure of the two types of statistical models
developed in this work, namely the classification trees and
probability density functions. In Chapter 4 the three case studies
for the validation of the documentation based approach are
presented. Chapter 5 presents the results of the cross-validation
of the classification trees for model selection and evaluation, as
well as the selection of important rules. Chapter 6 includes the
results of the probability density function models per reaction type
and discusses the further parameterization of these models. In
Chapter 7, the performance of the classification trees and
probability density function models is evaluated in a first case
study. Additionally, the probability density function models are
applied to a second case study, where the steam consumption of
different industrial synthesis routes for the production of a
pharmaceutical intermediate is estimated and a ranking of the
different alternatives is provided. Chapter 8 presents the
conclusions and outlooks of this thesis.
10 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
2 Documentation based Models of Steam Consumption
2.1 Bottom-up Modeling
The documentation based models relies on a bottom-up ap-
proach, defined as the summation of the energy consumption of
the single parts of a system (Werbos, 1990). In this context the
bottom-up modeling of steam consumption starts with the identifi-
cation of the relevant unit operations (UOs), described in the
standard operating procedure (SOP) as it is shown in Figure
2.2.1. Here, by unit operation, it is meant individual process steps
such as heating, evaporation or maintaining a constant tempera-
ture. Secondly, steam measurements are collected, if they are
available. In the worst case scenario, where neither measure-
ments nor process documentation are available, as it is often the
case in early phases of process design, empirical or statistical
models based on similar processes or process simulations can be
used. Once the measurements or estimated values for the single
unit operations are collected, they can be summed up to the de-
sired level of analysis (e.g. all unit operations comprised in a ves-
sel, during the reaction step or work-up processes, during a
whole production path, etc.).
2.2 Standard Operating Procedures (SOPs)
The data for the model development was acquired from two dif-
ferent chemical companies in Switzerland in the form of standard
operation procedures (SOPs). SOPs are written instructions that
DOCUMENTATION BASED MODELS 11
document the way the unit operations are performed in the pro-
duction plants. Often, these standard procedures follow the DIN
norm, at least in many batch plants in Europe. Normally, the first
chapter of an SOP consists of a short description, which includes
the reaction synthesis path, the working principle, the mass flow
diagram, some characteristics of the raw materials (e.g., molecu-
lar weight and purity), and other process related information (e.g.,
product yield, capacity of the plant and product use).
Figure 2.2.1. Documentation based approach (shaded box) for modeling steam consumption of unit operations in batch plants facilitating bottom-up modeling of steam consumption in multipurpose production buildings.
The second chapter generally includes process safety related
information, while the third chapter describes in a more detailed
way the performance of each unit operation and the
12 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
corresponding production parameters, (e.g., temperatures,
operation times, etc.). Furthermore, an SOP normally includes
equipment characteristics, (e.g., equipment volume and
construction material), quality and environmental requirements.
On the other hand, although SOPs are rich in process related
information, they are not complete for constructing energy
balances, due to a traditional lack of interest of “low volume, high
value” chemical batch production in energy costs, focusing mainly
on the more cost related material flows. Moreover, SOPs are
relatively static documents, in the sense that they do not provide
any information about batch-to-batch variability. As a result of
these two factors, performing an energy balance over the whole
production boundary using SOPs as the main data source would
be a highly time consuming task without a systematic
methodology for extracting the energy relevant information in
SOPs, filling the inevitable data gaps and providing realistic
estimations about the accuracy of the calculations. In the
following, we propose such a methodology focusing on steam
consumption. However, an extension to other energy utilities,
such as cooling water and brine, should be straightforward. In the
rest of the text, the terms steam and energy consumption are
used interchangeably.
2.3 Model Development
The first step of the process documentation based approach is to
define a set of keywords for the processes which require steam
consumption. These keywords are “charge”, “heat”, “evaporation”,
DOCUMENTATION BASED MODELS 13
“reflux”, “hold” and “reaction”. The description of the underlying
phenomena behind these keywords and the basic equations for
the single unit operations are presented in Table 2.3.1. In most
cases, the equations are based on simple first principles from
thermodynamics and heat transfer, except for reflux conditions,
where an empirical constant is used to describe the time-
dependent steam consumption. This is due to the absence of
information for the reflux ratio, which is the typical case in SOP
documentation. The energy losses are also characterized by an
empirical loss coefficient, which represents the heat transfer due
to radiation and free convection from the equipment surface
(Bieler et al., 2004). The input data include reaction mass,
process temperatures and physicochemical properties of the
material present in the unit operation, as well as some equipment
characteristics.
As can be seen in Table 2.3.1, the total energy consumption of
one unit operation consists of two terms, the theoretical energy
consumption and a time-dependent energy loss term. For all
parameters involved in these equations, data sources and
assumptions for default values are provided. For instance, a
substance or mixture charged into a vessel is assumed to be at
room temperature (20°C), unless something else is mentioned in
the SOP. In the case of missing data for heat capacities, a
substance is classified into three different categories, acid/base,
organic or water, and corresponding values are assigned.
Similarly, for enthalpy of vaporization data gaps, a substance is
classified according to its boiling point into low, middle and high
boiling point substance, and corresponding values are assigned.
Regarding the mass and total surface of the equipment, standard
14 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
values coming from DIN norms are assigned, based on the
nominal volume and material of the vessel. Enthalpies of reaction
are normally found in risk analysis documentation, rather than in
the SOP, and are based on final mass of the process step. The
proposed default values should be used as a mean for filling data
gaps in cases when the required information is not found in the
SOP. In the case of properties like heat capacity and enthalpy of
vaporization, more accurate values can be found in the literature,
at least for common substances and solvents, or can be
calculated by property estimation methods. In this way, the
accuracy of the model predictions can be improved with respect
to the estimations using default values. However, the proposed
default values should serve as a good basis for fast screening
calculations. Finally, 5-bar steam is used for temperatures below
145°C, and 15-bar steam for temperatures up to 190°C, the
steam consumption being calculated on the basis of its
condensation enthalpy.
DOCUMENTATION BASED MODELS 15
Table 2.3.1. Modeling of individual unit operations according to the process documentation based approach.
UO key-word
Description Formula Para-meter
Assumption/ Substance/ Equipment specification
Source/ Default value
Unit
Charge Heating of the new mass filled into the vessel to the same temperature (above 20°C) as the rest of the reac-tion mixture inside the vessel.
)( 2 iiitheo TTcpmE −⋅⋅= 2.3.1 mi SOP kg
cpi acid/base 1.5 kJ/(kg K)
( ) tTTAKE amloss ⋅−⋅⋅= 2 2.3.2 organic 2.2 kJ/(kg K)
water 4 kJ/(kg K)
−
−⋅
⋅
⋅−=
is
s
TT
TT
AU
cpmt 2ln
2.3.3 T2 SOP °C
Ti 20 °C
K Empirical (Bieler et al., 2004)
1.98 kJ/(min m
2 K)
A DIN-Norm m2
Tam 20 °C
Ts T2 <145→ 5-bar steam
159 °C
T2 >145→ 15-bar steam
201 °C
U STNR and 600 W/
16 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
cp >3.5 (m2 K)
otherwise 250 W/ (m
2 K)
Heat Heating of the total mass inside the vessel to a final temperature above 20°C.
( ))( 12
12
TTcpm
TTcpmE
eqeq
theo
−⋅⋅+
−⋅⋅=
2.3.4
∑=
n
i
imm 2.3.5 mi as in Charge
m
cpm
cp
n
i
ii∑ ⋅
=
)(
2.3.6 cpi as in Charge
tTTT
AKE amloss ⋅
−
+⋅⋅=
212
2.3.7 T2 as in Charge
T1 20 °C
−
−⋅
⋅
⋅−=
1
2lnTT
TT
AU
cpmt
s
s 2.3.8 meq STNR DIN-Norm
STNR kg
STEM DIN-Norm Stem
kg
cpeq STNR 0.5 kJ/(kg K)
DOCUMENTATION BASED MODELS 17
STEM 0.7 kJ/(kg K)
K as in Charge
A as in Charge
Tam as in Charge
Ts as in Charge
U as in Charge
Evapo-ration
Simple evapora-tion. It is always assumed if no re-flux conditions are mentioned.
iitheo HvmE ∆⋅= 2.3.9 mi as in Charge
iHv∆ Tboil low 350 kJ/kg
( ) damdloss tTTAKE ⋅−⋅⋅= 2.3.10 Tboil middle 900 kJ/kg
Tboil high 2250 kJ/kg
K as in Charge
A as in Charge
Td SOP °C
Tam as in Charge
td SOP/expert knowledge
min
Reflux Distillation under reflux conditions, with C being a constant fitted to
diitheo tCHvmE ⋅+∆⋅= 2.3.11 mi as in Charge
iHv∆ as in Evapora-tion
( ) damdloss tTTAKE ⋅−⋅⋅= 2.3.12 C empirical 5508 kJ/min
18 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
measurement data of steam consump-tion of recovery of butanol under strong reflux condi-tions.
td SOP/expert knowledge
K as in Charge
A as in Charge
Td as in Charge SOP °C
Tam as in Charge
Hold Keep the process temperature con-stant.
( ) hamhloss tTTAKE ⋅−⋅⋅= 2.3.13 K as in Charge
A as in Charge
Th SOP °C
Tam as in Charge
th SOP min
Reaction Energy produced or consumed due to exothermic or endothermic chem-ical reactions.
rtheo HmE ∆⋅= 2.3.14 m as in Charge RAD kg
rH∆ RAD kJ/kg
DOCUMENTATION BASED MODELS 19
Table 2.3.2. Definition of the symbols used in Table 2.3.1.
Symbol Description Unit A Surface area m
2
C Reflux constant kJ/min cp Heat capacity of the mixture kJ/(kg K) cpi Heat capacity of the substance kJ/(kg K) cpeq Heat capacity of the equipment kJ/(kg K) Etheo Theoretical energy consumption kJ Eloss Energy losses kJ K Loss coefficient kJ/(min K) m Mass of total reaction mixture kg meq Mass of the equipment kg mi Mass of substance-i kg T1 Initial temperature of reaction mixture °C T2 Final temperature of reaction mixture °C Tam Ambient temperature °C Tboil Boiling point °C Td Distillation temperature °C Th Process temperature kept constant °C Ti Temperature substance-i °C Ts Saturation temperature of steam °C t Heating time min td Distillation time min th Holding time min U Heat transfer coefficient W/(m
2 K)
rH∆ Enthalpy of reaction kJ/kg
iHv∆ Enthalpy of vaporization kJ/kg
20 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
2.3.1 Application Example
To illustrate this data extraction and modeling procedure, the
steam consumption for the batch production of 3200 kg of a raw
and wet product C is calculated, that is not including purification
and drying steps. Figure 2.3.1 shows a simplified example of the
SOP for the production of C. It is important to notice that while
some values can be directly extracted from the recipe section of
the SOP, other values have to be inferred from the mass flow
diagram of the SOP. The first unit operation is the preparation of
a reactant solution and heating for further filtration. After the first
filtration, the reaction step takes place in vessel 2, and
subsequently crystallization and washing steps are performed in
vessel 3. The suspension is filtered and the mother liquor is
transferred to vessel 4 for recovery of acetone. The description of
the reaction, crystallization and washing steps are not included in
this example since they are not relevant from a steam
consumption point of view. However, these processes are
depicted in the mass flow diagram.
In Table 2.3.3, the procedure for modeling the steam
consumption according to the methodology introduced above is
depicted in detail. The calculation steps, the production data and
their sources are presented, and a reference is made to the
equations of Table 2.3.1. Table 2.3.3 is divided into three
subsections, the first one corresponding to the modeling of the
heating of the reactant solution in vessel 1, the second part to the
modeling of the solvent recovery in vessel 4, and the last section
to the bottom-up modeling of the total steam consumption for the
DOCUMENTATION BASED MODELS 21
production of C. The bottom-up model in this case is the sum of
the total steam consumption in vessel 1 and vessel 4 resulting in
9000 kg of 5-bar steam for the production of 3200 kg of product
C.
22 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Figure 2.3.1. Example of a standard operation procedure (SOP).
DOCUMENTATION BASED MODELS 23
Table 2.3.3. Calculation of steam consumption for the production of 3200 kg of raw and wet product C (equal batch size)
according to the SOP presented in Figure 2.3.1.
Procedure Data Source Calculation Equation number Table 2.3.1
I. Modeling of energy consumption of the heating unit operation in vessel 1
Calculate the total mass in vessel 1
mwater=4000 kg SOP
kg 10100
450016004000
=
++=m
2.3.5 mA=1600 kg SOP
macetone=4500 kg SOP
Calculate the heat capacity of the mixture
cpwater=4 kJ/(kg K) Default
kJ/kgK 9.2
10100
2.245002.2160044000
=
⋅+⋅+⋅=cp
2.3.6 cporganic=2.2 kJ/(kg K) (A and acetone are organic)
Default
Calculate the theoretical energy consumption for heating of the reaction mixture and equipment
T1=20°C (initial temperature) Default
kJ 844750)2045(5.09000
)2045(9.210100
=−⋅⋅
+−⋅⋅=theoE
2.3.4 T2=45°C (final temperature) SOP
24 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
mass mequipment=9000 kg (STNR and NV=25 m
3)
Default
cpequipment=0.5 kJ/(kg K) (STNR)
Default
Calculate the heating time for calculation of energy losses
U=250 W/(m2 K) =15 kJ/(min
m2K) (STNR and cp<3.5)
Default
min 2.920159
45159ln
4215
9.210100
=
−
−⋅
⋅
⋅−=t
2.3.8 Ts=159°C for T2<145°C
Default
A=42 m2 (STNR and NV=25
m3)
DIN norm
Calculate the energy loss-es during heating of the reaction mixture and equipment mass
K=1.98 kJ/(min m2 K) empiri-
cal (Bieler et al., 2004)
kJ 24865
2.92
20454298.1
=
⋅
+⋅⋅=lossE
2.3.7
Calculate the total energy consumption during the heating unit operation
losstheototal EEE +=
kJ 869615
24865844750
=
+=totalE
DOCUMENTATION BASED MODELS 25
II. Modeling of energy consumption of solvent recovery in vessel 4 II.1. Modeling of heating unit operation
Calculate the heat capacity of the mixture
mML=36000 kg (mother liq-uor)
SOP*
kJ/kgK 8.3
36000
2.24500431500
=
⋅+⋅=cp
2.3.6 macetone-water=4500 kg (distil-
late) SOP*
mwater=31500 kg (residue) SOP*
Calculate the theoretical energy consumption for heating of the reaction mixture and equipment mass
T1=20°C (initial temperature) Default
kJ 11294400
)2098(5.016000
)2098(8.336000
=
−⋅⋅+
−⋅⋅=theoE
2.3.4
T2=98°C (final temperature) SOP
mequipment=16000 kg (for STNR and NV=40 m
3) Default
cpequipment=0.5 kJ/(kg K) (STNR) Default
Calculate the heating time for calculation of energy losses
U= 600 W/(m2 K)=36 kJ/(min
m2 K) (STNR and cp>3.5)
Default
min 4720159
98159ln
6636
8.336000
=
−
−⋅
⋅
⋅−=t
2.3.8 Ts= 159°C for T2<145°C
Default
26 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
A= 66 m2 (for STNR and
NV=40 m3)
DIN norm
Calculate the energy loss-es during heating of the reaction mixture and equipment mass according to Table 2.3.1
K=1.98 kJ/(min m2 K) empiri-
cal (Bieler et al., 2004)
kJ 239536
47202
20986698.1
=
⋅
−
+⋅=lossE
2.3.7
Calculate the total energy consumption during the heating unit operation
losstheototal EEE += kJ 11533936
23953611294400
=
+=totalE
II.2. Modeling of the distillation unit operation
Calculate the theoretical energy consumption for the distillation of the mix-ture acetone-water
macetone-water=4500 kg (distil-late)
SOP*
kJ 5702400
30055089004500
=
⋅+⋅=theoE
2.3.9 ∆Hv-acetone=900 kJ/kg (solvent with middle boiling tempera-ture)
Default
Reflux conditions are men-tioned in the SOP, therefore reflux is considered.
SOP
Calculate the energy loss-es during distillation
t=300 min (average distilla-tion time of the mixture ace-tone – water in the corre-sponding production plant)
Expert knowledge
kJ 3057912
300)2098(6698.1
=
⋅−⋅⋅=lossE 2.3.10
DOCUMENTATION BASED MODELS 27
K=1.98 kJ/(min m2 K) empiri-
cal(Bieler et al., 2004)
A= 66 m2(for STNR and
NV=40 m3)
DIN norm
Td=98°C SOP
Calculate the total energy consumption during the distillation unit operation
losstheototal EEE += kJ 8760312
30579125702400
=
+=totalE
II.3. Sum of the heating and the distillation unit operations
Sum the total energy val-ues for the heating and the evaporation unit opera-tions
kJ 20294248
876031211533936
=
+=totalE Bottom-up
III. Modeling of the total steam consumption for the production of product C
Sum the total energy con-sumption in vessels 1 and 4
kJ 21163862
20294248869615
=
+=totalE Bottom-up
Convert the total energy consumption in kJ to kilo-grams of 5-bar steam
Hs=2350 kJ/kg (condensa-tion enthalpy of steam (Bieler, 2004))
Default kg 00092350
21163862==Steam
28 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
2.4 Model Uncertainty
The uncertainty in the proposed documentation based bottom-up
modeling approach has two different sources: the use of standard
process documentation, and the use of default values, model
simplifications and assumptions. Although the standard process
documentation approximates production averages, it does not
always reflect the reality, since it does not provide any information
about the batch-to-batch variability. The reasons for this batch-to-
batch variability may vary, including variation of scheduling
patterns prolonging holding times, malfunction of controllers,
intended variation of process parameters to meet flexible,
dynamic production needs or even variation of ambient
conditions. On the other hand, examples of modeling
uncertainties are all the default values used for production
parameters, in order to deal with SOP data gaps, or the simplified
form of some models, as for instance the distillation models and
the energy loss terms.
Besides the use of simple intervals (Jean-Luc Chevalier, 1996),
or the traditional techniques based on probability theory, such as
analytical uncertainty propagation methods (Hong et al., 2010,
MacLeod et al., 2002), or Monte Carlo analysis (Morgan. and
Henrion., 1990), other methodologies based on fuzzy set and
possibility theory have been successfully applied to treat
uncertainty (Ferrero and Salicone, 2003, Mauris et al., 2001).
Compared to simple intervals, fuzzy intervals provide more
detailed information about the uncertainty distribution that is an
indication of central tendency and skewness or asymmetry, and
DOCUMENTATION BASED MODELS 29
not only upper and lower bounds. On the other hand, they require
more data and are computational more expensive than simple
intervals. Comparing the approaches based on probability theory
and those based on fuzzy/possibility theory, the last ones are
mathematically less robust but can be generated more readily
from small datasets and perform better when the data and model
imprecision are due to ambiguity rather than randomness, being
more compatible with heuristic information (Tan. et al., 2002). In
this work the different uncertainty sources described above are
assessed together in one term by means of simple and fuzzy
intervals (see Section A.1 in the appendix for a formal definition
of fuzzy intervals). This means that no propagation of the
individual uncertainty terms is performed, but rather a top-down
modeling of the total uncertainty. Both interval approaches were
chosen considering the data availability, the nature of the
uncertainty types, and the information obtained from the
uncertainty distribution.
2.4.1 Fuzzy Intervals
The fuzzy intervals proposed in this work represent absolute
values of relative errors showing the absolute deviations of the
documentation based energy models from reference data. As it is
shown in Figure 2.4.1, from the validation procedure the relative
errors between observed (reference) data and model results are
calculated, and subsequently the 2.5th, 25th, 75th and 97.5th
percentiles for this set of relative errors can be defined. These
percentiles further define the support and core values of the
trapezoidal membership functions (i.e., giving a, b, c and d the
30 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
2.5th, 25th, 75th and 97.5th percentile values, respectively) building
in this way our first fuzzy interval of relative errors. Parallel to this
step, a sensitivity analysis was performed in order to identify the
most influential parameters for error reduction between observed
and modeled values. After identification and correction of the
influential parameter data, the relative errors and the subsequent
percentiles are recalculated. Therefore, two different fuzzy
intervals are proposed, a broader one corresponding to the case
of less precise process parameter information, and a narrower
one corresponding to the availability of more precise data.
2.4.2 Application Example
Continuing with the example presented before (Section 2.3) and
the obtained result of total steam consumption (9000 kg of steam)
we can report the obtained value for the uncertainty range based
on the fuzzy intervals. To illustrate this, Table 2.4.1 presents the
resulting intervals corresponding to one fuzzy interval defined as
(5, 15, 40, 60). The interval between 5% and 60 % includes all
possible values which the relative errors can take in this example.
All values outside of this range are considered as not plausible.
Additionally, 15% and 40% are more plausible than 5% and 60%,
since they belong to the core of the fuzzy interval, see Figure
2.4.2. According to the values of Table 2.4.1 and considering only
the core, one would conclude as more plausible a lower bound
between 5400 and 7650 kg of steam and an upper bound
between 10350and 12600 kg of steam for the process steam
consumption.
DOCUMENTATION BASED MODELS 31
Figure 2.4.1. General procedure to generate uncertainty fuzzy intervals for
estimating relative errors within modeling of steam consumption in chemical
batch plants.
32 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Figure 2.4.2. Uncertainty estimation of the documentation based approach in
the form of a fuzzy interval for relative errors. The values a=5 and d=60
correspond to the support, and the b=15 and c=40 values to the core of the
fuzzy interval.
DOCUMENTATION BASED MODELS 33
Table 2.4.1. Application of trapezoidal fuzzy intervals (a, d: support and b, c: core of the fuzzy interval) to express uncertain-ty for the total steam consumption calculated in the example of Section 2.3.1. The shading part refers to the more plausible values according to the core of the fuzzy intervals. In the case of more precise process data, the interval estimation for steam consumption is narrower.
Fuzzy interval (a,b,c,d)
Relative error %
Relative error* [kg steam/ batch]
Lower bound [kg steam/ batch]
Upper bound [kg steam/ batch]
Fuzzy interval for less precise data
a 5 450 8550 9450 b 15 1350 7650 10350 c 40 3600 5400 12600 d 60 5400 3600 14400
Fuzzy interval for more precise data
a 1 90 8910 9090 b 5 450 8550 9450 c 20 1800 7200 10800 d 50 4500 4500 13500
* In this batch 9000 kg of steam were consumed.
34 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
3 Statistical Models
Assuming that energy utility consumption is mostly dependent on
the synthesis reaction type and on operation parameters of the
production processes rather than on specific reactants and prod-
ucts, it is possible to build generic models for steam consumption.
The models of steam consumption proposed here are based on
classic statistical modeling, where probability density functions
(PDF) are fitted to data, and on classification trees, represented
by a set of logical rules, which facilitate human interpretability. In
both cases the models can take the form of generic intervals.
3.1 System Boundaries
A reaction synthesis route includes the chemical synthesis and
work-up unit operations for the separation and recovery of the
product. In the reaction step the substrates are partially converted
to products and by-products. This reaction step can be followed
by work-up processes such as distillation, crystallization, extrac-
tion, etc. or by a next reaction step if this is an intermediate prod-
uct and recovery is not needed. Thus, as it is depicted inside the
area defined by the black pointed line in Figure 3.2.1, a synthesis
route can include one-to-n reaction steps, each of them followed
by zero-to-m work-up recovery processes. The models developed
in this work predict the steam consumption within the boundaries
defined by the grey boxes. Each steam model corresponds to a
single reaction plus the work-up processes which immediately
follow that reaction step, if there are any. From now on we will
STATISTICAL MODELS 35
refer to this system simply as reaction. The empirical yield ranges
corresponding to the analyzed reaction classes in this work are
given in Table B.1.1.2 in the appendix.
Special purification steps and drying of the product are not ad-
dressed within the system, because this would have to include a
dependence of energy consumption on the sequence of the reac-
tion step within the synthesis path. Thus, the steam models are
independent of whether the corresponding reaction is performed
as the last step of a synthesis route or not. Production of raw ma-
terials and auxiliaries, solvent recovery and waste treatment are
also not considered in the reaction models. Models and tools for
this scope are generally available (e.g., ecoinvent (Frischknecht
et al., 2005), finechem (Wernet et al., 2009), ecosolvent (Capello
et al., 2007)). Considering this, for a comprehensive cradle-to-
gate life cycle analysis, a synthesis path can be modeled as a
sequence of distinct reaction-steps by combining the individual
reaction models and filling in data for the processes not included
in the system boundaries of this work.
3.2 Training and Validation Datasets
The data acquisition for the model development was performed in
collaboration with nine industry partners in Switzerland, Germany,
France and United States, covering different sectors from basic
chemicals to pesticides and pharmaceutical products.
Since the data provided by most companies was in the form of
standard operation procedures (SOPs) and not as measured val-
ues, the training data (250 points) for the building of the statistical
36 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
models were estimated by means of the documentation based
approach previously introduced in Section 2. The values are giv-
en in kilograms of 5-bar steam per kilogram of product, according
to the system boundary defined above.
An additional industrial dataset was available from some of the
industrial partners for testing and comparing the performance of
the classification trees and the PDF models against new samples
that have not been used for model development. This case study
dataset consists of 17 modeled data of steam consumption, not
all of which, however, correspond to the documentation based
modeling approach.
STATISTICAL MODELS 37
Figure 3.2.1. System boundaries for the steam models. The outer black dashed
line represents a full synthesis route where raw materials and auxiliaries (Aux)
are entering the system and a product and waste (WST) are leaving the system.
The synthesis route can comprise one to n reaction steps. The system boundary
corresponding to one reaction is given by the grey boxes, including the reaction
synthesis and work-up processes, if they take place. The steam consumption
models are defined for each reaction system.
3.3 Stages of Process Design
Whereas at the earliest stage of process design, which we call
S1, only information regarding the reaction type is available, at a
second level of process design namely S2, the type of work-up
processes are also known. Later on, at the third (S3) and fourth
(S4) stages of process design, operational parameters such as
38 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
time and temperatures as well as mass flows are respectively in-
cluded. The latest stage of design – considered in this work –
called S5, assumes that the steam consumption of the energy
intensive distillation processes is known. For the classification
trees, which include several variables of nominal, binary and con-
tinuous type, the number and type of predictor variables differ ac-
cording to the stage of process design.
3.4 Selection and Classification of Chemical Reactions
The reaction types considered in this work cover very common
and frequent performed reactions in the chemical industry.
However, the selection of the reaction types is restricted to the
collected data, thus it does not represent a comprehensive study
of all existing reactions in production sites.
The selected reaction classes shown in Figure B.1.1.1 in the
appendix were derived heuristically following the standard form of
reaction classification in text books, which considers the formal
structural change, namely the bonds changed in a reaction.
Whenever possible, aggregation of similar reaction classes into
one group was performed in order to increase the number of data
points within a class. Similarity refers in this case to the type of
molecules involved in the reaction. For instance, in Alkylation and
Arylation reactions, alkyl and aryl groups, which are both
hydrocarbons, are introduced into a molecule. Thus Alkylations
and Arylations are included in the same reaction group. The
same is true for the different sub-classes of the
Alkylation/Arylation group, namely the C-,N-,O- and S-
STATISTICAL MODELS 39
Alkylations/Arylations. In Acylation reactions an acylating group is
introduced into a molecule. Whereas N- and O-Acylations were
aggregated in most cases, N-Acylation reactions with cyanuric
chloride as reactant were considered separately as an individual
reaction class due to its special process characteristics. In
addition to the expert knowledge, a univariate analysis of
variance (ANOVA) (Field, 2009) was performed in order to test
whether or not the mean steam consumption of these similar
reactions were significantly different among them (see Appendix,
Section B.1.2). ANOVA is a statistical technique used to
determine on the basis of one dependent measure, steam
consumption in this case, whether samples are from populations
– reaction groups – with equal means. Whereas for alkylation
reactions no significant difference was revealed and thus they
were grouped together, for acylation reactions a significant
difference was observed between acylations using cyanur
chloride as reactant and acylations using other reactants than
cyanur chloride. Therefore two different acylation categories are
considered.
3.5 Classification Models
Classification trees represent rules underlying data with
hierarchical, sequential structures. These rules partition the data
in every node of the tree based on a particular predictor variable
value (Figure 3.5.1). At every node, the resulting split optimizes
the classification for the respective tree depth. The tree is
typically grown to its full size achieving maximum classification
performance for the training data (e.g., using the CART algorithm
40 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
(L. Breiman et al., 1984)) and then pruned back in a cross
validation procedure to avoid overfitting (Section 3.4.2.). To
classify an unknown instance, this is routed down the tree
according to the values of the attributes tested in successive
nodes, and when a leaf is reached the instance is classified
according to the class assigned to the leaf.
As it was mentioned before, classification trees can include
several predictor variables of nominal, binary and continuous
type. In this work, the number and type of predictor variables
differ according to the stage of process design.
Figure 3.5.1. Example of a general classification tree. C(t): subset of classes
accessible from node t, F(t): predictor variables subset used at node t, decision
rule used at node t.
STATISTICAL MODELS 41
3.5.1 Selection of Predictor Variables and Discretization of Target Attribute
In theory irrelevant predictor variables are not selected by the tree
algorithm, however it has been observed that distracting variables
can deteriorate the classification performance and the interpreta-
bility of the tree (I. Witten, 2005). To test this effect, classification
trees were compared using two different training datasets (da-
taset-1 and dataset-2) with the same instances but different num-
ber of predictor variables. Dataset-1 comprises only a subset of
predictors from the most inclusive dataset-2, selected based on
empirical knowledge about their influence on steam consumption
(see Table 3.5.1).
42 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Table 3.5.1. Definition of the predictor variables in the training datasets 1 and 2.
Stage Predictor Type Dataset-1 Dataset-2 Description
S1 reaction type categorical x x Reaction type defined according to Figure B.1.1.1
mechanism categorical x x Reaction mechanism defined according to Table
B.1.1.1 Total 2 2
S2 mechanical binary x indicates presence or absence of mechanical processes (yes/ no). Mechanical processes include filtration, centrifugation and washing work-up processes
miscellaneous binary x indicates presence or absence of miscellaneous processes (yes/ no). Miscellaneous, are work-up processes which cannot be classified in any of the already mentioned categories (e.g. dilution of the reaction mixture and stirring at high temperature)
crystallization binary x indicates presence or absence of crystallization
processes (yes/ no)
distillation binary x x indicates presence or absence of distillation processes during the reaction work-up. It refers to simple evaporation or distillation under reflux conditions (yes/ no)
acid base reaction
binary x indicates presence or absence of acid-base reactions (yes/ no). Acid-base processes correspond to neutralization and precipitation work-up processes (e.g. previous to mechanical separation processes)
evaporation binary x indicates presence or absence of evaporation
STATISTICAL MODELS 43
processes (yes/ no). Evaporation refers to simple evaporation occurring during the reaction synthesis or any work-up step
reflux binary x x indicates presence or absence of reflux conditions
during the reaction synthesis or during the reaction work-up (yes/ no)
last reaction binary x indicates if the considered reaction is the last one of the
synthesis route (yes/no) Total* 4 10
S3 Tmean continous x Average operation temperature in °C
Tmax continous x x Maximal operation temperature in °C
time continous x x Sum over time in hours required for heating of the reaction mixture, solvent evaporation, keeping the temperature constant above the atmospheric temperature under reflux conditions or not, during the reaction synthesis and work-up processes within the defined boundary system
Total* 6 13
S4 PMI continous x x Process Mass Intensity1
PMIs continous x Solvent Mass Intensity2
PMIw continous x Water Mass Intensity3
RME continous x Reaction Mass Efficiency4
Total* 7 17
S5 Steamdist continous x x Steam consumption during distillation processes Total* 8 18
44 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
* The total number of predictor variables per design stage is cumulative, meaning that at a certain stage i the variables appearing at the previous stages are also present at stage i.
(1)product
total
m
mPMI = , (2)
product
solvents
m
mPMI = , (3)
product
waterw
m
mPMI = ,
(4)reagents
product
m
mRME =
where mtotal is the total input mass of raw materials, mproduct the mass of product,
msolvent the total input mass of solvent, mwater the total input mass of water,
mreagents the total input mass of reagents.
Regarding the discretization of the steam consumption, the
number and the width of the intervals (classes) had to be
specified. For this purpose a histogram of the steam data (Figure
3.5.2) pointed towards a compromise between homogenous
intervals and sufficient sample size in every interval. Again two
scenarios were tested, considering three (Table 3.5.2) and five
classes (Table 3.5.3) to evaluate the influence of the number of
output classes in the classification performance of the trees.
STATISTICAL MODELS 45
0 2 4 6 8 10 12 14 160
20
40
60
80
100
120
140
Steam consumption [kg/kg product]
Fre
quency
Figure 3.5.2. Histogram of the steam consumption values included in the
training dataset.
Table 3.5.2. Discretized intervals of steam consumption (target attribute)
considering three output classes. The values are given in kilograms of steam
consumption per kilogram of product.
Class label Interval Number of data points
High 3-16 61 Middle 1-3 51 Low 0-1 122
Table 3.5.3. Discretized intervals of steam consumption (target attribute)
considering five output classes. The values are given in kilograms of steam
consumption per kilogram of product.
Class label Interval Number of data points
High 5-16 28 Middle-high 3-5 33 Middle 1.5-3 35 Middle-low 0.5-1.5 48 Low 0-0.5 90
46 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
3.5.2 Model Selection and Evaluation
Similarly to a stepwise regression, in which the estimated R2 in-
creases with each additional variable, more splits in a classifica-
tion tree result in a lower misclassification error considering the
training dataset. However, as it is the case in stepwise regres-
sion, where after a certain point the introduction of more variables
causes decrease deterioration of the generalizability performance
of the model (i.e., the model starts to capture the noise included
in the data), too large classification trees can have a poorer per-
formance than trees with the right size. This phenomenon is
known as overfitting
The tree is typically grown to its full size achieving maximum
classification performance for the training data (e.g., using the
CART algorithm (L. Breiman et al., 1984)) and then pruned back
in a cross validation procedure to avoid overfitting. Tree pruning
means the trimming of the fully grown tree from the later to the
earlier nodes. The optimal level of model complexity is achieved
when the generalization performance is maximal. In order to de-
tect overfitting and assess the generalization capability of the
models, stratified tenfold cross validations were carried out. In
this case cross validation was performed by randomly dividing the
data into ten equal partitions. Each tenth is held out in turn for
testing and the remaining nine-tenths are used for training. Thus
the training procedure is performed ten times on different da-
tasets and every time the cross-validation defines the degree that
the tree must be pruned. From the average performance during
STATISTICAL MODELS 47
the ten-fold cross validation the pruning degree of the tree and
the modeling accuracy and generalization metrics are inferred.
Then to propose a final classification tree, this pruning degree is
imposed in a tree trained with all the available data. This is a
standard procedure when the data for training and testing is lim-
ited (I. Witten, 2005).
The classification performance of every output class of the tree
can be evaluated based on the counts of instances correctly and
incorrectly predicted by the model, summarized in a confusion
matrix. The confusion matrix is a contingency table, showing the
distribution of the data in the predicted classes (columns) with
respect to the actual classes (rows). Based on the confusion ma-
trix three different performance metrics, namely sensitivity, speci-
ficity and accuracy, can be calculated. These metrics can be ex-
pressed as follows:
FNFPTNTP
TNTPAccuracy
+++
+= 3.5.1
FNTP
TPySensitivit
+= 3.5.2
FPTN
TNySpecificit
+= 3.5.3
where TP (true positives) accounts for the number of instances
belonging to one class and predicted within that class, FN (false
negatives) is the number of instances belonging to one class but
not predicted within that class, FP (false positives) is the number
of instances predicted to be in one class but not belonging to that
class, and TN (true negatives) is the number of instances not be-
48 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
longing to one class and not being predicted into that class. While
high sensitivity for a certain output class implies a very general
model for this class, namely a lot of data points fitting into few
rules, low sensitivity denotes a very specific model with many
rules containing small portions of data. In order to visualize this
trade off, plots of sensitivity against (1-specificity), also called re-
ceiver operating characteristic (ROC) plots, are used (Perkins
and Schisterman, 2006, Youden, 1950). While sensitivity is the
true positives rate, (1-specificity) represents the false positive
rate. Perfect classification is depicted by the point on the left top
corner, namely the (0,1) point. Whereas high sensitive and specif-
ic models will appear on the left side of the diagonal line close to
the (0,1) point, a model which does not predict better than a ran-
dom guess will be on the right side of the line. Two different
quantitative metrics exist to quantify the optimal model perfor-
mance using ROC plots: the distance between the model point
and the (0,1) point, and the distance to the random line, also
called Youden index (Perkins and Schisterman, 2006, Youden,
1950).
The same metrics can be defined for the overall performance of
the classification tree, the difference being that they can be sum-
marized to the true positives rate of all the classes.
3.5.3 Selection of Important Rules
Besides the tree performance for every output class and overall,
classification trees produce logical rules, in which the nodes of
the tree correspond to a question about a predictor variable and
STATISTICAL MODELS 49
each branch represents an answer. Extraction of important logical
rules is therefore an important step towards model interpretability
and transparency. The importance of these rules can be consid-
ered in relation to the importance of certain predictor variables, in
the sense that an important rule should also contain important
predictors. The predictor importance indicates how strongly at-
tributes are correlated to the class, meaning the contribution of
the predictor in predicting the output class. This importance can
be quantified considering the risk reduction from parent to chil-
dren nodes due to splitting on every predictor variable (see for-
mulas in Appendix, Section B.2).
The selection of important predictors also serves the purpose of
parameterization of the PDF models, the second type of models
developed in this work.
50 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
3.6 Probability Density Function Models
The PDF models describe the variability of the gate-to-gate steam
consumption for the production of one kilogram of product for a
particular reaction type. These models are probability density
functions fitted to different datasets of steam consumption, each
of them defined by one reaction type.
The fitting was accomplished by means of the maximum likeli-
hood method (MLE) (Myung, 2003), which finds the type of distri-
bution and the value of the parameters that give the highest
probability of generating the sample data (see Appendix, Section
B.3) . The goodness of the fit was evaluated using standard sta-
tistical tests such as Chi-Square, Kolmogorov-Smirnoff, Ander-
son-Darling and the Akaike Information criteria (Akaike, 1974)
(see Appendix, Section B.3).
Generic interval based predictive models can also be derived
from the interquartile ranges of the fitted distributions. The inter-
quartile range is the difference between the lower and upper
quartiles, namely the 25th and 75th percentiles of a PDF, concen-
trates on the middle portion of the distribution (Figure 3.6.1).
Considering that the reaction type constitutes the main predictor
variable for the PDF models, parameterization of these models
means that for a specific reaction type the dataset is split to sub-
sets according to process parameters, such as temperature, time,
etc. An example would be the partition of the dataset for the con-
densation reaction into two datasets, a first one for condensations
where distillation processes take place and a second one where
STATISTICAL MODELS 51
distillation processes do not take place. In this context the rules
extracted from the classification trees are used to define the rele-
vant parameters for splitting the datasets of each reaction type.
The need for further parameterization arises when the model in-
tervals of the initial PDF models are very broad or when the
goodness of fit indicates a poor fitting of the data.
0 1 2 3 4 5 6 7 8 9 100
2
4
6
8
10
12
14
Steam consumption [kg/ kg product]
Fre
quency
25th percentile 75th percentile
interquartile range
Figure 3.6.1. Histogram of steam consumption data with superimposed fitted
exponential probability density function.
52 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
4 Results Documentation based Approach
The documentation based methodology for energy modeling
described above was validated in three different case studies
against reference data from three different chemical companies in
Switzerland. The system boundary for the first case study
comprises all unit operations performed in one vessel during the
production of a specific product. Besides a vessel-product
boundary, the second case study also considers the full synthesis
route for the production of a specific product (outer black dashed
in Figure 3.2.1) and the entire production building (i.e., including
the whole series of chemical products). The system boundary for
the third case study corresponds to a single reaction step as
defined in Section 3.1 (grey boxes in Figure 3.2.1).
In addition to the model validation, the results from the first case
study were used to determine the fuzzy uncertainty intervals
introduced in Section 2.5. The cores of these fuzzy intervals were
then used for the model predictions of the second case study. For
simplicity reasons we focus only on the core, in a way analogous
to simple intervals. Even though the full fuzzy interval is not
applied in our case studies, we present the comprehensive
approach in order to enable further uncertainty calculations which
are out of the scope of this thesis, but which could be of interest
for further calculations applying fuzzy logic algebra. For example,
this would be the case when an estimate of the energy cost in
one production site is required, where the energy consumption
model results and the cost of steam are both represented by
fuzzy intervals.
RESULTS DOCUMENTATION BASED APPROACH 53
4.1 Case Study I
4.1.1 Dataset
Dataset-1 comes from a multipurpose batch production building in
Switzerland, where around 50 products including specialty
chemicals and intermediates are produced per year. The
production building has 38 equipments operated in batch mode
with typical nominal volumes between 6.3 to 40 m3. Besides the
equipments operated in batch mode, there are also unit
operations in continuous mode, which were not considered in this
work. This is due to the fact that unlike for batch processes, for
continuous operations there are usually measurements of steam
consumption. Therefore, the documentation based energy
modeling is particularly interesting for batch processes.
Dataset-1 comprises values of 5-bar steam consumption for 18
steam consumption relevant equipment-product pairs produced in
several batches in the production building. This allows an
estimation of the batch-to-batch variability of the steam
consumption for each equipment-product pair. For this reason,
the median and the 2.5th and 97.5th percentiles of steam
consumption for each equipment-product pair were calculated.
These steam data come from an Energy Monitoring Tool (EMT)
installed in the plant, which is a model-based tool that uses
mainly existing data acquisition systems for monitoring the
production to calculate both theoretical energy consumption (5-
bar steam consumption) and associated losses. The calculation is
performed for every production step at high resolution (i.e., on
minute basis) and the results can be aggregated at different
54 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
levels. In this work, an aggregation per batch and individual
equipment-product pair has been selected, consisting of a set of
batch process steps for a specific product produced in that
equipment. It should be noted that at the equipment-product level
the average of the absolute relative error values of the EMT is 10
%. In this work, for simplicity it is assumed that the EMT values
are error free, hence the intrinsic uncertainty of the EMT is not
considered in the validation procedure of the documentation
based approach.
4.1.2 Theoretical Energy Consumption
As can be seen from the scatter-plot in Figure 4.1.1, most values
fall near the diagonal indicating a good agreement between
predicted and observed values. In addition, most of the model
predictions fall within the batch-to-batch variability ranges. This is
visually depicted by the reference intervals crossing the diagonal
line. There are, however, some data points that deviate
considerably from the reference data ranges. For example, in the
case of points 10 and 16, the inaccuracy of the model predictions
is due to the modeling of reflux conditions, which depends only
on operation time and an empirical constant (constant C in 2.3.11
of Table 2.3.1) and not on accurate data of reflux ratios of the
distillation processes carried out in these equipments. In these
cases either a more detailed model should be used, or
alternatively, a parameterization of constant C for diverse
distillation operation modes would be required for more accurate
results using the documentation based approach. Regarding
point 4, the overestimation is a consequence of the inaccuracy of
RESULTS DOCUMENTATION BASED APPROACH 55
the standard physicochemical properties used in the model.
Replacing these standard property values with accurate
substance specific data, if available, would provide the necessary
additional accuracy. A detailed analysis for all equipment-product
pairs of Figure 4.1.1 is presented in Table C.3.1 in the appendix.
2000 4000 6000 8000 10000 12000 14000
2000
4000
6000
8000
10000
12000
14000
Model predictions [kg steam/batch]
Ob
se
rve
d m
ed
ian
& b
atc
h−
to−
ba
tch
va
ria
bili
ty
[kg
ste
am
/ba
tch
]
4
10
16
Figure 4.1.1. Model predictions (documentation based approach) against
reference values (model based plant monitoring system) for the theoretical
energy consumption per equipment-product pair in the first case study (dataset-
1). The batch-to-batch variability corresponds to the 2.5 and 97.5 percentiles of
the observed energy consumption over several batches.
4.1.3 Energy Losses
Compared to the theoretical energy consumption, the predicted
values of energy losses present a higher deviation from the
reference, as can be seen in Figure 4.1.2. Among the different
sources of deviation, it is worth to mention inaccuracies regarding
56 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
the heat exchange area of the equipment, the temperature
control of the reaction mixture and the standard values used for
the loss coefficient (K in equation 2.3.10 of Table 2.3.1) for unit
operations at low temperature. A detailed analysis for all
equipment-product pairs of Figure 4.1.2 is presented in Table
C.3.1 in the appendix. Despite the aforementioned deviations, the
energy loss model performs satisfactorily in most of the cases,
especially for the unit operations with higher energy losses, which
can be considered as the most relevant for identification of the
energy saving potential.
4.1.4 Sensitivity and Uncertainty Analysis
The relative errors from the validation of the theoretical energy
consumption and energy losses in the first case study were used
to derive generic model uncertainty intervals based on the fuzzy
set theory. For this purpose, a sensitivity analysis was performed
in order to detect the most influential parameters on the model
results. For the theoretical energy consumption, the factor with
the highest effect on the energy model results was found to be
the existence of reflux conditions. For the energy losses, the
duration of a unit operation was identified as the most influential
parameter on model predictions. A detailed discussion of the
sensitivity analysis procedure is provided in the appendix (Section
C.1). In order to capture the influence of the reflux ratio and the
duration of a unit operation on the model uncertainty, different
fuzzy intervals were proposed depending on the available
process information. The more detailed the information is, the
narrower the fuzzy interval will be. The resulting uncertainty
RESULTS DOCUMENTATION BASED APPROACH 57
intervals are depicted in Figure 4.1.3, in the form of relative errors
for the models of theoretical energy consumption, energy losses
and total energy consumption. The uncertainty intervals for the
total energy consumption models shown in Figure 4.1.3 are
derived through propagation of the respective values for the
theoretical energy consumption and energy losses. In order to
select the proper uncertainty interval for the models of total
energy consumption, Table 4.1.1 demonstrates the possible
combinations regarding process characteristics and available
information. For instance, if it is known that a process runs under
reflux conditions and there are accurate data for the duration of
the heating and distillation operations, then an estimation of the
uncertainty range for the prediction of total energy consumption
can be made according to case 3 of Table 4.1.1, that is a relative
error between 13% and 45% with equal possibility, considering
the core of the fuzzy interval.
1000 2000 3000 4000 5000 6000
1000
2000
3000
4000
5000
6000
Model predictions [kg steam/batch]
Observ
ed m
edia
n &
batc
h−
to−
batc
h v
ariabili
ty
[kg s
team
/batc
h]
14 15
174
18
Figure 4.1.2. Model predictions (documentation based approach) against
reference values (model based plant monitoring system) for the energy losses
58 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
per equipment-product pair in the first case study (dataset-1). The batch-to-
batch variability corresponds to the 2.5 and 97.5 percentiles of the observed
energy losses over several batches.
0 20 40 60 80 100 120 1400
0.5
1
0 20 40 60 80 100 120 1400
0.5
1
Po
ssib
ility
0 20 40 60 80 100 120 1400
0.5
1
Relative error
abcd=[3 20 61 89]
abcd=[3 13 45 88]
abcd=[6 29 84 100]
abcd=[5 17 66 97]
abcd=[1 14 45 82]
abcd=[1 10 32 82]
(i)
(ii)
(iii)
Figure 4.1.3. Uncertainty estimation in the form of fuzzy intervals for relative
errors of the documentation based approach for i) Theoretical energy consump-
tion, ii) Energy losses. iii) Total energy consumption. Solid and dashed lines
correspond to different levels of available process information (i.e., solid lines
correspond to higher relative errors when there is limited or no information about
reflux taking place during distillation, and for energy losses when there is limited
or no information about the duration of unit operations). The a, d values corre-
spond to the support and the b, c values to the core of the fuzzy intervals.
RESULTS DOCUMENTATION BASED APPROACH 59
Table 4.1.1. Derived fuzzy intervals from the first case study for uncertainty representation in the models for theoretical
energy consumption (Etheo), energy losses (Eloss), and total energy consumption (Etot).
Process parameters Fuzzy intervals for absolute values of relative errors (%)
Distillation Operation time* Etheo Eloss Etot
Case-1 none known (1 10 32 82) (5 17 66 97) (3 13 45 88)
Case-2 none unknown (1 10 32 82) (6 29 84 100) (3 17 53 89)
Case-3 conditions known known (1 10 32 82) (5 17 66 97) (3 13 45 88)
Case-4 conditions known unknown (1 10 32 82) (6 29 84 100) (3 17 53 89)
Case-5 conditions unknown known (1 14 45 82) (5 17 66 97) (3 16 53 88)
Case-6 conditions unknown unknown (1 14 45 82) (6 29 84 100) (3 20 61 89)
* Excluding distillation time.
60 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
4.1.5 Total Energy Consumption
By applying these uncertainty intervals for the total steam
consumption in the case of the dataset-1, the model uncertainty
can be visualized along with the batch-to-batch variability, as
shown in Figure 4.1.4. The model performance with respect to
Figure 4.1.4 can be analyzed according to three levels of model
success. The first level corresponds to a full success of the model
prediction, when both intervals cross the diagonal line. The
second and third level of success correspond to a crossing of the
diagonal line because only of the batch-to-batch variability
interval (asymmetrical interval with respect to median point
estimation) or only by the model uncertainty range (symmetrical
interval with respect to the model point estimation), respectively.
As can be seen in Figure 4.1.4 and reported in detail in Table
4.2.1, the model performance is satisfactory both in success
terms described above and in terms of typical statistical indices.
In 12 out of 18 cases, the predictions lie within the batch-to-batch
variability ranges and at the same time the model uncertainty
intervals cross the diagonal line. In two more cases there is a
second and third level success. There are, however, three cases
of model inaccuracy. The deviation of point 10 is due to the
theoretical energy consumption as discussed in Figure 4.1.1,
while for points 15 and 18 the deviation arises from the energy
loss model inaccuracy as discussed in Figure 4.1.2. With respect
to point estimations (i.e., not considering the intervals of batch-to-
batch variability and the model uncertainty intervals), the mean
absolute relative error (MARE) of the total energy consumption
model is 27%, which is acceptable for a screening methodology,
RESULTS DOCUMENTATION BASED APPROACH 61
while the different forms of coefficient of determination(Willmott.
et al., 2012) lie between 0.86 and 0.89. Hence, these statistical
parameters represent a further indication of the good prediction
capabilities of the documentation based energy modeling
approach. A full list of statistical indices for the model
performance is available in the appendix (Section C.1).
2000 4000 6000 8000 10000 12000 14000 16000 18000
2000
4000
6000
8000
10000
12000
14000
16000
18000
Model predictions with fuzzy intervals (core)[kg steam/batch]
Observ
ed m
edia
n &
batc
h−
to−
batc
h v
ariabili
ty
[kg s
team
/batc
h]
10
18
15
Figure 4.1.4. Model predictions (documentation based approach) against refer-
ence values (model based plant monitoring system) for the total energy con-
sumption per equipment-product pair in the first case study (dataset-1). The
batch-to-batch variability corresponds to the 2.5 and 97.5 percentiles of the ob-
served total energy consumption over several batches. The bold segments of
the fuzzy intervals correspond to the minimal core value (b). The simple seg-
ments of the fuzzy intervals correspond to the maximal core value (c).
62 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
4.2 Case Study II
4.2.1 Dataset
In this case the reference data (dataset-2) come from two smaller
production buildings. The first is a mono-product batch plant, and
the second is a multiproduct batch plant producing four different
products, three in batch mode and one in continuous mode. The
nominal volumes for the equipments in both plants range
between 6.5 to 12 m3. In both plants all equipments operating in
batch mode and consuming steam for heating were considered
for the validation. This implies a total of 20 equipment-product
pairs for which steam consumption data from several batches
have been collected, and the median, the 2.5th and 97.5th
percentiles were calculated as reference data. The data sources
of these two buildings come from energy monitoring tools, which
follow the same principle as in the first case study. Consequently,
the quality of the reference data is equal in both case studies.
For the multiproduct plant, it was convenient to additionally
perform a top-down approach on the basis of a multi-linear
regression analysis of the overall steam consumption and the
production mass of the three chemicals produced in batch mode
in the multiproduct building according to equation 4.2.1:
3322110 mmmE ⋅+⋅+⋅+= ββββ (4.2.1)
where E is the total steam consumption of the building minus the
steam consumed by solvent regeneration processes operated in
RESULTS DOCUMENTATION BASED APPROACH 63
continuous mode, m1, m2, m3 are the production mass of the
three chemicals in a given time horizon, β0 represents the base
consumption of the multiproduct building, and β1, β2, β3 represent
the specific steam consumption per product mass. Besides the
infrastructure consumption and losses, the base consumption
includes the constant consumption corresponding to the product
produced in continuous mode. The data for this analysis
corresponds to a period of three years collected on a monthly
basis. Here, it is intended to compare the performance of the
proposed process documentation based approach with a
frequently used top-down approach in batch production for
allocating energy consumption to specific products and for
estimating the total consumption of the building.
4.2.2 Total Energy Consumption
The results of the documentation based approach for the
prediction of the total energy consumption in the second case
study (dataset-2) are presented in Figure 4.2.1. Again, the
agreement between reference and predicted values is
satisfactory in most of the cases. As in the first case study, the
considerable deviations (i.e., points 6 and 12 in Figure 4.2.1)
involve distillation processes under unknown reflux conditions.
Further discussion for all individual cases is available in the
appendix (Section C.3). Comparing the model performance in the
two case studies (Table 4.2.1), it can be inferred that the model
predictions are slightly better for the first case study. This is due
to the fact that data for unit operation time were in greater extent
available in the first case study, and as it has been shown from
64 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
sensitivity analysis the unit operation time is one of the most
influential parameters on the model results. Moreover, the batch-
to-batch variability ranges are much narrower in the second case
study since the reference values correspond to a period of two
months, compared to three years in the first case study. This has
an impact on the model success cases reported in Table 4.2.1,
the number of full successes being here lower than in the first
case study. On the other hand, the number of model success
considering only the model uncertainty intervals is significantly
higher, reaching the success levels of the first case study.
Furthermore, a MARE of 37% and coefficients of determination in
the range of 0.79 to 0.83 confirm the good prediction capabilities
of the documentation based energy modeling approach for fast
screening purposes in multipurpose batch plants.
Table 4.2.1. Statistical results for the validation of the total energy consumption
model.
Statistical
parameters
Case study 1
(dataset-1)
Case study 2
(dataset-2)
successes 12* 1** 1*** 7* 1** 6***
N 18 20
MARE 0.27 0.37
dr 0.86 0.83
q2 0.86 0.79
r2 0.89 0.80
* successes within batch-to-batch variability range and within uncertainty range
(fuzzy intervals).
** successes within batch-to-batch variability range.
*** successes within uncertainty range (fuzzy intervals).
RESULTS DOCUMENTATION BASED APPROACH 65
0 1000 2000 3000 4000 5000 6000 7000 80000
1000
2000
3000
4000
5000
6000
7000
8000
Model predictions with fuzzy intervals (core)[kg steam/batch]
Observ
ed m
edia
n &
batc
h−
to−
batc
h v
ariabili
ty
[kg s
team
/batc
h]
6
12
2,4,8,16,20
Figure 4.2.1. Model predictions (documentation based approach) against refer-
ence values (model based plant monitoring system) for the total energy con-
sumption per equipment-product pair in the second case study (dataset-2). The
batch-to-batch variability corresponds to the 2.5 and 97.5 percentiles of the ob-
served total energy consumption over several batches. The bold segments of
the fuzzy intervals correspond to the minimal core value (b). The simple seg-
ments of the fuzzy intervals correspond to the maximal core value (c).
4.2.3 Top-down Energy Modeling
In the second case study, the performance of the documentation
based models has also been tested against a top-down modeling
approach and reference values on chemical product and
production building basis. The steam consumption per product
shown in Table 4.2.2 was modeled by means of bottom-up
calculations with a model-based EMT installed in the plant
(bottom-up reference), with the documentation based models
(bottom-up model predictions), and also extracted from the multi-
66 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
linear regression coefficients for the top-down predictions. The
production building base consumption was in all cases equal to
the constant parameter delivered by the regression analysis. As it
can be seen in Table 4.2.2, the documentation based predictions
are much closer to the reference values for all three products
compared to the top-down approach based on the multi-linear
regression coefficients. However, when comparing in Figure 4.2.2
the performance of these different approaches for the modeling of
the production building steam consumption per month during one
year period, the top-down approach predictions present a similar
performance to the bottom-up reference values, being both in
better agreement with the steam measurements than the
documentation based bottom-up models, which systematically
underestimate the total steam consumption of the building. This,
however, is due to the fact that many operations requiring steam
are not production dependent (e.g. heating of solvents for
equipment cleaning), and therefore not part of the standard
process description. Hence, they are not captured by the
documentation based approach.
On the other hand, the “black box” top-down approach allocates
these operations to the production processes trying to match the
overall steam consumption of the building. Although this may
result in an integral success of the top-down approach model
performance, it is not suitable for allocating energy consumption
to particular products, especially if the product mass variability
over time is not high (i.e., the multi-linear regression analysis
could infer parameters which do not reflect the real specific
energy consumption per product, as it is the case in Table 4.2.2).
The documentation based bottom-up approach captures much
RESULTS DOCUMENTATION BASED APPROACH 67
better these trends, although it may result in deviations for the
energy consumption at the production building level because of
unavailable information about non-standardized, non-production
dependent unit operations.
Table 4.2.2. Comparison of the bottom-up and the top-down model predictions
against reference values of steam consumption for the production of three dif-
ferent products in the multiproduct building of case study 2.
Steam consumption [kg/ kg product]*
Product Bottom-up
reference values
Bottom-up
documentation
based approach
Top-down
approach
A 0.68 0.71 1.31
B 0.61 0.48 1.69
C 9.36 4.40 0.00
* These values do not include continuous equipment and base consumption.
0 Feb Apr Jun Aug Oct Dec
400
600
800
1000
1200
1400
1600
1800
2000
2200
Time [month]
Ste
am
consum
ption o
f th
e b
uild
ing [T
o]
building
bottom−up reference
bottom−up new model
top−down
Figure 4.2.2. Monthly total steam consumption of the multiproduct building in
case study 2. Three different model predictions of steam consumption (i.e., bot-
68 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
tom-up reference values from the model-based plant monitoring tool, estima-
tions from the bottom-up documentation based approach and a top-down ap-
proach) are compared against measurements of the total steam consumption of
the building.
RESULTS DOCUMENTATION BASED APPROACH 69
4.3 Case Study III
The dataset used in this case study comes from two different
chemical companies in Switzerland, one of the companies being
the same as in case study 2. Thus, part of the reference data
come from energy monitoring tools, which follow the same
principle as in the first and second case studies. However, in this
case we only consider the median steam consumption over the
different batches. The remaining data comprise estimated values
by means of rigorous modeling using Aspen Plus®
(www.aspentech.com).
Unlike the first two case studies, here we do not focus on single
equipments, but on reaction steps corresponding to the system
boundary defined in Section 3.1, namely the steam consumption
per reaction for the production of one kilogram of product. Each
of these reaction steps can involve several equipments (see
Table 4.3.1). The reactions in the dataset include acylations,
alkylations, condensations, halogenations and hydrolysis.
Table 4.3.1. Number of equipments used in the reaction synthesis path of the
products in dataset-3 (see reaction boundary defined in Section 3.1).
Product in dataset-3 Number of equipments A 4 B 4 C 1 D 3 E 2 F 4 G 8 H 1 I 1 J 2
70 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
The results presented in Figure 4.3.1 show a very good
agreement between predicted and observed values, with most of
the model predictions falling within the batch-to-batch variability
ranges and near to the diagonal. Therefore this validation shows
that the documentation based models also predict well at a more
aggregated level, namely at the reaction step level. These results
indicate that estimated values by the documentation based
approach can serve as input data for the building of the statistical
models, which were introduced in Section 3 and which results are
presented in the next section.
0 2 4 6 8 100
2
4
6
8
10
Model predictions with fuzzy intervals (core) [kg steam/kg product]
Observ
ed v
alu
es [kg s
team
/kg p
roduct]
Figure 4.3.1. Model predictions (documentation based approach) against refer-
ence values (measurements and rigorous model estimations) for the total ener-
gy consumption per product in the third case study (dataset-3). The bold seg-
ments of the fuzzy intervals correspond to the minimal core value (b). The sim-
ple segments of the fuzzy intervals correspond to the maximal core value (c).
RESULTS CLASSIFICATION TREES 71
5 Results Classification Trees
5.1 Model Selection and Evaluation
For analyzing the performance of the classification trees we start
by considering dataset-1 (maximal 8 predictor variables, Table
3.5.1) as training set, priors based on class frequencies and three
output classes. The selection of priors did not show any consid-
erable influence on the model performance at any of the design
stages considered in this work (Appendix, Section D.1). The re-
sults of the cross validation for the five classification trees (S1 to
S5) are depicted in Figure 5.1.1 as an ROC plot. The ROC plots
presented in this paper are displayed on different axis scales,
where the specificity appears in higher resolution than the sensi-
tivity. This is important to have in mind, since a small change in
specificity implies a big shift on the x axis. As expected, an im-
provement from S1 to S5 for both training and test sets is ob-
served, since more process information is available for the mod-
els. Secondly the model performance for the training and test sets
is similar, with a difference of less than 11% and 6% for the sensi-
tivity and specificity respectively. This suggests that the models
are not overfitted.
Additionally, we assess the influence of having more predictor
variables and output classes on the model performance. For this
purpose we consider the following four different scenarios:
i. 8 candidate predictor variables (dataset-1) and 3 output classes (initial scenario)
ii. 8 candidate predictor variables (dataset-1) and 5 output
72 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
classes iii. 18 candidate predictor variables (dataset-2) and 3 output
classes iv. 18 candidate predictor variables (dataset-2) and 5 output
classes
0.08 0.1 0.12 0.14 0.16 0.180
0.2
0.4
0.6
0.8
1
S1S2
S3S4
S5
S1
S2
S3
S4S5
1− Specificity
Se
nsitiv
ity
cv−training
cv−test
random line
Figure 5.1.1. Average model performance for cross-validation training and test
sets for five stages of process design (S1 to S5), considering dataset-1
(maximal 8 predictor variables) and 3 output classes. The line denotes random
classifier performance. Models that fall into the right region defined by the
random line perform worse than random performance, and models that fall into
the left region perform better than random performance. The point in the top left
corner depicts perfect classification.
For the second scenario, we observe a similar trend to the initial
scenario (Figure 5.1.2) – when comparing model performance for
the training and test sets – but with generally lower sensitivity
values and an indication of overfitting for some design stages.
RESULTS CLASSIFICATION TREES 73
0.06 0.08 0.1 0.12 0.14 0.160
0.2
0.4
0.6
0.8
1
S1S2
S3
S4S5
S1S2
S3S4
S5
1− Specificity
Sensitiv
ity
cv−training
cv−test
random line
Figure 5.1.2. Average model performance for cross-validation training and test
sets for five stages of process design (S1 to S5), considering dataset-1
(maximal 8 predictor variables) and 5 output classes. The line denotes random
classifier performance. Models that fall into the right region defined by the
random line perform worse than random performance, and models that fall into
the left region perform better than random performance. The point in the top left
corner depicts perfect classification.
Considering now the results from the cross validation test for the
four different scenarios, we observe in Figure 5.1.3 that the
number of output classes is a more influential factor than the
addition of more predictor variables. The performance of the
models having three output classes is in most cases better,
especially in terms of sensitivity, than the models with five
classes, considering both dataset-1 and dataset-2. Being
sensitivity the true positives rate, this general trend with respect
to the number of classes was expected considering that the
74 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
probability of obtaining true positives as prediction outcomes
decreases with the number of classes.
Overall, the results of the preliminary analysis for the model
selection indicate that the case of three output classes and eight
predictor variables (dataset-1) presents the best performance and
highest interpretability due to lower model complexity. Moreover,
the additional resolution for the case of the five output classes is
also not well supported from the independent class analysis.
0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.220
0.2
0.4
0.6
0.8
1
S1S2
S3S4
S5
S1
S2S3
S4
S5
S1S2S3
S4
S5
S1
S2
S3
S4S5
1− Specificity
Sensitiv
ity
18 predictors/3 classes
18 predictors/5 classes
8 predictors/5 classes
8 predictors/3 classes
random line
Figure 5.1.3. Average model performance for cross validation test set for five
stages of process design (S1 to S5) considering dataset-1 (8 candidate
predictor variables) and dataset-2 (18 candidate predictor variables) and 3 and 5
output classes. The line denotes random classifier performance. Models that fall
into the right region defined by the random line perform worse than random
performance, and models that fall into the left region perform better than random
performance. The point in the top left corner depicts perfect classification.
Up to this point we have considered the overall performance of
RESULTS CLASSIFICATION TREES 75
the classification trees. Figure 5.1.4 shows the performance of
the selected model per output class. It is interesting to notice that
while the low and high classes tend to form clusters, the middle
class is rather scattered on the plot. The low class presents high-
er sensitivity and slightly lower specificity than the high and mid-
dle classes. This can be explained considering that there are
more data points belonging to the low class than to the other
classes in a ratio of approximately 2 to 1. In general, when the
class sizes are not equal, the model favours the larger class in
terms of sensitivity and overall success rate or accuracy, but per-
forms less well regarding specificity. Following the low class, the
high class shows a better performance than the middle class.
Moreover, for the middle class we observe a performance im-
provement from S1 to S5 in terms of sensitivity, while the other
two classes improve in terms of specificity. Overall, these results
show that except for the middle class in S1, all other classes at
different stages of process design appear at the left side of the
ROC plot, indicating satisfactory model performance. On the con-
trary, the classification trees with five output classes (Figure
5.1.5), show a poor performance for the middle-low, middle, and
middle-high classes. These results also support the decision of
having three output classes instead of five. The sensitivity and
specificity values as well as the distances to the (0,1) point and
the random line for the tree output classes trees built from da-
taset-1 can be found in Table D.2.1 in the appendix.
76 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
0 0.1 0.2 0.3 0.4 0.50
0.2
0.4
0.6
0.8
1
s1s2
s3
s4
s5
s1
s2
s3s4
s5
s1
s2
s3s4s5
1−Specificity
Sensitiv
ity
low
middle
high
random line
Figure 5.1.4. Model performance per class for cross validation test set for five
stages of process design (S1 to S5), considering dataset-1 (8 candidate
predictor variables) and 3 output classes. The line denotes random classifier
performance. Models that fall into the right region defined by the random line
perform worse than random performance, and models that fall into the left
region perform better than random performance. The point in the top left corner
depicts perfect classification.
RESULTS CLASSIFICATION TREES 77
0 0.05 0.1 0.15 0.2 0.25 0.30
0.2
0.4
0.6
0.8
1
S1S2
S3S4
S5
S1
S2S3
S4
S5
S1
S2S3
S4
S5
S1S2
S3
S4
S5
S1
S2S3
S4
S5
1− Specificity
Sensitiv
ity
low
mid−low
mid
mid−high
high
random line
Figure 5.1.5. Model performance per class for cross-validation test set for five
stages of process design (S1 to S5), considering dataset-1 (maximal 8 predictor
variables) and 5 output classes. The line denotes random classifier
performance. Models that fall into the right region defined by the random line
perform worse than random performance, and models that fall into the left
region perform better than random performance. The point in the top left corner
depicts perfect classification.
78 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
5.2 Selection of Important Rules
The most important rules at each stage of process design are
presented in Table 5.2.1, one rule corresponding to a path in the
decision tree. For every output class the most sensitive rule was
selected as the most important one, since sensitivity was found to
be the most critical metric, especially for the middle class. To bet-
ter illustrate this procedure, Figure 5.2.1 shows the classification
tree developed for S4 with a total of seven paths corresponding
to seven rules. Three outputs are highlighted corresponding to
the three rules presented in Table 5.2.1. For instance, the path
leading to the highlighted high class output corresponds to rule-3
for S4 and can be stated as follows:
IF the reaction type is acylation OR alkylation OR complexation OR
condensation OR hydrolysis OR polymerization OR reduction, AND the
operation time is higher than 18 hours THEN the steam consumption is
high.
The performance of this model is presented in Figure 5.2.2 by
means of a resubstitution performance, where the training data is
presented on the x-axis and the predicted classes on the y-axis.
As it was previously shown in Figure 5.1.4, the S4 model is per-
forming very well for the low and high classes, and satisfactorily
for the middle class. We also see that the percentage of underes-
timations is lower than the percentage of overestimations. Con-
sidering these results, we can affirm that the S4 classification tree
can be used not only for descriptive purposes, but also for predic-
tions of steam consumption.
RESULTS CLASSIFICATION TREES 79
Table 5.2.1. Most important rules of the classification trees at the five stages of process design (S1 to S5).
Sta
ge
Ru
le
Re
actio
n
Me
ch
an
ism
Re
flu
x
Dis
tilla
tio
n
Tim
e
T m
ax
PM
I
Ste
am
dis
t
Cla
ss
S1 1 acylation (cyanur chloride) , azo-
coupling, diazotization, elimination,
halogenation, sulfonation
low
2 acylation, alkylation, complexation,
condensation, hydrolysis, polymerization,
reduction
HC,SN1,
SN2,SNAr
high
S2 1 acylation (cyanur chloride), azo-coupling,
diazotization, elimination, halogenation,
sulfonation
low
2 acylation, alkylation, complexation,
condensation, hydrolysis, polymerization,
reduction
HC,SNAr no middle
3 acylation, alkylation, complexation,
condensation, hydrolysis, polymerization,
yes high
80 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
reduction
S3 1 acylation (cyanur chloride), azo-coupling,
diazotization, elimination, halogenation,
sulfonation
low
2 acylation, alkylation, complexation,
condensation, hydrolysis, polymerization,
reduction
HC,SN2,
SNAr
no <18 middle
3 acylation, alkylation, complexation,
condensation, hydrolysis, polymerization,
reduction
>18 high
S4 1 acylation (cyanur chloride), azo-coupling,
diazotization, elimination, halogenation,
sulfonation
low
2 acylation, alkylation, complexation,
condensation, hydrolysis, polymerization,
reduction
HC,SN2,
SNAr
no <18 middle
3 acylation, alkylation, complexation,
condensation, hydrolysis, polymerization,
reduction
>18 high
RESULTS CLASSIFICATION TREES 81
S5 1 <80 <1.5 low
2 >80 0.5-1.5 middle
3 >1.5 high
In this table each row corresponds to one rule, each column starting from the third one to a predictor variable and the last
column to the output class. The grey areas indicate when a predictor variable is not present at the corresponding design
stage.
Considering the categorical predictor variables, reaction and mechanism, the logical rule operation for these predictors
corresponds to “OR”. The relation between the different predictors is given by the logical operator “AND”. For example, the
second rule of S1 can be formulated as follows:
IF the reaction type is equal to acylation OR alkylation OR complexation OR condensation OR hydrolysis OR polymerization OR
reduction, AND the mechanism is equal to HC OR SN1 OR SN2 OR SNAr THEN the steam consumption is high.
Reaction mechanisms are included only in cases where at least one of the reaction types can undergo through more than
one mechanism. In this way we consider the reaction mechanism as additional information to the reaction type. This choice
is consistent with the predictor importance depicted in Figure 5.2.3, which shows a higher relevance of the reaction type
compared to the mechanism at the five stages of process design.
82 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
time>18h
Tmax<93°C Tmax>93°C no distillation distillation
mechanism1** mechanism2**
reactions1* reactions2*
time<18h
PMI<4kg PMI>4kg
LOW
LOW
HIGH
MIDDLE HIGH
MIDDLELOW
Figure 5.2.1. Classification tree for the S4 design stage. The highlighted end nodes correspond to the most important rules
of S4 presented in Table 5.2.1. Rule 1 predicts low steam consumption, rule 2 predicts middle steam consumption, and rule
3 predicts high steam consumption. *Reactions1: acylation, alkylation, complexation, condensation, hydrolysis,
polymerization, reduction. Reactions2: acylation (cyanur chloride), azo-coupling, diazotization, elimination, halogenation,
sulfonation. ** Mechanism1: AN, AEN,SEAr. Mechanism2: HC, SN2, SNAr (see Table B.1.1.1 in the appendix).
RESULTS CLASSIFICATION TREES 83
0 1 3 160
low1
middle
3
high
16
Steam training dataset [kg/kg product]
Ste
am
mo
de
l cla
sse
s [
kg
/kg
pro
du
ct]
27%
95%
2%
16% 11.5%
57% 11.5%3%
77%
Figure 5.2.2. Model performance for the S4 tree in scatter plot of target values
(x-axis) versus predicted values (y-axis) (resubstitution performance)
considering dataset-1 (8 candidate predictor variables) and 3 output classes.
The data points lying inside the bold boxes on the diagonal axis represent the
data which actually belong to one class and were predicted within that class.
The points lying inside the boxes on the non diagonal bottom right area
represent underestimated values. The points lying inside the boxes on the non
diagonal top left area represent overestimated values
From Table 5.2.1, we can see that the reaction type appears in all
rules from S1 to S4, indicating a high importance of this predictor
variable. This hypothesis is confirmed by the predictor importance
plot depicted in Figure 5.2.3, where the reaction type presents the
highest importance from S1 to S4 among all attributes. The oper-
ation time, which appears in two rules in S3 and S4, suggests a
higher influence of this variable compared to other parameters.
This high influence of the time variable in S3 and S4 can also be
observed in Figure 5.2.3. On the other hand, the rules derived for
S5 follow a different pattern. The reaction type is not part of the
84 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
rules anymore, and the temperature, which was not appearing in
S3 and S4 is now included in two of the rules. The rules seem to
be dominated by the distillation steam consumption, which is not
surprising, since this is part of the target attribute, and is thus
highly correlated to it. These trends in the rules for S5 can be also
observed in the predictor importance plot in Figure 5.2.3. Summa-
rizing, we can affirm that the most sensitive rules, which were se-
lected to be the most important for the classification trees, include
the most important predictor variables.
reaction
mechanismreflu
x
distillatio
ntim
eTmax
PMI
steamDist
Pre
dic
tor
import
ance
S1
S2
S3
S4
S5
Figure 5.2.3. Predictor importance in classification trees at five stages of
process design (S1 to S5).
RESULTS PROBABILITY DENSITY FUNCTION MODELS 85
6 Results Probability Density Function Models
6.1 Model Development
Since the most important predictor variable for S1 to S4 has been
shown to be the reaction type, the decision of constructing PDF
models on this minimum process information is supported. In ad-
dition, the extracted rules served to further parameterize the PDF
models, where necessary. Table 6.1.1 shows the empirical medi-
an, minimum and maximum steam consumption values of the dif-
ferent datasets corresponding to reaction types (first column) and
the further parameterized subsets (second column). The fitted
PDF with the corresponding parameters, their median, first and
third quartiles, and 2.5th and 97.5th percentiles are also presented.
The assessment of the goodness of the fit is also given and fur-
ther discussed in Table E.1 in the appendix.
86 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Table 6.1.1. Empirical statistics and probability density function (PDF) model results per reaction type. The values are given
in kilograms of steam consumption per kilogram of product.
Reaction type Paramet
erization Empirical values PDF
Model
parameters Fitted values
n median min max p1 p2 median 25th 75th 2.5th 97.5th
Acylation 33 1.6 0.0 9.3 gamma1 0.6 4.1 1.2 0.3 3.2 0.0 11.1
Time
<18h
21 0.9 0.0 6.0 gamma 0.5 3.8 0.7 0.14 2.3 0.0 9.1
Time
>18h
12 2.4 0.79 9.3 lognormal2 0.9 0.8 2.5 1.5 4.4 0.5 12.3
Acylation
(cyanur
chloride)
22 0.0 0.0 0.8 lognormal -4.1 2.9 0.0 0.0 0.1 0.0 4.9
Alkylation 33 2.5 0.0 11.8 gamma 0.6 5.2 1.7 0.5 4.2 0.0 14.4
no disti-
llation
12 1.0 0.0 3.4 gamma 0.4 2.3 0.3 0.1 1.2 0.0 5.1
disti-
llation
21 3.0 0.3 11.8 weibull3 4.9 1.5 3.8 2.1 6.1 0.4 11.7
Azo-coupling 25 0.1 0.0 5.3 gamma 0.1 6.6 0.0 0.0 0.2 0.0 6.4
RESULTS PROBABILITY DENSITY FUNCTION MODELS 87
Complexation 9 2.8 0.2 14.9 exponential4
0.23 3.1 1.3 6.1 0.1 16.3
Time
<18h
6 2.0 0.2 3.5 rayleigh5 1.6 1.9 1.2 2.6 0.4 4.3
Condensation 25 2.1 0.3 10.5 exponential 0.4 1.9 0.8 3.8 0.1 10.1
Tmax
<93°C
8 0.7 0.3 3.8 exponential 0.8 0.8 0.4 1.7 0.0 4.5
Tmax
>93°C
17 2.6 0.6 10.5 lognormal 1.0 0.8 2.7 1.6 4.5 0.6 11.7
Diazotization 26 0.0 0.0 0.2 exponential 18.4 0.0 0.0 0.1 0.0 0.2
Elimination 9 0.3 0.0 1.0 exponential 0.1 3.0 0.0 0.0 0.2 0.0 3.4
Halogenation 14 0.4 0.0 4.9 lognormal -0.9 1.5 0.4 0.2 1.1 0.0 6.8
Hydrolysis 9 0.8 0.0 10.4 gamma 0.3 8.3 0.4 0.0 2.2 0.0 14.2
Polymerization 18 3.0 1.2 15.9 lognormal 1.32 0.90 3.7 2.0 6.9 0.6 21.9
Time
<18h
8 1.7 1.2 3.0 lognormal 0.5 0.3 1.7 1.4 2.0 0.9 3.0
Time
>18h
10 7.7 2.8 15.9 uniform6 2.8 15.9 9.4 6.1 12.6 2.8 15.9
88 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Reduction 14 3.6 0.1 15.8 exponential 0.2 3.1 1.3 6.2 0.1 16.6
Sulfonation 9 0.1 0.0 0.3 gamma 0.3 0.4 0.0 0.0 0.1 0.0 0.7
(1) 21
1
1
12 )(
1 p
x
p
pex
ppy
−
−
Γ= , (2)
22
21
2
)(ln
2 2
1 p
px
exp
y
−−
=π
, (3)
1
2
1 1
22
1
p
p
xp
ep
x
p
py
−−
= , (4)
xpepy 1
1−= , (5)
21
2
22
1
p
x
ep
xy
−
= ,
(6))(
1
12 ppy
−=
RESULTS PROBABILITY DENSITY FUNCTION MODELS 89
6.2 Model Evaluation per Reaction Type
The predictive capability of the PDF models by using the inter-
quartile ranges is depicted in Figure 6.2.1. The resubstitution per-
formance is in most cases similar among the different reaction
types (60% average of true positives) except for the polymeriza-
tion and elimination reactions which present a slightly poorer per-
formance (40% true positives), and the reduction reaction which
performs slightly better (70% true positives) than the rest. The
fact that the polymerization and elimination reactions are not pre-
dicted as well as the rest might be an indication that these two
reaction classes do not verify the hypothesis that steam con-
sumption can be predicted based only on the reaction type. In
general, the predictive capability of the PDF models is inferior to
the one of the classification trees with additional process related
predictor variables, but it can still provide useful information for an
interval estimation based on the interquartile ranges. More im-
portantly, the PDF models can provide a benchmark for labeling
chemical reaction types performed in industrial operations with
respect to their place in the distribution of the same reaction type
family. Furthermore, the PDF models allow for a more rigorous
uncertainty analysis compared to the interval estimations, by
sampling from the respective distributions.
90 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
0 2 4 6 8 10 12 14 16
*acyl,azo,dia,eli,sulfo 0
halogenation 0.4
hydrolysis 0.8
acylation 1.2
alkylation 1.7
condensation 1.9
complexation 3reduction 3.1
polymerization 3.7
Steam training dataset [kg/kg product]
Ste
am
mo
de
l m
ed
ian
[kg
/kg
pro
du
ct]
over
within
under
Figure 6.2.1. PDF model performance with target values (x-axis) versus
predicted values (y-axis) considering the interquartile ranges (resubstitution
performance). The data points colored in black fall within the model intervals.
The grey and the white colored points represent overestimated and
underestimated values respectively (see also Table 6.2.1). The line passing
through the points (0,4) and (0,4) represents perfect prediction.
*Acylation (cyanur chloride), azo-coupling, diazotization, elimination, sulfonation.
Table 6.2.1. Performance of the probability density function PDF models
considering the interquartile ranges (resubstitution validation)
Reaction within interval (%) underestimated (%) overestimated (%)
Acylation 58 27 15
Acylation
(cyanur chloride) 59 41 0
Alkylation 57 26 17
Azo-coupling 64 36 0
Complexation 56 22 22
Condensation 56 16 28
Diazotization 92 8 0
Elimination 33 67 0
Halogenation 54 15 31
RESULTS PROBABILITY DENSITY FUNCTION MODELS 91
Hydrolysis 56 22 22
Polymerization 39 28 33
Reduction 71 14 14
Sulfonation 56 44 0
6.3 Further Parameterization of the Models
In cases where further parameterization was performed for the
PDF models, the original dataset for a reaction type was parti-
tioned into two subsets. From the thirteen reaction models, five
were further parameterized. In all cases the partition results in
one lower and one higher interquartile interval. The lower inter-
quartile interval overlaps in most cases with the original interquar-
tile interval while decreasing its width, and the higher interquartile
interval only partially overlaps with the original interquartile inter-
val while maintaining the same width. Only for the polymerization
reaction no overlapping was observed. This is due to the gap of
values in the empirical distribution of this reaction group. After this
parameterization step, we observe that the PDF models for alkyl-
ation, condensation and polymerization maintain the same per-
centage of true positives with narrower interval predictions. Only
in the case of acylation reactions no improvement is observed
after further parameterization, since the number of true positives
decreases (Table 6.3.1). However this does not necessarily mean
that the model predictive capability deteriorates, since the corre-
sponding interval predictions are narrower and cannot be directly
compared to the parent PDF models of the same reaction type.
Overall these results support the increased resolution of the
models as a result of the further parameterization.
Theoretically, it should be possible to reach the same level of
92 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
model performance as for the classification trees of higher order
(e.g., at the S4 design stage) by continuing the parameterization
with further predictor variables for every chemical reaction type.
However, this could not be supported by the available amount of
data to obtain statistically significant results. Therefore, for predic-
tion purposes, we suggest the use of higher order classification
trees, assuming availability of the respective predictor variable
values. Nevertheless, when the input information is comparable
with the PDF models, as for instance in the cases of the S1 clas-
sification trees, a lower prediction performance compared to the
S4 classification tree is expected (Figure D.3.1 in the appendix),
which is much closer to the level of the PDF models.
Table 6.3.1. Performance of the probability density function PDF models
considering the interquartile ranges (resubstitution validation) after further
parameterization
Reaction within interval underestimated overestimated
Acylation 36% 33% 30%
Alkylation 54% 23% 23%
Complexation*
Condensation 56% 28% 16%
Polymerization 39% 28% 33%
* for the complexation reaction the resubstitution validation was not carried out,
since only the lower interval was determined. The fitting for the upper interval
was not satisfactory and thus not further considered.
APPLICATION OF THE STATISTICAL MODELS 93
7 Application of the Statistical Models
7.1 Case Study I
The classification trees and PDF models were applied to the addi-
tional case study dataset and the results are presented in Figure
7.1.1. Here the performance results are presented as stacked bar
charts divided into three sections. The first section on the bottom
represents the percentage of data which is predicted within the
model intervals. For the classification trees this is equal to the
sensitivity. The section in the middle represents the percentage of
values which are overestimated by the model, and the section on
the top, the percentage of data underestimated by the model.
From this perspective, the black lines indicate the percentage of
predicted values which are not underestimated by more than
30%, an error which is considered to be acceptable for shortcut
models in early design stages (Bumann et al., 2010, Turton R et
al., 1998).
On the top of this figure (a) we see the respective resubstitution
performance of the models (training set) and on the bottom (b)
the performance on the external case study dataset (not used for
training). In both cases we observe that the PDF models and the
S1 classification tree perform similarly within a difference range of
13%. The rest of the trends are also similar between (a) and (b),
although generally the performance of the models is approxi-
mately 10% inferior in the external data set. This can be also due
to the fact that not all steam consumption target values were de-
rived by the same modeling approach as the one used for the
training set. However, in both cases, more than 80% of the pre-
94 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
dictions were not underestimated by more than 30%. This is an
additional positive feature for the robustness of the model, in
terms of safeguarding the predictions from severe steam con-
sumption underestimation.
0
20
40
60
80
100
%
PDF S1 S2 S3 S4 S50
20
40
60
80
100
%
a)
b)
Figure 7.1.1. Average performance of the PDF models (interquartile prediction)
and classification trees (S1 to S5) for (a) resubstitution performance
(approximately 250 data points) and (b) additional industrial dataset (17 points).
The black area of the bars represents the percentage of cases which fall within
the model intervals, and the dark grey and the light grey areas the percentage of
overestimated and underestimated cases, respectively. The black lines show
the percentage of cases which are not underestimated by more than 30%.
Summarizing the results presented in this section, we suggest the
use of the PDF models especially for benchmarking and uncer-
tainty analysis. For prediction purposes the PDF models or the S1
trees should be used when the reaction type is the only available
information and a first approximate estimation of steam consump-
APPLICATION OF THE STATISTICAL MODELS 95
tion is targeted. Higher order classification trees (S4-S5) provide
satisfactory steam consumption estimations and serve as de-
scriptive models that explain which features define the level of
energy consumption (high, middle, low).
7.2 Case Study II
Here we apply the PDF models to a case study for the production
of the intermediate substance 4-(2-methoxyethyl)-phenol, which
can be produced from seven different synthesis routes (Figure
7.2.1). In a previous work a decision-making framework, which
considers environmental and economic proxy indicators for
screening of chemical batch process alternatives during early
phases of process design (Albrecht et al., 2010), was applied to
the same case study. Both methodologies allow a ranking of the
different synthesis routes.
In this example we show the procedure of estimating the steam
consumption of the different synthesis routes using the PDF
models and we compare the resulting ranking of the alternatives,
with the ranking given by the proxy indicators of (Albrecht et al.,
2010). Table 7.2.1 depicts the reaction types at every step for the
seven synthesis routes. From the thirty reactions, thirteen belong
exactly to reactions considered in the training dataset for the
model development (Figure B.1.1.1. in the appendix). Seven
reactions steps cannot be exactly match to any of the reactions in
Figure B.1.1.1, but belong to one of the reaction categories in the
training dataset (e.g. the O-Alkylation reaction with dimethyl
sulfate as reactant in step A-3). In this case in order to fill the data
gap the steam consumption is estimated considering the PDF
96 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
model for Alkylation. This is comparable to an extrapolation, since
this reaction with this reactant was not included in the training
dataset for the building of the models. Ten remaining reactions,
could not be assigned to any of the reactions included in the
training dataset (e.g. nitration in step A-1). A default value of 1.2
kg per kg of product, calculated as the average over the median
values from all PDF models, was considered in order to fill the
data gaps.
The steam consumption for a complete synthesis route can be
estimated by adding the median values derived from the PDF
models for every reaction step (see Table F.1.1 in the appendix).
The model values given in kilograms of steam per kilogram of
product, were multiplied by the corresponding reaction yield
values given in Table B.1.1.2 in the appendix, assuming
stoichiometric ratios (see Table F.1.2 in the appendix). As it can
be seen in Figure 7.2.2 and Figure 7.2.3, route E represents the
best alternative given by the PDF models as well as by the Mass
Loss Index (MLI) and the Energy Loss Index (ELI) proxy
indicators defined in the work of (Albrecht et al., 2010). The MLI is
defined in this case, as the sum of the mass ratios of all coupled
products and by products to intermediate or end product. Other
input materials into the system such as solvents, auxiliaries, etc.
are not considered in this definition of the MLI, since this
information is not available at earliest design stages. The ELI
proxy indicator is calculated on the basis of four parameters: the
concentration of water at the reactor outlet, the difference of the
boiling point temperatures between the product and the
substance which has the closest boiling point to the product, the
MLI values for each reaction step, and the reaction energy. All
APPLICATION OF THE STATISTICAL MODELS 97
these parameters are first scaled according to empirical criteria
and then weighted and aggregated to give the ELI value.
Comparing the rankings given by the PDF models and the ELI
indicator, we see that except for routes C and D the trend is the
same. Considering the MLIs, a similar ranking trend as given by
the PDF models is still observed, except for routes A and D.
Overall the ranking according to the ELI indicator presents a
higher similarity to the PDF model predictions than the MLIs. This
is consistent with the fact that an index based only on the
reaction mass yield does not necessarily correlate with energy
consumption. A fairly good correlation of a mass index, namely
the Process Mass Intensity (PMI) with an energy related indicator,
such as Global Warming Potential (GWP), has been shown in the
work of (Jimenez-Gonzalez et al., 2011). However, in this case
the mass index (PMI) includes the total mass of materials per
mass of product, for instance reactants, reagents, solvents used
for reaction and separation and catalysts, being this a more
robust, but also a more data intensive indicator than the MLI as
defined by (Albrecht et al., 2010). As it has been shown in
Section 3.5, the PMI is used as a predictor variable in
classification trees at the previously defined fourth and fifth
process design stages (S4 and S5) and not at the earliest stages
(S1 to S3), where not enough data is available. For the
classification tree (S4) the PMI was found to be an important
predictor variable (see Section 5.2), thus this result is consistent
with the work of (Jimenez-Gonzalez et al., 2011) mentioned
previously.
98 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
On the other hand, the ranking predicted by the Productivity Loss
Index (PLI) and by a composite indicator resulting from the
weighted sum of the ELI and the PLI, show a very similar trend
among each other and a different trend compared to the ranking
given by the PDF models and the ELI indicator. The PLI
considers number of reaction steps, cycle time, average product
concentration at the reactor outlet and average filling volume
(percentage of total vessel volume). Except for the number of
reaction steps, which is implicitly considered when the PDF
models are used for estimating steam consumption of a whole
synthesis route, the rest of the categories in the PLI are more
process and unit operation specific and contain less
physicochemical information compared to the ELI indicator. This
explains that the rankings given by the PDF models and the ELI
indicator follow similar trends among each other, and differentiate
from the PLI indicator ranking. In addition the PLI has a higher
weighting than the ELI on the composite proxy indicator. This
explains the similar ranking trend given by the PLI and the
composite proxy indicator.
Analogously to the point estimations, intervals of steam
consumption for every synthesis route were estimated
considering the first and third quartiles given by the PDF models.
As it is shown in Figure 7.2.5 most routes follow the same trend
considering the median values and the intervals, except route F,
which presents a slightly higher upper interval than the B route.
In order to evaluate the significance of the ranking results, a
Monte Carlo simulation was carried out to generate samples from
the PDF models corresponding to the reaction steps of the
APPLICATION OF THE STATISTICAL MODELS 99
synthesis routes, followed by an ANOVA test to analyze the
differences between the means of these samples. The data
sample for each synthesis route was obtained by adding the
sample values for each reaction step within the route. For the
reaction steps without a corresponding PDF model, a uniform
distribution was considered for the generation of the sample. The
parameters of the uniform distribution were assigned the
minimum and maximum values of 0 and 2.4 kg per kg of product
respectively. These values where derived considering the mean
value of the distribution to be the default value of 1.2 kg per kg of
product.’
In addition a boxplot diagram of the different reaction route
samples is shown in Figure 7.2.6. From visual inspection of the
boxplot, we see that we can distinguish two different groups of
synthesis routes, namely E, G, D, A with lower median values
and interval ranges, and F, B, C with higher median values and
interval ranges. The results of the ANOVA test presented in the
appendix (Section F.2) show that it is possible to further
discriminate among some routes within these two groups. While
route C is significantly different to B and F, B and F are not
significantly different among each other. Route D is not
significantly different from routes E and G, however routes E and
G are significantly different among them. Similarly, while route G
is not significantly different from routes A and D, routes A and D
are significantly different among each other.
The results depicted in this case study show the applicability of
the PDF models – considering median values or intervals – for a
fast ranking of different groups of alternative routes considering
100 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
energy consumption, when only the reaction synthesis routes are
known at early stages of process design. The PDF models
provide a similar ranking as by using a more complex indicator
such as the ELI, which requires more detailed chemical and
process information.
APPLICATION OF THE STATISTICAL MODELS 101
102 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Figure 7.2.1. Overview of different reaction routes to produce 4-(2-methoxyethyl)-phenol, including the reaction step
numbers.
APPLICATION OF THE STATISTICAL MODELS 103
Table 7.2.1. . Reactions in the different synthesis routes of 4-(2-methoxyethyl)-phenol (presented in Figure 7.2.1*.
Route Reaction 1 Reaction 2 Reaction 3 Reaction 4 Reaction 5 Reaction 6 Reaction 7 Reaction 8
A na na alkylation reduction diazotization na
B na reduction na alkylation reduction diazotization na
C halogenation alkylation reduction na acylation reduction D halogenation na na reduction diazotization na
E acylation alkylation reduction
F na acylation na halogenation alkylation reduction diazotization na
G acylation halogenation acylation reduction
*The colored cells indicate reactions included in the model training dataset (Figure B.1.1.1. in the appendix). Non-colored
cells indicate reactions, which belong to one of the categories included in the training dataset, but which does not match to
any of the specific reaction types in Figure B.1.1.1. Na indicates reactions, which do not belong to any category in the
training dataset.
104 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
E G D A F B C0
5
10
15
Synthesis route
Ste
am
consum
ption [kg/k
g p
roduct]
Figure 7.2.2. Ranking of the different synthesis routes of 4-(2-methoxyethyl)-
phenol considering the steam consumption predictions by the PDF models.
E G D A F B C E G D A F B C0
5
10
15
Synthesis route
0
1
2
3
ELIMLI
Figure 7.2.3. Ranking of the different synthesis routes of 4-(2-methoxyethyl)-
phenol considering the MLI (y-axis on the left) and the ELI (y-axis on the right)
proxy indicators according to (Albrecht et al., 2010).
APPLICATION OF THE STATISTICAL MODELS 105
E G D A F B C E G D A F B C0
0.5
1
Synthesis route
0
0.05
0.1
PLI ELI + PLI(weighted sum)
Figure 7.2.4. Ranking of the different synthesis routes of 4-(2-methoxyethyl)-
phenol considering the PLI (y-axis on the left) proxy indicator and the weighted
sum of the ELI and PLI (y-axis on the right) indicators according to (Albrecht et
al., 2010).
E G D A F B C0
5
10
15
20
25
30
35
Synthesis route
Ste
am
consum
ption [kg/k
g p
roduct]
Figure 7.2.5. Ranking of the different synthesis routes of 4-(2-methoxyethyl)-
phenol considering the steam consumption predicted median values and
intervals by the PDF models
106 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
E G D A F B C
0
10
20
30
40
50
60
Synthesis route
Ste
am
consum
ption [kg/ kg p
roduct]
Figure 7.2.6. Boxplot of the samples corresponding to the different synthesis
routes of 4-(2-methoxyethyl)-phenol.
CONCLUSIONS AND OUTLOOK 107
8 Conclusions and Outlook
8.1 Practical Relevance and Applications
The focus of this thesis was the modeling of steam consumption
in chemical batch plants for screening purposes. Considering that
energy minimization is a key target for the chemical sector, and
that steam is the energy utility with the highest consumption and
saving potentials, although not well documented in production
documentation, we propose two main types of models of steam
consumption for different levels of data availability.
The first modeling approach is based on standard process
documentation, thermodynamic principles, rules of thumb, default
model parameters and uncertainty values to deal with severe
gaps of information. For validating this new methodology, three
case studies were carried out in multipurpose batch plants in
Switzerland. For the first two case studies the steam consumption
was modeled at the unit operation level according to the new
approach, and the results were aggregated to the equipment
level following a bottom-up procedure. The results were validated
against reference data coming from model based energy
monitoring tools installed in the plants acquiring real time process
data with high resolution (i.e., 1-minute interval). The validation
results showed generally a good agreement between reference
and predicted values and a good capability of the uncertainty
intervals to capture the batch-to-batch variability of steam
consumption. In the third case study the steam consumption
modeled at the unit operation level was aggregated to the level of
the reaction step plus work-up processes. Again even in this
108 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
case, where the level of aggregation is higher, the validation
results showed a good agreement between reference and
predicted values including the uncertainty intervals.
Therefore this approach can be used for fast screening, allocation
and monitoring of steam consumption in multipurpose batch
plants based on a static yet frequently updated information
source such as the SOP documentation with limited modeling
effort. From this perspective, it can also be used for identification
of the energy saving potential that is by setting energy
consumption targets including an estimation of the batch-to-batch
variability. To this end, this approach lies between “black-box”
top-down approaches correlating energy consumption with
production portfolio and product amounts, and detailed bottom-up
approaches based on process parameters retrieved in high
resolution time intervals resulting in more accurate yet time-
consuming models regarding their development.
The second type of shortcut models of steam consumption pro-
posed in this work are generic intervals based on statistical anal-
ysis of estimated values by the documentation approach obtained
from real production data. The statistical analysis included fitting
of probability density functions (PDFs) and classification trees to
the available data. These models can be used at different levels
of process design, the minimal required information being the re-
action type. The cross-validation results of the classification trees
show that overfitting is avoided and that there is a significant im-
provement in the prediction capability from the earliest design
stage where only the reaction type and mechanism are known, to
the latest design level where the steam consumption during distil-
CONCLUSIONS AND OUTLOOK 109
lation processes is known. The resubstitution performance of the
PDF models indicates that for most reaction types, except for
polymerization and elimination, the interquartile ranges can pro-
vide satisfactory interval estimations when the reaction type is the
only available process information. Further parameterization of
the probability density functions considering additional process
information increases the model resolution. Additionally, the PDF
and the classification trees generalization capability was validated
in a case study. It was shown that, in average, more than 80% of
the predictions were not underestimated by more than 30%, be-
ing this a satisfactory performance for shortcut models in early
design stages. These models, especially the higher order classifi-
cation trees represent a potentially useful tool for estimating
steam consumption of production processes, when limited pro-
cess information is available or when overwhelming processes
have to be screened in short time. Even though the PDF models
also allow reasonable predictions of steam consumption, their
most interesting potential applications will be for providing a
benchmark framework for labelling chemical reaction types and
rigorous uncertainty analysis.
The two modeling approaches presented in this work are shortcut
models of steam consumption for different levels of process de-
tail. While the documentation based approach involves a more
rigorous modeling procedure delivering a deterministic estimated
value with an uncertainty range, the use of the statistical models
requires less input information and time effort, but results in a ge-
neric interval instead of a deterministic value. On the other hand
the statistical models serve as descriptive and explanatory tools,
besides their predictive capabilities. As it is shown in Figure 8.1.1,
110 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
depending on the application target one model might be more
convenient than the other, meaning that a compromise between
modeling time and accuracy has to be done. Summarizing we
can say that both type of models developed in this thesis are es-
pecially suitable for applications in the fields of process design,
Life Cycle Assessment (LCA) and benchmarking.
Figure 8.1.1. Decision scheme for the selection of the most suitable modeling
approach to steam consumption.
CONCLUSIONS AND OUTLOOK 111
8.2 Outlook
8.2.1 Extension of the Modeling Approaches to other Process Parameters
Although the documentation based models of this study were
developed and tested for steam consumption, the extension of
this approach to other energy utilities, such as cooling water,
brine and electricity, should be straightforward as a concept,
requiring of course additional unit operation related data (e.g.,
pumping and stirring costs, condensation duties, etc.) and the
respective equations and standard parameters for the
consumption of these additional energy utilities. In this way, a
more complete energy related life cycle inventory can be
estimated for the different products of multipurpose batch plants
facilitating a fast and efficient cradle-to-gate life cycle analysis,
assuming that the production related material flows are well
documented, which is a typical case in chemical batch industry.
The same can be said for the extension of the statistical models
to other energy utilities and/or production parameters. Provided
that data of other energy utilities consumption or emissions are
available – as measurements or as estimated values (e.g.
documentation based approach) – classification trees and
probability density functions can be fitted to these data and
further selected and evaluated using the same frameworks as
proposed in this work.
112 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
8.2.2 Optimization Problem for Selection of Classification Trees
A more exhaustive model selection procedure for the
classification trees would include a systematic search of the best
set of alternatives, considering structural settings or pre-treatment
of the training dataset, and algorithmic settings.
The first level of model selection, which implies the pre-treatment
of the training dataset, includes the choice of the most suitable
predictor variables and the discretization of the target attribute
(number of intervals and interval width). In this work the data pre-
treatment was done by generating a limited number of scenarios
and assessing the model performance (see Section 3). The
second level of model selection considering algorithmic and
calculation settings include the attribute test condition to split the
dataset into smaller subsets and a measure of goodness of split,
the choice of priors, and the stopping condition to terminate the
tree-growing process. In addition to these settings, the effect of
the number of folds or partitions of the data for the cross-
validation – pruning procedure can be investigated. A rigorous
approach to handle all these structural and algorithmic
parameters in a systematic way would require the formulation of
an optimization problem.
Besides the optimization for model selection, more powerful
techniques such as random forests and other data mining
techniques (e.g., support vector machine techniques (I. Witten,
2005) can be used for building of classification models, provided
that there is enough data.
NOMENCLATURE 113
Nomenclature
Symbols
A Surface area m2
C Reflux constant kJ/min cp Heat capacity of the mixture kJ/(kg K) cpi Heat capacity of the substance kJ/(kg K) cpeq Heat capacity of the equipment kJ/(kg K) Etheo Theoretical energy consumption kJ Eloss Energy losses kJ Hs Enthalpy of steam kJ/kg K Loss coefficient kJ/(min K) m Mass of total reaction mixture kg meq Mass of the equipment kg mi Mass of substance-i kg T1 Initial temperature of reaction mixture °C T2 Final temperature of reaction mixture °C Tam Ambient temperature °C Tboil Boiling point °C Td Distillation temperature °C Th Process temperature kept constant °C Ti Temperature substance-i °C Ts Saturation temperature of steam °C t Heating time minute td Distillation time minute th Holding time minute U Heat transfer coefficient W/(m
2 K)
rH∆ Enthalpy of reaction kJ/kg
iHv∆ Enthalpy of vaporization kJ/kg
Indexes
am ambient boil boiling d distillation dist distillation eq equipment h hold temperature constant loss overall energy loss
114 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
max maximum mean mean r reaction s steam (Hs, Ts) or solvent (PMIs) theo theoretical tot total w water 1 initial (temperature) 2 final (temperature)
Abbreviations
AE Electrophilic addition AN Nucleophilic addition AEN Nucleophilic addition elimination CART Classification and Regression Trees algorithm CED Cumulative Energy Demand DIN Deutsches Institut für Normung (German Institute for Standardi-
zation) dr Refined index of agreement E elimination ELI Energy Loss Index EMT Energy Monitoring Tool FN False negatives FP False positives HC Heterogeneous catalysis LCA Life Cycle Assessment LCIA Life Cycle Impact Assessment MARE Mean Absolute Relative Error MLE Maximum Likelihood Estimation MLI Mass Loss Index N Number of data points Na Non applicable NV Nominal Volume PLI Productivity Loss Index PDF Probability Density Function PMI Process Mass Intensity PMIs Solvent Mass Intensity PMIw Water Mass Intensity q
2 Coefficient of determination
r2 Square of the correlation coefficient
Rad Radical mechanism
NOMENCLATURE 115
RAD Risk Analysis Documents ROC Receiver Operating Characteristic RME Reaction Mass Efficiency SEAr Electrophilic aromatic substitution SNAr Nucleophilic aromatic substitution SN1 Nucleophilic substitution 1 SN2 Nucleophilic substitution 2 SOP Standard Operation Procedures STEM Enamel-coated Steel STNR Stainless Steel S1 First level of process design S2 Second level of process design S3 Third level of process design S4 Fourth level of process design S5 Fifth level of process design T Temperature Tmax Maximal operation temperature Tmean Mean operation temperature TN True negatives TP True positives UO Unit Operation
Glossary
Acid base reac-tion
Indicates presence or absence of acid-base reactions (yes/ no). Acid-base processes correspond to neutralization and precipitation work-up processes (e.g. previous to mechanical separation processes)
Charge Heating of the new mass filled into the vessel to the same temperature (above 20°C) as the rest of the reaction mixture inside the vessel.
Core Core of a fuzzy set
Crystallization indicates presence or absence of crystallization processes (yes/ no)
Distillation Indicates presence or absence of distillation processes during the reaction work-up. It refers to simple evaporation or distillation under reflux conditions (yes/ no)
Evaporation Simple evaporation. It is always assumed if no reflux conditions are mentioned.
Equipment It refers to reactors, storage tanks, decanters, etc. Evaporation (classification trees)
Indicates presence or absence of evaporation processes (yes/ no). Evaporation refers to simple evaporation occurring during the reaction synthesis or any work-up step
116 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Heat Heating of the total mass inside the vessel to a final temperature above 20°C.
Hold Keep the process temperature constant. Last reaction Indicates if the considered reaction is the last one of the
synthesis route (yes/no) Mechanical Indicates presence or absence of mechanical processes
(yes/ no). Mechanical processes include filtration, centrifugation and washing work-up processes
Mechanism Reaction mechanism defined according to Table B.1.1.1 Miscellaneous Indicates presence or absence of miscellaneous processes
(yes/ no). Miscellaneous, are work-up processes which cannot be classified in any of the already mentioned categories (e.g. dilution of the reaction mixture and stirring at high temperature)
Reaction Energy produced or consumed due to exothermic or endothermic chemical reactions.
Reaction type Reaction type defined according to Figure B.1.1.1 Reflux (classifi-cation trees)
Indicates presence or absence of reflux conditions during the reaction synthesis or during the reaction work-up (yes/ no)
Reflux Distillation under reflux conditions, with C being a constant fitted to measurement data of steam consumption of recovery of butanol under strong reflux conditions.
Steamdist Steam consumption during distillation processes Support Support of a fuzzy set Time Sum over time in hours required for heating of the reaction
mixture, solvent evaporation, keeping the temperature constant above the atmospheric temperature under reflux conditions or not, during the reaction synthesis and work-up processes within the defined boundary system
Vessel See equipment.
APPENDIX 117
Appendix
A Supporting Information to Chapter 2
118 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
A.1 Uncertainty as Simple and Fuzzy Intervals
In contrast to classical sets where for each object there are only
two possibilities, namely belonging to the set or not, a fuzzy set is
a set of objects without clear boundaries, meaning that it can
contain elements with a partial membership. If X is the universe of
all elements in consideration and A is a subset of X, each
element x Є X is associated with a membership value to the
subset A. This degree of membership of elements in a fuzzy set
is expressed by real numbers in the unit interval [0,1], that is by a
membership function µA(x):
[ ]{ }1 0, )( ,);(,( ∈∈= xXxxxA AA µµ (a.1.1)
The closer µA(x) is to 1, the more x is considered to belong to A.
Therefore, fuzzy sets are suitable for expressing gradual
transitions from membership to non-membership. Fuzzy intervals
are fuzzy sets where the membership function usually consists of
an increasing and decreasing part, and possibly flat parts. Among
the different distributions which can be used to assess
membership functions, triangular and trapezoidal functions are
often selected due to their simplicity. In this work, we propose the
use of trapezoidal fuzzy intervals (Figure A.1.1) with a
membership function given by equation (a.1.2):
APPENDIX 119
Figure A.1.1. Trapezoidal fuzzy interval.
( )
≤
≤≤−
−
≤≤
≤≤−
−
≤
=
xd
dxccd
xd
cxb
bxaab
ax
ax
dcbaxA
,0
,
,1
,
,0
,,,,µ (a.1.2)
This membership function can be interpreted as a possibility
distribution function, which is the degree of plausibility between
zero and one of a particular interval (Zadeh, 1999).
Consequently, a fuzzy variable is associated with a possibility
distribution in the same manner as a random variable is
associated with a probability distribution. The interval between a
and d, which is called the support, covers all values that are
plausible or possible, whereas the range from b to c is called the
core and covers the most plausible values of the fuzzy interval.
120 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
The fuzzy approach for the expression of model uncertainty can
be used in different ways. The trapezoidal membership function
can provide information about the uncertainty distribution, namely
central tendency and skewness, but it can also lead to a simple
interval approach defined by the core and the support.
APPENDIX 121
B Supporting Information to Chapter 3
122 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
B.1 Classification of Chemical Reactions
B.1.1 Reaction Types Included in the Training Dataset
Notation used for the list of reactions in Figure B.1.1.1
• Unless specified in observations, R symbolizes hydrogen,
alkyl chains, aromatic ring(s), or any functional group as
substituent.
• R’, R’’, R’’’, etc are used to indicate the presence of
different substituents.
• Unless specified in observations, X symbolizes a halogen
atom.
• Catalysts, solvents, auxiliaries and reaction conditions are
not specified.
• A definition of the reaction mechanisms are found in Table
B.1.1.1.
APPENDIX 123
124 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
APPENDIX 125
126 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
APPENDIX 127
128 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
APPENDIX 129
130 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
APPENDIX 131
Figure B.1.1.1. List of reaction types included in the training dataset.
132 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Table B.1.1.1. Reaction mechanisms corresponding to the reaction types
presented in Figure B.1.1.1.
Abbreviation Mechanism Observation
AE electrophilic addition AN nucleophilic addition AEN nucleophilic addition
elimination
E elimination HC heterogeneous catalysis This is not a chemical
mechanism, but a way of grouping together reactions that involve heterogeneous catalysis and complex reaction mechanisms, e.g. reductions with hydrogen
SEAr electrophilic aromatic substitution
SNAr nucleophilic aromatic substitution
SN1 nucleophilic substitution 1 SN2 nucleophilic substitution 2 Rad radical mechanism
Table B.1.1.2. Reaction yield descriptive statistics corresponding to the training
dataset for the statistical models.
Reaction type median minimum maximum
Acylation 88% 42% 96% Acylation (cyanur chloride) 90% 71% 95% Alkylation 85% 60% 100% Azo-coupling 90% 73% 97% Complexation 90% 85% 90% Condensation 79% 55% 95% Diazotization 90% 73% 97% Elimination 90% 71% 90% Halogenation 91% 78% 97% Hydrolysis 89% 66% 97% Reduction 81% 40% 95% Sulfonation 90% 60% 97% Polymerization na na na
APPENDIX 133
Table B.1.1.3. Process parameter descriptive statistics corresponding to the training dataset for the statistical models.
Reaction type Time [h] Tmax [°C] Tmean [°C] PMI PMIs PMIw
mean
min
max
mean
min
max
mean
min
max
mean
min
max
mean
min
max
mean
min
max
Acylation 14 0 42 87 20 185 51 16 94 10.4 1.2 54.7 4.6 0.0 34.9 2.9 0.0 44.2
Acylation* 1 0 9 40 17 70 25 8 56 9.6 4.2 28.4 0.0 0.0 0.0 5.2 0.0 21.8
Alkylation 17 0 42 95 0 185 52 0 107 8.4 1.1 42.3 3.7 0.0 34.6 1.8 0.0 9.1
Azo-coupling 13 1 40 44 0 130 25 0 47 13.9 0.5 46.6 0.0 0.0 0.0 10.2 0.0 36.2
Complexation 13 1 35 100 75 150 69 58 88 16.3 3.3 64.9 0.0 0.0 0.0 11.3 1.1 52.6
Condensation 3 0 16 101 35 170 62 25 107 9.6 1.2 32.0 3.6 0.0 9.2 2.5 0.0 19.5
Diazotization 3 0 15 37 20 55 22 14 28 10.0 3.3 44.8 0.0 0.0 0.0 6.3 0.0 28.9
Elimination 0 0 2 34 0 70 22 0 50 11.8 1.1 22.0 0.0 0.0 0.0 4.7 0.0 19.7
Halogenation 15 2 46 68 20 103 48 12 89 6.3 1.3 13.4 2.7 0.0 10.8 0.8 0.0 9.1
Hydrolysis 9 2 24 75 25 102 47 14 80 16.9 3.0 58.8 2.8 0.0 10.6 9.5 0.0 42.7
Polymerization 32 7 84 89 72 102 64 47 85 4.3 1.9 6.5 1.5 0.1 3.4 1.6 0.0 3.9
Reduction 14 0 29 74 35 105 36 20 49 15.4 1.1 43.8 8.2 0.0 39.6 3.6 0.0 23.1
Sulfonation 11 3 18 59 23 120 40 20 82 6.0 2.5 18.9 0.0 0.0 0.2 1.3 0.0 11.5
* cyanur chloride.
134 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
B.1.2 One-way Anova
Acylation Reactions
Table B.1.2.1. Descriptive statistics for the dependent variable steam
consumption and the grouping variable acylation reaction type. The values are
given in kg of steam per kg of product
Reaction Mean Standard deviation
Number of points
C-Acylation* 4.94 0.91 4 N-Acylation 2.79 2.48 23 N-Acylation (cyanur chloride)
0.18 0.26 22
O-Acylation 1.36 1.34 10 Total 1.72 2.19 59
* C-Acylation reactions were not included in the training dataset since they were
significantly different to the rest of the acylation reactions and the total number
of points (4) was too low to perform statistical analysis on this group (see
multiple comparisons in Table B.1.2.5).
Table B.1.2.2. Test of homogeneity of variances
Levene statistic degrees of freedom 1
degrees of freedom 2
Sig.*
8.76 3 55 .00
* Since the significance value is less than 0.05 we can say that the variances of
the different acylation reaction sub-groups are different. Therefore, the
assumption of homogeneity of variances has been violated.
Table B.1.2.3. One-way anova test results for the acylation reactions.
Source of variation
Sum of squares
degrees of freedom
Mean square
F Sig.*
Between Groups
121.01 3 40.34 14.23 .00
Within Groups
155.96 55 2.84
APPENDIX 135
Total 276.97 58
* Since the assumption of homogeneity of variances has been violated, we also
look at the Brown-Forsythe and Welch alternative F-ratios, which have been
derived to be robust in this cases (Field, 2009).
Table B.1.2.4. Robust test of equality of means.
Statistic* degrees of freedom 1
degrees of freedom 2**
Sig.***
Welch 40.74 3 10.51 .00 Brown-Forsythe 19.92 3 33.82 .00
* Asymptotically F distributed. ** Adjusted residual degrees of freedom. *** Both
test statistics are highly significant, thus we can say that there is a significant
difference among the different acylation reaction types. Therefore we proceed
with a multiple comparison to compare pairwise all different combinations of
groups.
Table B.1.2.5. Multiple comparisons using the Games-Howell procedure
(Jaccard et al., 1984)*.
Reaction (I) Reaction (J) Mean diffe-rence (I-J)
Stan-dard error
Sig.** CI 95% Lower bound
CI 95% Upper bound
C-Acylation N-Acylation 2.15* 0.69 0.04 0.13 4.18
N-Acylation (cyanur chloride)
4.76* 0.46 0.01 2.59 6.93
O-Acylation 3.57* 0.62 0.00 1.60 5.55
N-Acylation C-Acylation -2.15* 0.69 0.04 -4.18 -0.13
N-Acylation (cyanur chloride)
2.61* 0.52 0.00 1.16 4.05
O-Acylation 1.42 0.67 0.17 -0.40 3.24
136 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
N-Acylation (cyanur chloride)
C-Acylation -4.76* 0.46 0.01 -6.93 -2.59
N-Acylation -2.61* 0.52 0.00 -4.05 -1.16
O-Acylation -1.18 0.43 0.08 -2.51 0.14
O-Acylation C-Acylation -3.57* 0.62 0.00 -5.55 -1.60
N-Acylation -1.42 0.67 0.17 -3.24 0.40
N-Acylation (cyanur chloride)
1.18 0.43 0.08 -0.14 2.51
* This procedure is recommended in cases when there is a doubt that the
population variances are equal (see Table B.1.2.2) (Field, 2009).
** The underlined values correspond to significance levels of less than 0.05,
thus they indicate which reaction pair are significantly different between each
other.
Alkylation Reactions
Table B.1.2.6. Descriptive statistics for the dependent variable steam
consumption and the grouping variable alkylation reaction type. The values are
given in kg of steam per kg of product
Reaction Mean Standard deviation
N
C-Alkylation 0.83 1.22 7 N-Alkylation 3.48 2.73 18 O-Alkylation 4.01 3.44 17 S-Alkylation 2.51 1.13 3 Total 3.12 2.99 35
Table B.1.2.7. Test of homogeneity of variances
Levene statistic degrees of freedom 1
degrees of freedom 2
Sig.*
APPENDIX 137
2.32 3 31 .095
* Since the significance value is more than 0.05 we can say that the variances of
the different acylation reaction sub-groups are NOT different. Therefore, the
assumption of homogeneity of variances has not been violated.
Table B.1.2.8. One-way anova test results for the alkylation reactions.
Source of variation
Sum of squares
degrees of freedom
Mean square
F Sig.
Between Groups
52.24 3 17.41 2.14 0.12
Within Groups
252.70 31 8.15
Total 304.94 34
* Since the observed significance value is more than .05 we can say that there
is NOT a significant difference among the different alkylation reaction types.
Symbols and abbreviations
CI Confidence interval Degrees of freedom 1
Number of different groups to which the sampled cases belong minus one
degrees of freedom 2
Total number of cases in all groups minus the number of different groups to which the sampled cases belong
F Quantile of the F-test distribution N Number of data points Sig Level of significance
138 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
B.2 Classification Tree
B.2.1 Predictor Importance
The following formulas give a formal definition of predictor im-
portance
N
tR
I
N
t
i
i
∑=
∆
= 1
)(
(b.2.1.1)
)()()()( RL tRtRtRtR −−=∆ (b.2.1.2)
)(gdi)()( ttPtR ⋅= (b.2.1.3)
∑−=
t
tP )(1gdi(t) 2 (b.2.1.4)
totaln
tntP
)()( = (b.2.1.5)
where Ii is the impurity of predictor i, t is the node where predictor
i is tested, N is the total number of nodes where predictor i is
tested, R(t) is the risk of the parent node, R(tL) is the risk of the
child node on the left, R(tR) is the risk of the child node on the
right, gdi(t) is the Gini index at node t, P(t) is the node probability,
n(t) is the proportion of observations from the original data that
satisfy the conditions for the node and ntotal is the total number of
observations from the original data.
APPENDIX 139
B.3 Probability Density Function Models
B.3.1 Maximum Likelihood Estimation
There are different methods of estimating population parameters,
such as the method of moments, maximum likelihood, least-
squares and Bayes estimators. The maximum likelihood (MLE)
method is the most common statistical method of parameter es-
timation, resulting in statistically efficient solutions with parameter
values having minimum variance.
It finds the model parameters that maximize the likelihood func-
tion, namely the parameters corresponding to the probability den-
sity function (PDF) that makes the observed data the most likely
to have happened. Whereas the PDF is a function of the data
given a particular set of parameter values (Figure B.3.1.1, top),
the likelihood function is a function of the parameter given a par-
ticular set of observed data defined in the parameter scale. The
likelihood of multiple observations is defined as the product of the
likelihoods of the individual observations in equation b.3.1.1 be-
low, and accordingly the log-likelihood (presented as negative
log-likelihood in Figure B.3.1.1, bottom) as the sum of the likeli-
hoods of the individual observations in equation b.3.1.2.
In general numerical algorithms optimize the log-likelihood
function instead of the likelihood function, in order to avoid very
small numbers which could exceed computational precision. In
addition the MLE algorithm implemented in MATLAB1 for
convenience minimizes the negative log-likelihood function, which
is equivalent to finding the maximum likelihood estimates. In the
example of Figure B.3.1.1, the minimum negative log-likelihood is
140 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
found at p1=0.2=1/µ, with a mean value of µ=5. Parameter p1
defines then the PDF shown on top of Figure B.3.1.1.
1 2 3 4 5 6 7 8 9
2500
3000
3500
4000
4500
5000
parameter 1/p1
Ne
ga
tive
Lo
g−
like
liho
od
0 5 10 15 20 25 30 35 40 450
0.05
0.1
0.15
0.2
data x
Pro
ba
bili
ty d
en
sity
mean or expected value=1/p1
negative log−likelihood of 1/p1=5
Figure B.3.1.1. Exponential probability density function xpx epf 1
1−= with
parameter p1 (top), joint negative Log-likelihood of the parameters over an independent random sample X (bottom).
);()( 11 pxfpL iXi
n ∏= (b.3.1.1)
∑=
i
iXn pxfpL );(log)(log 11 (b.3.1.2)
1 Statistics Toolbox
TM.
APPENDIX 141
B.3.2 Goodness of Fit
B.3.2.1 Conventional Test Statistics
Tests of goodness of fit are used to assess whether or not a
sample of measurements from a random variable can be
represented by a selected theoretical probability density function.
The most commonly used tests are the Chi-Square, the
Kolmogorov-Smirnoff and the Anderson-Darling tests. They
provide a probability that random data generated from the fitted
distribution would have produced a goodness-of-fit statistic value
as low as that calculated for the observed data (Ayyub and
McCuen, 1997). The best fit among different candidate
distributions is reflected by the lowest value of a given test
statistic.
Chi-square Statistic
The Chi-Square statistic χ2 compares the histogram of the
observed data with the expected histogram obtained from the
fitted distribution, and is calculated as follows,
{ }∑
=
−=
N
iiE
iEiO
1
22
)(
)()(χ (b.3.2.1)
Where,
O(i)= number of observations in bin i E(i)=expected number of observations in bin i from the fitted distribution N=number of classes in the histogram
142 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Kolmogorov-Smirnoff Statistic
The Kolmogorov-Smirnoff statistic Dn is based on the maximum
vertical distance between the fitted cumulative distribution
function and the empirical cumulative distribution.
[ ])()(max xFxFD nn −= (b.3.2.2)
Where,
Dn is the Kolmogorov-Smirnoff distance n= total number of data points F(x)=fitted distribution function Fn(x)= i/n i=cumulative rank of data point
Anderson-Darling Statistic
The Anderson-Darling-statistic is a more elaborated version of the
Kolmogorov-Smirnoff statistic with an improved performance on
fitting the tails of a distribution.
∫+∞
∞−
Ψ−= dxxfxxFxFA nn )()()()(22 (b.3.2.3)
{ })(1)()(
xFxF
nx
−=Ψ (b.3.2.4)
Where,
n=number of data points F(x) =fitted distribution function f(x)=density function of fitted distribution Fn(x)= i/n i =cumulative rank of data point
APPENDIX 143
Despite the fact that these test statistics are omnipresent in
modeling literature, they suffer some important limitations. The
Chi-Square test requires a very large number of observations for
providing accurate results, and suffers from the dependency on
the number of histogram classes used with the associated
uncertainty in making a correct choice. The Kolmogorov-Smirnoff
and the Anderson-Darling tests assume that the hypothesized
distribution is known a priori which is seldomly met in reality,
where the distribution parameters are often estimated from the
observed data. In addition, none of these statistics accounts for
model complexity penalizing for the number of parameters used.
Thus these test statistics do not prevent overfitting.
B.3.2.2 Akaike Information Criterion
The Akaike Information Criterion (AIC) is an approach used for
selecting a model from a set of models. It is based on information
theory, which derives from Boltzmann’s concept of entropy. The
selected model minimizes the information lost when a model is
used to approximate reality (distance between reality and a
model). Akaike proposed a formal relationship between
information theory and likelihood theory (equation b.3.2.5), where
the maximized log-likelihood accounts for the accuracy of the
parameter estimates, and K (number of free parameters
estimated within the model) accounts for model complexity or
compensation for the bias (penalty component), thus helping to
avoid overfitting. The model with minimum AIC value is selected
as the best model to fit the data.
KLAIC 2log2 +−= (b.3.2.5)
144 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Where,
logL=maximized log-likelihood (see Appendix B.3.1). K=number of free parameters in the model.
The AIC allows relative model comparison by means of the
Akaike weights wi, which normalize the model likelihoods such
that they sum 1 and treat them as probabilities. The Akaike
weights can be interpreted as the probability that model i is the
best model for the data (equation b.3.2.6, where the numerator is
the relative likelihood for each model).
∑=
∆−
∆−
=R
i
i
i
iw
1
)2
exp(
)2
exp(
(b.3.2.6)
AICAIC ii min−=∆ (b.3.2.7)
AICi=AIC of model-i R= number of competing models
APPENDIX 145
C Supporting Information to Chapter 4
146 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
C.1 Sensitivity Analysis
In order to quantify the effects of input parameter variations on
the model results, a sensitivity analysis was performed for the
dataset-1. The input parameters considered for the theoretical
energy consumption include the information about reflux
conditions during distillation, reaction mass and temperature
increase inside the reactor. Additionally, variations in
physicochemical properties of the substances, such as heat
capacity and enthalpy of vaporization, and equipment
characteristics such as mass and heat capacity of the reactor,
have also been considered. For the energy losses, the operation
time, the area of the reactor and the difference between the
temperature inside the reactor and the ambient temperature were
considered. The original input parameters, as extracted from the
SOP were corrected according to the actual process parameters
extracted from the EMT. The energy modeling was then
calculated again by means of the documentation based approach
considering the mentioned corrections one at each time.
Additionally, a model run was performed considering all
corrections simultaneously, approaching in this way the real
process conditions.
For the evaluation of the model performance after the mentioned
input corrections, some descriptive statistics and various
diagnostic measures are reported in Table C.1.1. As model error
measures the mean absolute error (MAE), the mean absolute
relative error (MARE), and the root mean squared error (RMSE)
were used, decomposed into a systematic (RMSEs) and an
unsystematic (RMSEu) part. The RMSEs describes the linear
APPENDIX 147
bias produced by the model, whereas the RMSEu may be
considered as a measure of precision. Besides these error
measures, various indices of agreement between model
predictions and reference values were calculated, that is the
square of the correlation coefficient (r2), the coefficient of
determination (q2) and refined indices of agreement (d1, d2, dr)
(Willmott. et al., 2012). As can be seen in Table C.1.1, the MARE,
MAE and RMSE differ and present a decrease with respect to the
start input only for reflux considerations for the theoretical energy,
and operation time for energy losses. Regarding the indices of
agreement, the highest values and thus a better agreement
between observed and predicted values also correspond to reflux
considerations and operation time. Since these evaluation criteria
indicate a reduction of the error and an increase of the prediction
capabilities only for reflux conditions and operation time, both
parameters are considered as influential on the model results.
148 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Table C.1.1. Statistical evaluation of the documentation based approach model for the theoretical energy, energy losses, and total energy consumption in the case of dataset-1, during sensitivity analysis scenarios for the model input parameters (start: original parameters from the process documentation, reflux: reflux conditions during distillation, properties: physico-chemical properties of the substances and equipment characteristics, mass: reaction mass, dT: temperature increase during process operation, all: all parameters simultaneously). The units are given in kg of steam/ batch.
Theoretical energy Energy losses Total energy
Para-meter
start reflux prop mass dT all start time prop dT all reflux +time
N 18 18 18 18 18 18 18 18 18 18 18 18
mean_r 2801 2801 2801 2801 2801 2801 1439 1439 1439 1439 1439 4252
mean_m 2047 2390 1944 2163 2040 2380 770 1199 1363 751 1897 3589
sd_r 3109 3109 3109 3109 3109 3109 1441 1441 1441 1441 1441 4530
sd_m 1977 2457 2028 2020 1961 2460 1100 1443 2211 1097 2526 3762
a 538 263 421 656 557 256 -71.84 -5.51 -140 -62.25 84.06 250
b 0.54 0.76 0.54 0.54 0.53 0.76 0.58 0.84 1.04 0.57 1.26 0.79
MARE 0.29 0.25 0.29 0.34 0.34 0.27 0.56 0.42 0.61 0.56 0.66 0.27
MAE 946 604 990 951 990 615 744 536 978 765 1010 1030
RMSE 1884 1067 1954 1888 1918 1088 1121 838 1577 1163 1805 1658
RMSEs 1583 835 1623 1534 1613 843 886 331 99 919 585 1155
RMSEu 1020 664 1088 1101 1038 687 687 770 1574 714 1707 1190
APPENDIX 149
d2 0.85 0.96 0.84 0.85 0.84 0.96 0.81 0.91 0.78 0.79 0.76 0.96
d1 0.79 0.87 0.78 0.79 0.78 0.87 0.68 0.78 0.64 0.67 0.65 0.85
dr 0.82 0.88 0.81 0.82 0.81 0.88 0.68 0.77 0.58 0.67 0.57 0.86
q2 0.61 0.88 0.58 0.61 0.60 0.87 0.36 0.64 -0.27 0.31 -0.66 0.86
r2 0.72 0.92 0.70 0.69 0.70 0.92 0.59 0.70 0.46 0.55 0.52 0.89
Symbols and abbreviations
a Intercept of a least-squares regression line between predicted and observed variables b Slope of a least-squares regression line between predicted and observed variables dr Refined index of agreement d1 Refined index of agreement d2 Refined index of agreement MAE Mean Absolute Error MARE Mean Absolute Relative Error mean_r Mean value of the reference dataset mean_m Mean value of the model dataset N Number of data points q
2 Coefficient of determination
r2
Square of the correlation coefficient RMSE Root mean squared error RMSEs Systematic root mean squared error RMSEu Unsystematic root mean squared error sd_m Standard deviation of the model dataset sd_r Standard deviation of the reference dataset
150 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
C.2 Statistic Evaluation of the Documentation based
Approach Model Performance
Comparing the results of the theoretical energy predictions shown
in Tables C.1.1 and C.2.1, it is interesting to notice that for all the
cases of dataset-1, the systematic portion of the error RMSEs is
bigger than the unsystematic part RMSEu, whereas for dataset-2
the opposite direction is observed. This can be explained by the
fact that some corrections were made in the input parameters for
the theoretical energy calculation of dataset-1, whereas for
dataset-2 this was not the case. Therefore, even though the high
RMSEs for the first case study showed that there is a potential for
improvement of the theoretical energy model performance by
means of a more detailed process modeling with less
assumptions and default values, the RMSEs for the second case
study indicated a low portion of systematic error and, hence, a
good accuracy of the model based only on process
documentation without any further corrections of the default input
parameters and assumptions.
On the other hand, the RMSEs and RMSEu for the energy losses
present the opposite trend, being the ratio of unsystematic error
to total error higher for the first case study and lower for the
second. A high RMSEu to RMSE ratio reveals the difficulty of
modeling the energy losses in an accurate way, even when the
operation time is known, and suggests that it would be difficult to
further improve the results by means of a more detailed modeling.
On the contrary, a low RMSEu to RMSE ratio like for case study 2
implies that there is bias produced by the model. Since the loss
constant (Kloss) was chosen among two different proposed values
APPENDIX 151
(Bieler et al., 2004), according to the performance on dataset-1,
when this constant is applied to a different production plant, it
may tend to produce a systematic error and increase the bias in
the results. Finally, it should be noted that for both datasets the
MARE and the three indices of agreement confirm the expected
trend of higher prediction capability for modeling of theoretical
energy, followed by total energy and energy losses.
Table C.2.1. Statistical evaluation of the documentation based approach model
for the theoretical energy, energy losses, and total energy consumption in the
case of dataset-2. The units are given in kg of steam/ batch.
Statistical parameters
Theoretical energy
Energy losses Total energy
N 20 20 20
mean_r 1183 622 1805
mean_m 1247 396 1642
sd_r 1201 819 1897
sd_m 1378 403 1738
a -2.24 156.89 163.05
b 1.06 0.38 0.82
MARE 0.31 2.46 0.37
MAE 300 337 546
RMSE 533 595 843
RMSEs 91 542 372
RMSEu 526 245 757
d2 0.95 0.74 0.94
d1 0.86 0.67 0.82
dr 0.86 0.72 0.83
q2 0.79 0.45 0.79
r2 0.85 0.61 0.80
152 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Symbols and Abbreviations
a Intercept of a least-squares regression line between predicted and observed variables
b Slope of a least-squares regression line between predicted and observed variables
dr Refined index of agreement d1 Refined index of agreement d2 Refined index of agreement MAE Mean Absolute Error MARE Mean Absolute Relative Error mean_r Mean value of the reference dataset mean_m Mean value of the model dataset N Number of data points q
2 Coefficient of determination
r2
Square of the correlation coefficient RMSE Root mean squared error RMSEs Systematic root mean squared error RMSEu Unsystematic root mean squared error sd_m Standard deviation of the model dataset sd_r Standard deviation of the reference dataset
APPENDIX 153
C.3 Observed and Predicted Steam Consumption
Tables C.3.1 and C.3.2 show the observed and the predicted
theoretical energy, energy losses and total energy values for the
first and second case study, respectively. Additionally, the relative
errors and the success of the predicted results considering also
the uncertainty ranges and batch-to-batch variability are shown
for each individual case. The problematic cases which were
pointed out in the main text (Figure 4.1.1, Figure 4.1.2, Figure
4.1.4, Figure 4.2.1) are highlighted, since they present
considerable deviation from the observed values and the model
uncertainty ranges fail to capture the observed batch-to-batch
variability. These cases are discussed in detail in the following
sub-sections.
C.3.1 Theoretical Energy Consumption
Errors concerning distillation processes undergoing reflux, like it
is the case for the equipments 10 and 16 in Table C.3.1, and 6
and 17 in Table C.3.2, are a consequence of the assumption of a
standard value of energy consumption during reflux conditions.
This standard value was derived by averaging measurements of
steam consumption under strong reflux of butanol for 27 batches
(Bieler et al., 2004). Obviously, this shortcut model does not
account for different and perhaps extremely high or low reflux
ratios and it also does not consider the substance specific
enthalpies of vaporization. Therefore, an improvement of the
model predictions for processes undergoing reflux could be
achieved by explicitly including the reflux ratio in the energy
calculations, whenever it is available. On the other hand, errors
which arise from the use of default values for the heat capacity,
154 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
for instance in case of equipment 4 in Table C.3.1, can be
reduced by considering the substance specific heat capacities.
After these corrections the relative error for the theoretical energy
consumption reduces to 3%. Another case of model deviations
from observed energy consumption values is the one of
simultaneous heating and cooling due to suboptimal temperature
control system. This is the case for equipment 2 in Table C.3.2.
Finally, the inaccuracy of the temperature during distillation is one
of the causes of the high relative error for equipment 14 in Table
C.3.2. In this case the relative error reduces from 86 to 56% after
the temperature correction.
C.3.2 Energy Losses
Energy losses at low process temperatures are not very well
captured by the documentation based approach, as it is the case
for equipments 4 and 15 in Table C.3.1 and equipments 15 and
16 in Table C.3.2. In the first two cases, the initial temperature of
the reaction mass is around 15°C, while the final temperature
reaches approximately 30°C, and in the last two cases the
average temperature remains between 40 and 48°C. Considering
that the energy losses according to the documentation based
approach are proportional to the temperature difference between
the ambient temperature and the average temperature inside the
reactor, a low temperature difference implies very low energy
losses. For the equipments 4 and 15 (Table C.3.1) the heat
exchange area of the equipment was identified as the source of
error in the model predictions. After replacing standard default
area values by specific area information, the relative errors
reduced to 16% and 4% respectively. In addition to the area,
APPENDIX 155
inaccuracies in the temperature inside the reactor were also
leading to error in the energy losses estimations of equipments
14 and 17 (Table C.3.1). After corrections of both parameters, the
relative errors were reduced to -40% and -3%. For equipment 5 in
Table C.3.2, the high relative error is produced mainly by
temperature inaccuracies in the process documentation, and after
correction the relative error was reduced to 36%. Therefore, for
most problematic cases the failures of the model predictions
could be explained based on inaccuracies of the model input
values and the respective relative errors were reduced after
appropriate correction. However, there are two cases which
remain unclear. For equipment 12 in Table C.3.2, there are
reasons to believe, based on the trend of the sensor data of
steam consumption, that there might be a problem with the
observed value (measurement). Finally, no specific input
inaccuracy could be detected for equipment 6 in Table C.3.2. In
this case, a model uncertainty according to the support instead of
the core of the fuzzy interval would be more appropriate.
156 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Table C.3.1. Detailed assessment of the documentation based approach model predictions of the theoretical energy, energy
losses, and total energy consumption for all individual points in the case of dataset-1 (*ok: within batch-to-batch variability
range (success), ko: outside batch-to-batch variability range (no success), **ok: within fuzzy uncertainty range (success),
ko: outside uncertainty range (no success)). The units are given in kg of steam/ batch.
Theoretical energy Energy losses Total energy
Eq
ua
tio
n
Ob
se
rve
d
me
dia
n
Mo
del
pre
dic
tion
Assess-
me
nt*
Re
lative
e
rro
r %
Ob
se
rve
d
me
dia
n
Mo
del
pre
dic
tion
Assess-
me
nt*
Re
lative
e
rro
r %
Ob
se
rve
d
me
dia
n
Mo
del
pre
dic
tion
Assess-
me
nt*
Assess-
me
nt*
*
Re
lative
e
rro
r %
1 390 445 ok 14 437 473 ok 8 815 918 ok ok 13
2 257 435 ok 69 259 7 ko -97 529 443 ok ok -16
3 832 842 ok 1 296 364 ok 23 1129 1205 ok ok 7
4 2239 3321 ko 48 1310 2178 ko 66 3552 5499 ko ok 55
5 603 720 ok 19 408 475 ok 16 1016 1195 ok ok 18
6 183 178 ok -2 206 126 ok -39 384 305 ok ok -21
7 109 146 ok 34 210 267 ok 27 314 412 ok ok 32
8 7094 6485 ok -9 3520 3860 ok 10 10652 10345 ok ok -3
9 74 13 ok -82 144 34 ok -76 211.5 47 ok ko -78
APPENDIX 157
10 7703 4088 ko -47 4556.5 3679 ok -19 12468 7767 ko ko -38
11 7131 6550 ok -8 2648 2094 ok -21 9779 8644 ok ok -12
12 1687 1941 ok 15 1362 1960 ok 44 3006 3901 ok ok 30
13 743 627 ok -15 283 300 ok 6 1023 927 ok ok -9
14 890 1017 ok 14 607 109 ko -82 1497 1126 ok ok -25
15 3722 2798 ok -25 1495 541 ko -64 5233 3339 ko ko -36
16 9543 7662 ko -20 3685 4296 ok 17 13209 11958 ok ok -9
17 1486 1376 ok -7 833 399 ko -52 2312 1775 ok ok -23
18 5068 4103 ok -19 3262 441 ko -86 8351 4544 ko ko -46
158 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Table C.3.2. Detailed assessment of the documentation based approach model predictions of the theoretical energy, energy
losses, and total energy consumption for all individual points in the case of dataset-2 (*ok: within batch-to-batch variability
range (success), ko: outside batch-to-batch variability range (no success), **ok: within fuzzy uncertainty range (success),
ko: outside uncertainty range (no success)). The units are given in kg of steam/ batch.
Theoretical energy Energy losses Total energy
Eq
ua
tio
n
Ob
se
rve
d
me
dia
n
Mo
del
pre
dic
tion
Assess-
me
nt*
Assess-
me
nt*
*
Re
lative
e
rro
r %
Ob
se
rve
d
me
dia
n
Mo
del
pre
dic
tion
Assess-
me
nt*
Assess-
me
nt*
*
Re
lative
e
rro
r %
Ob
se
rve
d
me
dia
n
Mo
del
pre
dic
tion
Assess-
me
nt*
Assess-
me
nt*
*
Re
lative
e
rro
r %
1 2544 2775 ok ok 9 781 959 ok ok 23 3325 3734 ok ok 12
2 373 106 ko ko -72 28 51 ok ok 85 401 157 ko ko -61
3 408 609 ok ko 49 457 692 ok ok 52 864 1301 ko ok 50
4 117 59 ko ko -50 74 58 ok ok -21 191 117 ko ko -39
5 3554 3547 ok ok 0 1646 780 ko ko -53 5200 4327 ko ok -17
6 3229 2071 ko ko -36 1408 488 ko ko -65 4637 2559 ko ko -45
7 216 188 ok ok -13 -7 13 ok ok -282 209 201 ok ok -4
8 77 95 ok ok 24 133 10 ko ko -93 210 105 ko ko -50
APPENDIX 159
9 0 0 ok ok 0 0 0 ok ok 0 0 0 ok ok 0
10 1618 1331 ok ok -18 769 389 ok ok -49 2387 1721 ok ok -28
11 291 381 ko ok 31 24 143 ko ko 496 295 524 ko ok 78
12 2427 2580 ok ok 6 3262 1142 ko ko -65 5689 3722 ko ko -35
13 458 514 ok ok 12 191 225 ok ok 18 649 739 ko ok 14
14 225 418 ko ko 86 106 95 ok ok -11 330 513 ko ok 55
15 1619 1183 ok ko -27 375 108 ko ko -71 1994 1291 ok ko -35
16 118 90 ok ok -24 307 97 ko ko -69 425 187 ko ko -56
17 2904 4641 ko ko 60 883 1027 ok ok 16 3787 5668 ko ok 50
18 2068 2965 ko ok 43 1583 902 ok ok -43 3652 3867 ok ok 6
19 1106 1161 ok ok 5 400 730 ok ok 82 1506 1891 ko ok 26
20 314 276 ok ok -12 39 1 ko ko -97 353 277 ko ok -22
160 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
D Supporting Information to Chapter 5
APPENDIX 161
D.1 Prior Selection
One of the settings of the CART algorithm (L. Breiman et al.,
1984) is the selection of the prior probabilities (priors). The priors
refer to the probability of each class previous to any empirical ev-
idence. Priors are calculated by default in MATLAB1 based on the
class frequencies. Besides growing trees using default priors, we
set all priors to be equal (uniform priors) in order to evaluate the
influence of the prior selection on the model performance.
0.1 0.12 0.14 0.16 0.18 0.20
0.2
0.4
0.6
0.8
1
S1
S2S3S4S5
S1S2
S3 S4
S5
1− Specificity
Sensitiv
ity
frequency priors
uniform priors
Figure D.1.1. Average model performance for cross validation test set for five
stages of process design (S1 to S5) for dataset-1 (8 candidate predictor
variables) and 3 output classes considering frequency-based and uniform priors.
The line denotes random classifier performance. Models that fall into the right
region defined by the random line perform worse than random performance, and
models that fall into the left region perform better than random performance. The
point in the top left corner depicts perfect classification.
1 Statistics Toolbox
TM.
162 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
D.2 Performance of Classification Trees per Output Class
Table D.2.1. Test cross validation performance of the classification models at
five stages of process design (S1 to S5) built from dataset-1, depicted per
output class.
Stage Class Sensitivity Specificity Accuracy D1* D2*
S1 low 0.90 0.59 0.75 0.42 0.49 middle 0.00 1.00 0.78 1.00 0.00 high 0.66 0.78 0.75 0.40 0.44
S2 low 0.89 0.79 0.84 0.24 0.68 middle 0.31 0.94 0.80 0.69 0.25 high 0.72 0.81 0.79 0.34 0.53
S3 low 0.85 0.80 0.82 0.25 0.65 middle 0.40 0.90 0.79 0.61 0.30 high 0.67 0.84 0.79 0.37 0.51
S4 low 0.90 0.75 0.83 0.27 0.65 middle 0.40 0.91 0.80 0.61 0.31 high 0.70 0.90 0.85 0.32 0.60
S5 low 0.80 0.81 0.80 0.28 0.61 middle 0.65 0.79 0.76 0.41 0.44 high 0.69 0.98 0.91 0.31 0.67
* D1 is the Euclidean distance to the random line and D2 is equal to the
Euclidean distance to the point (0,1).
APPENDIX 163
D.3 Average Model Performance of the S1 Tree
0 1 3 160
low1
middle
3
high
16
Steam training dataset [kg/kg product]
Ste
am
mo
de
l cla
sse
s [
kg
/kg
pro
du
ct]
51%
94% 49% 36%
64%6%
Figure D.3.1. Average model performance of the S1 tree (resubstitution
validation), considering dataset-1 (maximal 8 predictor variables) and 3 output
classes. The training data is presented on the x axis and the predicted classes
on the y axis. The data points lying inside the bold boxes on the diagonal axis
represent the data which actually belong to one class and were predicted within
that class. The points lying inside the boxes on the non diagonal bottom right
area represent underestimated values. The points lying inside the boxes on the
non diagonal top left area represent overestimated values.
164 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
D.4 Classification Trees
Table D.4.1. Classification tree for the S1 design stage
Node Parent node
Rule
1 0 IF the reaction type is acylation OR alkylation OR complexation OR condensation OR hydrolysis OR polymerization OR reduction THEN go to internal node 2 ELSEIF reaction type is acylation (cyanur chloride) OR azo-coupling OR diazotization OR elimination OR halogenation OR sulfonation THEN go to terminal node 3
2 1 IF the reaction mechanism is AN OR SEAr OR AEN THEN go to terminal node 4 ELSEIF the reaction mechanism is HC OR RAD OR SN1 OR SN2 OR SNAr THEN go to terminal node 5
3 1 low steam consumption 4 2 low steam consumption 5 2 high steam consumption
APPENDIX 165
Table D.4.2. Classification tree for the S2 design stage.
Node Parent node
Rule
1 0 IF the reaction type is acylation OR alkylation OR complexation OR condensation OR hydrolysis OR polymerization OR reduction THEN go to internal node 2 ELSEIF reaction type is acylation (cyanur chloride) OR azo-coupling OR diazotization OR elimination OR halogenation OR sulfonation THEN go to terminal node 3
2 1 IF distillation does not take place THEN go to internal node 4 ELSEIF distillation takes place go to terminal node 5
3 1 low steam consumption 4 2 IF the reaction mechanism is AN OR SEAr OR AEN OR SN2 THEN go to terminal node 6 ELSEIF the
reaction mechanism is HC OR RAD OR SNAr THEN go to terminal node 7
5 2 high steam consumption 6 4 low steam consumption 7 4 middle steam consumption
166 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Table D.4.3. Classification tree for the S3 design stage.
Node Parent node
Rule
1 0 IF the reaction type is acylation OR alkylation OR complexation OR condensation OR hydrolysis OR polymerization OR reduction THEN go to internal node 2 ELSEIF reaction type is acylation (cyanur chloride) OR azo-coupling OR diazotization OR elimination OR halogenation OR sulfonation THEN go to terminal node 3
2 1 IF time is lower than 18 hours THEN go to internal node 4 ELSEIF time is higher than 18 hours go to terminal node 5
3 1 low steam consumption 4 2 IF the reaction mechanism is AN OR SEAr OR AEN THEN go to terminal node 6 ELSEIF the reaction
mechanism is HC OR RAD OR SN2 OR SNAr THEN go to terminal node 7 5 2 high steam consumption
6 4 IF Tmax is lower than 93°C then go to terminal node 8 ELSEIF Tmax is higher than 93°C THEN go to terminal node 9
7 4 IF distillation does not take place THEN go to terminal node 10 ELSEIF distillation takes place go to terminal node 11
8 6 low steam consumption 9 6 middle steam consumption
10 7 middle steam consumption 11 7 high steam consumption
APPENDIX 167
Table D.4.4. Classification tree for the S4 design stage.
Node Parent node
Rule
1 0 IF the reaction type is acylation OR alkylation OR complexation OR condensation OR hydrolysis OR polymerization OR reduction THEN go to internal node 2 ELSEIF reaction type is acylation (cyanur chloride) OR azo-coupling OR diazotization OR elimination OR halogenation OR sulfonation THEN go to terminal node 3
2 1 IF time is lower than 18 hours THEN go to internal node 4 ELSEIF time is higher than 18 hours go to terminal node 5
3 1 low steam consumption 4 2 IF the reaction mechanism is AN OR SEAr OR AEN THEN go to internal node 6 ELSEIF the reaction
mechanism is HC OR RAD OR SN2 OR SNAr THEN go to internal node 7
5 2 high steam consumption 6 4 IF Tmax is lower than 93°C then go to terminal node 8 ELSEIF Tmax is higher than 93°C THEN go to
internal node 9
7 4 IF distillation does not take place THEN go to terminal node 10 ELSEIF distillation takes place go to terminal node 11
8 6 low steam consumption
9 6 IF PMI is lower than 4 THEN go to terminal node 12 ELSEIF PMI is higher than 4 THEN go to terminal node 13
10 7 middle steam consumption
11 7 high steam consumption 12 9 low steam consumption
13 9 middle steam consumption
168 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Table D.4.5. Classification tree for the S5 design stage
Node Parent node
Rule
1 0 IF Steamdist is lower than 1.5 kg THEN go to internal node 2 ELSEIF Steamdist is higher than 1.5 kg THEN go to terminal node 3
2 1 IF Tmax is lower than 79°C then go to terminal node 4 ELSEIF Tmax is higher than 79°C THEN go to internal node 5
3 1 high steam consumption 4 2 low steam consumption 5 2 IF Steamdist is lower than 0.5 kg THEN go to internal node 6 ELSEIF Steamdist is higher than 0.5 kg
THEN go to terminal node 7 6 5 IF the reaction type is acylation OR alkylation OR condensation OR halogenations OR hydrolysis OR
sulfonation THEN go to terminal node 8 ELSEIF reaction is complexation OR azo-coupling OR polymerization OR reduction THEN go to terminal node 9
7 5 middle steam consumption 8 6 low steam consumption
9 6 middle steam consumption
APPENDIX 169
E Supporting Information to Chapter 6
170 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
E.1 Goodness of Fit of PDF Models
APPENDIX 171
0 5 100
10
20Acylation
gam
0 5 100
20
40Acylation (cyanur chloride)
exp
0 5 10 150
10
20Alkylation
exp
0 2 4 60
20
40Azo−coupling
exp
0 5 10 15 200
5
10Complexation
gam
0 5 10 15 200
5
10Condensation
exp
0 0.05 0.1 0.15 0.20
20
40Diazotization
exp
0 0.5 1 1.50
5Elimination
logn
0 5 100
5
10
Halogenation
logn
0 5 10 150
5
10Hydrolysis
exp
0 5 10 15 200
5
10
Polymerizationlo
gn
0 5 10 15 200
5
10Reduction
exp
0 0.1 0.2 0.3 0.40
5
unif
Sulfonation
Figure E.1.1. Histograms of the steam training data and the corresponding superimposed fitted probability distributions.
172 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
0 5 100
0.5
1gam
Acylation
0 5 100
0.5
1
exp
Acylation (cyanur chloride)
0 5 10 150
0.5
1
exp
Alkylation
0 2 4 60
0.5
1
exp
Azo−coupling
0 5 10 150
0.5
1
gam
Complexation
0 5 10 150
0.5
1
exp
Condensation
0 0.05 0.1 0.15 0.20
0.5
1
exp
Diazotization
0 0.5 10
0.5
1
logn
Elimination
0 2 4 60
0.5
1
logn
Halogenation
0 5 10 150
0.5
1
exp
Hydrolysis
0 5 10 15 200
0.5
1
logn
Polymerization
0 5 10 15 200
0.5
1
exp
Reduction
0 0.1 0.2 0.3 0.40
0.5
1
uniform
Sulfonation
Data
Model
Figure E.1.2. Cumulative distribution functions of the steam training data and the corresponding statistical populations.
APPENDIX 173
0 5 10 15 200
10
20
ga
m
Acylation
0 1 2 3 4−10
0
10
exp
Acylation (cyanur chloride)
0 5 10 15−20
0
20
exp
Alkylation
0 1 2 3−10
0
10
exp
Azo−coupling
0 5 10 150
10
20
ga
m
Complexation
0 5 10 150
10
20
exp
Condensation
0 0.05 0.1−0.2
0
0.2
exp
Diazotization
0 2000 4000 6000−2000
0
2000
log
n
Elimination
0 2 4 60
5
log
n
Halogenation
0 2 4 60
10
20
exp
Hydrolysis
0 10 20 30−50
0
50lo
gn
Polymerization
0 5 10 150
10
20
exp
Reduction
0 0.1 0.2 0.3 0.40
0.2
0.4
un
ifo
rm
Sulfonation
Figure E.1.3. Q-Q plots of the steam training data (y-axis) against the corresponding statistical populations (x-axis).
174 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Table E.1.1. Goodness of the fit assessment of the PDF models.
Reaction type parameterization fitted PDF n χ2 p(χ2) D p(D) AD p(AD) weight delta Acylation gamma 33 4.44 0.49 0.11 0.74 na na 0.81 0.00
Time<18h gamma 21 8.75 0.07 0.20 0.32 na na 0.85 0.00 Time>18h lognormal 12 2.00 0.37 0.22 0.53 0.44 0.24 0.24 0.00
Acylation
(cyanur chloride) lognormal 22 7.54 0.11 - - - - 0.43 0.00
Alkylation gamma 33 6.58 0.25 0.15 0.38 na na 0.78 0.00
no distillation gamma 12 - - 0.28 0.22 na na 0.76 0.00 distillation weibull 21 2.5 0.65 - - 0.17 0.81 0.33 0.00
Azo-coupling gamma 25 6.1 0.19 - - na na 0.99 0.00
Complexation exponential 9 0.9 0.82 0.14 0.99 0.19 0.98 0.66 0.00 time<18h rayleigh 6 0.42 0.81 0.22 0.88 na na 0.38 0.00
Condensation exponential 25 4.45 0.49 0.14 0.64 0.52 0.42 0.34 0.00 Tmax<93°C exponential 8 3.59 0.31 0.26 0.58 0.62 0.30 0.49 0.00 Tmax>93°C lognormal 17 1.36 0.72 0.15 0.77 0.30 0.54 0.33 0.00
Diazotization exponential 10 1.32 0.72 0.32 0.22 1.11 0.08 0.35 0.05
Elimination gamma 9 - - - - na na 0.93 0.00
APPENDIX 175
Halogenation lognormal 14 3.6 0.31 0.10 0.99 0.19 0.87 0.53 0.00
Hydrolysis gamma 9 1.04 0.59 0.40 0.09 na na 0.88 0.00
Polymerization lognormal 18 3.36 0.34 0.15 0.78 0.56 0.13 0.40 0.00
time<18 lognormal 8 1.6 0.45 0.24 0.67 0.41 0.26 0.20 1.56 time>18 uniform 10 na na 0.23 0.60 na na 0.70 0.00
Reduction exponential 14 7.52 0.11 0.20 0.57 0.52 0.41 0.60 0.00
Sulfonation gamma 9 - - 0.34 0.21 na na 0.89 0.00
Symbols
n number of data points (degrees of freedom) na non applicable χ2 Chi-Square statistic p(χ2) p-value associated with the Chi-Square statistic D Kolmogorov-Smirnov statistic p(D) p-value associated with the Kolmogorov-Smirnov statistic AD Anderson-Darling statistic p(AD) p-value associated with the Anderson-Darling statistic weight Weight of evidence in favor of the model being the actual best model (Akaike weight) delta Measure of the Akaike Information Criterion of the model relative to the best model
176 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
The test statistics presented in this table are obtained from the
Chi-Square-, the Kolmogorov-Smirnoff and the Anderson-Darling
goodness of fit tests (Vose, 2008). These non-parametric tests
return a probability that a randomly drawn dataset from the fitted
distribution might have generated a test statistic at least as low as
the one observed. The null hypothesis that the data originate
from the hypothesized distribution is rejected, if the p-value is
lower than the pre-specified significance level α and the best fit
among different candidate distributions is reflected by the lowest
value of a given test statistic. The reported test statistics in this
table have a p-value greater than a significance level of 5% (p-
value > 0.05). The test statistics corresponding to a significance
level lower than 5% are not reported. A higher p-value implies
that the probability of getting the corresponding statistic from a
sample of the same size is higher. Besides these conventional
test statistics, the Akaike Information Criterion (AIC) has been
used to select the optimal distribution. The AIC is a measure of
the relative quality of a statistical model for a dataset. Compared
to conventional test statistics, AIC is superior in selecting the
optimal distribution, since they account for model complexity,
penalizing distributions with more parameters, and are better
suited for relative model comparison by means of the Akaike
weights and delta. The best model has a delta equal to cero, and
the biggest weight among all compared models.
APPENDIX 177
F Supporting Information to Chapter 7
178 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
F.1 Modeling results – Case Study II
Table F.1.1. Steam consumption estimatins (kg /kg product) for the reactions presented in Figure 7.2.1, considering the
median values of the PDF models.
Route Reaction 1 Reaction 2 Reaction 3 Reaction 4 Reaction 5 Reaction 6 Reaction 7 Reaction 8 Total
A 1.2 1.2 1.7 3.1 0 1.2 8.4
B 1.2 3.1 1.2 1.7 3.1 0 1.2 10.3 C 0.4 1.7 3.1 1.2 1.2 3.1 10.7
D 0.4 1.2 1.2 3.1 0 1.2 5.9
E 1.2 1.7 3.1 6
F 1.2 1.2 1.2 0.4 1.7 3.1 0 1.2 10
G 1.2 0.4 1.2 3.1 5.9
Table F.1.2. Steam consumption estimations (kg /kg product) for the reactions presented in Figure 7.2.1, considering the
median values of the PDF models and the corresponding reaction yields (Table B.1.1.2) and stoichiometric ratios.
Route Reaction 1 Reaction 2 Reaction 3 Reaction 4 Reaction 5 Reaction 6 Reaction 7 Reaction 8 Total
A 0.8 1.2 2.2 4.6 0.0 1.4 10.2
B 0.9 3.0 1.1 2.2 4.6 0.0 1.4 13.2 C 0.3 1.7 5.1 1.7 1.4 4.2 14.3
D 0.2 1.4 1.8 4.5 0.0 1.4 9.3
E 0.8 2.2 4.2 7.3
F 0.9 1.1 1.5 0.6 3.0 4.6 0.0 1.4 13.0
G 0.8 0.4 1.9 4.2 7.3
APPENDIX 179
Table F.1.3. Environmental and economic proxy indicators for multi-objective
screening of chemical batch process alternatives during early design stages*.
Route ELI PLI ELI+PLI (weighted sum)
A 1.70 0.56 0.06
B 2.11 0.59 0.06
C 1.49 0.69 0.06
D 2.23 0.74 0.07 E 0.61 0.68 0.06
F 2.07 0.68 0.07
G 1.04 0.82 0.07
* (Albrecht et al., 2010)
F.2 One-way Anova – Case Study II
Table F.2.1. Descriptive statistics for the dependent variable steam
consumption and the grouping variable synthesis route. The values are given in
kg of steam per kg of product.
Reaction Mean Standard deviation
Number of points
A 14.72 8.83 1000
B 19.59 13.04 1000
C 23.47 16.20 1000 D 12.83 8.15 1000
E 12.40 8.51 1000
F 20.86 12.62 1000
G 13.89 11.43 1000
Total 16.82 12.28 7000
Table F.2.2. Test of homogeneity of variances
Levene statistic degrees of freedom 1
degrees of freedom 2
Sig.*
72.38 6 6993 .00
* Since the significance value is less than 0.05 we can say that the variances of
the different acylation reaction sub-groups are different. Therefore, the
assumption of homogeneity of variances has been violated.
180 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Table F.2.3. One-way anova test results.
Source of variation
Sum of squares
degrees of freedom
Mean square
F Sig.*
Between Groups
116578.52 6 19429.75 144.77 0.00
Within Groups
938533.03 6993 134.21
Total 1055111.55 6999
* Since the assumption of homogeneity of variances has been violated, we also
look at the Brown-Forsythe and Welch alternative F-ratios, which have been
derived to be robust in this cases (Field, 2009).
Table F.2.4. Robust test of equality of means.
Statistic* degrees of freedom 1
degrees of freedom 2**
Sig.***
Welch 130.40 6 3091.15 0.00 Brown-Forsythe 144.77 6 5649.75 0.00
* Asymptotically F distributed. ** Adjusted residual degrees of freedom. *** Both
test statistics are highly significant, thus we can say that there is a significant
difference among the different synthesis routes. Therefore we proceed with a
multiple comparison to compare pair wise all different combinations of groups.
Table F.2.5. Multiple comparisons using the Games-Howell procedure (Jaccard
et al., 1984)*.
Reaction (I) Reaction (J) Mean diffe-rence (I-J)
Stan-dard error
Sig.** CI 95% Lower bound
CI 95% Upper bound
A B -4.86 0.50 0.00 -6.33 -3.39 C -8.74 0.58 0.00 -10.46 -7.02 D 1.90 0.38 0.00 0.77 3.02 E 2.32 0.39 0.00 1.17 3.46 F -6.14 0.49 0.00 -7.58 -4.70 G 0.83 0.46 0.53 -0.52 2.18 B A 4.86 0.50 0.00 3.39 6.33 C -3.88 0.66 0.00 -5.82 -1.94 D 6.76 0.49 0.00 5.32 8.20 E 7.18 0.49 0.00 5.73 8.64 F -1.27 0.57 0.28 -2.97 0.42
APPENDIX 181
G 5.70 0.55 0.00 4.08 7.32 C A 8.74 0.58 0.00 7.02 10.46 B 3.88 0.66 0.00 1.94 5.82 D 10.63 0.57 0.00 8.94 12.33 E 11.06 0.58 0.00 9.35 12.77 F 2.60 0.65 0.00 0.69 4.52 G 9.58 0.63 0.00 7.72 11.43 D A -1.90 0.38 0.00 -3.02 -0.77 B -6.76 0.49 0.00 -8.20 -5.32 C -10.63 0.57 0.00 -12.33 -8.94 E 0.42 0.37 0.92 -0.68 1.52 F -8.03 0.48 0.00 -9.44 -6.63 G -1.06 0.44 0.20 -2.37 0.25 E A -2.32 0.39 0.00 -3.46 -1.17 B -7.18 0.49 0.00 -8.64 -5.73 C -11.06 0.58 0.00 -12.77 -9.35 D -0.42 0.37 0.92 -1.52 0.68 F -8.46 0.48 0.00 -9.88 -7.04 G -1.49 0.45 0.02 -2.82 -0.16 F A 6.14 0.49 0.00 4.70 7.58 B 1.27 0.57 0.28 -0.42 2.97 C -2.60 0.65 0.00 -4.52 -0.69 D 8.03 0.48 0.00 6.63 9.44 E 8.46 0.48 0.00 7.04 9.88 G 6.97 0.54 0.00 5.38 8.56 G A -0.83 0.46 0.53 -2.18 0.52 B -5.70 0.55 0.00 -7.32 -4.08 C -9.58 0.63 0.00 -11.43 -7.72 D 1.06 0.44 0.20 -0.25 2.37 E 1.49 0.45 0.02 0.16 2.82 F -6.97 0.54 0.00 -8.56 -5.38
* This procedure is recommended in cases when there is a doubt that the
population variances are equal (see Table F.2.2) (Field, 2009).
** The underlined values correspond to significance levels of less than 0.05,
thus they indicate which reaction pair are significantly different between each
other.
182 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
Bibliography
AKAIKE, H. 1974. NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION. Ieee Transactions on Automatic Control, AC19, 716-723.
ALBRECHT, T., PAPADOKONSTANTAKIS, S., SUGIYAMA, H. & HUNGERBÜHLER, K. 2010. Demonstrating multi-objective screening of chemical batch process alternatives during early design phases Chemical Engineering Research and Design, 22.
ANDREJ SZÏJJARTO, STAVROS PAPADOKONSTANTAKIS, ULRICH FISCHER & HUNGERBÜHLER, K. 2008. Bottom-up Modeling of the Steam Consumption in Multipurpose Chemical Batch Plants Focusing on Identification of the Optimization Potential. Industrial &
Engineering Chemistry Research, 47, 7323-7334. AYYUB, B. M. & MCCUEN, R. H. 1997. Probability, Statistics, & Reliability
for Engineers. B. MAURICE, R. FRISCHKNECHT, V. COELHO-SCHWIRTZ &
HUNGERBÜHLER, K. 2000. Uncertainty analysis in life cycle inventory. Application to the production of electricity with French coal power plants. Journal of Cleaner Production, 8, 95-108.
BAUER, P. E. & MACIEL, R. 2004. Incorporation of environmental impact criteria in the design and operation of chemical processes. Brazilian
Journal of Chemical Engineering, 21, 405-414. BIELER, P. 2004. Analysis and Modelling of the Energy Consumption of
Chemical Batch Plants. Dissertation submitted to the SWISS
FEDERAL INSTITUTE OF TECHNOLOGY ZURICH. BIELER, P. S., FISCHER, U. & HUNGERBUHLER, K. 2003. Modeling the
energy consumption of chemical batch plants - Top-down approach. Industrial & Engineering Chemistry Research, 42, 6135-6144.
BIELER, P. S., FISCHER, U. & HUNGERBUHLER, K. 2004. Modeling the energy consumption of chemical batch plants: Bottom-up approach. Industrial & Engineering Chemistry Research, 43, 7785-7795.
BUMANN, A. A., PAPADOKONSTANTAKIS, S., SUGIYAMA, H., FISCHER, U. & HUNGERBUEHLER, K. 2010. Evaluation and analysis of a proxy indicator for the estimation of gate-to-gate energy consumption in the early process design phases: The case of organic solvent production. Energy, 35, 2407-2418.
BIBLIOGRAPHY 183
BURGESS, A. A. & BRENNAN, D. J. 2001. Application of life cycle assessment to chemical processes. Chemical Engineering Science, 56, 2589-2604.
CANO-RUIZ, J. A. & MCRAE, G. J. 1998. Environmentally conscious chemical process design. Annual Review of Energy and the
Environment, 23, 499-536. CANTER, K. G., KENNEDY, D. J., MONTGOMERY, D. C., KEATS, J. B. &
CARLYLE, W. M. 2002. Screening stochastic life cycle assessment inventory models. International Journal of Life Cycle Assessment, 7, 18-26.
CAPELLO, C., HELLWEG, S., BADERTSCHER, B., BETSCHART, H. & HUNGERBUHLER, K. 2007. Part 1: The ecosolvent tool - Environmental assessment of waste-solvent treatment options. Journal
of Industrial Ecology, 11, 26-38. CAPELLO, C., HELLWEG, S. & HUNGERBUHLER, K. 2008.
Environmental assessment of waste-solvent treatment options - Part II: General rules of thumb and specific recommendations. Journal of
Industrial Ecology, 12, 111-127. CONCEPCIÓN JIMÉNEZ-GONZÁLEZ, ALAN D. CURZONS, DAVID J.C.
CONSTABLE & OVERCASH, M. R. 2001. How do you select the "greenest technology? Development of guidance for the pharmaceutical industry. Clean Products and Processes, 3, 35-41.
CONCEPCIÓN JIMÉNEZ-GONZÁLEZ, CONSTABLE, D. J. C., ALAN D. CURZONS & CUNNINGHAM, V. L. 2002. Developing GSK’s green technology guidance: methodology
for case-scenario comparison of technologies. Clean Techn Environ Policy, 4, 44-53.
CONCEPCIÓN JIMÉNEZ-GONZALEZ, SEUNGDO KIM & OVERCASH, M. R. 2000. Methodology for Developing Gate-to-Gate Life Cycle Inventory Information. International Journal of Life Cycle
Assessment, 5, 153 - 159. COOPER, J., GODWIN, C. & HALL, E. S. 2008. Modeling process and
material alternatives in life cycle assessments. International Journal of
Life Cycle Assessment, 13, 115-123. COSMI, C., LOPERTE, S., MACCHIATO, M., PIETRAPERTOSA, F.,
RAGOSTA, M. & SALVIA, M. 2004. Life cycle assessment and multivariate data analysis for an integrated characterisation of the technologies for electric energy production. Air Pollution Xii, 14, 67-75.
184 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
CURZONS, A. D., JIMENEZ-GONZALEZ, C., DUNCAN, A. L., CONSTABLE, D. J. C. & CUNNINGHAM, V. L. 2007. Fast life cycle assessment of synthetic chemistry (FLASC (TM)) tool. International Journal of Life Cycle Assessment, 12, 272-280.
FERRERO, A. & SALICONE, S. 2003. The random-fuzzy variables: A new approach for the expression of uncertainty in measurement. Proceedings of the 20th IEEE Instrumentation Technology Conference
(Cat. No.03CH37412). Piscataway, NJ, USA. FIELD, A. 2009. Discovering statistics using SPSS. FRISCHKNECHT, R., JUNGBLUTH, N., ALTHAUS, H. J., DOKA, G.,
DONES, R., HECK, T., HELLWEG, S., HISCHIER, R., NEMECEK, T., REBITZER, G. & SPIELMANN, M. 2005. The ecoinvent database: Overview and methodological framework. International
Journal of Life Cycle Assessment, 10, 3-9. G. E. KNIEL, K. DELMARCO & J.G.PETRIE 1996. Life Cycle Assessment
Applied to Process Design: Environmental and Economic Analysis and Optimization of a Nitric Acid Plant. Environmental Progress, 15, 221-228.
HAU, J. L., YI, H. S. & BAKSHI, B. R. 2007. Enhancing life-cycle inventories via reconciliation with the laws of thermodynamics. Journal of
Industrial Ecology, 11, 5-25. HELLWEG, S., FISCHER, U., SCHERINGER, M. & HUNGERBUHLER, K.
2004. Environmental assessment of chemicals: methods and application to a case study of organic solvents. Green Chemistry, 6, 418-427.
HONG, J. L., SHAKED, S., ROSENBAUM, R. K. & JOLLIET, O. 2010. Analytical uncertainty propagation in life cycle inventory and impact assessment: application to an automobile front panel. The
International Journal of Life Cycle Assessment, 15, 499-510. HUIJBREGTS, M. A. J., ROMBOUTS, L. J. A., HELLWEG, S.,
FRISCHKNECHT, R., HENDRIKS, A. J., VAN DE MEENT, D., RAGAS, A. M. J., REIJNDERS, L. & STRUIJS, J. 2006. Is cumulative fossil energy demand a useful indicator for the environmental performance of products? Environmental Science &
Technology, 40, 641-648. I. WITTEN, E. F. 2005. Data Mining. JACCARD, J., BECKER, M. A. & WOOD, G. 1984. PAIRWISE MULTIPLE
COMPARISON PROCEDURES - A REVIEW. Psychological
Bulletin, 96, 589-596.
BIBLIOGRAPHY 185
JEAN-LUC CHEVALIER, J.-F. L. T. 1996. Life Cycle Analysis with Ill-Defined Application to Building Products. The International Journal
of Life Cycle Assessment, 1, 90-96. JENCK, J. F., AGTERBERG, F. & DROESCHER, M. J. 2004. Products and
processes for a sustainable chemical industry: a review of achievements and prospects. Green Chemistry, 6, 544-556.
JIMENEZ-GONZALEZ, C., PONDER, C. S., BROXTERMAN, Q. B. & MANLEY, J. B. 2011. Using the Right Green Yardstick: Why Process Mass intensity Is Used in the Pharmaceutical Industry To Drive More Sustainable Processes. Organic Process Research & Development, 15, 912-917.
L. BREIMAN, J. H. FRIEDMAN, R. A. OLSHEN & STONE, C. J. 1984. Classification and regression trees, New York, Chapman and Hall.
LE LANN, M. V., CABASSUD, M. & CASAMATTA, G. 1999. Modeling, optimization and control of batch chemical reactors in fine chemical production. Annual Reviews in Control, 23, 25-34.
LINNHOFF, B. 1993. PINCH ANALYSIS - A STATE-OF-THE-ART OVERVIEW. Chemical Engineering Research and Design, 71, 503-522.
MACLEOD, M., FRASER, A. J. & MACKAY, D. 2002. Evaluating and expressing the propagation of uncertainty in chemical fate and bioaccumulation models. Environmental Toxicology and Chemistry, 21, 700-709.
MAURIS, G., LASSERRE, V. & FOULLOY, L. 2001. A fuzzy approach for the expression of uncertainty in measurement. Measurement, 29, 165-177.
MORGAN., M. G. & HENRION., M. 1990. Uncertainty: A Guide to Dealing
with Uncertainty in Quantitative Risk and Policy Analysis, Cambridge, UK.
MUELLER, K. G. & BESANT, C. B. 1999. Streamlining life cycle analysis: A method. First International Symposium on Environmentally Conscious
Degign and Inverse Manufacturing, Proceedings, 114-119. MUELLER, K. G., LAMPERTH, M. U. & KIMURA, F. 2004. Parameterised
inventories for life cycle assessment - Systematically relating design parameters to the life cycle inventory. International Journal of Life
Cycle Assessment, 9, 227-235. MYUNG, I. J. 2003. Tutorial on maximum likelihood estimation. Journal of
Mathematical Psychology, 47, 90-100.
186 MODELING OF STEAM CONSUMPTION IN CHEMICAL BATCH PLANTS
OPPENHEIMER, O. & SORENSEN, E. 1997. Comparative energy consumption in batch and continuous distillation. Computers &
Chemical Engineering, 21, S529-S534. PATTERSON, M. G. 1996. What is energy efficiency? Concepts, indicators
and methodological issues. Energy Policy, 24, 377-390. PAUL T. ANASTAS & WARNER, J. C. 1998. Green Chemistry Theory and
Practice, Oxford. PERKINS, N. J. & SCHISTERMAN, E. F. 2006. Re: "The inconsistency of
'optimal' cutpoints obtained using two criteria based on the receiver operating characteristic curve" - The authors reply. American Journal
of Epidemiology, 164, 708-708. PETER SALING, ANDREAS KICHERER, BRIGITTE DITTRICH-
KRÄMER, ROLF WITTLINGER, WINFRIED ZOMBIK, ISABELL SCHMIDT, SCHROTT, W. & SCHMIDT, S. 2002. Eco-efficiency Analysis by BASF: The Method. International Journal of Life Cycle
Assessment, 1-16. PHILLIPS, C. H., LAUSCHKE, G. & PEERHOSSAINI, H. 1997.
Intensification of batch chemical processes by using integrated chemical reactor-heat exchangers. Applied Thermal Engineering, 17, 809-824.
RERAT, C., PAPADOKONSTANTAKIS, S. & HUNGERBUEHLER, K. 2013. Integrated waste management in batch chemical industry based on multi-objective optimization. Journal of the Air & Waste
Management Association, 63, 349-366. ROLF BRETZ & FRANKHAUSER, P. 1996. Screening LCA for Large
Numbers of Products. International Journal of Life Cycle Assessment, 1, 139-146.
SHENOY, U. V. 1995. Heat exchanger network synthesis: process
optimization by energy and resource analysis, Houston, Gulf Publishing Co.
SMITH, R. 1995. Chemical process design, New York, McGraw Hill. STEINMEYER, D. 2000. Energy Management. Kirk-Othmer Encyclopedia of
Chemical Technology.
SUGIYAMA, H., FISCHER, U., HUNGERBUHLER, K. & HIRAO, M. 2008a. Decision framework for chemical process design including different stages environmental, health, and safety assessment. Aiche
Journal, 54, 1037-1053. SUGIYAMA, H., FUKUSHIMA, Y., HIRAO, M., HELLWEG, S. &
HUNGERBUHLER, K. 2005. Using standard statistics to consider
BIBLIOGRAPHY 187
uncertainty in industry-based life cycle inventory databases. International Journal of Life Cycle Assessment, 10, 399-405.
SUGIYAMA, H., HIRAO, M., FISCHER, U. & HUNGERBUHLER, K. 2008b. Activity Modeling for Integrating Environmental, Health and Safety (EHS) Consideration as a New Element in Industrial Chemical Process Design. Journal of Chemical Engineering of Japan, 41, 884-897.
SZÏJJARTO, A., PAPADOKONSTANTAKIS, S., FISCHER, U. & HUNGERBÜHLER, K. 2008. Bottom-up modeling of the steam consumption in multipurpose chemical batch plants focusing on identification of the optimization potential. Industrial & Engineering
Chemistry Research, 47, 7323-7334. TAN., R. R., CULABA., A. B. & PURVIS, M. R. I. 2002. Application of
possibility theory in the life-cycle inventory assessment of biofuels. International Journal of Energy Research, 26, 737-745.
TURTON R, BAILIE RC, WHITING WB & JA, S. 1998. Analysis, synthesis,
and design of chemical processes, New Jersey. VAKLIEVABANCHEVA, N., IVANOV, B. B., SHAH, N. & PANTELIDES,
C. C. 1996. Heat exchanger network design for multipurpose batch plants. Computers & Chemical Engineering, 20, 989-1001.
VANDECASTEELE, C., VAN CANEGHEM, J. & BLOCK, C. 2007. Cleaner production in the Flemish chemical industry. Clean Technologies and
Environmental Policy, 9, 37-42. WERBOS, P. J. 1990. ECONOMETRIC TECHNIQUES - THEORY VERSUS
PRACTICE. Energy, 15, 213-236. WERNET, G., MUTEL, C., HELLWEG, S. & HUNGERBUEHLER, K. 2011.
The Environmental Importance of Energy Use in Chemical Production. Journal of Industrial Ecology, 15, 96-107.
WERNET, G., PAPADOKONSTANTAKIS, S., HELLWEG, S. & HUNGERBUHLER, K. 2009. Bridging data gaps in environmental assessments: Modeling impacts of fine and basic chemical production. Green Chemistry, 11, 1826-1831.
WILLMOTT., C. J., ROBESON., S. M. & MATSUURA, K. 2012. A refined index of model performance. International Journal of Climatology, 32, 2088-2094.
YOUDEN, W. J. 1950. INDEX FOR RATING DIAGNOSTIC TESTS. Biometrics, 6, 172-173.
ZADEH, L. A. 1999. Fuzzy sets as a basis for a theory of possibility. Fuzzy
Sets and Systems, 100, 9-34.