the conjunction of process and spectral data for enhanced fault detection

41
Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England www.ncl.ac.uk/cpact/ The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Upload: urbano

Post on 05-Jan-2016

40 views

Category:

Documents


2 download

DESCRIPTION

The Conjunction of Process and Spectral Data for Enhanced Fault Detection. Elaine Martin Centre for Process Analytics and Control Technology University of Newcastle, England www.ncl.ac.uk/cpact/. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Elaine MartinCentre for Process Analytics and Control Technology

University of Newcastle, England

www.ncl.ac.uk/cpact/

The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Page 2: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Motivation

It is conjectured that there may be factors relating specifically to a process that cannot be identified from the spectroscopic measurements that could be described by the process data or vice versa.

Consequently one way to enhancing prediction accuracy and process performance and fault detection is through the integration of process and spectral data.

The aim of the subsequent studies was to investigate the combined power of spectral and process data.

Page 3: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Overview

Process Modelling

Fermentation Process• Spectral Data• Spectral and Process Data

Process Monitoring and Fault Detection

Polymer-resin Manufacturing• Process Data• Process and Spectral Data

Page 4: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Challenges in the Monitoring of Fermentation Processes

Fermentation is a process in which micro-organisms convert chemical species to products of higher value.

On-line information relating to the progression of the process is not easily attained.

Near Infrared and Mid Infrared spectroscopy have been applied for the monitoring of fermentation processes.

The successful implementation of these spectroscopic approaches necessitates the application of appropriate multivariate data analysis techniques, such as partial least squares (PLS).

Page 5: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Experimental Data Set

The industrial pilot-plant scale Streptomyces fermentation process involves two stages:

Seed stage Final stage

The seed stage materialises in the generation of biomass. The starting ingredients include carbohydrate, soya protein,

vegetable oil and trace elements in water.

The biomass is transferred to the final stage for the production of the desired product.

The final stage is a fed batch process lasting approximately 140hrs.

NIR measurements were collected for the final stage of the process.

Page 6: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Spectra Data Acquisition

The NIR spectral data were recorded using a Zeiss Corona 45

Page 7: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Description of the Data Set

Final stage data from 7 standard batches and 7 Design of Experiment batches form the basis of the subsequent analysis.

Data collected included on-line process data, off-line data, biochemical and NIR measurements.

Page 8: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Methodological Summary

Pre-processing of the spectral data set First derivates Splining

Segmented wavelength region selection

Global modelling – Linear PLS, Neural Network PLS, Quadratic PLS

Local modelling - Linear PLS, Neural Network PLS, Quadratic PLS

Bagging of the models Linear partial least squares Averaging

Page 9: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Data Pre-processing

The NIR data (Zeiss Corona NIR) were recorded every 15 minutes and the first derivatives were taken.

Since only ten values of titre were recorded, a spline was fitted to the data.

The splined titre values were aligned to the 550 spectral values for each batch.

The range utilised for both the spectral and quality data was 43.75 to 125 log hours.

Page 10: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Data Pre-processing

0 50 100 150 200 250 300 350 400 450 500 550

0

0.2

0.4

0.6

0.8

1

Time points

Titr

e va

lue

s

Real and splined values for batch 88

Batch 88

0

0.2

0.4

0.6

0.8

1

1.2

30 50 70 90 110 130 150

LH

Titre

value

s

Page 11: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

NIR Data and First Derivatives

10 20 30 40 50 60

-0.01

0

0.01

0.02

0.03

0.04

0.05

Wavelength/nm

log

(1/R

)

Stack Batches 2088 to 2100

1000 1100 1200 1300 1400 1500 1600

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

Wavelength/nm

log(1

/R)

Stack Batches 2088 to 2100

NIR Data First Derivative

Page 12: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Spectral Window Selection Algorithm

N

Select training and validation batches

Mean centre and take derivatives of the spectral

data

Generate random centres and widths

Build model ‘input’ matrix eliminating

common data. Generate PLS model Calculate RMS errors

Generate random changes to centres

and widths

Apply the randomchanges to the

current centres and widths

Build new input matrix,

generate model and calculate RMS errors

Has the RMS on training

data decreased?

Has number of iterations been

exceeded and there are more models to

build ?

Present the final bagged model

N

Y

Y

Page 13: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Spectral Window Selection Algorithm

0 50 100 150 200 250 300 350 400 450-0.04

-0.03

-0.02

-0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

Wavelength

Centre

Width

Generate random increment in centre and width

Centre

Width

Update the centre and width

Take another step with theCentre and Width increment

Step too far. The prediction error has increased. Go back to where we were.

Generate a new increment in centre and width and continue search

Has the prediction error decreased?Yes, then a step in the right direction

Page 14: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Benefits of the SWS Algorithm

SWS offers the opportunity to consider not only the extremes of a single wavelength and the full set but also restricts selection to multiple sub-sets of the full set.

Finds the ‘best’ possible models for the product concentration and the biochemical components.

Finds the ‘best’ wavelength range from which these models can be built.

Page 15: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Bagging

SWS does not provide a unique model.

To obtain a more robust model, bagging is implemented.

‘Resample and Combine’ method or ‘bagging’ is an algorithm that helps improve the robustness of models by combining predictions

from different models.

Page 16: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Bagging of Models

30 models were generated by changing the initial random seed of the wavelength selection algorithm.

Bagging was applied to the 30 models:

The average value was calculated from the output of the 30 models.

A PLS model was fitted between the real and fitted values to give a weighted average.

X Y

Page 17: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Global and Local Modelling

0 100 200 300 400 5000

0.1

0.2

0.3

0.4

0.5

0.6

0.7Full data set of batch 133

0 50 100 150 200 2500

0.1

0.2

0.3

0.4

0.5First half from the data set of batch 133

0 50 100 1500.46

0.48

0.5

0.52

0.54

0.56Second half from the data set of batch 133

Apply Global Modelling

Apply Local Modelling

Page 18: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Total Sugar for 2088

-1 34 69 104 139

LH

To

tal S

ug

ar

Free glucose for 2088

-1 34 69 104 139

LH

Fre

e g

luco

se

Soluble Phospahe for 2088

-1 34 69 104 139

LH

So

l P

ho

sp

hate

2 critical points at 70 and 100 hours were identified from plots

of the biochemical data

Local Modelling

Page 19: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Local ModellingcTitre against Log Hours

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

42 52 62 72 82 92 102 112 122 132 142

LH

cT

itre

MS2088 MS2090 MS2092 MS2094 MS2096 MS2098 MS2100

First Time Interval Second Time Interval Third Time Interval

Page 20: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Local Modelling Approach

Three time regions for both the spectra and the quality variable values (titre) were selected.

Samples up to 70 log hours, i.e 175-280 sample points.

From 70 log hours to 100 log hours, i.e 280-400 sample points.

From 100 log hours up to the end of the chosen window, i.e. 400-500 sample points.

180 190 200 210 220 230 240 250 260 270 280

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

Time points

Titr

e va

lue

s

Titre Values 2088 to 2100 for Range[175 280]

10 20 30 40 50 60

-0.01

-0.005

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Wavelength/nm lo

g(1/

R)

Stack Batches 2088 to 2100 for samples 175 to 280

Page 21: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Local Modelling Approach

400 410 420 430 440 450 460 470 480 490 500

0.9

0.95

1

1.05

1.1

1.15

1.2

1.25

Time points

Titr

e va

lue

s

Titre Values 2088 to 2100 for Range[400 500]

10 20 30 40 50 60

-0.01

0

0.01

0.02

0.03

0.04

0.05

Wavelength/nm lo

g(1/

R)

Stack Batches 2088 to 2100 for samples 400 to 500

280 300 320 340 360 380 400

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

Time points

Titr

e va

lue

s

Titre Values 2088 to 2100 for Range[280 400]

10 20 30 40 50 60

-0.01

0

0.01

0.02

0.03

0.04

Wavelength/nm

log(

1/R

)

Stack Batches 2088 to 2100 for samples 280 to 400

Region 2

Region 3

Page 22: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Results : Time Interval 1

0 50 100 150 200 250 300 350 400 450 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8 Training Data

Samples

Titr

e v

alu

es

0 50 100 150 200 250 300 350 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 Validation Data

Samples

Titr

e v

alu

es

Page 23: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Results : Time Interval 1

0 5 10 15 20 25 30 350

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

ERRORS EXPERIMENTAL

The RMS of the training set for models 1, 7 and 29 is large.

The RMS of the validation data set for models 1, 7 and 29 is small.

The RMS error for PLS Bagging is smaller than the error of each individual model

0 5 10 15 20 25 30 350

0.05

0.1

0.15

0.2

0.25

ERRORS VALIDATION

RMS error after PLS Bagging

Page 24: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Linear PLS – Region 1 (Wavelength Selection)

0 50 100 150 200 250 300 350 400 450 0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 Training Data Set - Performing PLS or Averaging

Samples

Titr

e va

lues

Predicted PLS Real Predicted Averaging

0 50 100 150 200 250 300 350 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 Validation Data Set - Performing PLS or Averaging

Samples T

itre

valu

es

Predicted PLS Real Predicted Averaging

Training Data Set Validation Data Set

Page 25: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Results : Time Interval 1

10 20 30 40 50 60 0

5

10

15

wavelengths

num

ber o

f app

eara

nces

Frequency of Appearances

10 20 30 40 50 60

-0.01

-0.005

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Wavelength/nm

log(1

/R)

Stack Batches 2088 to 2100 for samples 175 to 280

The wavelengths between 30 and 40 are selected most frequently.

Page 26: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Neural Network PLS – Region 2 (Wavelength Selection)

0 50 100 150 200 250 300 350 400 450 5000.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3Training Data Set - Performing PLS or Averaging

Samples

Titr

e va

lues

Predicted PLSReal Predicted Averaging

0 50 100 150 200 250 300 350 4000.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3Validation Data Set - Performing PLS or Averaging

SamplesT

itre

valu

es

Predicted PLSReal Predicted Averaging

Training Data Set Validation Data Set

Page 27: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Polynomial PLS – Region 3 (Wavelength Selection)

0 50 100 150 200 250 300 3500.9

1

1.1

1.2

1.3

1.4Validation Data Set - Performing PLS or Averaging

Samples

Titr

e va

lues

Predicted PLSReal Predicted Averaging

0 50 100 150 200 250 300 350 400 4500.8

0.9

1

1.1

1.2

1.3

1.4

1.5Training Data Set - Performing PLS or Averaging

Samples

Titr

e va

lues

Predicted PLSReal Predicted Averaging

Training Data Set Validation Data Set

Page 28: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Local Modelling : Training Data Set

0 50 100 150 200 250 300 3500.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3Training Data Set - All the Wavelengths- First Batch

Samples

Titre

valu

es

Real Predicted

0 10 20 30 40 50 60 70 80 90 1000.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75Training Data Set - Taking the Average-First Batch

Samples

Titre

valu

es

Real Predicted

100 120 140 160 180 200 220 2400.7

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1Training Data Set - Taking the Average

Samples

Titre

valu

es

Real Predicted

220 240 260 280 300 320 3401.06

1.08

1.1

1.12

1.14

1.16

1.18

1.2

1.22

1.24

1.26Training Data Set - Taking the Average-First Batch

Samples

Titre

valu

es

Real Predicted

Global Modelling Local Modelling

Global Modelling predictions

Local Modelling predictions

for time intervals 1, 2 and 3

Page 29: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Local Modelling : Validation Data Set

0 50 100 150 200 250 300 3500.2

0.4

0.6

0.8

1

1.2

1.4

1.6Validation Data Set - All the Wavelengths- First Batch

Samples

Titre valu

es

Real Predicted

0 50 100 150 200 250 300 3500.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2Training Data Set - Taking the Average-Local Model

Samples

Titre values

Real Predicted

1rst Time Interval2nd Time Interval

3rd Time Interval

Page 30: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Genetic Algorithm Results

10 20 30 40 50 60 0

5

10

15

wavelengths

num

ber

of

app

ea

ran

ces

Frequency of Appearances

10 20 30 40 50 600

5

10

15

20

25

30

wavelengths

num

ber

of

appeara

nces

Frequency of Appearances

Genetic algorithms provide the possibility of selecting individual wavelengths but potentially does not predict future samples well.

SWS Genetic Algorithms

Page 31: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

GA Results – Region 2

0 50 100 150 200 250 300 350 4000.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3Validation Data Set - Taking the Average

Samples

Titr

e va

lues

Real Predicted

0 50 100 150 200 250 300 350 4000.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3Validation Data Set - Taking the Average

Samples

Titr

e va

lues

Real Predicted

SWS Averaging Ga’S Averaging

RMS of Validation - SWS: 0.048 GAs:0.069

Page 32: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Genetic Algorithm Results

Time Interval 1 Time Interval 2 Time Interval 3

PLS Bagging

Average Bagging

PLS Bagging

Average Bagging

PLS Bagging

Average Bagging

SWS with Linear PLS 0.018 0.034 0.025 0.034 0.039 0.060

GAs with Linear PLS 0.018 0.018 0.023 0.024 0.037 0.038

TRAINING

Time Interval 1 Time Interval 2 Time Interval 3

PLS Bagging

Average Bagging

PLS Bagging

Average Bagging

PLS Bagging

Average Bagging

SWS with Linear PLS 0.045 0.049 0.059 0.048 0.095 0.058

GAs with Linear PLS 0.045 0.043 0.067 0.069 0.177 0.139

VALIDATION RESULTS

Page 33: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Summary of Results

GAs produced slightly better predictions for the training data set resulting in overfitting.

In the validation model, SWS combination with bagging for local modelling gave better results than the GA in combination with bagging.

Local modelling gives better results than global modelling.

SWS with bagging gives better results compared with the purported ‘one-shot wonder’ models.

Page 34: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Design of Experiment Data

Integration of Process and Spectral Data

Page 35: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Conjunction of Process and Spectral Data

In the later stages of the fermentation, the error in the calibration models was observed to be greater with offsets being present.

During this time, significant changes in the fermentation broth concentrations occur.

The offset can potentially be modelled by utilising other process information such as off-gas measurements.

Page 36: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Data Set and Aim

The aim is to infer product concentration and the biochemical components from the spectral data.

Working on the off-line, biochemical and NIR data for the design of experiment batches.

Changing conditions in experimental design:

• Temperature (°C) • pH• Sugar feed (gh-1)• Oil feed (%)

Page 37: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Conjunction of Process and Spectral Data

MODEL

SpectralΣ

+

Biochemical Concentration

-

Calibration spectral residuals

MODEL

Process DataΣ

+

Calibration Spectral Residuals

-

Innovations

First Step: Calculation of the calibration spectral residuals.

Second Step: Modelling of the calibration spectral residuals from the process data and the generation of the innovations.

Σ

Biochemical Concentration Predictions by Spectra

Residuals Prediction by Process Data

Final Product Concentrations Final Step: Prediction of the

product concentration

Page 38: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Conjunction of Process and Spectral Data

CER

CO2 Total

pH

OUR

Temperature

5 variables were considered to be the most important for the prediction of product concentration

200 400 600 800 1000 1200 1400 1600

Time Series Plot

5 pH

200 400 600 800 1000 1200 1400 1600

Time Series Plot

2 CER

200 400 600 800 1000 1200 1400 1600

Time Series Plot

3 CO2 Total

200 400 600 800 1000 1200 1400 1600

Time Series Plot

9 OUR

0 200 400 600 800 1000 120026.8

26.85

26.9

26.95

27

27.05

27.1

27.15

27.2

Page 39: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

0 500 1000 15000

0.2

0.4

0.6

0.8

1

1.2

1.4

Predicted train values

Conjunction of Process and Spectral Data

Predictions Residuals

0 500 1000 1500-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Residuals for training data set

0 50 100 150 200 250 300 3500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Predicted valid values

0 50 100 150 200 250 300 350-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25Residuals for validation data set

Page 40: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Final predictions of the product

0 50 100 150 200 250 300 350-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Real values, Predicted values and Final predicted values for valid

New residuals

• The off-set is reduced

• The residuals exhibit less structure and reflect noise

Conjunction of Process and Spectral Data

0 50 100 150 200 250 300 350-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25Residuals for the validation data set after adding process data

Page 41: The Conjunction of Process and Spectral Data for Enhanced Fault Detection

Centre for Process Analytics and Control Technology (CPACT)University of Newcastle, UK

Conclusions

A Spectral Window Selection (SWS) algorithm has been proposed to select a window of wave numbers.

Multiple models are ‘bagged’ to produce a more robust model.

SWS produces better results than when the complete wavelength region is included.

Process data was combined with spectral data to eliminate offsets.

The wavelength selection-bagging approach in combination with the process data is now under investigation.

The results to date are promising.