introduction to time-series analysis with pi system data...yeast can easily form a membrane that...

54
#PIWorld ©2018 OSIsoft, LLC Introduction to Time-Series Analysis with PI System Data Prof. Brian D. Davison Prof. Martin Takáč

Upload: others

Post on 24-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Introduction to Time-Series Analysis with PI System Data

Prof. Brian D. Davison

Prof. Martin Takáč

Page 2: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

#PIWorld ©2018 OSIsoft, LLC

Presentation Outline

• Preliminaries: Sensors, Data set, and Sampling

• Task #1: Prediction of ADF

• Task #2: Modeling Cooling

• Task #3: PCA for Visualization

• Lessons Learned

Page 3: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

3

Preliminaries Sensors, Data set, and Sampling

Page 4: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Sensor data

4

Common problems:

• missing values (wrong reading from sensor/communication issue)

• a sensor fails and gives wrong readings

• frequent readings of sensor values (e.g., each second/minute)

Page 5: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Brewery Dataset

5

source: wikipedia

Fermenter

Temperature

sensors top

middle

bottom

Pressure

sensor

… and many more sensors…

and even more fermenters...

Page 6: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Brewery Dataset

• From a company that produces more than two dozen craft beers

• Different beers can use different yeasts during fermentation and different fermenting temperatures

6

Stages of beer fermentation

From OSIsoft Users Conference 2017

OSIsoft: Introduction to Process

Optimization and Big-Data Analytics

with the PI System

https://www.youtube.com/watch?v=9Ad

vKkyTeBE

Page 7: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Formally, a time series

7

• is an ordered sequence of values of a variable at equally spaced time intervals

• can be built on top of data obtained from sensors

– choose the size of time interval (e.g., 1 hour)

– represent all measurements in each hour by one value (e.g., average value / first value / last value …)

– we obtain a collection of values (e.g., one number per hour)

– the PI system can extract such values from collected data

• has many applications

– in forecasting, monitoring, or feedback and feed-forward control

Page 8: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Multiple sources and “sampling” frequencies

8

Real life problem = multiple sensors

Manual = infrequent

Page 9: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

An illustrative Example

9

bottom

middle

top temperature sensor

Before doing analytics, we need to align the data streams (in this example, we could choose a multiple of one hour)!

Page 10: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

10

Task #1 Prediction of ADF

Page 11: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

The Problem Setting

11

• Automated measuring of gravity of wort is expensive

• The brewery measures it manually (and infrequently)

• It is used to compute Apparent Degree of Fermentation (ADF)

which is an important indicator of the status of fermentation

(Note: ADF indicates how much sugar has been converted to alcohol)

Our task: Predict ADF as well as possible eliminate a lot of manual work

improve quality of beer

decrease cost of fermentation

Page 12: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Data and Challenges

12

• We can use the data from all sensors to predict ADF BUT

• We have only a few ADF values per fermentation cycle (and moreover it is not computed in “regular” intervals)

Page 13: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

The First Approach

13

• Let’s fix one beer brand, and plot our ADF points: x-axis - Time from start of fermentation y-axis - ADF

MyData <- read.csv(file="regression.txt")

MyData$time <- MyData$time/60/60

plot(MyData$time, MyData$adf, xlab =

"Time Since Fermentation [hours]",

ylab = "ADF", main = "Data without outliers")

Page 14: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Simple Linear Regression Model

14

response (dependent)

variable

explanatory (independent)

variable

intercept slope

errors of the

response

variables should

be uncorrelated

with each other

Page 15: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Linear Fit

15

Estimate

(Intercept) -0.0598497

Time 0.0159091

each hour, ADF

increases by

~0.016

plot(FilteredTimes, FilteredADF, xlab = "Time Since

Fermentation [hours]", ylab='ADF', main="ADF ~

Time")

linMod <- lm(FilteredADF ~ FilteredTimes)

abline(linMod$coefficients)

summary(linMod)

Page 16: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Checking Assumptions with a Residual Plot

16

errors should be

independent of

predicted values

Our model is NOT ok!

predicted <- fitted(linMod)

errors <- resid(linMod)

plot(predicted, errors, main = "Residuals vs. predicted

ADF")

Page 17: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

What was wrong? How to improve it?

17

• We tried our luck with a simple linear model

• The model works OK for the data in the middle of our range, but is failing for small and large values of ADF

• As data scientists, we need to understand our problem domain.

• We need to learn about beer fermentation!

Page 18: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Understanding the Fermentation Process (a bit better)

18

• What influences the speed of fermentation?

– temperature (we are cooling the wort, so this is relatively constant)

– amount of yeast

– amount of sugar and ethanol level

Gilliland, R.B., 1962. Yeast

reproduction during fermentation.

Journal of the Institute of Brewing,

68(3), pp.271-275.

Page 19: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Speed of Fermentation

19

time http://wineserver.ucdavis.edu/industry/enology/fermentation_management/wine/problem_fermentations.html

sugar

content

One degree Brix

is 1 gram of

sucrose in 100

grams of solution

cells are adapting to the wort conditions and

engaging in cell division

fastest rate of fermentation, ethanol

level is relatively low

The rate slows because ethanol in

the medium forces an adaptation

of the plasma membrane. The

yeast can easily form a

membrane that depends upon

ethanol replacing water for its

structure and functionality…

Page 20: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

New (improved) model

20

• We now understand that there are different rates of fermentation.

• How about fitting 3 linear models? We would also need to figure out the break-points.

parameters to estimate

Page 21: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

intercept above zero

initial slope

Fitted Model

21

increase of slope: new

slope is 0.018011

decrease of slope: new

slope is 0.003764

break-points

xx <- FilteredTimes

yy <- FilteredADF

dati <- data.frame(x = xx, y = yy )

out.lm <- lm(y ~ x, data = dati)

o <- segmented(out.lm, seg.Z = ~x,

psi = list(x = c(10,40)),

control = seg.control(display = FALSE))

dat2 = data.frame(x = xx, y = broken.line(o)$fit)

o

Meaningful coefficients of the linear terms: (Intercept) x U1.x U2.x 0.005924 0.009560 0.008451 -0.014247 Estimated Break-Point(s): psi1.x psi2.x 14.57 45.61

Page 22: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Fitted Model

22

ggplot(dati, aes(x = x, y = y)) + xlab("Time Since

Fermentation [hours]") +

ylab('ADF') +

geom_point() +

geom_line(data = dat2, color = 'blue')

Page 23: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Checking Assumptions with a Residual Plot

23

errors <- resid(o)

plot( predicted, errors, main = "Residuals vs. predicted

ADF" )

Page 24: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

24

Task #2 Cooling prediction

Page 25: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

The Problem Setting

25

• After the fermentation (and possibly Free Rise or Diacetyl Rest) is finished, the beer has to be cooled almost to a freezing point

• The data contains temperature measurements at 3 different locations

Top

Middle

Bottom

Page 26: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Typical Cooling Profile

26

In the ideal case, the

beer is cooling nicely

and all sensors read

similar values

Page 27: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Atypical Cooling Profile

27

Top

Middle

Bottom

Page 28: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

All together

28

Page 29: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Issues with data

29

• For each time point, we know in

which state of fermentation the

beer is (colors on graph)

• We also know about other sensors,

such as the cooling valves

• During fermentation, we try to

keep temperature constant

• The cooling is achieved by three

cooling valves (usually open 100%)

• However, there are many inconsistencies

Fermentation

Free rise

Diacetyl

Rest Cooling

Page 30: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Issues with data

30

Cooling

labeled as

Diacetyl Rest

Page 31: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Issues with data

31

Page 32: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

• We will ignore the “label” of the fermentation “stage”

• We define a cooling period simply as a uninterrupted interval with 100% cooling capacity

• Moreover, we will be interested in the data until the temperature reaches ~32F

Preprocessing

32

Free rise Cooling

Page 33: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Filtered cooling profiles

33

● different initial

temperature for

various coolings

Page 34: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Volumes

34

● various volumes of

beer in fermenter

Page 35: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Thermodynamic heat transfer

35

• how the temperature will change in a small time interval ?

• real answer is hard and involves solving PDFs, …..

• we can try to approximate it

New temperature

Old temperature

Temperature

of “cooling”

substance

Cooling Rate ( <0 ) Volume

Page 36: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Some simplifications

36

Page 37: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Time difference = 30 min

37

Parameters:

Estimate Std. Error t value Pr(>|t|)

alpha -17.1918 0.3134 -54.86 <2e-16 ***

beta 30.4543 0.2427 125.49 <2e-16 ***

The cooling

substance is

~30.4F

d <- read.csv(file="UC2_TS_30.dat")

yOld <- d$o1

y <- d$n1

vol <- d$v

m <- nls(y ~ (1+a/vol)*yOld - a*b/vol)

summary(m)

Page 38: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Residuals

38

we have more data when it is cooler

predicted <- fitted(m)

errors <- resid(m)

plot(predicted, errors) Residuals vs. Predicted Temperature

Page 39: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Real cooling profile vs. Predicted cooling profile

39

real, measured data

computed from initial

temperature and volume

nd <- read.csv("UC2_Prediction.dat")

x<-seq(0,58*4,1)*0.25; y<-x*0

y[1] <- nd$m1[1]; vol <- nd$v[1]

alpha <- summary(m)$coefficients[1]

beta <- summary(m)$coefficients[2]

for (i in 2:length(x)) {

y[i] <- (1+alpha/vol)*y[i-1] - alpha*beta/vol

}

plot(nd$t/60, nd$m1, xlab = "Time Since Cooling [hours]",

ylab='Temperature [F]', type='l', col="red")

lines(x, y, type="s", col="blue")

Page 40: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

40

Task #3 Principal component analysis for

visualization

Page 41: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Principal component analysis (PCA)

41

• PCA performs dimensionality reduction and de-correlation of dimensions

• The “real” data can lie within a smaller dimensional subspace

• PCA is useful for visualization and pre-processing

• Detect patterns and outliers

• Produce fewer and uncorrelated dimensions

• Good for data with relatively few labels

Page 42: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Data preprocessing

42

• We have a lot of data from sensors; it is a bit tricky to use PCA directly (too many points, missing data)

• More importantly, we want to see when cycles might be same or differ

• We represented each fermentation cycle by 31 features, like

– highest ADF

– mean/min/max temperature during fermentation

– volume

– mean/min/max temperature during cooling

– duration of fermentation

– duration of cooling

– …...

Page 43: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Visualization in 3D

43

how much variance is captured by the

ith principal component • We scale the data and compute the principal components

• We project the data onto two or three principal components that represent the most variance and plot them

# apply PCA - scale. = TRUE is highly

# advisable, but default is FALSE.

beers.pca <- prcomp(beers.data,

center = TRUE,

scale. = TRUE)

plot(beers.pca, type = "l")

Page 44: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Visualization in 3D

44

• "Realtime Hops"

• "Red Wonder"

• "Alistair"

• "Kennedy"

• "Kerberos"

# Predict PCs

beers.projected <- predict(beers.pca,

newdata=beers.data)

scatterplot3d(beers.projected[,1],

beers.projected[,2],

beers.projected[,3],

color = beers.labels,

main = "PCA")

Page 45: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Similar members

45

Page 46: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Outliers

46

Page 47: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

47

Lessons Learned

Page 48: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

RESULTS CHALLENGE SOLUTION

COMPANY and GOAL

48

Lesson Learned

Predicting Apparent Degree of Fermentation (ADF)

Linear model of ADF as a function of time is insufficient

Use domain knowledge to develop a better model

Piecewise linear model is much improved over basic linear model

• Existence of three phases of fermentation suggests three segments in the linear model

• Errors are not correlated with predicted values

• Normality test values are better (not presented)

• Could consider other attributes

• ADF starts negative

• Errors are correlated with predicted values (BAD!)

• Normality test values are not near model (not presented)

Page 49: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

RESULTS CHALLENGE SOLUTION

COMPANY and GOAL

49

Lesson Learned

Predict cooling of fermented beer

Want to predict cooling trend, completion times

Basic thermodynamics function for cooling

Basic cooling model works well

• Use data to find values of constants

• Errors are not correlated with predicted values

• Could build complex model

• Many attributes available, including starting temperature, volume, and time

Page 50: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

RESULTS CHALLENGE SOLUTION

COMPANY and GOAL

50

Lesson Learned

Visualizing high-dimensional data

Want to see similarities and differences between runs

Use PCA on run-focused data Can see outliers and patterns

• Represent situation of interest

• Identify reduced dimensionality

• Some beers are consistent

• Others include outliers

• Data in form of time series with many attributes

• Humans need at most 3-D data

Page 51: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

References for further study

• Gebhard Kirchgässner and Jürgen Wolters, Introduction to Modern Time Series Analysis, Springer Berlin Heidelberg New York, 2007.

• Peter J. Brockwell and Richard A. Davis, Introduction to Time Series and Forecasting, 2nd ed., Springer Texts in Statistics, Springer Science+Business Media, LLC, New York, NY, USA, 2002.

• James D. Hamilton, Time Series Analysis, Princeton University Press, Princeton, NJ, USA, 1994.

• NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/, 2018.

• http://www.r-tutor.com

• http://wineserver.ucdavis.edu/industry/enology/fermentation_management/wine/problem_fermentations.html

51

Page 52: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Contact Information

Brian D. Davison [email protected]

Associate Prof., Lehigh University

Slides and R source code available from:

http://www.cse.lehigh.edu/~brian/PIW18/ Lehigh University's Linderman Library

Page 53: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

Questions?

Please wait for

the microphone

State your

name & company

Please rate this session

in the mobile app!

Search

“OSIsoft” in

your app store

53

Page 54: Introduction to Time-Series Analysis with PI System Data...yeast can easily form a membrane that depends upon ethanol replacing water for its structure and functionality… #PIWorld

#PIWorld ©2018 OSIsoft, LLC

54