mtb 07 run status charts and advanced...

M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts rev: 2020-02-27 1

MTB – 07 Run Status Charts

and Advanced Concepts

This module will cover:• Step by Step creation of a Run Status Graph

• Multiple Capability Analysis– Screening many processes to see which need follow up

• Creating “Stoplight” charts

• “Growth with Seasonality” – A worked example of data analysis of a challenging time oriented dataset

• “Pi by Cannonball” - A Worked example to show:

– The use of “sub” macros to make code more readable

– Obtaining input from users

– Looping

– Use of constants especially w.r.t. graphing

– Adding calculated lines to graphs

– Creating a row number variable

– The use of subscripts to access a specific row’s value

• Performance vs Model – how to detect improvement in the face of confounding variables – eg. Energy performance


Multi-Capability:

A Macro with explanation for viewing the Process Capability of dozens of

processes simultaneously


Multi-Capability

Minitab® has good capability to summarize the Capability of a

Process. For example consider this process:

126124122120118116114

LSL Target USL

LSL 114

Target 120

USL 126

Sample Mean 119.932

Sample N 187

StDev(Overall) 1.98672

Process Data

Pp 1.01

PPL 1.00

PPU 1.02

Ppk 1.00

Cpm 1.01

Overall Capability

% < LSL 0.00

% > USL 0.00

% Total 0.00

Observed Performance

% < LSL 0.14

% > USL 0.11

% Total 0.25

Exp. Overall Performance

Process Capability of Prop

This process is almost exactly on aim.

This process is on the borderline of being capable since the Ppk is 1

In this example, a Standard Deviation of 2 would mean that the Pp would be exactly 1 since the distance between specification limits would be exactly 6 times the Standard Deviation of 2 i.e. 12. Tolerance is the

distance between Upper and Lower Specification Limits - 12 in this example.

I’ll term the Standard Deviation which is exactly 1/6th of

the Tolerance to be the Critical Standard Deviation or Sc


Multi-Capability

How do we compare the Capability of dozens of Processes in one view? We could be dealing with different aims, specification limits, etc. What we need to do is to compare different processes on a common scale.

Using the definition of Critical Standard Deviation or Sc = 1/6th of the Tolerance, then we can “Normalize” the Mean and Standard Deviation of a Process. (We are assuming symmetrical limits i.e. the Target is midway between the LSL and USL).

– XbarN = Normalized Average= ( Xbar – Target ) / SC Where SC = (USL – LSL)/6

= distance from Target in units of Critical Standard Deviations

– SN = Normalized Standard Deviation= S/SC


Multi-Capability

Take this example process:

126124122120118116114

LSL Target USL

LSL 114

Target 120

USL 126

Sample Mean 118.31

Sample N 187

StDev(Overall) 1.62473

Process Data

Pp 1.23

PPL 0.88

PPU 1.58

Ppk 0.88

Cpm 0.85

Overall Capability

% < LSL 0.00

% > USL 0.00

% Total 0.00

Observed Performance

% < LSL 0.40

% > USL 0.00

% Total 0.40

Exp. Overall Performance

Process Capability of Prop

For this example,

Sc= 1/6 * (126 – 114) = 2

– XbarN= ( Xbar – Target ) / SC

= (118.31 – 120) / 2

= - 0.845

– SN

= S / SC = 1.625/2 = 0.812


XBarN

Sn

3210-1-2-3

2.0

1.8

1.6

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

>

–

–

–

< 0.0

0.0 1.0

1.0 1.3

1.3 2.0

2.0

Ppk

3

2.0

1.8

1.6

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Contour Plot of Ppk vs Sn, XBarN

Multi-Capability

The next question is how to display the capability of

many processes on one graph. Consider a graph like:

This is not a novel approach, but it does provide a

means to compare many processes at once.

You can see where the process shown on the previous slide (XbarN = - 0.845, SN = 0.812) is situated.

Its Ppk of 0.88 is in the “incapable” region.

Along this line Ppk = 1

If you take pairs of XbarN and SN and calculate Ppk, you can create the contour plot shown.


Multi-Capability

Here is a

graph

which

shows the

capability

for 12

processes:

… but what

if you had

many more

to compare

at one

time?


Multi-Capability

Here 27

processes

are

compared on

one graph

using the

“Normalized”

Xbar and S

view:

XBarN

Sn

3210-1-2-3

2.0

1.8

1.6

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

>

–

–

–

< 0.0

0.0 1.0

1.0 1.3

1.3 2.0

2.0

PpkRegion

3

2.0

1.8

1.6

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

HLI_A 123

HLI_B423

HLI_X37

KRK_A 123

KRK_B423

NRM_A 123

NRM_B423

NRM_X37

RGG_A 123

RGG_B423

BRK_A 123

RGG_X37

SO C _A 123

SO C _B423

SO C _X37

V A P_A 123

V A P_B423

V A P_X37

BRK_B423

BRK_X37

C O L_A 123

C O L_B423

C O L_X37

GA V _A 132

GA V _B423

GA V _X37

VarName

Plot of Normalized Std Dev., Mean vs Ppk

Last Updated: 2012-02-20 17:09

This process is nearly on aim and has very good process capability

This process is off aim (low) but has a low enough standard deviation that it could be marginally capable by just getting the process on aim!


Multi-Capability

So how were the graphs generated? … by a

complex Minitab® macro.

This presentation will work through the macro

as a learning exercise…

You will see screen shots from my favourite

text editor – EditPadPro. The next slide shows

some of the features.


Multi-CapabilityMultiple files can be open at once. You can search all open files, easily copy text from one file to another…

Syntax colouring (eg. Green italics for comments) makes the code more readable and lets you know you’ve spelled keywords correctly.

The file navigation window shows you the structure of your file and lets you jump to specific points in your file.

The clip collection lets you store text that you can quickly insert into you text by dragging or double clicking.

Extensive Search & Replaceincluding use of Regular Expressions.

Split window editing (not shown) is supported.

“Folding” allows sections of code to be hidden.


Multi-Capability

Etc.

INSTRUCTIONS in the header comments:


Multi-Capability

Overall structure of macro:

The main macro calls 5 sub-macros.

The call to this sub-macro would be commented out (or deleted) when analyzing “real” data.


Multi-Capability

Overall structure of the

macro can also be seen

from the File Navigation

Panel in my “fancy” text

editor - EditPadPro:

The main macro calls 5

sub-macros.

DO Loops

Layout command used to create a 2 panel graph

Layout command used to create a 2 layer graph. (Contour Graph does not show up in File Navigation Panel)


Multi-Capability

Example data

creation…

This creates

27 columns of

data.

A “real”

analysis would

not use this

sub-macro.

Code continues creating 27 columns of data…


Multi-Capability

The last section of this sub-macro creates the table of

column numbers, aims and tolerances…

The set command allows you to enter data into a column…


Multi-Capability

The capability command code in this section has been “folded” so you can see the overall structure… The code is on the next page…

These columns receive the output of the Capability Command.

These columns store the result of capability analysis on one column in a table of results.

The DO loop will need to know how many columns of data there are to analyze

K81 is the loop counter so we are defining parameters which can be used when calling the Capability Analysis

The output from the analysis is stored, for the capability analysis just completed.


Multi-Capability

Here is the code executed inside the loop

shown on the previous page…


Multi-Capability

Here is the worksheet after the fourth trip

through the loop:

These columns receive the output of the Capability Command. These columns are

where the table of results is created.


Multi-Capability

The upper graph in the two panel graph has a max. value of 2 so create a variable that will plot as 2 if the value is larger.

Create classifications of the Ppk for the plot

We may be creating several pages of the two panel graph so we want the variable names to appear in alphabetical order so we need to sort all the associated columns (in place).


Multi-Capability

Minitab® macro code has limitations so sometimes work-arounds are required. We want to stack all the columns of raw data but how to deal with a variable number of columns?

The work-around is to store all the column numbers in constants. Once there are no more constants, the constants are defined as 205 so they we point to a column just a single missing data point.

Stack the (up to) 60 columns worth of data.

A work-around needed from the work-around!


Multi-Capability

After all the code has executed, you are left with a table of statistics for each column in columns 210-219 and the stacked data in columns 220+. For the example data the stacked data columns are 5044 rows long.

These two columns are created by the stack command.


Multi-Capability

As described earlier we can calculate “Normalized” Mean and Standard Deviation now that we have our table of statistics.

We also calculate a page number since our two panel graph will be limited to 12 variables per page.


Multi-Capability

The Convert command is what is call up interactively as Data > Code > Use Conversion Table.

We use it to add additional columns based on the “table” of information created earlier.


Multi-Capability

This code creates a constant with text like “Last Updated: 2012-02-25” which can be used as the sub-title for the graph. This is helpful for auto-updating graphs to make sure you know if the update has happened.

We create a graph with up to 12 variables per page. Max(Page) is the highest number page and this is used for the DO loop end value.

What would happen if our data only had one or two of the capability ranges of red / yellow / green? Then our graph would only use 2 plot symbols & they might not match red = incapable, etc.

The workaround?? Create 3 rows of data (for each page) that have each of the 3 status values but the Ppk value set to missing so they do not actually plot. We have to define values for some other columns so all the columns to be plotted are the same length. This workaround makes the code robust – i.e. able to handle variables with any combination of the three status values.


Multi-Capability

The rest of the DoPlot sub-macro:

The text editor’s ability to fold (hide) text has been used so that none of the subcommands for the 2 boxplots are shown. They can be seen on the next 2 slides.

We create a constant with the desired title including the page number (which is stored in constant K82).

We create a constant with the desired full file name / path including page number.

We store the graph as a JPEG file (which can be viewed from a web page if it has a suitable link). The replace sub-command is important if we want to run this macro automatically. Without it, if the graph already existed you would be prompted to ask if you want to replace it.


Multi-Capability

The top panel of the

graph layout of the type

shown on slide 6.

The graph is created

interactively and then

the code is copied into

the clipboard via Copy > Command Language. From there it is

tweaked with use of

constants, comments,

etc.

The figure sub command says here that this panel will occupy the top 30% of the overall graph area.


Multi-Capability

The bottom panel of

the graph layout of the

type shown on slide 6.

The Grouping vs StackMean StackOSStDev and StackPpk is a handy way to label the axis with useful information. We have used the “NoEmpty” subcommand so that we do not get all combinations of these variables!!

The figure sub command says here that this panel will occupy the bottom 70% of the overall graph area.


Multi-Capability

The Summary Plot

Sub-macro:

The text editor’s ability to fold (hide) text has been used so that none of the subcommands for the 2 graphs are shown. They can be seen on the next 2 slides.

This code creates a two dimensional array of XBar and S values which can be used to calculate Ppk values. Those Ppk values are then plotted via a contour plot. (See slide 7).

This time a layout is used not for 2 panels but to plot two graphs on top of each other. The contour plot shows the three Ppk regions and the (Scatter) Plot will place the symbols for each variable.


Multi-Capability

The Contour Plot:

This syntax says create tick values from -3 to 3 in steps of 1.


Multi-Capability

The Scatter Plot:


Multi-Capability

Summary:

– The capability of many processes can be graphically

summarized on a common basis by “normalizing” the

mean and standard deviation vs the Specification Limits.

– The macro code needed to do this looks daunting but can

be developed one step at a time.

– The macro code was developed in such as way that it

could easily be used in an auto-job providing a frequently

updated view of process capability, greatly streamlining

preparations for a process capability review.


Growth With Seasonality:

A worked example


Growth With Seasonality –

A Practical Analysis rev. 2012-02-15

This example is derived from a posting in the LinkedIn® Minitab®*

Discussion Forum by “Subodh” on 2012-02-09. I named his variable

“Revenue” to make the example seem more concrete. Here is his

quarterly data:

60544842363024181261

3500

3000

2500

2000

1500

1000

500

0

Index

Reven

ue

Time Series Plot of Revenue His posting was:

How I can use S-curve and Seasonality together?

In trend analysis, S-curve fits better. The data show seasonal pattern also. How to combine the two for decomposition and is it appropriate?

* “MINITAB® and all other trademarks and logos for the Company’s products and services are the exclusive property of Minitab Inc. All other marks referenced remain the property of their respective owners. See minitab.com for more information.”

“Portions of information contained in this publication/book are printed with permission of Minitab Inc. All such material remains the exclusive property and copyright of Minitab Inc. All rights reserved.”



A Practical Analysis

• First I created a couple of variables:

– Period:

The “Partial Sum” Function (Pars), adds the values up in this and all preceding rows. Thus if a column had 1 in each row, the Partial Sum in Row 1 would have a value of 1, in Row 2 a value of 2, etc.

‘Revenue’<>-1 is a condition that is always true i.e. has a value of 1 so the partial sum of this will essentially give us a row index value of 1 for Row 1, 2 for Row 2, etc.

This is easier than Calc > Make Patterned Data because we don’t need to know the number of rows involved.




– Quarter:

The “Modulus” Function (Mod), returns the remainder after integer division. e.g. Mod(11,4) returns 3 since 4 divides into 11 2 times with a remainder of 3.

‘mod('Period'-1,4)+1 returns to us the quarter for each data point (assuming the first data point is for a 1st

quarter). Mod(X,4) would return values of 0,1,2 and 3 so we add 1. The –1 is to have the first return a value of 1.

Our Worksheet so far: (It is assumed the first row represented a 1st

quarter!)




• Next I created a

fitted line plot

(Stat > Fitted Line Plot), requesting

a quadratic fit

and storing the

Residuals, Fits

and equation of

best fit

coefficients.

• I also requested

the “Four in one”

residuals plot.


Growth With Seasonality –A Practical Analysis

• Here is the output:

706050403020100

3500

3000

2500

2000

1500

1000

500

0

Period

Reven

ue

S 136.362

R-Sq 97.6%

R-Sq(adj) 97.5%

Fitted Line PlotRevenue = 271.7 - 8.227 Period

+ 0.8311 Period**2

5002500-250-500

99.9

99

90

50

10

1

0.1

Residual

Pe

rce

nt

3000200010000

600

400

200

0

-200

Fitted Value

Re

sid

ua

l

4002000-200

30

20

10

0

Residual

Fre

qu

en

cy

605550454035302520151051

600

400

200

0

-200

Observation OrderR

esid

ua

l

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Revenue

“Heteroscedasticity” is seen which means the variance of the residuals is not constant across all values of the Y variable. Here the variance of the residuals increases as Y increases. This violates one of the assumptions for regression analysis so we need to interpret the results with caution.

The fit to the quadratic line is

quite good with an Adjusted

R-Squared of 97.5%




• To make the residuals more consistent across values of Y, I

calculated a new variable “ResidPerCentOfFit” calculated as

('RESI1'/'FITS1')*100 or in other words, the residual as a

percentage of the fit value. These residuals now appear to be

consistent across time.

706050403020100

20

10

0

-10

-20

Period

Resid

PerC

en

tOfF

it

0

1

2

3

4

Quarter

Scatterplot of ResidPerCentOfFit vs Period

To generate this graph I used

Graph > Scatterplot and

chose “With Connect and Groups” from the gallery and

used. Once the graph was

generated I double clicked on

one of the connect lines and

erased the “grouping”

variable so there was only 1

connect line (instead of 4).

I also added a reference lineat 0.




• Another way to look at the new Residuals as a

Percentage of Fit is by splitting out the 4 quarters:

20

10

0

-10

-20

604530150

20

10

0

-10

-20

604530150

Quarter = 1

Period

Resid

PerC

en

tOfF

it

Quarter = 2

0

Quarter = 3 Quarter = 4

0

Scatterplot of ResidPerCentOfFit vs Period It looks like something

non random is

happening with the 4th

quarter results. They

have gone from below

“average” to above.

In fact if we do a linear

regression of these 4th

Quarter residuals, the P-

Value for the model is

0.000 and 74% of the

variability is explained by

the linear relationship!




• Another way to look at the data is a

boxplot by quarter for these residuals:

4321

20

10

0

-10

-20

Quarter

Resid

PerC

en

tOfF

it

0-1.59224

5.06087

1.24764

-4.46178

Boxplot of ResidPerCentOfFit

It looks like 1st

quarter results may

be low on average

by about 4% from

the fit and 3rd

quarter results

might be high by

about 5%. Is this

statistically

significant?




• To see if the quarters may be statistically

different* we can use a One Way ANOVA:

One-way ANOVA: ResidPerCentOfFit versus Quarter

Source DF SS MS F P

Quarter 3 790.7 263.6 2.64 0.058

Error 59 5889.8 99.8

Total 62 6680.6

S = 9.991 R-Sq = 11.84% R-Sq(adj) = 7.35%

Individual 95% CIs For Mean Based on

Pooled StDev

Level N Mean StDev ---------+---------+---------+---------+

1 16 -4.462 7.289 (---------*---------)

2 16 1.248 9.666 (---------*---------)

3 16 5.061 10.227 (---------*---------)

4 15 -1.592 12.313 (----------*---------)

---------+---------+---------+---------+

-5.0 0.0 5.0 10.0

* We must use caution in interpreting these results as we have created an indirect variable based on a model which had heteroscedasticity.

The p-value is on the

borderline of

significance at 0.06,

but know 4th

quarters exhibited

non random

behaviour. If we

exclude 4th quarters,

the p-value for the

one way ANOVA of

the 3 remaining

quarters is 0.018.




• Conclusions:

– A quadratic relationship provides a good

fit to the data

– The fourth quarter was initially lower

than “expected” and then became higher

than “expected”.

– The first quarter yields lower than

“expected” results by about 4% and third

quarters higher than “expected” by

about 5%.




Follow Up Questions on LinkedIn® Minitab®

Discussion Forum asked essentially:

1. If you choose the cubic instead of quadratic then

the R-sq(adj) is the same but the R-Sq is 0.1%

higher. Does it make much of a difference or

should you just apply the parsimony concept for

practicality?

2. Also in slide 5 from your website showed that it

does not fulfill the ANOVA assumption as it does

not have the equal variance (heteroscedasticity).

So, how much can ANOVA help in the analysis?

Hope this will help to clear my confusion.




• Question 1: Higher Order Model– Adding terms or factors to a model will increase

R-Sq but the desired model is the one that has the best predictive power. With a higher order (cubic with 4 terms vs quadratic with 3 terms) model you may well be fitting noise. The R-Sq(adj) takes into account the “degrees of freedom” for the model and gives a better measure of whether or not the model has improved.

– Generally the KISS (Keep it Simple Stupid) approach, or as you phrase it, parsimony is recommended. If the more complex model doesn’t appreciably improve the model stick with the simpler model.




• Question 2: Is ANOVA valid given heteroscedasticity?– If you look at slide 9 you will notice that the

ANOVA is not on the original variable but instead the “Residual As Per Cent Of Fit” variable which as the Graph on Slide 7 shows has reasonably uniform variance (but does have the linear trend in 4th quarter results).

– Stat > ANOVA > Test for Equal Variances… on ResidPerCentOfFit vs Quarter does not show an issue with unequal variances – i.e. the p-value is not low. This is even more evident if the non random 4th quarter results are excluded. (See Graphs next slide).




4

3

2

1

22.520.017.515.012.510.07.55.0

Qu

art

er

95% Bonferroni Confidence Intervals for StDevs

Test Statistic 3.80

P-Value 0.284

Test Statistic 1.33

P-Value 0.274

Bartlett's Test

Levene's Test

Test for Equal Variances for ResidPerCentOfFit

3

2

1

17.515.012.510.07.55.0

Qu

art

er

95% Bonferroni Confidence Intervals for StDevs

Test Statistic 1.78

P-Value 0.410

Test Statistic 0.46

P-Value 0.632

Bartlett's Test

Levene's Test

Test for Equal Variances for NonFourthQuarterPCResiduals

The P-Values are NOT low so

the Null Hypothesis of Equal

Variance does NOT have to go

– i.e. we do not have evidence

of unequal variance.



A Practical AnalysisAnother follow up to the analysis from Anton:

• Guys: keep it graphical, to start any analysis. Looking at the plots categorised by period:

1. This looks like an exponential growth situation, so dealing with the logs of the data is appropriate- not quadratics or cubics. Taking logs or using a log Y axis linearises most of the dataset nicely, so that supposition is correct.

2. The first few periods don't fit the pattern- they show lower growth than subsequent periods, and is this is the oldest data, so it's probably best to discard it. You can exercise your judgement about how much to discard, but possibly the first 7 or 8 periods looks about right.

3. The next strongest feature is the steady, strong gain of Q4 relative to the other 3 quarters.

4. Finally, there are a few sudden steps in the data: depending on the source of the data you may find explanations such as public holidays etc.




• Anton raises some excellent points. A statistical model

may help you fit past behaviour, but it doesn’t give you

insight as the the why for the behaviour. It is best to use

experimentation and analysis to gain insight into the

causes of the response. For example, in a heat transfer

situation your statistical model might confirm a first

principles model based on the physics surrounding heat

transfer. In the example presented if the response is

something like revenue then exponential growth is a more

likely underlying principle than quadratic growth. The

motion of a projectile might better fit a parabola and thus

a quadratic equation.

• Taking the log of our Y variable also makes it easier to

graphically interpret the data:


706050403020100

8.0

7.5

7.0

6.5

6.0

5.5

5.0

Period

Lo

g(R

eve

nu

e)

1

2

3

4

Quarter

Log(Revenue) vs Period



Using the

Calculator

function we

generate a new

column based on

Log(Revenue)

As Anton suggested, in this view the slower growth

in the first number of periods is highlighted.

The strange behaviour of the 4th

quarter from “underperforming” to

“overperforming” (assuming more

is better) is apparent.



A Practical Analysis• This refined analysis improves the fit of the model

slightly – RSq-Adj for a linear regression fit to the LogRevenue variable is 98.1% vs 97.5% and the residuals no longer show heteroscedasticity. Moreover extrapolating into the future (a dangerous but often helpful thing to do), this new model is likely to be more accurate based on the underlying principle of exponential growth.

• A minor point relative to Anton’s point No.2. Discarding the first number of points does little to change the analysis. The table to the right shows the R-Squared Adjusted for the Log(Revenue) linear regression fit after skipping the first N values.

• Lots of issues to ponder from 63 data points!


Estimating π by the

Cannonball Method

Pi By Cannonball

I’m too late for the celebration of Pi day

(3/14) but years ago I read a Scientific

American article (I think it was by Martin

Gardner) that had a whimsical way of

estimating Pi. It was a Monte-Carlo

method using the premise of firing

cannonballs into a circular lake…


Pi By Cannonball


(0,0)

(1,1)

(-1,1)(-1,-1)

(-1,1)

The area of square is 2 * 2 = 4 units

If a cannon shoots with equal probability of landing

anywhere inside the square then the ratio of “splashes” to total shots (“splashes” +

“thuds”) can be used to estimate Pi.

Pi By Cannonball


If r is < 1 then the cannonball lands in the lake and we have a splash.

If r is > 1 then we have a thud.

(x,y)

Using Minitab’s Calc > Random Data function we can bring this theory to life..

Lake

Land

Pi By Cannonball

• Here are some outputs from using this technique:


1.00.50.0-0.5-1.0

1.0

0.5

0.0

-0.5

-1.0

X

Y

0

1

Splash

Y vs X for each shot. Number of shots = 75

80706050403020100

4

3

2

1

0

4

3

2

1

0

Row

Esti

ma

teO

fPi

Sp

lash

3.14159265

Estimate of Pi, Splashes. (No. shots = 75)

Last Estimate: 3.30667

After the first shot, the estimate is either 0 or 4.

Pi By Cannonball


1.00.50.0-0.5-1.0

1.0

0.5

0.0

-0.5

-1.0

X

Y

0

1

Splash


200150100500

4

3

2

1

0

4

3

2

1

0

Row

Esti

ma

teO

fPi

Sp

lash

3.14159265


Last Estimate: 3.1

Pi By Cannonball


1.00.50.0-0.5-1.0

1.0

0.5

0.0

-0.5

-1.0

X

Y

0

1

Splash


300025002000150010005000

4

3

2

1

0

4

3

2

1

0

Row

Esti

ma

teO

fPi

Sp

lash

3.14159265


Last Estimate: 3.13733

Even when 150,000 shots were used, the estimate was in

error in the 3rd decimal point so this is not an efficient

method for estimating Pi !!!

Pi By Cannonball

• The code used to generate these graphs is shown in

the next slides as a learning example since they

illustrate several Minitab® macro programming

techniques:

– The use of “sub” macros to make code more readable.

– Obtaining input from users

– Looping

– Use of constants especially w.r.t. graphing

– Adding calculated lines to graphs

– Creating a row number variable

– The use of subscripts to access a specific row’s value

• The screen shots show the code as seen in my

favourite text editor – EditPad Pro.


Pi By Cannonball


Pi By Cannonball


Plan view of shots cont’d

Pi By Cannonball


Pi By Cannonball


EstimateAndSplashGraph cont’d

Performance vs Model

• Overview of our approach:

– Analyze past data to develop a model to explain

consumption

– Use this model to predict consumption for new

data

– Look at the difference between predicted and

actual consumption to see if improvement (or

backsliding) is present.

– The improvement can be monetized by

converting the delta between predicted and

actual consumption in $. (eg. Has my investment

in new windows in 2006-09 paid off?)



• We will develop our consumption model

from my home Natural Gas consumption

data for the period 2001-01 through 2006-

07:

– NATURALGAS Model Basis Data.MTW

• Lets start by some simple graphical

regression outputs.


NATURALGAS Model Basis Data.MTW



9008007006005004003002001000

500

400

300

200

100

0

DegreeDaysC

Mete

rsC

ub

ed

S 28.3173

R-Sq 95.3%

R-Sq(adj) 95.2%

Fitted Line PlotMetersCubed = 3.911 + 0.4419 DegreeDaysC

This model seems to miss the curvature in the consumption vs

degree days.


• Let’s investigate this apparent

curvature:



• Here is the output:Regression Analysis: MetersCubed versus DegreeDaysC

The regression equation is

MetersCubed = 3.91 + 0.442 DegreeDaysC

Predictor Coef SE Coef T P

Constant 3.911 5.619 0.70 0.489

DegreeDaysC 0.44191 0.01218 36.27 0.000

S = 28.3173 R-Sq = 95.3% R-Sq(adj) = 95.2%

Analysis of Variance

Source DF SS MS F P

Regression 1 1054827 1054827 1315.46 0.000

Residual Error 65 52122 802

Total 66 1106948


This model shows statistical evidence that MetersCubed

and DegreeDaysC are correlated.



100500-50-100

99.9

99

90

50

10

1

0.1

Residual

Pe

rce

nt

4003002001000

80

40

0

-40

-80

Fitted Value

Re

sid

ua

l

6040200-20-40-60

10.0

7.5

5.0

2.5

0.0

Residual

Fre

qu

en

cy

65605550454035302520151051

80

40

0

-40

-80

Observation Order

Re

sid

ua

l

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for MetersCubedResiduals

graphs:

The residuals show a pattern suggesting

curvature.



9008007006005004003002001000

500

400

300

200

100

0

DegreeDaysC

Mete

rsC

ub

ed

S 23.5331

R-Sq 96.8%

R-Sq(adj) 96.7%

Fitted Line PlotMetersCubed = 22.78 + 0.2553 DegreeDaysC

+ 0.000230 DegreeDaysC**2

We try

Stat > Regression > Fitted Line Plot

& this time choose Quadratic for

the Type of Regression Model

This is a better model as judged visually and by

looking at the R-Sq(adj).


• Here is the Minitab code that makes

use of this model:

– Home_Data.mac

• From the macro here is the section

that creates the two key variables:


Home_Data.mac



Bar / Line

graph relating

Consumption

and Degree

Days vs Time:



The non consumption

related monthly fee

inflates the cost / Cubic

Meter in the summer

when consumption is low:



A good way to

compare year

over year

consumption

and cost:



A Subset of

this data

was used to

develop the

model.



Recent years

show

consumption

lower than

predicted by

the model –

fundamental

consumption

has

improved!

The

Money

Shot!


• To statistically test if consumption is lower we

create a new variable that differentiates the period

before and after the replacement of windows. We

than perform a 2 Sample t-test (Stat > Basic Stats >

2-Sample-t…)




Two-Sample T-Test and CI: DevnFromPrediction, Period

Two-sample T for DevnFromPrediction

Period N Mean StDev SE Mean

Before 67 0.0 23.2 2.8

After 50 -7.2 19.7 2.8

Difference = mu (Before) - mu (After)

Estimate for difference: 7.28

95% CI for difference: (-0.58, 15.14)

T-Test of difference=0 (vs not =): T-Value=1.83 P-Value=0.069 DF=113

The test shows borderline

significance for the hypothesis

that before & after are equal.



If the difference of 7.3

Cubic Meters per month

is accurate, at current

costs of $0.371/Cubic

Meter then the yearly

savings is $4.45 so the

replacement was not

justifiable financially.

Fortunately our

motivation was ease of

cleaning, appearance

and comfort!

mtb 07 run status charts and advanced...

Documents