mtb 07 run status charts and advanced...
TRANSCRIPT
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts rev: 2020-02-27 1
MTB – 07 Run Status Charts
and Advanced Concepts
This module will cover:• Step by Step creation of a Run Status Graph
• Multiple Capability Analysis– Screening many processes to see which need follow up
• Creating “Stoplight” charts
• “Growth with Seasonality” – A worked example of data analysis of a challenging time oriented dataset
• “Pi by Cannonball” - A Worked example to show:
– The use of “sub” macros to make code more readable
– Obtaining input from users
– Looping
– Use of constants especially w.r.t. graphing
– Adding calculated lines to graphs
– Creating a row number variable
– The use of subscripts to access a specific row’s value
• Performance vs Model – how to detect improvement in the face of confounding variables – eg. Energy performance
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 2
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 3
Multi-Capability:
A Macro with explanation for viewing the Process Capability of dozens of
processes simultaneously
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 4
Multi-Capability
Minitab® has good capability to summarize the Capability of a
Process. For example consider this process:
126124122120118116114
LSL Target USL
LSL 114
Target 120
USL 126
Sample Mean 119.932
Sample N 187
StDev(Overall) 1.98672
Process Data
Pp 1.01
PPL 1.00
PPU 1.02
Ppk 1.00
Cpm 1.01
Overall Capability
% < LSL 0.00
% > USL 0.00
% Total 0.00
Observed Performance
% < LSL 0.14
% > USL 0.11
% Total 0.25
Exp. Overall Performance
Process Capability of Prop
This process is almost exactly on aim.
This process is on the borderline of being capable since the Ppk is 1
In this example, a Standard Deviation of 2 would mean that the Pp would be exactly 1 since the distance between specification limits would be exactly 6 times the Standard Deviation of 2 i.e. 12. Tolerance is the
distance between Upper and Lower Specification Limits - 12 in this example.
I’ll term the Standard Deviation which is exactly 1/6th of
the Tolerance to be the Critical Standard Deviation or Sc
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 5
Multi-Capability
How do we compare the Capability of dozens of Processes in one view? We could be dealing with different aims, specification limits, etc. What we need to do is to compare different processes on a common scale.
Using the definition of Critical Standard Deviation or Sc = 1/6th of the Tolerance, then we can “Normalize” the Mean and Standard Deviation of a Process. (We are assuming symmetrical limits i.e. the Target is midway between the LSL and USL).
– XbarN = Normalized Average= ( Xbar – Target ) / SC Where SC = (USL – LSL)/6
= distance from Target in units of Critical Standard Deviations
– SN = Normalized Standard Deviation= S/SC
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 6
Multi-Capability
Take this example process:
126124122120118116114
LSL Target USL
LSL 114
Target 120
USL 126
Sample Mean 118.31
Sample N 187
StDev(Overall) 1.62473
Process Data
Pp 1.23
PPL 0.88
PPU 1.58
Ppk 0.88
Cpm 0.85
Overall Capability
% < LSL 0.00
% > USL 0.00
% Total 0.00
Observed Performance
% < LSL 0.40
% > USL 0.00
% Total 0.40
Exp. Overall Performance
Process Capability of Prop
For this example,
Sc= 1/6 * (126 – 114) = 2
– XbarN= ( Xbar – Target ) / SC
= (118.31 – 120) / 2
= - 0.845
– SN
= S / SC = 1.625/2 = 0.812
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 7
XBarN
Sn
3210-1-2-3
2.0
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
>
–
–
–
< 0.0
0.0 1.0
1.0 1.3
1.3 2.0
2.0
Ppk
3
2.0
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Contour Plot of Ppk vs Sn, XBarN
Multi-Capability
The next question is how to display the capability of
many processes on one graph. Consider a graph like:
This is not a novel approach, but it does provide a
means to compare many processes at once.
You can see where the process shown on the previous slide (XbarN = - 0.845, SN = 0.812) is situated.
Its Ppk of 0.88 is in the “incapable” region.
Along this line Ppk = 1
If you take pairs of XbarN and SN and calculate Ppk, you can create the contour plot shown.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 8
Multi-Capability
Here is a
graph
which
shows the
capability
for 12
processes:
… but what
if you had
many more
to compare
at one
time?
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 9
Multi-Capability
Here 27
processes
are
compared on
one graph
using the
“Normalized”
Xbar and S
view:
XBarN
Sn
3210-1-2-3
2.0
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
>
–
–
–
< 0.0
0.0 1.0
1.0 1.3
1.3 2.0
2.0
PpkRegion
3
2.0
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0.0
HLI_A 123
HLI_B423
HLI_X37
KRK_A 123
KRK_B423
NRM_A 123
NRM_B423
NRM_X37
RGG_A 123
RGG_B423
BRK_A 123
RGG_X37
SO C _A 123
SO C _B423
SO C _X37
V A P_A 123
V A P_B423
V A P_X37
BRK_B423
BRK_X37
C O L_A 123
C O L_B423
C O L_X37
GA V _A 132
GA V _B423
GA V _X37
VarName
Plot of Normalized Std Dev., Mean vs Ppk
Last Updated: 2012-02-20 17:09
This process is nearly on aim and has very good process capability
This process is off aim (low) but has a low enough standard deviation that it could be marginally capable by just getting the process on aim!
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 10
Multi-Capability
So how were the graphs generated? … by a
complex Minitab® macro.
This presentation will work through the macro
as a learning exercise…
You will see screen shots from my favourite
text editor – EditPadPro. The next slide shows
some of the features.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 11
Multi-CapabilityMultiple files can be open at once. You can search all open files, easily copy text from one file to another…
Syntax colouring (eg. Green italics for comments) makes the code more readable and lets you know you’ve spelled keywords correctly.
The file navigation window shows you the structure of your file and lets you jump to specific points in your file.
The clip collection lets you store text that you can quickly insert into you text by dragging or double clicking.
Extensive Search & Replaceincluding use of Regular Expressions.
Split window editing (not shown) is supported.
“Folding” allows sections of code to be hidden.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 12
Multi-Capability
Etc.
INSTRUCTIONS in the header comments:
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 13
Multi-Capability
Overall structure of macro:
The main macro calls 5 sub-macros.
The call to this sub-macro would be commented out (or deleted) when analyzing “real” data.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 14
Multi-Capability
Overall structure of the
macro can also be seen
from the File Navigation
Panel in my “fancy” text
editor - EditPadPro:
The main macro calls 5
sub-macros.
DO Loops
Layout command used to create a 2 panel graph
Layout command used to create a 2 layer graph. (Contour Graph does not show up in File Navigation Panel)
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 15
Multi-Capability
Example data
creation…
This creates
27 columns of
data.
A “real”
analysis would
not use this
sub-macro.
Code continues creating 27 columns of data…
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 16
Multi-Capability
The last section of this sub-macro creates the table of
column numbers, aims and tolerances…
The set command allows you to enter data into a column…
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 17
Multi-Capability
The capability command code in this section has been “folded” so you can see the overall structure… The code is on the next page…
These columns receive the output of the Capability Command.
These columns store the result of capability analysis on one column in a table of results.
The DO loop will need to know how many columns of data there are to analyze
K81 is the loop counter so we are defining parameters which can be used when calling the Capability Analysis
The output from the analysis is stored, for the capability analysis just completed.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 18
Multi-Capability
Here is the code executed inside the loop
shown on the previous page…
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 19
Multi-Capability
Here is the worksheet after the fourth trip
through the loop:
These columns receive the output of the Capability Command. These columns are
where the table of results is created.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 20
Multi-Capability
The upper graph in the two panel graph has a max. value of 2 so create a variable that will plot as 2 if the value is larger.
Create classifications of the Ppk for the plot
We may be creating several pages of the two panel graph so we want the variable names to appear in alphabetical order so we need to sort all the associated columns (in place).
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 21
Multi-Capability
Minitab® macro code has limitations so sometimes work-arounds are required. We want to stack all the columns of raw data but how to deal with a variable number of columns?
The work-around is to store all the column numbers in constants. Once there are no more constants, the constants are defined as 205 so they we point to a column just a single missing data point.
Stack the (up to) 60 columns worth of data.
A work-around needed from the work-around!
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 22
Multi-Capability
After all the code has executed, you are left with a table of statistics for each column in columns 210-219 and the stacked data in columns 220+. For the example data the stacked data columns are 5044 rows long.
These two columns are created by the stack command.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 23
Multi-Capability
As described earlier we can calculate “Normalized” Mean and Standard Deviation now that we have our table of statistics.
We also calculate a page number since our two panel graph will be limited to 12 variables per page.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 24
Multi-Capability
The Convert command is what is call up interactively as Data > Code > Use Conversion Table.
We use it to add additional columns based on the “table” of information created earlier.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 25
Multi-Capability
This code creates a constant with text like “Last Updated: 2012-02-25” which can be used as the sub-title for the graph. This is helpful for auto-updating graphs to make sure you know if the update has happened.
We create a graph with up to 12 variables per page. Max(Page) is the highest number page and this is used for the DO loop end value.
What would happen if our data only had one or two of the capability ranges of red / yellow / green? Then our graph would only use 2 plot symbols & they might not match red = incapable, etc.
The workaround?? Create 3 rows of data (for each page) that have each of the 3 status values but the Ppk value set to missing so they do not actually plot. We have to define values for some other columns so all the columns to be plotted are the same length. This workaround makes the code robust – i.e. able to handle variables with any combination of the three status values.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 26
Multi-Capability
The rest of the DoPlot sub-macro:
The text editor’s ability to fold (hide) text has been used so that none of the subcommands for the 2 boxplots are shown. They can be seen on the next 2 slides.
We create a constant with the desired title including the page number (which is stored in constant K82).
We create a constant with the desired full file name / path including page number.
We store the graph as a JPEG file (which can be viewed from a web page if it has a suitable link). The replace sub-command is important if we want to run this macro automatically. Without it, if the graph already existed you would be prompted to ask if you want to replace it.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 27
Multi-Capability
The top panel of the
graph layout of the type
shown on slide 6.
The graph is created
interactively and then
the code is copied into
the clipboard via Copy > Command Language. From there it is
tweaked with use of
constants, comments,
etc.
The figure sub command says here that this panel will occupy the top 30% of the overall graph area.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 28
Multi-Capability
The bottom panel of
the graph layout of the
type shown on slide 6.
The Grouping vs StackMean StackOSStDev and StackPpk is a handy way to label the axis with useful information. We have used the “NoEmpty” subcommand so that we do not get all combinations of these variables!!
The figure sub command says here that this panel will occupy the bottom 70% of the overall graph area.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 29
Multi-Capability
The Summary Plot
Sub-macro:
The text editor’s ability to fold (hide) text has been used so that none of the subcommands for the 2 graphs are shown. They can be seen on the next 2 slides.
This code creates a two dimensional array of XBar and S values which can be used to calculate Ppk values. Those Ppk values are then plotted via a contour plot. (See slide 7).
This time a layout is used not for 2 panels but to plot two graphs on top of each other. The contour plot shows the three Ppk regions and the (Scatter) Plot will place the symbols for each variable.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 30
Multi-Capability
The Contour Plot:
This syntax says create tick values from -3 to 3 in steps of 1.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 31
Multi-Capability
The Scatter Plot:
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 32
Multi-Capability
Summary:
– The capability of many processes can be graphically
summarized on a common basis by “normalizing” the
mean and standard deviation vs the Specification Limits.
– The macro code needed to do this looks daunting but can
be developed one step at a time.
– The macro code was developed in such as way that it
could easily be used in an auto-job providing a frequently
updated view of process capability, greatly streamlining
preparations for a process capability review.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 33
Growth With Seasonality:
A worked example
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 34
Growth With Seasonality –
A Practical Analysis rev. 2012-02-15
This example is derived from a posting in the LinkedIn® Minitab®*
Discussion Forum by “Subodh” on 2012-02-09. I named his variable
“Revenue” to make the example seem more concrete. Here is his
quarterly data:
60544842363024181261
3500
3000
2500
2000
1500
1000
500
0
Index
Reven
ue
Time Series Plot of Revenue His posting was:
How I can use S-curve and Seasonality together?
In trend analysis, S-curve fits better. The data show seasonal pattern also. How to combine the two for decomposition and is it appropriate?
* “MINITAB® and all other trademarks and logos for the Company’s products and services are the exclusive property of Minitab Inc. All other marks referenced remain the property of their respective owners. See minitab.com for more information.”
“Portions of information contained in this publication/book are printed with permission of Minitab Inc. All such material remains the exclusive property and copyright of Minitab Inc. All rights reserved.”
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 35
Growth With Seasonality –
A Practical Analysis
• First I created a couple of variables:
– Period:
The “Partial Sum” Function (Pars), adds the values up in this and all preceding rows. Thus if a column had 1 in each row, the Partial Sum in Row 1 would have a value of 1, in Row 2 a value of 2, etc.
‘Revenue’<>-1 is a condition that is always true i.e. has a value of 1 so the partial sum of this will essentially give us a row index value of 1 for Row 1, 2 for Row 2, etc.
This is easier than Calc > Make Patterned Data because we don’t need to know the number of rows involved.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 36
Growth With Seasonality –
A Practical Analysis
– Quarter:
The “Modulus” Function (Mod), returns the remainder after integer division. e.g. Mod(11,4) returns 3 since 4 divides into 11 2 times with a remainder of 3.
‘mod('Period'-1,4)+1 returns to us the quarter for each data point (assuming the first data point is for a 1st
quarter). Mod(X,4) would return values of 0,1,2 and 3 so we add 1. The –1 is to have the first return a value of 1.
Our Worksheet so far: (It is assumed the first row represented a 1st
quarter!)
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 37
Growth With Seasonality –
A Practical Analysis
• Next I created a
fitted line plot
(Stat > Fitted Line Plot), requesting
a quadratic fit
and storing the
Residuals, Fits
and equation of
best fit
coefficients.
• I also requested
the “Four in one”
residuals plot.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 38
Growth With Seasonality –A Practical Analysis
• Here is the output:
706050403020100
3500
3000
2500
2000
1500
1000
500
0
Period
Reven
ue
S 136.362
R-Sq 97.6%
R-Sq(adj) 97.5%
Fitted Line PlotRevenue = 271.7 - 8.227 Period
+ 0.8311 Period**2
5002500-250-500
99.9
99
90
50
10
1
0.1
Residual
Pe
rce
nt
3000200010000
600
400
200
0
-200
Fitted Value
Re
sid
ua
l
4002000-200
30
20
10
0
Residual
Fre
qu
en
cy
605550454035302520151051
600
400
200
0
-200
Observation OrderR
esid
ua
l
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Revenue
“Heteroscedasticity” is seen which means the variance of the residuals is not constant across all values of the Y variable. Here the variance of the residuals increases as Y increases. This violates one of the assumptions for regression analysis so we need to interpret the results with caution.
The fit to the quadratic line is
quite good with an Adjusted
R-Squared of 97.5%
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 39
Growth With Seasonality –
A Practical Analysis
• To make the residuals more consistent across values of Y, I
calculated a new variable “ResidPerCentOfFit” calculated as
('RESI1'/'FITS1')*100 or in other words, the residual as a
percentage of the fit value. These residuals now appear to be
consistent across time.
706050403020100
20
10
0
-10
-20
Period
Resid
PerC
en
tOfF
it
0
1
2
3
4
Quarter
Scatterplot of ResidPerCentOfFit vs Period
To generate this graph I used
Graph > Scatterplot and
chose “With Connect and Groups” from the gallery and
used. Once the graph was
generated I double clicked on
one of the connect lines and
erased the “grouping”
variable so there was only 1
connect line (instead of 4).
I also added a reference lineat 0.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 40
Growth With Seasonality –
A Practical Analysis
• Another way to look at the new Residuals as a
Percentage of Fit is by splitting out the 4 quarters:
20
10
0
-10
-20
604530150
20
10
0
-10
-20
604530150
Quarter = 1
Period
Resid
PerC
en
tOfF
it
Quarter = 2
0
Quarter = 3 Quarter = 4
0
Scatterplot of ResidPerCentOfFit vs Period It looks like something
non random is
happening with the 4th
quarter results. They
have gone from below
“average” to above.
In fact if we do a linear
regression of these 4th
Quarter residuals, the P-
Value for the model is
0.000 and 74% of the
variability is explained by
the linear relationship!
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 41
Growth With Seasonality –
A Practical Analysis
• Another way to look at the data is a
boxplot by quarter for these residuals:
4321
20
10
0
-10
-20
Quarter
Resid
PerC
en
tOfF
it
0-1.59224
5.06087
1.24764
-4.46178
Boxplot of ResidPerCentOfFit
It looks like 1st
quarter results may
be low on average
by about 4% from
the fit and 3rd
quarter results
might be high by
about 5%. Is this
statistically
significant?
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 42
Growth With Seasonality –
A Practical Analysis
• To see if the quarters may be statistically
different* we can use a One Way ANOVA:
One-way ANOVA: ResidPerCentOfFit versus Quarter
Source DF SS MS F P
Quarter 3 790.7 263.6 2.64 0.058
Error 59 5889.8 99.8
Total 62 6680.6
S = 9.991 R-Sq = 11.84% R-Sq(adj) = 7.35%
Individual 95% CIs For Mean Based on
Pooled StDev
Level N Mean StDev ---------+---------+---------+---------+
1 16 -4.462 7.289 (---------*---------)
2 16 1.248 9.666 (---------*---------)
3 16 5.061 10.227 (---------*---------)
4 15 -1.592 12.313 (----------*---------)
---------+---------+---------+---------+
-5.0 0.0 5.0 10.0
* We must use caution in interpreting these results as we have created an indirect variable based on a model which had heteroscedasticity.
The p-value is on the
borderline of
significance at 0.06,
but know 4th
quarters exhibited
non random
behaviour. If we
exclude 4th quarters,
the p-value for the
one way ANOVA of
the 3 remaining
quarters is 0.018.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 43
Growth With Seasonality –
A Practical Analysis
• Conclusions:
– A quadratic relationship provides a good
fit to the data
– The fourth quarter was initially lower
than “expected” and then became higher
than “expected”.
– The first quarter yields lower than
“expected” results by about 4% and third
quarters higher than “expected” by
about 5%.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 44
Growth With Seasonality –
A Practical Analysis
Follow Up Questions on LinkedIn® Minitab®
Discussion Forum asked essentially:
1. If you choose the cubic instead of quadratic then
the R-sq(adj) is the same but the R-Sq is 0.1%
higher. Does it make much of a difference or
should you just apply the parsimony concept for
practicality?
2. Also in slide 5 from your website showed that it
does not fulfill the ANOVA assumption as it does
not have the equal variance (heteroscedasticity).
So, how much can ANOVA help in the analysis?
Hope this will help to clear my confusion.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 45
Growth With Seasonality –
A Practical Analysis
• Question 1: Higher Order Model– Adding terms or factors to a model will increase
R-Sq but the desired model is the one that has the best predictive power. With a higher order (cubic with 4 terms vs quadratic with 3 terms) model you may well be fitting noise. The R-Sq(adj) takes into account the “degrees of freedom” for the model and gives a better measure of whether or not the model has improved.
– Generally the KISS (Keep it Simple Stupid) approach, or as you phrase it, parsimony is recommended. If the more complex model doesn’t appreciably improve the model stick with the simpler model.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 46
Growth With Seasonality –
A Practical Analysis
• Question 2: Is ANOVA valid given heteroscedasticity?– If you look at slide 9 you will notice that the
ANOVA is not on the original variable but instead the “Residual As Per Cent Of Fit” variable which as the Graph on Slide 7 shows has reasonably uniform variance (but does have the linear trend in 4th quarter results).
– Stat > ANOVA > Test for Equal Variances… on ResidPerCentOfFit vs Quarter does not show an issue with unequal variances – i.e. the p-value is not low. This is even more evident if the non random 4th quarter results are excluded. (See Graphs next slide).
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 47
Growth With Seasonality –
A Practical Analysis
4
3
2
1
22.520.017.515.012.510.07.55.0
Qu
art
er
95% Bonferroni Confidence Intervals for StDevs
Test Statistic 3.80
P-Value 0.284
Test Statistic 1.33
P-Value 0.274
Bartlett's Test
Levene's Test
Test for Equal Variances for ResidPerCentOfFit
3
2
1
17.515.012.510.07.55.0
Qu
art
er
95% Bonferroni Confidence Intervals for StDevs
Test Statistic 1.78
P-Value 0.410
Test Statistic 0.46
P-Value 0.632
Bartlett's Test
Levene's Test
Test for Equal Variances for NonFourthQuarterPCResiduals
The P-Values are NOT low so
the Null Hypothesis of Equal
Variance does NOT have to go
– i.e. we do not have evidence
of unequal variance.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 48
Growth With Seasonality –
A Practical AnalysisAnother follow up to the analysis from Anton:
• Guys: keep it graphical, to start any analysis. Looking at the plots categorised by period:
1. This looks like an exponential growth situation, so dealing with the logs of the data is appropriate- not quadratics or cubics. Taking logs or using a log Y axis linearises most of the dataset nicely, so that supposition is correct.
2. The first few periods don't fit the pattern- they show lower growth than subsequent periods, and is this is the oldest data, so it's probably best to discard it. You can exercise your judgement about how much to discard, but possibly the first 7 or 8 periods looks about right.
3. The next strongest feature is the steady, strong gain of Q4 relative to the other 3 quarters.
4. Finally, there are a few sudden steps in the data: depending on the source of the data you may find explanations such as public holidays etc.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 49
Growth With Seasonality –
A Practical Analysis
• Anton raises some excellent points. A statistical model
may help you fit past behaviour, but it doesn’t give you
insight as the the why for the behaviour. It is best to use
experimentation and analysis to gain insight into the
causes of the response. For example, in a heat transfer
situation your statistical model might confirm a first
principles model based on the physics surrounding heat
transfer. In the example presented if the response is
something like revenue then exponential growth is a more
likely underlying principle than quadratic growth. The
motion of a projectile might better fit a parabola and thus
a quadratic equation.
• Taking the log of our Y variable also makes it easier to
graphically interpret the data:
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 50
706050403020100
8.0
7.5
7.0
6.5
6.0
5.5
5.0
Period
Lo
g(R
eve
nu
e)
1
2
3
4
Quarter
Log(Revenue) vs Period
Growth With Seasonality –
A Practical Analysis
Using the
Calculator
function we
generate a new
column based on
Log(Revenue)
As Anton suggested, in this view the slower growth
in the first number of periods is highlighted.
The strange behaviour of the 4th
quarter from “underperforming” to
“overperforming” (assuming more
is better) is apparent.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 51
Growth With Seasonality –
A Practical Analysis• This refined analysis improves the fit of the model
slightly – RSq-Adj for a linear regression fit to the LogRevenue variable is 98.1% vs 97.5% and the residuals no longer show heteroscedasticity. Moreover extrapolating into the future (a dangerous but often helpful thing to do), this new model is likely to be more accurate based on the underlying principle of exponential growth.
• A minor point relative to Anton’s point No.2. Discarding the first number of points does little to change the analysis. The table to the right shows the R-Squared Adjusted for the Log(Revenue) linear regression fit after skipping the first N values.
• Lots of issues to ponder from 63 data points!
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 52
Estimating π by the
Cannonball Method
Pi By Cannonball
I’m too late for the celebration of Pi day
(3/14) but years ago I read a Scientific
American article (I think it was by Martin
Gardner) that had a whimsical way of
estimating Pi. It was a Monte-Carlo
method using the premise of firing
cannonballs into a circular lake…
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 53
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 54
(0,0)
(1,1)
(-1,1)(-1,-1)
(-1,1)
The area of square is 2 * 2 = 4 units
If a cannon shoots with equal probability of landing
anywhere inside the square then the ratio of “splashes” to total shots (“splashes” +
“thuds”) can be used to estimate Pi.
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 55
If r is < 1 then the cannonball lands in the lake and we have a splash.
If r is > 1 then we have a thud.
(x,y)
Using Minitab’s Calc > Random Data function we can bring this theory to life..
Lake
Land
Pi By Cannonball
• Here are some outputs from using this technique:
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 56
1.00.50.0-0.5-1.0
1.0
0.5
0.0
-0.5
-1.0
X
Y
0
1
Splash
Y vs X for each shot. Number of shots = 75
80706050403020100
4
3
2
1
0
4
3
2
1
0
Row
Esti
ma
teO
fPi
Sp
lash
3.14159265
Estimate of Pi, Splashes. (No. shots = 75)
Last Estimate: 3.30667
After the first shot, the estimate is either 0 or 4.
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 57
1.00.50.0-0.5-1.0
1.0
0.5
0.0
-0.5
-1.0
X
Y
0
1
Splash
Y vs X for each shot. Number of shots = 200
200150100500
4
3
2
1
0
4
3
2
1
0
Row
Esti
ma
teO
fPi
Sp
lash
3.14159265
Estimate of Pi, Splashes. (No. shots = 200)
Last Estimate: 3.1
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 58
1.00.50.0-0.5-1.0
1.0
0.5
0.0
-0.5
-1.0
X
Y
0
1
Splash
Y vs X for each shot. Number of shots = 3000
300025002000150010005000
4
3
2
1
0
4
3
2
1
0
Row
Esti
ma
teO
fPi
Sp
lash
3.14159265
Estimate of Pi, Splashes. (No. shots = 3000)
Last Estimate: 3.13733
Even when 150,000 shots were used, the estimate was in
error in the 3rd decimal point so this is not an efficient
method for estimating Pi !!!
Pi By Cannonball
• The code used to generate these graphs is shown in
the next slides as a learning example since they
illustrate several Minitab® macro programming
techniques:
– The use of “sub” macros to make code more readable.
– Obtaining input from users
– Looping
– Use of constants especially w.r.t. graphing
– Adding calculated lines to graphs
– Creating a row number variable
– The use of subscripts to access a specific row’s value
• The screen shots show the code as seen in my
favourite text editor – EditPad Pro.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 59
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 60
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 61
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 62
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 63
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 64
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 65
Plan view of shots cont’d
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 66
Pi By Cannonball
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 67
EstimateAndSplashGraph cont’d
Performance vs Model
• Overview of our approach:
– Analyze past data to develop a model to explain
consumption
– Use this model to predict consumption for new
data
– Look at the difference between predicted and
actual consumption to see if improvement (or
backsliding) is present.
– The improvement can be monetized by
converting the delta between predicted and
actual consumption in $. (eg. Has my investment
in new windows in 2006-09 paid off?)
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 68
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 69
Performance vs Model
Performance vs Model
• We will develop our consumption model
from my home Natural Gas consumption
data for the period 2001-01 through 2006-
07:
– NATURALGAS Model Basis Data.MTW
• Lets start by some simple graphical
regression outputs.
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 70
Performance vs Model
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 71
9008007006005004003002001000
500
400
300
200
100
0
DegreeDaysC
Mete
rsC
ub
ed
S 28.3173
R-Sq 95.3%
R-Sq(adj) 95.2%
Fitted Line PlotMetersCubed = 3.911 + 0.4419 DegreeDaysC
This model seems to miss the curvature in the consumption vs
degree days.
Performance vs Model
• Let’s investigate this apparent
curvature:
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 72
Performance vs Model
• Here is the output:Regression Analysis: MetersCubed versus DegreeDaysC
The regression equation is
MetersCubed = 3.91 + 0.442 DegreeDaysC
Predictor Coef SE Coef T P
Constant 3.911 5.619 0.70 0.489
DegreeDaysC 0.44191 0.01218 36.27 0.000
S = 28.3173 R-Sq = 95.3% R-Sq(adj) = 95.2%
Analysis of Variance
Source DF SS MS F P
Regression 1 1054827 1054827 1315.46 0.000
Residual Error 65 52122 802
Total 66 1106948
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 73
This model shows statistical evidence that MetersCubed
and DegreeDaysC are correlated.
Performance vs Model
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 74
100500-50-100
99.9
99
90
50
10
1
0.1
Residual
Pe
rce
nt
4003002001000
80
40
0
-40
-80
Fitted Value
Re
sid
ua
l
6040200-20-40-60
10.0
7.5
5.0
2.5
0.0
Residual
Fre
qu
en
cy
65605550454035302520151051
80
40
0
-40
-80
Observation Order
Re
sid
ua
l
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for MetersCubedResiduals
graphs:
The residuals show a pattern suggesting
curvature.
Performance vs Model
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 75
9008007006005004003002001000
500
400
300
200
100
0
DegreeDaysC
Mete
rsC
ub
ed
S 23.5331
R-Sq 96.8%
R-Sq(adj) 96.7%
Fitted Line PlotMetersCubed = 22.78 + 0.2553 DegreeDaysC
+ 0.000230 DegreeDaysC**2
We try
Stat > Regression > Fitted Line Plot
& this time choose Quadratic for
the Type of Regression Model
This is a better model as judged visually and by
looking at the R-Sq(adj).
Performance vs Model
• Here is the Minitab code that makes
use of this model:
– Home_Data.mac
• From the macro here is the section
that creates the two key variables:
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 76
Performance vs Model
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 77
Bar / Line
graph relating
Consumption
and Degree
Days vs Time:
Performance vs Model
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 78
The non consumption
related monthly fee
inflates the cost / Cubic
Meter in the summer
when consumption is low:
Performance vs Model
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 79
A good way to
compare year
over year
consumption
and cost:
Performance vs Model
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 80
A Subset of
this data
was used to
develop the
model.
Performance vs Model
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 81
Recent years
show
consumption
lower than
predicted by
the model –
fundamental
consumption
has
improved!
The
Money
Shot!
Performance vs Model
• To statistically test if consumption is lower we
create a new variable that differentiates the period
before and after the replacement of windows. We
than perform a 2 Sample t-test (Stat > Basic Stats >
2-Sample-t…)
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 82
Performance vs Model
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 83
Two-Sample T-Test and CI: DevnFromPrediction, Period
Two-sample T for DevnFromPrediction
Period N Mean StDev SE Mean
Before 67 0.0 23.2 2.8
After 50 -7.2 19.7 2.8
Difference = mu (Before) - mu (After)
Estimate for difference: 7.28
95% CI for difference: (-0.58, 15.14)
T-Test of difference=0 (vs not =): T-Value=1.83 P-Value=0.069 DF=113
The test shows borderline
significance for the hypothesis
that before & after are equal.
Performance vs Model
M. A. Sibley Consulting – All Rights Reserved MTB 07 – Run Status Charts and Advanced Concepts 84
If the difference of 7.3
Cubic Meters per month
is accurate, at current
costs of $0.371/Cubic
Meter then the yearly
savings is $4.45 so the
replacement was not
justifiable financially.
Fortunately our
motivation was ease of
cleaning, appearance
and comfort!