1 a case study of bayesian modeling on a real world problem ram energy energester/enziro bob...

44
1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik Banerjee

Upload: nicholas-lane

Post on 26-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

1

A Case Study of Bayesian Modeling on a Real World Problem

RAM Energy Energester/Enziro

Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik Banerjee

Page 2: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

2

ISS – Intelligent Systems Solutions

Group of researchers/academics Working with CAS (Centre for Adaptive

Systems) Remit:

Provide Technology Transfer and Expertise to Industry

Assist NE SME’s and stimulate business growth Obtain funding, e.g. SMART Awards, GONE,

etc.

Page 3: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

3

ISS Projects

RAM Energy – Intelligent Data Analysis

Neptune Engineering – Intelligent Diagnostics

HASS – Back-office system/DBase

Hart Biological – Back-office system/Dbase,

process manufacturing

Etc.

Page 4: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

4

RAM Energy Founded 2000 Clients in Oil/Gas, Energy, Process,

Manufacturing, Haulage Industry Products Energester +Enziro

Ester based synthetic lubricants and greases, enzymatic cleaning solutions, absorbents and blasting media

Better lubrication, heat dissipation and vibration reduction than oil or grease in isolation and conventional additives

Page 5: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

5

RAM Energy

ProblemDemonstrate effectiveness and cost efficiencyData collected by RAM Energy

very large major differences across the various sectors

Assist RAM Energy in structuring their data collection and storage in general

Heavy haulage industry

Page 6: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

6

RAM Energy

Trials RAM energy carried out select trials with

clients. These included: Monitored consumption prior to Energester use

Monitored consumption post Energester use

Use of control vehicles (no Energester use)

Temperature data collected

Page 7: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

7

RAM Energy Haulage

Data collected via diesel receipts Information consisted of

Card number (allocated to regn number) Vehicle registration Date Fuel Mileage

Page 8: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

8

Registration Number Date Reg Entered Fuel Added Mileage

J577PWL 20020901 DX51MYT 276.19 128504

J577PWL 20020902 DX51MTY 296.51 129130

J577PWL 20020904 DX51MYT 288.88 999

J577PWL 20020905 J577PWL 235.95 666

J577PWL 20020907 J577PWL 346 1

J577PWL 20020907 J577PWL 234.86 1

J577PWL 20020908 DX51NYT 211 99999

J577PWL 20020909 DX51MYT 447.73 11

J577PWL 20020910 51 286.24 4717

J577PWL 20020910 DX51MYT 253.07 135300

J577PWL 20020911 DX51MYT 281 1

J577PWL 20020912 51 220.66 1000

J577PWL 20020912 DX51MYT 260 1

J577PWL 20020913 DU02PBY 325 1

J577PWL 20020914 DU02PBY 255.59 109705

J577PWL 20020915 DU02RBY 267.17 110296

J577PWL 20020915 2 267.62 120889

J577PWL 20020916 DU02PBY 182.16 111563

J577PWL 20020916 DU52PBY 260.02 112043

J577PWL 20020917 2 263.91 2646

J577PWL 20020917 DU02PBY 224.81 113223

J577PWL 20020918 2 251.09 3773

J577PWL 20020918 DU02PBY 224.67 114513

Page 9: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

9

RAM Energy

AnalysisPerformed using Excel spreadsheetsDiscrete mpg (mileage since last fill/diesel input)Some cumulative mpg using total mileage/total

diesel input to date)Attempt to normalise using mean temperature

records Some regression analysis

Page 10: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

10

Fuel Consumption Rover 75 W608 UOH

32

34

36

38

40

42

44

46

48

50

52

1 11 21 31 41 51 61 71

Fill No.

MPG

.Discreet MPG

Cumulative MPG

Adjusted MPG

Page 11: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

11

RAM Energy Results

No seasonal adjustment

With seasonal

After Energester 42.94 43.46

Before Energester 42.66 42.64

     

Percentage gain 0.64% 1.92%

Page 12: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

12

RAM Energy Problems

Missing data consisted of Driver information (who?)Loading information (full/empty)Length of journeyType of journey (long haul vs short haul)Urban or motorway conditionsEtc.

Page 13: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

13

RAM Energy Conclusion

Results very poor and inconclusive

Page 14: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

14

Database

Excel sheets were converted to an Access database with deletion of unnecessary rows and columns.

The Access database was then imported into SQL Server for data query and subsequent analysis

Page 15: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

15

Data Cleansing

Brief outline of most obvious problems with the data 1. Card Number2. Registration Number3. Date4. Fuel Added5. Mileage

Page 16: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

16

Card Number There were duplicate Card Numbers for

(presumably) the same Card, e.g. 85944 and 0085944 In a few cases, for a given Registration

Number, there appear additional Card Numbers, e.g. for ‘N151EUB’ there are the Card Numbers:

38195 0038195 56408

Page 17: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

17

Registration Number

Registration numbers seemed to be always entered correctly

However, the field Reg Entered did not always tally with this

RAM recommendation to ignore

Page 18: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

18

Date

Dates entered very consistent preserved the ordering distance between dates the actual date

An important question was: CAN WE PRESUME THE DATE IS ALWAYS ENTERED CORRECTLY ?

If this was so, then this provided us with a convenient check on the Mileage, as Date and Mileage should both increase together.

Page 19: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

19

Fuel Outlier identification

Very small and very large values easily detected over large dataset

Take mean of the sample and flag as outliers data more than 3 or 4 SD’s away from the mean

Very small values e.g. 0 or 1 assumed as bogus values

9999, 999, etc. taken to be bogus valuesSome small and large values mistyped, with

either the decimal place occurring too soon (e.g. 38.6 instead of 386) or extra digits added (e.g. 3860 instead of 386)

Page 20: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

20

Fuel

Difficult errorse.g. 693392.. could be 69392 ? What if

693399 ?Data must be flagged as erroneous

Page 21: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

21

Mileage

Some values were entered as {0,1,999,9999,2,3,5,10,111,1111,123,789, etc}

If we can presume that the Date is a sensible value, then in a dataset where there are only a few missing or obviously incorrect values for the Mileage, these values can be amended as follows

Page 22: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

22

Mileage

Day Mileage Spurious?

11 300

12 400

13 500 ?14 450 ?

We do not know if the day 13 entry is wrong, or day 14. So we can look ahead:

Page 23: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

23

MileageDay Mileage Spurious?

11 300

12 400

13 500

14 450 ?15 510

Day Mileage Spurious?

11 300

12 400

13 500 ?14 450

15 470

Or

Page 24: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

24

Mileage

Trans Quantity (Fuel Added) Odometer (Mileage)

182.04 55525

236 0

290 1

268.33 57589

Trans Quantity (Fuel Added) Odometer (Mileage)

182.04 55525

236+ 290 + 268.33 = 794.33 57589

Collapsed to:

Page 25: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

25

Mileage

Small and very large values could be ignored Problem was determining whether any of the

remaining data was valid – data validation Evaluating the degree of correlation between the

increasing Date, and the supposed increasing Mileage

Useful approaches for estimating rank-orderedness and correlation between lists Spearman’s coefficient of rank correlation Kendall’s Tau

Page 26: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

26

Data Cleansing

Page 27: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

27

Ram Energy Data Validator

Page 28: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

28

Page 29: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

29

Page 30: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

30

Bayesian - Approach In Bayesian approach to statistical inference,

express uncertain beliefs about things in terms of probability E.g. that there is a 50% chance that the average fuel

consumption of a vehicle will be less than 30mpg

Can use probabilities in this way to describe uncertainty about things we do not know E.g. amount of fuel in a vehicle’s tank at 10.00am

yesterday

Page 31: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

31

Bayesian - Approach

Once we accept this view of probability, the principle for learning from data is simple

Before we see the data, we have a probability distribution based on our knowledge up to that point prior distribution

When we see the data our probability distribution changes, in the light of new information in the data posterior distribution.

Page 32: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

32

Bayesian - Approach

Calculation used to get from the prior distribution to the posterior distribution Uses Bayes’ theoremHence Bayesian statistics

Very straightforward interpretation of the results when using this method

Posterior distribution tells us how likely it is that various things are true, after we have used the evidence in the data

Page 33: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

33

Bayesian - Approach

Different observers can have different prior beliefs and this means that their posterior distributions will also be different make prior distribution represent very little information in practice prior tends to have little effect on posterior

One advantage of this approach is that it is straightforward to calculate what we expect various things to be after seeing the data For example, can calculate a posterior probability

distribution for the cost savings of applying the fuel additive to a whole vehicle fleet

Page 34: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

34

Bayesian - Model

The basic model used is a regression, with fuel used as the dependent variable and distance travelled as one of the explanatory variables

Each observation corresponds to the time between two successive additions of fuel to the fuel tank

Expect zero fuel to be used if zero distance were travelled, amount of fuel used is not necessarily proportional to the distance travelled

For example, fuel efficiency may be greater on longer journeys

Page 35: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

35

Bayesian - Model

Simplest form of the model, assume that fuel used is proportional to distance travelled

Constant of proportionality which is the slope of the line on a graph

Various other forms of relationship were also investigated.

While distance travelled is most obvious explanatory variable, there are several other variables and factors which must be taken into account

Page 36: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

36

Bayesian - Factors Vehicle Types

Type of vehicle has effect Individual vehicles of same type may also

have different characteristicsEffect of individual vehicles (within a type)

was regarded as a random effectVehicles seen as a sample from all vehicles of

that type

Page 37: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

37

Bayesian - Factors

DriversDriver identified by card numberDrivers closely associated with vehicles In this case, difficult to separate effects of

vehicles from the effects of driversHowever, if this were not the case, then it

would be possible to make inferences about individual drivers as well as individual vehicles

Page 38: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

38

Bayesian - Factors

Time of yearFuel efficiency may be affected by ambient

temperature/meteorological variables Ideally use meteorological dataObtained data for this purposeBut, as a first step, a simple substitute is to

use the time of year, e.g. month

Page 39: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

39

Bayesian - Factors

Presence of fuel additiveThe main question of interest is, “How does

the use of the fuel additive affect fuel consumption?

Page 40: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

40

Bayesian - Complications

Fuel How full the fuel tank was before or after fuel was

added Precisely how much fuel was used between fills

True tank content regarded as a latent or “hidden” variable Such variables can be built into a Bayesian analysis

Page 41: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

41

Bayesian - Complications

Data entry errors Graph of odometer readings against date for a single

vehicle shows the general pattern - spurious values This built into the model by allowing certain prior

probabilities for errors of different types The analysis can thus “recognise” errors by

calculating posterior probabilities that a reading is an error of the various types

Those values which have large posterior probabilities of being erroneous are, in effect, ignored by the rest of the analysis.

Page 42: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

42

Bayesian - Conclusions

Prototype Bayesian models were successfully run

Demonstrated feasibility of approach for this problem

However: Need to overcome problems of missing data Uncertainty over when additive would be expected to

have an effect Pattern of this effect Confounding of additive effect with the effects of other

factors such as the changing seasons

Page 43: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

43

Bayesian Results

Posterior probability density for the effect of the additive, in litres per mile

Page 44: 1 A Case Study of Bayesian Modeling on a Real World Problem RAM Energy Energester/Enziro Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik

44

Conclusions

Recommendations:Design of better trials and data acquisitionCollection of ambient temperatures, etc.

Future DirectionsFraud detectionEfficiency of individual drivers/vehiclesPatterns of work, optimisation