1 a case study of bayesian modeling on a real world problem ram energy energester/enziro bob...

A Case Study of Bayesian Modeling on a Real World Problem

RAM Energy Energester/Enziro

Bob Mattheys, Malcolm Farrow, Giles Oatley, Garen Arevian, Souvik Banerjee

ISS – Intelligent Systems Solutions

Group of researchers/academics Working with CAS (Centre for Adaptive

Systems) Remit:

Provide Technology Transfer and Expertise to Industry

Assist NE SME’s and stimulate business growth Obtain funding, e.g. SMART Awards, GONE,

ISS Projects

RAM Energy – Intelligent Data Analysis

Neptune Engineering – Intelligent Diagnostics

HASS – Back-office system/DBase

Hart Biological – Back-office system/Dbase,

process manufacturing

RAM Energy Founded 2000 Clients in Oil/Gas, Energy, Process,

Manufacturing, Haulage Industry Products Energester +Enziro

Ester based synthetic lubricants and greases, enzymatic cleaning solutions, absorbents and blasting media

Better lubrication, heat dissipation and vibration reduction than oil or grease in isolation and conventional additives

RAM Energy

ProblemDemonstrate effectiveness and cost efficiencyData collected by RAM Energy

very large major differences across the various sectors

Assist RAM Energy in structuring their data collection and storage in general

Heavy haulage industry

RAM Energy

Trials RAM energy carried out select trials with

clients. These included: Monitored consumption prior to Energester use

Monitored consumption post Energester use

Use of control vehicles (no Energester use)

Temperature data collected

RAM Energy Haulage

Data collected via diesel receipts Information consisted of

Card number (allocated to regn number) Vehicle registration Date Fuel Mileage

Registration Number Date Reg Entered Fuel Added Mileage

J577PWL 20020901 DX51MYT 276.19 128504

J577PWL 20020902 DX51MTY 296.51 129130

J577PWL 20020904 DX51MYT 288.88 999

J577PWL 20020905 J577PWL 235.95 666

J577PWL 20020907 J577PWL 346 1

J577PWL 20020907 J577PWL 234.86 1

J577PWL 20020908 DX51NYT 211 99999

J577PWL 20020909 DX51MYT 447.73 11

J577PWL 20020910 51 286.24 4717

J577PWL 20020910 DX51MYT 253.07 135300

J577PWL 20020911 DX51MYT 281 1

J577PWL 20020912 51 220.66 1000

J577PWL 20020912 DX51MYT 260 1

J577PWL 20020913 DU02PBY 325 1

J577PWL 20020914 DU02PBY 255.59 109705

J577PWL 20020915 DU02RBY 267.17 110296

J577PWL 20020915 2 267.62 120889

J577PWL 20020916 DU02PBY 182.16 111563

J577PWL 20020916 DU52PBY 260.02 112043

J577PWL 20020917 2 263.91 2646

J577PWL 20020917 DU02PBY 224.81 113223

J577PWL 20020918 2 251.09 3773

J577PWL 20020918 DU02PBY 224.67 114513

RAM Energy

AnalysisPerformed using Excel spreadsheetsDiscrete mpg (mileage since last fill/diesel input)Some cumulative mpg using total mileage/total

diesel input to date)Attempt to normalise using mean temperature

records Some regression analysis

Fuel Consumption Rover 75 W608 UOH

1 11 21 31 41 51 61 71

Fill No.

.Discreet MPG

Cumulative MPG

Adjusted MPG

RAM Energy Results

No seasonal adjustment

With seasonal

After Energester 42.94 43.46

Before Energester 42.66 42.64

Percentage gain 0.64% 1.92%

RAM Energy Problems

Missing data consisted of Driver information (who?)Loading information (full/empty)Length of journeyType of journey (long haul vs short haul)Urban or motorway conditionsEtc.

RAM Energy Conclusion

Results very poor and inconclusive

Database

Excel sheets were converted to an Access database with deletion of unnecessary rows and columns.

The Access database was then imported into SQL Server for data query and subsequent analysis

Data Cleansing

Brief outline of most obvious problems with the data 1. Card Number2. Registration Number3. Date4. Fuel Added5. Mileage

Card Number There were duplicate Card Numbers for

(presumably) the same Card, e.g. 85944 and 0085944 In a few cases, for a given Registration

Number, there appear additional Card Numbers, e.g. for ‘N151EUB’ there are the Card Numbers:

38195 0038195 56408

Registration Number

Registration numbers seemed to be always entered correctly

However, the field Reg Entered did not always tally with this

RAM recommendation to ignore

Dates entered very consistent preserved the ordering distance between dates the actual date

An important question was: CAN WE PRESUME THE DATE IS ALWAYS ENTERED CORRECTLY ?

If this was so, then this provided us with a convenient check on the Mileage, as Date and Mileage should both increase together.

Fuel Outlier identification

Very small and very large values easily detected over large dataset

Take mean of the sample and flag as outliers data more than 3 or 4 SD’s away from the mean

Very small values e.g. 0 or 1 assumed as bogus values

9999, 999, etc. taken to be bogus valuesSome small and large values mistyped, with

either the decimal place occurring too soon (e.g. 38.6 instead of 386) or extra digits added (e.g. 3860 instead of 386)

Difficult errorse.g. 693392.. could be 69392 ? What if

693399 ?Data must be flagged as erroneous

Mileage

Some values were entered as {0,1,999,9999,2,3,5,10,111,1111,123,789, etc}

If we can presume that the Date is a sensible value, then in a dataset where there are only a few missing or obviously incorrect values for the Mileage, these values can be amended as follows

Mileage

Day Mileage Spurious?

11 300

12 400

13 500 ?14 450 ?

We do not know if the day 13 entry is wrong, or day 14. So we can look ahead:

MileageDay Mileage Spurious?

11 300

12 400

13 500

14 450 ?15 510

Day Mileage Spurious?

11 300

12 400

13 500 ?14 450

15 470

Mileage

Trans Quantity (Fuel Added) Odometer (Mileage)

182.04 55525

268.33 57589

Trans Quantity (Fuel Added) Odometer (Mileage)

182.04 55525

236+ 290 + 268.33 = 794.33 57589

Collapsed to:

Mileage

Small and very large values could be ignored Problem was determining whether any of the

remaining data was valid – data validation Evaluating the degree of correlation between the

increasing Date, and the supposed increasing Mileage

Useful approaches for estimating rank-orderedness and correlation between lists Spearman’s coefficient of rank correlation Kendall’s Tau

Data Cleansing

Ram Energy Data Validator

Bayesian - Approach In Bayesian approach to statistical inference,

express uncertain beliefs about things in terms of probability E.g. that there is a 50% chance that the average fuel

consumption of a vehicle will be less than 30mpg

Can use probabilities in this way to describe uncertainty about things we do not know E.g. amount of fuel in a vehicle’s tank at 10.00am

yesterday

Bayesian - Approach

Once we accept this view of probability, the principle for learning from data is simple

Before we see the data, we have a probability distribution based on our knowledge up to that point prior distribution

When we see the data our probability distribution changes, in the light of new information in the data posterior distribution.

Bayesian - Approach

Calculation used to get from the prior distribution to the posterior distribution Uses Bayes’ theoremHence Bayesian statistics

Very straightforward interpretation of the results when using this method

Posterior distribution tells us how likely it is that various things are true, after we have used the evidence in the data

Bayesian - Approach

Different observers can have different prior beliefs and this means that their posterior distributions will also be different make prior distribution represent very little information in practice prior tends to have little effect on posterior

One advantage of this approach is that it is straightforward to calculate what we expect various things to be after seeing the data For example, can calculate a posterior probability

distribution for the cost savings of applying the fuel additive to a whole vehicle fleet

Bayesian - Model

The basic model used is a regression, with fuel used as the dependent variable and distance travelled as one of the explanatory variables

Each observation corresponds to the time between two successive additions of fuel to the fuel tank

Expect zero fuel to be used if zero distance were travelled, amount of fuel used is not necessarily proportional to the distance travelled

For example, fuel efficiency may be greater on longer journeys

Bayesian - Model

Simplest form of the model, assume that fuel used is proportional to distance travelled

Constant of proportionality which is the slope of the line on a graph

Various other forms of relationship were also investigated.

While distance travelled is most obvious explanatory variable, there are several other variables and factors which must be taken into account

Bayesian - Factors Vehicle Types

Type of vehicle has effect Individual vehicles of same type may also

have different characteristicsEffect of individual vehicles (within a type)

was regarded as a random effectVehicles seen as a sample from all vehicles of

that type

Bayesian - Factors

DriversDriver identified by card numberDrivers closely associated with vehicles In this case, difficult to separate effects of

vehicles from the effects of driversHowever, if this were not the case, then it

would be possible to make inferences about individual drivers as well as individual vehicles

Bayesian - Factors

Time of yearFuel efficiency may be affected by ambient

temperature/meteorological variables Ideally use meteorological dataObtained data for this purposeBut, as a first step, a simple substitute is to

use the time of year, e.g. month

Bayesian - Factors

Presence of fuel additiveThe main question of interest is, “How does

the use of the fuel additive affect fuel consumption?

Bayesian - Complications

Fuel How full the fuel tank was before or after fuel was

added Precisely how much fuel was used between fills

True tank content regarded as a latent or “hidden” variable Such variables can be built into a Bayesian analysis

Bayesian - Complications

Data entry errors Graph of odometer readings against date for a single

vehicle shows the general pattern - spurious values This built into the model by allowing certain prior

probabilities for errors of different types The analysis can thus “recognise” errors by

calculating posterior probabilities that a reading is an error of the various types

Those values which have large posterior probabilities of being erroneous are, in effect, ignored by the rest of the analysis.

Bayesian - Conclusions

Prototype Bayesian models were successfully run

Demonstrated feasibility of approach for this problem

However: Need to overcome problems of missing data Uncertainty over when additive would be expected to

have an effect Pattern of this effect Confounding of additive effect with the effects of other

factors such as the changing seasons

Bayesian Results

Posterior probability density for the effect of the additive, in litres per mile

Conclusions

Recommendations:Design of better trials and data acquisitionCollection of ambient temperatures, etc.

Future DirectionsFraud detectionEfficiency of individual drivers/vehiclesPatterns of work, optimisation

1 a case study of bayesian modeling on a real world problem ram energy energester/enziro bob...

ram energy trials ram

ram energy analysis

ram energy haulage data

ram energy conclusion

inconclusive slide

regression analysis

souvik banerjee slide

data collection

Documents

index []soumyadip chattetjee (2018) souvik ghosal (promo...

prepared by: souvik chakraborty department of cse, techno...

souvik mahapatra department of electrical engineering

georges river college oatley prospectus · 2020. 6. 9. ·...

souvik bhattacharyya , aparajita khan , indradip banerjee...

garen journald - ndsu€¦ · garen journald year ......

souvik raha computer network amie - section...

22 cross-cultural forays: commentary by keith oatley...

term 2 week 9 oatley west lantern

oatley electronics k272a headphone amplifier

manual velotiii portao garen

souvik dhara, johan s.h. van leeuwaarden, and …souvik...

oatley heritage group projects. · presented to the oatley...

occupational identity.aaron oatley

online air ticket reservation.ppt (2)2-01.05.0211-souvik

2017 oatley west public school annual report ·...

o f f. news - oatley flora & fauna conservation society

georges river college oatley enewsletter

rto handbook 2016 - oatley football club

oatley rac development planning proposal transport and