forecasting for business - blog

8/3/2019 Forecasting for Business - Blog

1/12

Call us: +1 (716) 989 6531 or email at: [email protected]

ecasting for Business - Blog - http://blog.lokad.com/journal/category/forecastin

r 12 22/12/2011 14:29


2/12

Entries in forecasting (45)

Seasonality illustratedMonday, September 19, 2011 at 12:00PM

Seasonality is one of the strongest statistical pattern that can

be leveraged to refine forecasts. Below, 4 time-series

aggregated at the weekly level (159 weeks). Historical data are

in red and forecasts are in purple. Vertical gray markers

indicate January 1st.

When illustrating seasonality, everyone (Lokad's included) tend

to use long time-series, much like the first three series here

above. Indeed, it's more visual and more appealing.

However, long time-series do not represent

your usual situation. On average consumer goods have a

lifespan of no more than 3 or 4 years. Thus, long time-series are

typically a small minority in your dataset. Worse, those long

time-series might be outliers that do not reflect the behavior of

other shorter-livedproducts.

Here above, the short 4th time-series is a much more

representative case with less than 1 year of data. In such a

situation, however, it's much less clear how seasonality can be

leveraged. The Lokad trick to do that consists of using multiple

time-series analysis.

Learn more on our seasonality definition article.

Joannes Vermorel | Post a Comment | Share Article

tagged forecasting, insights in forecasting, insights

Video: How the ForecastingEngine works?Tuesday, September 13, 2011 at 09:00AM

Questions about under the hooddetails of Lokad are frequent.

We have recently added a big FAQto our Forecasting

Technologysection. Today, we are releasing a new video that

give the big picture on how our forecasting engine is working.


r 12 22/12/2011 14:29


3/12

Again, special thanks to Ray Grover for the voice over.


tagged video in forecasting, insights, video

Weekly/Monthly aggregation is alossy processThursday, April 14, 2011 at 12:19PM

When practionners have a first look at a forecast report

produced by Lokad, then tend to stumble upon various

oddities. For example, some forecasts may look way too low.

Without any observable trend nor any seasonality,

Lokad anticipates something rather unexpected. Sometimes

it's a by-product of rather advanced correlation analytics, but

sometimes it's something both simpler and deeper.

The graph on the left represents a typical situation: steady

sales for a couple of months, and then, a somewhat

inexplicable drop in the forecasts.

Common sense is yelling this can't be right, let's fix this broken

forecast; and yetforecasting and common sense do not mixwell.

The way we observe sales is deeply misleading. Indeed, we

are observing here monthly aggregated sales, not the sales

themselves. Many businesses favor monthly forecasts because

they feel their sales are too low or too erratic at the daily or

weekly level to be of any practical use. Hence, they aggregate

sales data over long(er) period of time. By doing so, sales

appear smootherand, consequently, more predictable.

This visualizationof sales, i.e. thinking totals rather than an

endless stream of transactions is so ubiquitous than many

businesses fail to realize that aggregating sales primarily

means loosing information, that is potentially valuable to

perform the forecasts.

Let's illustrate the point with a fresh look at the same sales

history, although through weekly aggregation.

The picture is extremely different. We realize that the


r 12 22/12/2011 14:29


4/12

seemingly steady monthly averages were just resulting fromtwo super-heavy weeks: one in between January and February

and a second in March.

Such spikes routinely appear in businesses because of

promotions and other various kind of exception events.

With the second illustration, low forecasts are making a lot

more sense: sales include infrequent spikes that should not be

accounted for, and, when we mentally discardthose spikes,

we obtain forecasts that just follow the usual averaging

pattern.

A traditional forecasting system would typically be fooled by

such a situation, and would anticipate a much higher monthly

forecast, which would turn to be much less accurate.

But Lokad is definitively not your traditional forecasting

system. When monthly or weekly forecasts are requested, we

keep looking at the most fine-grained data available. This let

us identify patterns that would otherwise been lost through the

sales aggregation process.


tagged forecasting, insights in forecasting, insights, time series

Business is UP but forecasts are DOWNFriday, April 1, 2011 at 11:11AM

Statistical demand forecasting is a counter-intuitive science.

This point was pressed a couple oftimes before, but let's have

a look at another misleading situation.

If every single product segment of my business is

growing fast, then at least some products should

have an upward sales trend as well. Right?

Otherwise, we would not be growing at all.

This statement looks like just plain common sense; and yet it's

wrong, very wrong. We live in fast paced economy. Having an

identical product being sold more than 3 years is the exception

rather than the norm in most consumer good businesses. As a

result, product life-cycles tend to dwarf organic growth ofretailers.

This situation is illustrated by the schema below.

This is a set of product sales plotted on the same graphic. Each

curve is associated to a particular product; and products are

launched over time. Each product come with its own lifecycle

pattern. The lifecycle patterns here illustrate a typical novelty


r 12 22/12/2011 14:29


5/12

effect: sales quickly ramp-up after product launch, and then

the product enters its downward phase, which ends when the

product is finally phased out of the market.

Yet, how does an upward trend - from the retailer itself -

impacts this picture? Let's have another look at the illustration

below.

Sales are higher with a positively trended retailer,yet this

growth is nowhere strong enough to compensate for the product

lifecycle effect. The sales of the product are still decreasing -

albeit at a slower rate.

This situation outlines how we can have a fast-growing retail

business with only negatively trended product sales. The main

trick lies in the fact that new products keep being launched.

Alas, this situation generates a lot of confusion. Indeed, when

sales forecasts severely mismatch overall expectations, it

becomes very tempting tofixthe forecasts.

Since most forecasting tools are poorly suited to deal with too

varying or too intermittent demand anyway, it is tempting to

aggregate sales per family, per category to produce an

aggregated forecast; and then to de-aggregate forecasts at the

SKU level using ratios. This approach is named top-down

forecasting; and heavily used in many industries (textile among

others).

Top-down forecasts produce results that look much closer to

intuitive expectations: a growth is observed in the sales

forecasts, and it matches growth observed on the various

business segments.

Yet, by producing the forecast at the TOP level, the forecasting

model is capturing an fictitious upward trend that only results

from the contribution of regular product launches. If this

fictitious ends up applied to a lower level - aka SKUs or

products - then we significantly over-forecast the sales for

each individual product.

Near worst case: massive overstock is generated for products

precisely at the time they are phased out of the market.

From a forecasting perspective, a good forecasting system

should be able to capture lifecycle effects. It means that sales

forecasts may significantly differ from the overall business

forecast. Business can go UP while every single product is

getting DOWN. In such a situation, trying to fixforecasts is

most like going to make them worse.

Addendum: Despite the date of this post (April 1st, 2011), this

post is not a joke.


tagged forecasting, insights, lifecycle, retail, trend in

forecasting, insights

New Forecasting Technology FAQWednesday, March 9, 2011 at 11:28AM

Lately, we realized that the page detailing our forecasting


r 12 22/12/2011 14:29


6/12

technology was somewhat vague concerning under-the-hood

aspects such as seasonality, trend, product life-cycle,

promotions, ... Hence we have just posted a new extensive

Forecasting Technology FAQ.

Questions and Answers

Nuts and bolts

How accurate are your forecasts?

Forecasting competitions, do you have any academic

validation of your technology?Do you evaluate the accuracy of your forecasts?

General patterns

Macro trends (ex: financial crisis), how are they

handled?

Seasonality, trend, how is it handled?

Promotions, how are they handled?

Product Life Cycles and product launches, how are

they handled?

Intermittent / low volume products, how are they

handled?

Cannibalization, how are they handled?

Weather, how is it handled?

Demand artifacts

Lost sales caused by stock-outs, how are they handled?

Exceptional sales, how are they handled?

Aggregation, top-down or bottom-up?

Obviously, we are barely scratching the surface here. Don't

hesitate to post your own questions, we will do our best to

address them as well.


tagged documentation, forecasting, insights in docs,


Fallacies in data cleaning for(short-term) sales forecastsFriday, November 19, 2010 at 11:43AM

When it comes to data analysis, experts frequently emphasize

(and rightly so) the importance of having a clean dataset before

starting any analysis. Otherwise, you end up with Garbage In,

Garbage Out.

As a result, most forecasting toolkits provides extensive

features to support data cleaning / data preparations; and yet,

Lokad does not provide any explicit feature supporting data

cleaning.

Have we missed something BIG here?

We don't believe so. There are some misunderstandings when

it comes to data cleaning for the purpose (short-term) salesforecasting. Indeed, nowadays, sales of most retailers,

wholesalers, manufacturers are stored into either an ERP or

some accounting system. In our experience, as of 2010,

transactional data associated to sales are remarkably clean.

If there is a transaction recorded November 1st, 2010 indicating


r 12 22/12/2011 14:29


7/12

that the product X has been sold in Y quantity, then, the

probability for this information to true is very high, with a

confidence above 99.9% for most sales processes.

Indeed, companies cannot afford not to knowwhat they are

selling. As a result, massive efforts have been invested in the

last two decades to make really sure that sales data are

reliable to some extent. We are not saying that no erroneous

sales entry everenter the system, we are only saying that the

proportion is typically non-significant.

If sales data are clean, why are we still pushing efforts on

data cleaning?

We have been observing a lot of data cleaning practices in the

industry, and it turns out that the operations referred as

cleaning tend to be much more than actually looking for the

0.1% erroneous transactions. The illustration here above gives

some insights about the actual operations involved in a typical

data cleaning phase: it's all about smoothing the extremes. For

example, partial sales during shortages are manually increased,

and promotional/exceptional sales are caped.

Needless to say, we are not believers of this approach.Real

sales data should not be replaced byfictitious sales data.

Indeed, nothing can tell with 100% confidence how muchproducts would have been soldif there had not been any

shortage. The partial sales are the only tangible data that we

have that does not already rely on statistical extrapolation.

Yet, there is one interesting side-effect of the smooth-

the-extreme practice: smoothing improves the accuracy of

the naive forecasting methods that behave much like the

moving average.

It is tempting, if the only tool you have is a

hammer, to treat everything as if it were a nail.,

Abraham Maslow, 1966

Trying to adjust the sales data to better fit on the only

forecasting model on hand is just a bad case of the Law of theinstrument. Our approach consists oftackling directly the

complex patterns instead of trying to circumvent them.


tagged cleaning, data, forecasting, insight, sales in accuracy,


Width vs. Depth, Rotate your salesforecasts by 90 degreesTuesday, August 31, 2010 at 06:05PM

We have already discussed why Lokad did not care much aboutforecasting Chinese food rather than Sport Bar beverages.

Another way of thinking our technology consists ofrotating

your sales forecasts by 90 degrees.

We are observing that a consumer product has, on average, 3

years lifecycle. This means that on average the amount of data

available for every single product about 18 months. When, we

look at the sales history with a monthly aggregation, 18 months

of data means 18 points.

With 18 data points, no matter how smart or advanced is your

forecasting theory, you can't do much simply because we face

an utter lack of data to perform any robust statistical analysis.

With 18 points, even a pattern has obviously as seasonality

becomes a challenge to observe because we don't even have 2

complete seasonal observation.

Your mileage may vary from one industry to the next, but

unless your products stay in the market for decades, you are

most likely to face this issue.


r 12 22/12/2011 14:29


8/12

As a

direct

consequence, classical forecasting toolkits require

statisticians to tweak forecasting models for every singleproduct because no non-trivial statistical model can be robustly

fit with only 18 points as input data.

Yet, Lokad does not require any statistician, and the magic

lies in the 90 degrees rotation: our models do not iterate over

data a single time-series at a time, but against all time-series at

once. Thus, we have a lot more input data available, and

consequently we can succeed with rather advance models.

This approach is just common sense: if you want to forecast

the seasonality of your new chocolate bar, the seasonality of

the other chocolate bars seems like a good candidate. Why

should you treat each chocolate bar in strict isolation from the

others?

Yet, from a computational perspective, the problem has just

become a lot harder: if you have 10,000 SKUs the number of

associations between two SKUs is roughly 100 millions (and

10,000 SKU is nowhere a large number). That's precisely where

the cloud kicks in: even if your algorithms are well-designed

not to suffer a strict quadratic complexity, you're still going to

need a lot of processing power. The cloud just happens to make

this processing power available on demand at a very low price.

Without the cloud, it is simply not possible to deliver this kind

of technology.


tagged cloud computing, depth, forecasting, insights,

statistics, technology, width in forecasting, insights

Forecast's species: classificationvs. regressionTuesday, April 6, 2010 at 12:21PM

The word forecasting is covering a very large spectrum of

processes, technologies and even markets. In the past, we

introduced the worlds of forecasting software, distinguishing

between:

Deterministic simulation software

Expert aggregation softwareStatistical forecasting software

Lokad falls in the last category as our technology is purely

statistical. Yet, Lokad is far from covering the entire statistical

spectrum on is own. Two broads categories of forecasts exist in


r 12 22/12/2011 14:29


9/12

statistical forecasting (*):

Classification forecasts

Regression forecasts

(*) We are oversimplifying here for the sake of clarity, as

statistical learning subtleties are well beyond the scope of

this modest blog post.

Classification attempts to separate (or classify) objects

according to their properties. The illustration below from

Tomasz Malisiewicz illustrates a classification task trying toseparate images picturing a chairfrom images picturing a

table.

Illustration from tombone's blog

The output of a classification is binary (or rather discrete):

objects get assigned to classes with more or less confidence,

i.e. higher or lower probabilities.

On the other hand, regressions typically output curves. The

illustration below is considering a time-series representing

historical sales, and displays the corresponding forecast.

The regression forecast is a curve rather than a binary (or

combination of binary) settings. Inputs get prolonged into the

future.

How does this distinction impact the business?

Well, it turns out that Lokad - as it stands early 2010 - only

delivers regression forecasts. Thus, there are many interesting

problems that cannot be tackled by Lokad because these are

classification problems:

Customer segmentation: for each customer, we would like

to evaluate the probability of achieving successful up-sale

through a direct marketing action. Following the same

idea, we could try to predict the churn as well.


r 12 22/12/2011 14:29


10/12

Fraud detection: for each transaction, we would like to

evaluate - based on the transaction pattern - the

probability for the operation to be a fraud attempt.

Deal prioritization: based on the properties of the

prospect (availability of budget, industry, contact rank in

the company, expressed level of interest, ...), we would

like to evaluate the likelihood to get a profitable deal out

of each prospect to prioritize the sales team efforts.

Frequently, we are asked whether Lokad could deliver

classification forecasts as well. Unfortunately, the answer will

be negative for the time being. Albeit being rooted by the same

mathematical theory, classification and regression entail very

different technologies; and Lokad is pushing all its efforts

toward regression problems.

Although, we are not dismissive about classification

problems, they truly deserve attention and efforts. For 2010,

we are sticking to our roadmap, but further ahead,

classification could be a natural extension of our forecasting

services.


tagged classification, forecasting, insights, regression,

software in business, forecasting, insights, market

Measuring forecast accuracyTuesday, February 23, 2010 at 09:32AM

Most engineers will tell you that:

You can't optimize what you

don't measure

Turns out that forecasting is no

exception. Measuring forecast

accuracy is one of the few

cornerstones of any forecasting

technology.

A frequent misconception about accuracy measurement is that

Lokad has to wait for the forecasts to become past, to finally

compare the forecasts with what really happened.

Although, this approach works to some extend, it comes with

severe drawbacks:

It's painfully slow: a 6 months ahead forecast takes 6

months to be validated.

It's very sensitive to overfitting. Overfitting should not to

be taken lightly, and it's one the few thing that is very

likely to wreak havoc in your accuracy measurements.

Measuring the accuracy of delivered forecasts is a tough piece

of work for us. Accuracy measurement accounts for roughly

half of the complexity of our forecasting technology: the more

advance the forecasting technology, the greater the need for

robust accuracy measurements.

In particular, Lokad returns the forecast accuracy associated to

every single forecast that we deliver (for example, our

Excel-addin reports forecast accuracy). The metric used for

accuracy measurement is the MAPE (Mean Absolute Percentage

Error).

In order to compute an estimated accuracy, Lokad proceeds

(roughly) through cross-validation tuned for time-series

forecasts. Cross-validation is simpler than it sounds. If weconsider a weekly forecast 10 weeks ahead with 3 years (aka

150 weeks) of history, then the cross-validation looks like:

Take the 1st week, forecast 10 weeks ahead, and

compare results to original.

1.


ur 12 22/12/2011 14:29


11/12

Take the 2 first weeks, forecast 10 weeks ahead, and

compare.

2.

Take the 3 first weeks, forecast 10 weeks ahead, and

compare.

3.

...4.

The process is rather tedious, as we end-up recomputing

forecasts about 150 times for only 3 years of history. Obviously,

cross-validation screams for automation, and there is little

hope to go through such a process without computer support.

Yet, computers typically cost less than business forecast errors,

and Lokad relies on cloud computing to deliver such

high-intensive computations.

Attempts to "simplify" the process outlined are very likely to

end-up with overfitting problems. We suggest to say very

careful, as overfitting isn't a problem to be taken lightly. In

doubts, stick to a complete cross-validation.

Joannes Vermorel | 1 Comment | Share Article

tagged accuracy, forecasting, measure in accuracy, forecasting,

insights

Internet is needed for your forecastsSaturday, November 14, 2009 at 07:28PM

Do I really need an Internet

connection to get your

forecasts?is a question

frequently asked by prospects

having a look at our

forecasting technology.

Well, the answer is YES. With

Lokad, there is no

work-around. Our forecasting

engine does not come as an

on-premises solution.

But why should we need an internet connection for an

algorithmic processing such as forecasting?

The answer to this question is one of the core reason that have

lead to the very existence of Lokad in the first place.

When we started working on the Lokad project - back in 2006 -

we quickly realized that forecasting, despite appearances, was

a total misfit for local processing.

1. Your can't get your forecasts right without having the data

at hand. Researchers have been looking for decades for a

universal forecasting model, but the consensus among the

community is that there is no free lunch; universal models donot exist, or rather, they tend to perform poorly. This is the

primary reasons why forecasting toolkits feature so many

models (don't click this link, it's 3000 pages manual for a

popular toolkit). With Lokad, the process is much simpler

because the data is made available to Lokad. Hence, it does

not matter any more if thousands of parameters are needed, as

parameters are handled by Lokad directly.

2. Advanced forecasting is quite resource intensive but the

need to forecast is only intermittent. Even a small retailer with

10 point of sales and 10k product references represents already

100k time-series to be forecasted. If we consider a typical

performance of 10k/series per hour for a single CPU (which is

already quite optimistic for complex models), then computing

sales forecasts for the 10 points of sales take a total 10h of CPU

time. Obviously, retailers prefer not to wait for 10h to get their

forecasts. Buying an amazingly powerful workstation is

possible, but then does it make sense to have so much

processing power staying idle 99% of the time when forecasts

are made only once a week? Outsourcing the processing power


ur 12 22/12/2011 14:29


12/12

is the obvious cost-effective approach here.

3. Forecasting is still under fast paced evolution. Since our

launch about 3 years ago, Lokad has been upgraded every

month or so. Our forecasting technology is not some

indisputable achievement carved in stone, but on the contrary,

is still undergoing a rapid evolution. Every month, the

statistical learning research community moves forward with

loads of fresh ideas. In such context, on-premise solutions

undergo a rapid decay until the day the discrepancy between

the performance of current version and the performance of the

deployed version is so great that the company has no choice but

to rush an upgrade. Aggressively developed SaaS ensure that

customers benefit from the latest improvements without having

to even worry about it.

In our opinion, going for an on-premise solution for your

forecasts is like entering a golf competition with a large

handicap. It might make the game more interesting, but it does

not maximize your chances. Don't expect your competitors to be

fair enough to start with the same handicap just because you

do.


tagged business, forecasting, insight, technology in business,



forecasting for business - blog

Documents