academia sinica jan-2015

Rob J Hyndman

Visualizing and forecasting

big time series data

Victoria: scaled

Outline

1 Examples of big time series

2 Time series visualisation

3 BLUF: Best Linear Unbiased Forecasts

4 Application: Australian tourism

5 Application: Australian labour market

6 Fast computation tricks

7 hts package for R

8 References

Visualising and forecasting big time series data Examples of big time series 2

1. Australian tourism demand

Quarterly data on visitor night from1998:Q1 – 2013:Q4From: National Visitor Survey, based onannual interviews of 120,000 Australiansaged 15+, collected by Tourism ResearchAustralia.Split by 7 states, 27 zones and 76 regions(a geographical hierarchy)Also split by purpose of travel

HolidayVisiting friends and relatives (VFR)BusinessOther

304 bottom-level series

2. Labour market participation

Australia and New Zealand StandardClassification of Occupations

8 major groups43 sub-major groups

97 minor groups– 359 unit groups

* 1023 occupations

Example: statistician2 Professionals

22 Business, Human Resource and MarketingProfessionals224 Information and Organisation Professionals

2241 Actuaries, Mathematicians and Statisticians224113 Statistician

2. Labour market participation

* 1023 occupations

3. PBS sales

3. PBS salesATC drug classification

A Alimentary tract and metabolismB Blood and blood forming organsC Cardiovascular systemD DermatologicalsG Genito-urinary system and sex hormonesH Systemic hormonal preparations, excluding sex hormones

and insulinsJ Anti-infectives for systemic useL Antineoplastic and immunomodulating agentsM Musculo-skeletal systemN Nervous systemP Antiparasitic products, insecticides and repellentsR Respiratory systemS Sensory organsV Various

3. PBS sales

ATC drug classificationA Alimentary tract and metabolism14 classes

A10 Drugs used in diabetes84 classes

A10B Blood glucose lowering drugs

A10BA Biguanides

A10BA02 Metformin

4. Spectacle sales

Monthly sales data from 2000 – 2014Provided by a large spectacle manufacturerSplit by brand (26), gender (3), price range(6), materials (4), and stores (600)About a million bottom-level series

4. Spectacle sales

Hierarchical time series

A hierarchical time series is a collection ofseveral time series that are linked together ina hierarchical structure.

AA AB AC

BA BB BC

CA CB CC

ExamplesNet labour turnoverPharmaceutical salesTourism by state and region

AA AB AC

BA BB BC

CA CB CC

AA AB AC

BA BB BC

CA CB CC

AA AB AC

BA BB BC

CA CB CC

Grouped time series

A grouped time series is a collection of timeseries that can be grouped together in anumber of non-hierarchical ways.

ExamplesTourism by state and purpose of travelGlasses by brand and store

Grouped time series

Outline

7 hts package for R

8 References

Visualising and forecasting big time series data Time series visualisation 11

Victorian tourism dataB

Kite diagrams0

Line graph profile

Duplicate & fliparound the hori-zontal axis

Fill the colour

Kite diagrams: Victorian tourism20

Victoria

Kite diagrams: Victorian tourism

Kite diagrams: Victorian tourism20

Victoria: scaled

An STL decompositionSTL decomposition of tourism demandfor holidays in Peninsula

2000 2005 2010

timeVisualising and forecasting big time series data Time series visualisation 15

Seasonal stacked bar chart

Place positive values above the originwhile negative values below the originMap the bar length to the magnitudeEncode quarters by colours

Seasonal stacked bar chart

Place positive values above the originwhile negative values below the originMap the bar length to the magnitudeEncode quarters by colours

−1.0

−0.5

Holiday

BAA BAB BAC BBABCABCBBCCBDABDBBDCBDDBDEBDF BEA BEBBECBEDBEE BEFBEGRegions

Seasonal stacked bar chart: VIC

−1.0−0.5

0.00.51.0

−1.0−0.5

0.00.51.0

−1.0−0.5

0.00.51.0

−1.0−0.5

0.00.51.0

Holiday

usinessO

BAABABBACBBABCABCBBCCBDABDBBDCBDDBDEBDFBEABEBBECBEDBEEBEFBEGRegions

QtrQ1Q2Q3Q4

Corrgram of remainder

Compute the correlationsamong the remaindercomponents

Render both the sign andmagnitude using a colourmapping of two hues

Order variables according tothe first principal component ofthe correlations.

Corrgram of remainder: VIC

Visualising and forecasting big time series data Time series visualisation 19−1

−0.8

−0.6

−0.4

−0.2

BEEHolBEFOthBEEOthBDEOthBEBOthBEABusBEFBusBDCOthBACHolBEBBusBEAVisBBAHolBDEHolBABOthBAAVisBAAHolBDCHolBBABusBCBHolBEGBusBDDVisBABVisBDAVisBEAOthBDFHolBEEBusBAAOthBACOthBDAOthBDEBusBCBOthBACBusBEBVisBACVisBCAOthBEFVisBCBVisBEDHolBEGOthBDBHolBABBusBEBHolBDFBusBECHolBCAHolBDBOthBEAHolBDCBusBECVisBDBVisBCCHolBBAVisBABHolBBAOthBCCOthBCBBusBCCVisBEGVisBDDHolBECOthBDCVisBAABusBCCBusBECBusBCAVisBDFVisBEGHolBDDOthBEDOthBEDVisBDDBusBDEVisBEFHolBEEVisBDBBusBDABusBDAHolBCABusBDFOthBEDBus

Corrgram of remainder: VIC

−0.8

−0.6

−0.4

−0.2

BDAHol

BDDHol

BEBHol

BEFHol

BECHol

BEDHol

BDFHol

BCCHol

BDCHol

BCAHol

BEAHol

BEGHol

BBAHol

BAAHol

BABHol

BDBHol

BDEHol

BACHol

BCBHol

BEEHol

Corrgram of remainder: TAS

−0.8

−0.6

−0.4

−0.2

FCAHol

FBBHol

FBAHol

FAAHol

FCBHol

FCAVis

FBBVis

FAAVis

FCBBus

FAAOth

FCAOth

FBBOth

FBABus

FBAOth

FCBVis

FCABus

FBAVis

FCBOth

FBBBus

FAABus

Principal components decomposition

2000 2005 2010

First three PCs

Season plot: PC1

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Season plot: PC2

Season plot: PC3

−0.15 −0.10 −0.05 0.00 0.05

Loading 1

NSWVICQLDSATASNTWA

−0.15 −0.10 −0.05 0.00 0.05

Loading 1

HolVisBusOth

Feature analysis

Summarize each time series with a featurevector:

strength of trendsummer seasonalitywinter seasonalityBox-Pierce statistic on remainder of STLLumpiness (variance of annual variances ofremainder)

Do PCA on feature matrix

Feature analysis

summer

winterco

rrlumpy

−5.0 −2.5 0.0 2.5PC1 (39.1% explained var.)

groups

BusHolOthVis

Feature analysis

summer

winterco

rrlumpy

−5.0 −2.5 0.0 2.5PC1 (39.1% explained var.)

groups

NSWNTQLDSATASVICWA

Outline

7 hts package for R

8 References

Visualising and forecasting big time series data BLUF: Best Linear Unbiased Forecasts 25

Hierarchical/grouped time seriesForecasts should be “aggregateconsistent”, unbiased, minimum variance.

Existing methods:ã Bottom-upã Top-downã Middle-out

How to compute forecast intervals?

Most research is concerned about relativeperformance of existing methods.

Top-down method

Advantages

Works well inpresence of lowcounts.

Single forecastingmodel easy tobuild

Provides reliableforecasts foraggregate levels.

Disadvantages

Loss of information,especiallyindividual seriesdynamics.

Distribution offorecasts to lowerlevels can bedifficult

No predictionintervals

Top-down method

Advantages

Disadvantages

Top-down method

Advantages

Disadvantages

Top-down method

Advantages

Disadvantages

Top-down method

Advantages

Disadvantages

Top-down method

Advantages

Disadvantages

Bottom-up method

Advantages

No loss ofinformation.

Better capturesdynamics ofindividual series.

Disadvantages

Large number ofseries to beforecast.

Constructingforecasting modelsis harder becauseof noisy data atbottom level.

Bottom-up method

Advantages

Disadvantages

Bottom-up method

Advantages

Disadvantages

Bottom-up method

Advantages

Disadvantages

Bottom-up method

Advantages

Disadvantages

The BLUF approach

Hyndman et al (CSDA 2011) proposed a newstatistical framework for forecastinghierarchical time series which:

1 provides point forecasts that areconsistent across the hierarchy;

2 allows for correlations and interactionbetween series at each level;

3 provides estimates of forecast uncertaintywhich are consistent across the hierarchy;

4 allows for ad hoc adjustments andinclusion of covariates at any level.

The BLUF approach

Hierarchical data

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Hierarchical data

yt = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

YA,tYB,tYC,t

Hierarchical data

1 1 11 0 00 1 00 0 1

︸︷︷︸

YA,tYB,tYC,t

Hierarchical data

1 1 11 0 00 1 00 0 1

︸︷︷︸

YA,tYB,tYC,t

︸︷︷︸

Hierarchical data

1 1 11 0 00 1 00 0 1

︸︷︷︸

YA,tYB,tYC,t

︸︷︷︸

Btyt = SBt

Hierarchical dataTotal

AX AY AZ

BX BY BZ

CX CY CZ

YtYA,tYB,tYC,tYAX,tYAY,tYAZ,tYBX,tYBY,tYBZ,tYCX,tYCY,tYCZ,t

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸︷︷︸

YAX,tYAY,tYAZ,tYBX,tYBY,tYBZ,tYCX,tYCY,tYCZ,t

︸︷︷︸

AX AY AZ

BX BY BZ

CX CY CZ

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸︷︷︸

AX AY AZ

BX BY BZ

CX CY CZ

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸︷︷︸

yt = SBt

Grouped dataAX AY A

BX BY B

X Y Total

YtYA,tYB,tYX,tYY,tYAX,tYAY,tYBX,tYBY,t

1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1

︸︷︷︸

YAX,tYAY,tYBX,tYBY,t

︸︷︷︸

Grouped dataAX AY A

BX BY B

X Y Total

1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1

︸︷︷︸

Grouped dataAX AY A

BX BY B

X Y Total

1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1

︸︷︷︸

yt = SBt

Forecasting notation

Let yn(h) be vector of initial h-step forecasts,made at time n, stacked in same order as yt.(They may not add up.)

Hierarchical forecasting methods of the form:yn(h) = SPyn(h)

for some matrix P.

P extracts and combines base forecastsyn(h) to get bottom-level forecasts.S adds them upRevised reconciled forecasts: yn(h).

for some matrix P.

Bottom-up forecasts

yn(h) = SPyn(h)

Bottom-up forecasts are obtained using

P = [0 | I] ,

where 0 is null matrix and I is identity matrix.

P matrix extracts only bottom-levelforecasts from yn(h)

S adds them up to give the bottom-upforecasts.

Bottom-up forecasts

yn(h) = SPyn(h)

P = [0 | I] ,

Bottom-up forecasts

yn(h) = SPyn(h)

P = [0 | I] ,

Top-down forecasts

yn(h) = SPyn(h)

Top-down forecasts are obtained using

P = [p | 0]

where p = [p1, p2, . . . , pmK]′ is a vector of

proportions that sum to one.

P distributes forecasts of the aggregate tothe lowest level series.

Different methods of top-down forecastinglead to different proportionality vectors p.

Top-down forecasts

yn(h) = SPyn(h)

P = [p | 0]

Top-down forecasts

yn(h) = SPyn(h)

P = [p | 0]

General properties: bias

yn(h) = SPyn(h)

Assume: base forecasts yn(h) are unbiased:E[yn(h)|y1, . . . ,yn] = E[yn+h|y1, . . . ,yn]

Let Bn(h) be bottom level base forecastswith βn(h) = E[Bn(h)|y1, . . . ,yn].Then E[yn(h)] = Sβn(h).We want the revised forecasts to be unbiased:E[yn(h)] = SPSβn(h) = Sβn(h).Result will hold provided SPS = S.True for bottom-up, but not for any top-downmethod or middle-out method.

yn(h) = SPyn(h)

General properties: variance

yn(h) = SPyn(h)

Let variance of base forecasts yn(h) be givenby

Σh = Var[yn(h)|y1, . . . , yn]

Then the variance of the revised forecasts isgiven by

Var[yn(h)|y1, . . . , yn] = SPΣhP′S′.

This is a general result for all existing methods.Visualising and forecasting big time series data BLUF: Best Linear Unbiased Forecasts 37

yn(h) = SPyn(h)

Σh = Var[yn(h)|y1, . . . , yn]

yn(h) = SPyn(h)

Σh = Var[yn(h)|y1, . . . , yn]

BLUF via trace minimization

TheoremFor any P satisfying SPS = S, then

= trace[SPΣhP′S′]

has solution

P = (S′Σ†hS)−1S′Σ†h.

Σ†h is generalized inverse of Σh.

Equivalent to GLS estimate of regressionyn(h) = Sβn(h) + εh where ε ∼ N(0,Σh).

has solution

Optimal combination forecasts

yn(h) = SPyn(h) = S(S′Σ†hS)−1S′Σ†hyn(h)

Var[yn(h)|y1, . . . , yn] = S(S′Σ†hS)−1S′

Problem: Σh hard to estimate.

Initial forecasts

Revised forecasts Initial forecasts

yn(h) = S(S′Σ†hS)−1S′Σ†hyn(h)

Revised forecasts Base forecasts

Solution 1: OLSAssume εh ≈ SεB,h where εB,h is theforecast error at bottom level.

Then Σh ≈ SΩhS′ where Ωh = Var(εB,h).

If Moore-Penrose generalized inverse used,then (S′Σ†hS)

−1S′Σ†h = (S′S)−1S′.

yn(h) = S(S′S)−1S′yn(h)Visualising and forecasting big time series data BLUF: Best Linear Unbiased Forecasts 40

−1S′Σ†h = (S′S)−1S′.

yn(h) = S(S′S)−1S′yn(h)Total

Weights:

S(S′S)−1S′ =

0.75 0.25 0.25 0.250.25 0.75 −0.25 −0.250.25 −0.25 0.75 −0.250.25 −0.25 −0.25 0.75

AA AB AC

BA BB BC

CA CB CC

Weights: S(S′S)−1S′ =

0.69 0.23 0.23 0.23 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.080.23 0.58 −0.17 −0.17 0.19 0.19 0.19 −0.06 −0.06 −0.06 −0.06 −0.06 −0.060.23 −0.17 0.58 −0.17 −0.06 −0.06 −0.06 0.19 0.19 0.19 −0.06 −0.06 −0.060.23 −0.17 −0.17 0.58 −0.06 −0.06 −0.06 −0.06 −0.06 −0.06 0.19 0.19 0.190.08 0.19 −0.06 −0.06 0.73 −0.27 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 0.19 −0.06 −0.06 −0.27 0.73 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 0.19 −0.06 −0.06 −0.27 −0.27 0.73 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 0.73 −0.27 −0.27 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 0.73 −0.27 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 −0.27 0.73 −0.02 −0.02 −0.020.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 0.73 −0.27 −0.270.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 0.73 −0.270.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 −0.27 0.73

AA AB AC

BA BB BC

CA CB CC

Weights: S(S′S)−1S′ =

0.69 0.23 0.23 0.23 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.080.23 0.58 −0.17 −0.17 0.19 0.19 0.19 −0.06 −0.06 −0.06 −0.06 −0.06 −0.060.23 −0.17 0.58 −0.17 −0.06 −0.06 −0.06 0.19 0.19 0.19 −0.06 −0.06 −0.060.23 −0.17 −0.17 0.58 −0.06 −0.06 −0.06 −0.06 −0.06 −0.06 0.19 0.19 0.190.08 0.19 −0.06 −0.06 0.73 −0.27 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 0.19 −0.06 −0.06 −0.27 0.73 −0.27 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 0.19 −0.06 −0.06 −0.27 −0.27 0.73 −0.02 −0.02 −0.02 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 0.73 −0.27 −0.27 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 0.73 −0.27 −0.02 −0.02 −0.020.08 −0.06 0.19 −0.06 −0.02 −0.02 −0.02 −0.27 −0.27 0.73 −0.02 −0.02 −0.020.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 0.73 −0.27 −0.270.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 0.73 −0.270.08 −0.06 −0.06 0.19 −0.02 −0.02 −0.02 −0.02 −0.02 −0.02 −0.27 −0.27 0.73

Features

Covariates can be included in initial forecasts.

Adjustments can be made to initial forecastsat any level.

Very simple and flexible method. Can workwith any hierarchical or grouped time series.

SPS = S so reconciled forcasts are unbiased.

Conceptually easy to implement: OLS onbase forecasts.

Weights are independent of the data and ofthe covariance structure of the hierarchy.

Features

Challenges

Computational difficulties in bighierarchies due to size of the S matrix andsingular behavior of (S′S).

Need to estimate covariance matrix toproduce prediction intervals.

Ignores covariance matrix in computingpoint forecasts.

yn(h) = S(S′S)−1S′yn(h)

Challenges

Solution 1: OLSApproximate Σ†1 by cI.

Solution 2: RescalingSuppose we approximate Σ1 by itsdiagonal.

Let Λ =[diagonal

)]−1contain inverse

one-step forecast variances.

yn(h) = S(S′Σ†1S)−1S′Σ†1yn(h)

yn(h) = S(S′ΛS)−1S′Λyn(h)

Let Λ =[diagonal

Optimal reconciled forecasts

yn(h) = Sβn(h) = S(S′ΛS)−1S′Λyn(h)

Easy to estimate, and places weight wherewe have best forecasts.Ignores covariances.For large numbers of time series, we needto do calculation without explicitly formingS or (S′ΛS)−1 or S′Λ.

Initial forecasts

Outline

7 hts package for R

8 References

Visualising and forecasting big time series data Application: Australian tourism 47

Australian tourism

Hierarchy:States (7)

Zones (27)

Regions (82)

Australian tourism

Hierarchy:States (7)

Zones (27)

Regions (82)

Base forecastsETS (exponentialsmoothing) models

Base forecasts

Domestic tourism forecasts: Total

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: NSW

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: VIC

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: Nth.Coast.NSW

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: Metro.QLD

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: Sth.WA

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: X201.Melbourne

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: X402.Murraylands

1998 2000 2002 2004 2006 2008

Base forecasts

Domestic tourism forecasts: X809.Daly

1998 2000 2002 2004 2006 2008

Reconciled forecasts

2000 2005 2010

2000 2005 20101000

2000 2005 2010

er2000 2005 201018

2000 2005 20104000

2000 2005 2010

LD2000 2005 201060

ital c

2000 2005 2010

Forecast evaluation

Select models using all observations;

Re-estimate models using first 12observations and generate 1- to8-step-ahead forecasts;

Increase sample size one observation at atime, re-estimate models, generateforecasts until the end of the sample;

In total 24 1-step-ahead, 232-steps-ahead, up to 17 8-steps-ahead forforecast evaluation.

Forecast evaluation

Hierarchy: states, zones, regions

MAPE h = 1 h = 2 h = 4 h = 6 h = 8 AverageTop Level: Australia

Bottom-up 3.79 3.58 4.01 4.55 4.24 4.06OLS 3.83 3.66 3.88 4.19 4.25 3.94Scaling (st. dev.) 3.68 3.56 3.97 4.57 4.25 4.04Level: States

Bottom-up 10.70 10.52 10.85 11.46 11.27 11.03OLS 11.07 10.58 11.13 11.62 12.21 11.35Scaling (st. dev.) 10.44 10.17 10.47 10.97 10.98 10.67Level: Zones

Bottom-up 14.99 14.97 14.98 15.69 15.65 15.32OLS 15.16 15.06 15.27 15.74 16.15 15.48Scaling (st. dev.) 14.63 14.62 14.68 15.17 15.25 14.94Bottom Level: Regions

Bottom-up 33.12 32.54 32.26 33.74 33.96 33.18OLS 35.89 33.86 34.26 36.06 37.49 35.43Scaling (st. dev.) 31.68 31.22 31.08 32.41 32.77 31.89

Outline

7 hts package for R

8 References

Visualising and forecasting big time series data Application: Australian labour market 53

ANZSCO

* 1023 occupations

ANZSCO

* 1023 occupations

Australian Labour Market data

1. Managers2. Professionals3. Technicians and trade workers4. Community and personal services workers5. Clerical and administrative workers6. Sales workers7. Machinery operators and drivers8. Labourers

1990 1995 2000 2005 2010

Lower three panelsshow largestsub-groups at eachlevel.

1990 1995 2000 2005 2010

Base forecastsReconciled forecasts

2010 2011 2012 2013 2014 2015

1990 1995 2000 2005 2010

Base forecastsReconciled forecasts

2010 2011 2012 2013 2014 2015

Base forecastsfrom auto.arima()

Largest changesshown for eachlevel

Forecast evaluation (rolling origin)RMSE h = 1 h = 2 h = 3 h = 4 h = 5 h = 6 h = 7 h = 8 Average

Top level

Bottom-up 74.71 102.02 121.70 131.17 147.08 157.12 169.60 178.93 135.29

OLS 52.20 77.77 101.50 119.03 138.27 150.75 160.04 166.38 120.74

WLS 61.77 86.32 107.26 119.33 137.01 146.88 156.71 162.38 122.21

Level 1

Bottom-up 21.59 27.33 30.81 32.94 35.45 37.10 39.00 40.51 33.09

OLS 21.89 28.55 32.74 35.58 38.82 41.24 43.34 45.49 35.96

WLS 20.58 26.19 29.71 31.84 34.36 35.89 37.53 38.86 31.87

Level 2

Bottom-up 8.78 10.72 11.79 12.42 13.13 13.61 14.14 14.65 12.40

OLS 9.02 11.19 12.34 13.04 13.92 14.56 15.17 15.77 13.13

WLS 8.58 10.48 11.54 12.15 12.88 13.36 13.87 14.36 12.15

Level 3

Bottom-up 5.44 6.57 7.17 7.53 7.94 8.27 8.60 8.89 7.55

OLS 5.55 6.78 7.42 7.81 8.29 8.68 9.04 9.37 7.87

WLS 5.35 6.46 7.06 7.42 7.84 8.17 8.48 8.76 7.44

Bottom Level

Bottom-up 2.35 2.79 3.02 3.15 3.29 3.42 3.54 3.65 3.15

OLS 2.40 2.86 3.10 3.24 3.41 3.55 3.68 3.80 3.25

WLS 2.34 2.77 2.99 3.12 3.27 3.40 3.52 3.63 3.13

Outline

7 hts package for R

8 References

Visualising and forecasting big time series data Fast computation tricks 57

Fast computation: hierarchical data

AX AY AZ

BX BY BZ

CX CY CZ

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 11 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸︷︷︸

yt = SBt

Fast computation: hierarchical data

AX AY AZ

BX BY BZ

CX CY CZ

YtYA,tYAX,tYAY,tYAZ,tYB,tYBX,tYBY,tYBZ,tYC,tYCX,tYCY,tYCZ,t

1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 01 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 00 0 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 1 0 0 0 0 00 0 0 0 1 0 0 0 00 0 0 0 0 1 0 0 00 0 0 0 0 0 1 1 10 0 0 0 0 0 1 0 00 0 0 0 0 0 0 1 00 0 0 0 0 0 0 0 1

︸︷︷︸

yt = SBt

Fast computation: hierarchies

Think of the hierarchy as a tree of trees:

T1 T2 . . . TK

Then the summing matrix contains k smaller summingmatrices:

1′n1

1′n2· · · 1′nK

S1 0 · · · 00 S2 · · · 0...

... . . . ...0 0 · · · SK

where 1n is an n-vector of ones and tree Ti has niterminal nodes.

Think of the hierarchy as a tree of trees:

T1 T2 . . . TK

Then the summing matrix contains k smaller summingmatrices:

1′n1

1′n2· · · 1′nK

S1 0 · · · 00 S2 · · · 0...

... . . . ...0 0 · · · SK

where 1n is an n-vector of ones and tree Ti has niterminal nodes.

S′ΛS =

S′1Λ1S1 0 · · · 0

0 S′2Λ2S2 · · · 0... ... . . . ...0 0 · · · S′KΛKSK

+λ0 Jn

λ0 is the top left element of Λ;Λk is a block of Λ, corresponding to tree Tk;Jn is a matrix of ones;n =

∑k nk.

Now apply the Sherman-Morrison formula . . .

S′ΛS =

S′1Λ1S1 0 · · · 0

0 S′2Λ2S2 · · · 0... ... . . . ...0 0 · · · S′KΛKSK

+λ0 Jn

λ0 is the top left element of Λ;Λk is a block of Λ, corresponding to tree Tk;Jn is a matrix of ones;n =

∑k nk.

Now apply the Sherman-Morrison formula . . .

(S′ΛS)−1 =

(S′1Λ1S1)

−1 0 · · · 00 (S′2Λ2S2)

−1 · · · 0...

.... . .

...0 0 · · · (S′KΛKSK)

−cS0

S0 can be partitioned into K2 blocks, with the (k, `)block (of dimension nk × n`) being

(S′kΛkSk)−1Jnk,n`(S

′`Λ`S`)

Jnk,n` is a nk × n` matrix of ones.

c−1 = λ−10 +

1′nk(S′kΛkSk)

−11nk .

Each S′kΛkSk can be inverted similarly.S′Λy can also be computed recursively.

(S′ΛS)−1 =

(S′1Λ1S1)

−1 0 · · · 00 (S′2Λ2S2)

−1 · · · 0...

.... . .

...0 0 · · · (S′KΛKSK)

−cS0

S0 can be partitioned into K2 blocks, with the (k, `)block (of dimension nk × n`) being

(S′kΛkSk)−1Jnk,n`(S

′`Λ`S`)

Jnk,n` is a nk × n` matrix of ones.

c−1 = λ−10 +

1′nk(S′kΛkSk)

−11nk .

Each S′kΛkSk can be inverted similarly.S′Λy can also be computed recursively.

The recursive calculations can bedone in such a way that we neverstore any of the large matricesinvolved.

Fast computation

When the time series are not strictlyhierarchical and have more than two groupingvariables:

Use sparse matrix storage and arithmetic.

Use iterative approximation for invertinglarge sparse matrices.

Paige & Saunders (1982)ACM Trans. Math. Software

Fast computation

Outline

7 hts package for R

8 References

Visualising and forecasting big time series data hts package for R 64

hts package for R

hts: Hierarchical and grouped time seriesMethods for analysing and forecasting hierarchical and groupedtime series

Version: 4.3Depends: forecast (≥ 5.0)Imports: SparseM, parallel, utilsPublished: 2014-06-10Author: Rob J Hyndman, Earo Wang and Alan LeeMaintainer: Rob J Hyndman <Rob.Hyndman at monash.edu>BugReports: https://github.com/robjhyndman/hts/issuesLicense: GPL (≥ 2)

Example using Rlibrary(hts)

# bts is a matrix containing the bottom level time series# nodes describes the hierarchical structurey <- hts(bts, nodes=list(2, c(3,2)))

AX AY AZ

# Forecast 10-step-ahead using WLS combination method# ETS used for each series by defaultfc <- forecast(y, h=10)

forecast.gts functionUsageforecast(object, h,method = c("comb", "bu", "mo", "tdgsf", "tdgsa", "tdfp"),fmethod = c("ets", "rw", "arima"),weights = c("sd", "none", "nseries"),positive = FALSE,parallel = FALSE, num.cores = 2, ...)

Argumentsobject Hierarchical time series object of class gts.h Forecast horizonmethod Method for distributing forecasts within the hierarchy.fmethod Forecasting method to usepositive If TRUE, forecasts are forced to be strictly positiveweights Weights used for "optimal combination" method. When

weights = "sd", it takes account of the standard deviation offorecasts.

parallel If TRUE, allow parallel processingnum.cores If parallel = TRUE, specify how many cores are going to be

Outline

7 hts package for R

8 References

Visualising and forecasting big time series data References 69

ReferencesRJ Hyndman, RA Ahmed, G Athanasopoulos, andHL Shang (2011). “Optimal combination forecasts forhierarchical time series”. Computational statistics &data analysis 55(9), 2579–2589.RJ Hyndman, AJ Lee, and E Wang (2014). Fastcomputation of reconciled forecasts for hierarchicaland grouped time series. Working paper 17/14.Department of Econometrics & Business Statistics,Monash UniversityRJ Hyndman, AJ Lee, and E Wang (2014). hts:Hierarchical and grouped time series.cran.r-project.org/package=hts.RJ Hyndman and G Athanasopoulos (2014).Forecasting: principles and practice. OTexts.OTexts.org/fpp/.

å Papers and R code:

robjhyndman.com

å Email: [email protected]

academia sinica jan-2015

Education

quarterly data

monthly sales data

unit groups

pbs salesvisualising

spectacle salesvisualising

metabolismb blood

human resource

level series2