Methodologies for an Industrial Production Index: a study by
simulation
Daniel MotaInstituto Nacional de Estatística, Portugal
www.ine.pt, [email protected]
Objectives...
• Motivation is assessing methodologies for building up an IPI:
technologies and data collection have been greatly facilitated and
improved. Main goal is to identify the best method (if there is one)
• Secondary goals are to investigate which methods promote data
reduction and diminished response burden
• Finally, analyse potential discrepancies due to the use of “lagged”
samples
Stylised facts
• In a dynamic sector, with more births than deaths, the sample index will underestimate the true index
• In a shrinking sector, the sample index will usually overestimate the true index in case the deaths are taken off the sample
• In a dynamic sector, with births and deaths, but a stable trend of
production, the index will normally be consistent with the true
index
• These assertions are independent of the method used to calculate
the IPI index
Methodological issues
• Index is of a Laspeyres type (all formulae in the paper)
• Fixed weights vs variable weights in sample indices: weights in this context should always refer to the base year, then, weights should be fixed within a fixed sample and variable within re-samples
• However, this is not the end of the story for weights: under stringent circumstances, weights must be fixed. Also, working simultaneously with indices based on the universe and based on samples will lead to the coexistence of different kinds of weights in one type of index (more technical details in the paper)
• Fixed sample vs “rotating” samples (both approaches were tested)
• When using “rotating” samples, chaining becomes an issue: two methods to chain indices were tried out – chaining in December and chaining by yearly averages
Simulation methodology
• Data for the universe are generated: 8 sectors with distinct behaviours comprising a total of circa 50000 firms/products for 7 years
• Quantities were generated by a mild stochastic growth rule:
• Prices follow an almost AR process, with rho following a normal distribution of mean 1,2 and variance 0,4:
• Weights are relatively stable for most sectors (there is a clear growth of importance of one sector with a correspondent decrease of importance in another sector – sector 1 goes from 31 to 26% weight while sector 2 increases its importance from 19 to 27%)
• 2 approaches: immediate access to perfect information vs a real life environment
1t t ty y
1 1t t tp p
Methods
• Fixed samples and “rotating” samples
• Samples chosen either by decreasing turnover (standing for 85, 50 and 35% of total turnover) or by random processes (20% of the universe)
• If yearly samples, chaining is either done by yearly growth averages or in December in the usual way
• This leads to 14 distinct methods tested
• Symbols:
• 85, 50, 35 and ale (short for random)
• For fixed samples – f and v stand for fixed and variable weights
• For “rotating” samples – d and a stand for chaining in December or by yearly averages
Methods (table)
Fixed sample (constant for 5 years)
Rotating sample
85 Fixed weights
Variable weights
Chaining in December
Yearly chaining
50 Fixed weights
Variable weights
Chaining in December
Yearly chaining
35 Fixed weights
Variable weights
Chaining in December
Yearly chaining
ale (random) Chaining in December
Yearly chaining
Results (contemporaneous samples)
95,000
100,000
105,000
110,000
115,000
120,000
125,000
130,000
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04
ind2v 2_85 2_85v 2_85rvd 2_85rva 2_ale20d 2_ale20a
Results (contemporaneous samples)
90,000
95,000
100,000
105,000
110,000
115,000
120,000
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04
ind3v 3_85 3_85v 3_85rvd 3_85rva 3_ale20d 3_ale20a
Results (contemporaneous samples)
95
97
99
101
103
105
107
109
111
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04
indgv 85f 85v 85rvd 85rva ale20d ale20a
Results (lagging samples)
95,000
100,000
105,000
110,000
115,000
120,000
125,000
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04
ind2v 2_85 2_85v 2_85rvd 2_85rva 2_ale20d 2_ale20a
Results (lagging samples)
94,000
96,000
98,000
100,000
102,000
104,000
106,000
108,000
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04
ind3v 3_85 3_85v 3_85rvd 3_85rva 3_ale20d 3_ale20a
Results (lagging samples)
94
96
98
100
102
104
106
Jan-00 Jan-01 Jan-02 Jan-03 Jan-04
indgv 85f 85v 85rvd 85rva ale20d ale20a
Results (synthesis)
√ mean squared error of indices (contemporaneous sample)
85f 50f 35f 85v 50v 35v 85d 50d 35d ale20d 85a 50a 35a ale20a
indices 3,59 2,13 1,56 2,89 1,35 0,91 1,40 2,57 3,42 0,27 3,30 4,79 5,70 0,97
yoy rates 1,65 1,32 1,24 1,28 0,92 0,88 0,48 0,89 1,18 0,17 1,14 1,65 1,94 0,40
ratio to best 9,54 7,63 7,18 7,41 5,33 5,07 2,77 5,14 6,81 1,00 6,57 9,53 11,22 2,32
√ mean squared error of indices (lagging sample)
85f 50f 35f 85v 50v 35v 85d 50d 35d ale20d 85a 50a 35a ale20a
indices 5,07 4,64 4,28 5,05 4,64 4,31 4,11 2,80 2,21 4,81 3,32 1,78 1,09 4,18
yoy rates 1,97 1,83 1,72 1,94 1,81 1,71 1,56 1,11 0,91 1,79 1,28 0,82 0,66 1,55
ratio to best 3,01 2,80 2,63 2,97 2,76 2,60 2,38 1,69 1,40 2,74 1,96 1,26 1,00 2,37
Solutions
• Universe is usually known at y + ½ and sample for y + 2 is then based on the universe of y (y being the base year)
• The reference census must be made available sooner; nowadays there are powerful technological tools to help in this respect
• Calculating ratios of births over deaths may be used to correct the “representativity” of the current sample
• A possible solution (not yet fully studied in a real-life context) would be to publish “provisional” indices that would be updated with the growth ratio of previous year, combined with the re-sampling mentioned in the previous bullet
• The problems we set to solve in the beginning are more severe the more dynamic is the industry. In relatively stable environments, the problems are mild and opting for the method described with contemporaneous samples seems safe
Solutions (2)
• More “straightforward” solutions consist of using a bundle of methods e.g.
• for dynamic sectors (many births, independent of deaths), choose 35a
• For stable sectors (not many births or deaths), choose 85f
• For shrinking sectors (many deaths), opt for 35f
• This solution is dependent on the classification of the behaviour of each sector (which is usually known with a lag, although the behaviour of a sector tends to be somewhat predictable)
Conclusion
• The simulation is based on a big universe, with real-life characteristics and was repeated a number of times – therefore, results presented and corresponding conclusions are robust (to these factors)
• Lagging samples, in a dynamic world, stand as the biggest problem for statisticians
• In a dynamic environment, there are no clear best methods but there are some ways to mitigate the problems inherent to that condition
• In a relatively stable environment, the best method is clearly the one based on yearly samples chained in December and a second best is the same method but with yearly chaining
• In a rigid environment, a fixed sample is the best method, whatever the sub-method (either turnover or random)