se of administrative and accounts data in ... 2010...where x is the auxiliary variable chosen in...
TRANSCRIPT
1
ESSNET
USE OF ADMINISTRATIVE AND ACCOUNTS DATA IN
BUSINESS STATISTICS
WP4
TIMELINESS OF ADMINISTRATIVE SOURCES FOR MONTHLY AND
QUARTERLY ESTIMATES
DELIVERABLE 4.2 – SGA-2010
PRACTICES FOR USING VAT TURNOVER DATA WITHIN THE UK TO
PRODUCE ESTIMATES OF GROWTH AND MONTHLY TURNOVER
Craig Orchard a, Kevin Moore, Ann Langford
a Corresponding author: Methodologist, Methodology Directorate
Cardiff Road, Newport, South Wales
NP10 8XG Office for National Statistics
Telephone (044) 1633 455755
2
Summary
This report forms the UK‟s contribution for SGA 2010 to establishing best practice in
the use of administratively sourced data across Europe, specifically ESSnet Work
Package (WP) 4 „The Timeliness of Administrative Sources for Monthly and
Quarterly Estimates‟. It gives details of the current national practices for the use of
administrative data in UK short-term business statistics; the work that the UK has
been carrying out in SGA 2010 to improve best practice in the use of data sourced
from Value Added Tax(VAT) turnover data (reduce or replace data collected through
its current short-term business surveys); and how the UK intends to build upon its
current knowledge, and that shared within the work package, to develop European
best practice and apply it to the UK.
At the moment, UK short-term business estimates are produced using data from
monthly surveys with a combined sample size of circa 40000. These surveys
incorporate annualised VAT data in the estimation process for calibration purposes,
helping to reduce overall sample size cf. expansion estimators. Since this
methodology was established, the Office for National Statistics (ONS) has gained
improved access to returned VAT turnover, allowing the possibility of replacing some
or all survey data with this source. As with the majority of other European countries,
two main problems exist with the monthly VAT datastream. The first is that not all
data are available following a period to produce an initial estimate by 30 days (as
directed by EU regulations). The second being that VAT data is either reported
monthly, quarterly, or yearly; resulting in some data being unavailable due to
periodicity of reporting. Both of these are aspects of timeliness.
Under SGA 2010, ONS agreed to address the issue of timeliness by investigating the
use of interpolation and extrapolation to align the VAT data stream with the survey
period. This has resulted in a time series based method to address timeliness where
little or no VAT data are available to produce first (or subsequent) estimates. In
addition, ONS has also compared survey-based and VAT-based turnover estimates
within the UK. This work concluded that the best practice of using VAT turnover data
within the UK would be to retain a survey of the largest enterprises and use VAT
turnover, possibly in conjunction with a smaller survey, for medium and smaller
enterprises.
The UK will now seek to implement this model in SGA2011/12/13, ensuring the
methodology is shared across the work package, and that the approaches and best
practices other European countries such as macro-imputation (DESTATIS) and
modelling (Statistics Netherlands) are effectively tested and incorporated, where
possible.
Keywords: expansion estimator, extrapolation, interpolation, Survey-based turnover,
VAT-based turnover, UK best practice
3
1. Introduction
The UK currently produces short term turnover estimates by stratified simple random
sampling where employment is used as the stratifying variable, together with NACE
classification. Annual turnover derived mainly from VAT is used as an auxiliary
variable in the survey estimation procedure (i.e. ratio estimation). At the moment, this
turnover data are annualised values held on the UKs Inter-Departmental Business
Register (IDBR). No use is made specifically of monthly administrative data returns.
As part of the UKs contribution to the ESSnet Work Package on the timeliness of
administrative data for short term business (WP4) statistic, the UK has been
investigating the use of VAT turnover information in the context of timeliness to
supplement or replace survey estimates. Timeliness, as defined in WP4, has been
divided into the ability of a National Statistics Institute (NSI) to produce t+30 day
estimates from administrative data in time for its Eurostat obligations; and the effect
of periodicity in administrative data due to quarterly reporting in producing timely
monthly estimates when monthly data are unavailable.
To investigate timeliness of the VAT turnover for use in short term business statistics,
the UK has focused on two main areas, corresponding to the two internal reports that
have been submitted as part of the ESSNET project. Specifically the two projects
examined
1. The use of interpolation and extrapolation to counter the effect of incomplete data
(both due to data not yet being received and the reporting of data in a quarterly
datastream).
2. A comparison of survey-based and VAT-based turnover estimates within the UK.
These investigations can be related to box 2 (nowcasting) and box 3b (modelling
VAT) of the hierarchical tree presented in WP4s current SGA agreement; the scenario
where administrative data do not cover the enterprise population when the STS
estimates have to be made.
The remainder of this report gives an overview of UK practices on the use of
administrative data in Short-term Business Statistics (Section 2) and outlines the two
projects undertaken and their results (Section 3). The report concludes with a
comparison with other countries and a description of suggested future work.
2. Description of the National Practice
2.1. Current Methods of Estimating Turnover in Short-Term Statistics
At present in the UK two surveys are used to estimate turnover on a monthly basis;
each covering a different parts of the economy:
Monthly Business Inquiry (MBS) – This covers manufacturing and other
production industries, as well as covering the service sector industries (not
retail). As such, it contributes to the monthly Index of Production (IoP),
the monthly Index of Services (IoS), and quarterly Gross Domestic
Product (GDP; Output approach). The sample size of MBS is
approximately 33000. MBS was introduced in 2010, and is a combination
of two earlier surveys, MPI (Monthly Production Inquiry) and MIDSS
(Monthly Inquiries into Distribution and Services Sector). These two
surveys have been analysed separately in this report.
4
Retail Sales Inquiry (RSI) - covers only the retail sector and forms the
monthly Retail Sales Index. It contributes to the monthly Index of Services
and quarterly GDP(O). The sample size of RSI is approximately 5000.
All the surveys collect turnover information, though the definition of turnover varies a
little from industry-to-industry. For example, for travel agents, 'sales on own account'
and 'commission' are collected; in RSI, 'retail sales' is collected instead of 'total
turnover'.
MBS collects data for a calendar month ('January 1st-31st', etc.) and the main survey
outputs are estimates of total monthly turnover. For RSI, the data collection period is
based on a set number of weeks (following a 4-4-5 week pattern), and the main output
is total retail sales for an average week in the collection period.
Stratification and sampling – The sampling frame for all the surveys is based on the
IDBR, the Inter-departmental Business Register. This Register is held by ONS and
updated by administrative data from HRMC, PAYE and annual surveys. Both MBS
and RSI are stratified by a cross-classification of SIC and employment size band.
SICs are grouped together in various ways (not necessarily following the
hierarchical structure of the SIC), and not all SICs within, for example,
manufacturing, fall in-scope of the surveys (sometimes external sources of
data are used in the IoP and IoS). Broadly, the level of industrial
stratification in manufacturing and retail is more detailed than in services.
Frozen1 SIC is used for stratification.
The frozen employment of a business (reporting unit) on the IDBR is used
to assign it to a size-band. In general, four size bands are used within each
SIC group, although different sets of size bands can be used in different
SIC groups. There are about eight different sets of size bands in use across
the surveys. Businesses in the largest size band are completely
enumerated; those in all other sizebands are sampled.
A fifth band is also created in MIDSS and RSI, comprising businesses
with employment between 10 and the cut-off of band 4 (i.e. medium-sized
businesses) that have IDBR frozen turnover in excess of £40 m - these
businesses are also completely enumerated.
Rotational sampling (using Permanent Random Numbers) is employed in the sampled
strata. For businesses in these strata, the expected time in the sample is either 15 or 27
months.
Imputation – A complete data set is constructed to cover all the sampled RUs
(excluding any deaths, out-of-scope, etc.) by imputing for non-response. For non-
responding businesses that were in the sample in the previous month, an imputed
value for the current month is calculated by applying the average growth (of
responders in the same imputation class) to the RU's value for last month; for new-to-
sample businesses, a value is constructed in a similar way, but using the relationship
with IDBR data instead.
1 Two versions of SIC, employment and turnover are available on the IDBR: frozen and current. The current fields
may be updated during the year, whereas the frozen fields take on the value of the current field at the end of the
year (i.e. late December) and then remain fixed throughout the year. An exception is frozen SIC which will be
updated during the year if an RU's SIC code changes at the 2-digit level. The frozen fields are used to stratify the
short-term surveys to give greater stability.
5
Outlier Detection – outliers are detected and treated using one-sided Winsorisation in
MBS. One-sided Winsorisation reduces outliers with high values to below a given
threshold. This reduces the effect of outliers that would otherwise have a significant
impact on survey estimates. For RSI an alternative method is applied.
Estimation – Estimation is carried out by employing calibration estimation (known as
ratio estimation). In some SICs, strata form the calibration groups (separate ratio
estimation), whereas in other SICs, the sampled size bands as one (i.e. bands 1-3
together) form the calibration groups (combined ratio estimation). In all cases, the
largest size band is kept separate, and since it is completely enumerated and
imputation is used, no weighting is applied.
The formula for deriving total turnover at the stratum level in sampled size bands can
be represented by:
hi
ihhh ygaY^
Where ah is the design weight from the expansion (Horvitz-Thomson) estimator
component within the stratified population, gh is the ratio estimator component using
an auxiliary variable, and yi is returned turnover for a business. The formulae used for
deriving a- and g-weights for a separate ratio estimator are given below:
h
hh
n
Na
hsi
i
h
h
hi
i
h
xn
N
x
g
Where x is the auxiliary variable chosen in order to calibrate the estimate.
For UK STS turnover estimates the auxiliary variable is the annualised turnover held
on the IDBR (Pring P, 2008).
Construction of short-terms output indices – The estimates of turnover from MPI,
MIDSS (now MBS) and RSI are then used in the construction of the Index of
Production, Index of Services and Retail Sales Index, (and, ultimately, the estimate of
GDP using the Output approach. The processes involved in constructing these series
vary somewhat, but the following are usually applied in construction of the final,
published series:
deflation (division of the turnover series by price relatives or other deflators)
seasonal adjustment
aggregation using „gross value added‟(GVA) weights
annual chain-linking
2.2. Challenges and Opportunities of VAT-Sourced Turnover
This section outlines the two main problems with the timeliness of VAT data in the
UK – namely the time after the end of a period when useable quality data is received,
and the effect of monthly and quarterly “staggered” returns.
The ONS currently receives several datasets from HM Revenue & Customs (HMRC)
under Section 91 of the Value Added Tax Act 1994. The primary use of these datasets
is to keep the IDBR up-to-date. Those datasets that are received daily, twice-monthly
and biennially contain no actual turnover data but are instead concerned with
6
maintaining the IDBR. They provide information on business births/deaths and on
changes in details such as postal addresses and contact names.
There are two types of datasets supplied by HMRC containing VAT turnover
information, one received monthly, and one received quarterly. The monthly dataset
contains raw turnover data, as it appears on the VAT forms that businesses return to
HMRC. The quarterly dataset contains estimated annual turnover figures for a twelve
month rolling period (i.e. January 2008 to December 2008 inclusive) based on returns
for the previous four quarters. In addition to the VAT turnover variable, the datasets
are supplied with VAT reference numbers (unique business identifiers), the VAT
period (the month in which the VAT returns were due) and stagger. The stagger
denotes the month in which businesses are expected to submit their VAT returns.
Different businesses are required to submit returns with different frequencies; some
monthly, some quarterly, and some yearly. Approximately 90% of businesses submit
returns on a quarterly basis, 10% on a monthly basis, and a very small amount on a
yearly basis. Businesses are required to register for VAT when the value of their
taxable supplies is expected to exceed £67000 over a twelve month or less period.
Very small businesses (with total turnover excluding VAT below £600,000) may
apply to submit returns annually.
The main limitation on the use of VAT turnover data is its timeliness. The European
Statistical Service (ESS) refers to timeliness as the lapse of time between publication
and the period to which the data refer. In the case of VAT turnover, timeliness may be
better described as the lapse in time between receiving data of useable quality and the
period it refers to. At present, HMRC receives 90% of returns 35 days after the end of
the VAT period with 100% of returns received 118 days after the end of the period.
By examining the proportion of turnover received, 94% of turnover had been returned
by 40 days after the end of the VAT period with only 40% of turnover returned by the
30th
day after the end of the VAT period. (Figure 1)
Figure 1. Timeliness of VAT declarations to HMRC each month by total VAT
Turnover expressed as an accumulative percentage. Data shown are representative of
returns made in 2009. Taken from Hargreaves, 2009.
7
Figure 2. Timeliness of VAT declarations to ONS by total VAT Turnover for
monthly (black line) and quarterly staggers (blue line) expressed as an accumulative
percentage. The figure indicates that in the quarterly datastreams, larger businesses
leave reporting their turnover until after 30 days from the period AND that only 20%
is reported in the first period.
MPI, MIDSS (now MBS) and RSI all supply survey data to their users within 15-21
working days following the month to which the survey data relates (for MPI and
MIDSS, these users are IoP and IoS respectively). The HMRC VAT turnover data
cannot currently be supplied as quickly as the survey data. If HMRC VAT turnover
data were to be supplied as quickly as survey data, it would contain less than 10% of
turnover by proportion (see Fig 1). This would result in the VAT turnover data being
more timely but may affect accuracy. It is also likely that the HMRC VAT turnover
data supplied will not include those businesses that would otherwise have been fully
enumerated by survey sampling; and therefore those businesses that contribute the
most to turnover by proportion.
In addition, the timeliness of the estimates based on HMRC VAT turnover data may
be affected due to the staggered returns. Staggering results in different businesses
returning their VAT turnover data in different periods. For instance, in the UK,
quarterly reporters are allowed to report in three „staggers‟. Reporters may report for
quarters ending March, June, September, and December (stagger 1); or for quarters
ending April, July, October and January (stagger 2); or for quarters ending May,
August, November and February (stagger 3). Monthly reporters are referred to as
stagger 0. Figure 3 demonstrates the availability of data for the production of
estimates for June. Most European countries that operate only on „financial‟ quarters
will only have stagger 0 and stagger 1.
8
Figure 3. Distribution and completeness of monthly (stagger 0) and quarterly
(staggers 1-3) reporting periods for producing June estimates. Yearly staggers have
been excluded due to the minimal turnover reported
Stagger Oct Nov Dec Jan Feb Mar Apr May Jun
0 M10 M11 M12 M1 M2 M3 M4 M5 A
1 B
2 C
3 D
allmost full data available
partial data available
Q2Q1
Q1Q4
Q4 Q1
3. Results for SGA 2010
Under SGA 2010 the UK proposed to investigate two aspects of the use of
administrative data for short term business statistics. The first of these was to address
the issue of timeliness (both due to data relating to the period not yet being received
and the reporting of data in a quarterly datastream) by developing a method of
interpolation and extrapolation. The second was to compare survey-based and VAT-
based turnover estimates within the UK. The results of both of these are discussed,
with full details of the work undertaken found in the appropriate internal reports
(Parkin, 2010; Orchard et al., 2010).
3.1. Interpolation and Extrapolation from Value Added Tax Returns
The aim of the work was to compare the suitability of the different interpolation and
extrapolation methods to combine monthly and quarterly series to obtain timely
estimates of aggregate monthly turnover.
The different methods of producing monthly estimates were compared by measuring:
the size of revisions, both of the levels and growth; the difference between initial
estimates and the estimate at 18 months after the reference period.
To ensure that industries representative of all sectors were included in the analysis,
two NACE Rev 1.1 2-digit divisions were analysed for the services sector, two for the
manufacturing sector, and two NACE 3-digit classifications for the retail sector.
These were NACE 29, 45, 51, 52.1, and 74; chosen due to previous work carried out
on this NACE classification by the UK and Statistics Netherlands. In total, 96 datasets
were created, one for each month from December 2001 through to December 2009.
In general, for each industry, data were identified as being seasonal. In addition, it
was identified that the initial monthly data received into ONS from HMRC are not
good approximations of the accumulative data received in subsequent months, either
in terms of levels or monthly growth. The data also contains outliers that should be
treated appropriately before being included in the estimation.
3.1.1. Methodology of Calculating Monthly Estimates of Turnover
The processing of monthly series is identical to the processing of quarterly series,
except that interpolation is not done for the monthly series. The series were processed
in three stages:
9
Stage A – Calculation of outliers and their adjustment factors
Outliers were identified and prior adjustments calculated using the program X-12-
ARIMA (United States Census Bureau) for each series. Details of how this was done
can be found in appendix 1 of the UK WP4 Internal Report (Parkin, 2010).
Stage B – Calculation of Monthly Estimates for Monthly and Quarterly Series
Stage B1 – adjusting for outliers – If required, adjustments were made by
dividing turnover by the appropriate adjustment factor, as calculated in stage
A.
Stages B2 and B3: Interpolation and extrapolation –
The following methods of interpolation were tested:
I1. Simple – allocating a third of the total for the quarter to each month
in the quarter
I2. Spline – allocation using a cubic spline.
The following methods of extrapolation were tested:
E1. Simple – applying the growth between the last two periods to the
last level.
E2. Winters – fits a model with seasonal factors and either a linear or
quadratic trend.
E3. Univariate ARIMA – fits an ARIMA model.
The SAS procedure PROC EXPAND was used to interpolate using cubic splines.
This procedure calculates the spline so that its integral over a quarter is equal to the
total turnover in the quarter. Two different end point constraints were used: the first
(the default in SAS) causes the first two splines, at both ends of the series, to be part
of the same cubic; the second causes the second derivative of the cubic curves, at
both ends of the series, to be zero. There was little difference in the results from these
two end point conditions, and only the second is reported in the results section below.
The SAS procedure PROC FORECAST was used to extrapolate using the Winters
and the univariate ARIMA methods. Every series was extrapolated four periods
ahead, though not all of the forecasts were used. The Winters method has a choice of
trend, both the linear and quadratic trend were tried. The quadratic trend was found to
give poor results, so only the method with linear trend is reported in the results
section below.
Stage C – Combining Series to Produce Monthly Estimate
At stage C the monthly and quarterly datastreams for each series were combined to
give an estimate for each reference month and for each lag in the range 0 to 18
10
months. The result of this was over 18,000 series of monthly turnover estimates; each
series defined by industry, reporting period, whether prior adjustment had occurred,
the interpolation method, the extrapolation method, and order of
extrapolation/interpolation.
An example to illustrate the production of an estimate in month 7 is given
below.(Figure 4) “D” represents actual data available (note that in this example data
is deemed available two months after the end of the period) “o” represents
interpolated data, “x” represents extrapolated data.
Figure 4: Illustrative Example
Month 1 2 3 4 5 6 7
Monthly D D D D D x x
Quarterly
(Stagger 1)
o D o o D x x
Quarterly
(Stagger 2)
D o o D x x x
Quarterly
(Stagger 3)
o o D x x x x
Thus in month 7 the estimate for month 6 is constructed by the addition of all the
components in column 6 and will all be based on extrapolations. The estimate for
month 5 on the other hand will be based partly on actual data and partly on
extrapolation. This example demonstrates the UK position, but can be easily be
adapted to the more typical European situation where there may only be one quarterly
stagger.
3.1.2. Measurement of Performance
Two measures of performance were used to assess the interpolation and extrapolation
methodology used to produce monthly estimates. These were the extent of revisions
in monthly estimates and the difference between early estimates and the estimates at
lag 18.
These measures were constructed as follows. Let , ,h s tT be the estimate of monthly
turnover at reference month s constructed in month t (with t s , so the estimate is
lag t s ) for series h , where the series is identified with a specific combination of
industry, prior adjustment, method (extrapolation or interpolation), and method order.
The following revision measures were calculated for each series, each measure is an
average value over the length, N , of the series (the length is the same for all series).
1a. Mean revision in level at lag L, , , , , 1
1h t t L h t t L
t
T TN
, which is the average
over all reference months, t , of the difference between the level in the reference
month measured at month t L and the level measured one month earlier. This is
equal to , , , , 1
1 1h t t L h t t L
t t
T TN N
, the difference between the average level
measured at month t L , and the average level measured one month earlier. This is
measured in pounds sterling.
11
1b. Mean absolute revision in level, , , , , 1
1h t t L h t t L
t
T TN
. The measure in 1a. may
be affected by cancellation, the absolute measure will not be. This is measured in
pounds sterling.
1c. Mean revision in monthly growth, , , , , 1
1h t t L h t t L
t
G GN
, where
, , , 1, 1
, ,
, 1, 1
100h t t L h t t L
h t t L
h t t L
T TG
T
. This is measured in percentage points (p.p.)
1d. Mean absolute revision in monthly growth, , , , , 1
1h t t L h t t L
t
G GN
. This is
measured in p.p.
2a. Ratio of root mean square difference to average turnover level, at lag L
( 0,...,17L ),
2
, , , , 18
, , 18
h t t L h t t
th t t
t
NT T
T
. This is a measure of the average difference
between the estimates at lag L and the estimates at lag 18. This measure has no units.
2b. Ratio of root mean square difference in growth to average growth in turnover,
2
, , , , 18
, , 18
h t t L h t t
th t t
t
NG G
G
. This measure has no units.
3.1.3. Results
Month on month revisions to monthly estimates – The method winters/simple with
the order interpolation then extrapolation appeared to be slightly better than other
choices.
Mean absolute revisions in growth for the better methods – Of those
combinations of interpolation and extrapolation methods tested, the four
combinations below performed the best with regards to mean absolute revisions to
growth, in percentage points (p.p.), at lags 1 to 4 months:
A. Extrapolate using ARIMA then interpolate using simple
B. Extrapolate using Winters then interpolate using simple
C. Interpolate using simple then extrapolate using ARIMA
D. Interpolate using simple then extrapolate using Winters
It is clear that there is no overall best method, in terms of month on month revisions,
for all industries and all lags. However, simple interpolation followed by Winters
extrapolation is a consistently good performer for lags 2 to 4 (with notable exceptions
for industries SIC 521 and 527, where the ARIMA methods outperform it). Figure5
demonstrates the results for SIC 029.
12
Figure 5. Mean absolute revisions to growth in SIC29 by a) extrapolating using
ARIMA then interpolating using simple b) extrapolating using Winters then
interpolating using simple c) interpolating using simple then extrapolating using
ARIMA and d) interpolating using simple then extrapolate using Winters
I ndust r y SI C29
met hod A = ei _st epar _si mpl e B = ei _wi nt er s_si mpl e
C = i e_st epar _si mpl e D = i e_wi nt er s_si mpl e
Revi si ons p. p.
0
1
2
3
4
Met hod
Lag1 2 3 4
A B C D A B C D A B C D A B C D
Note that for revisions at lag 1 the situation is different, with Winters extrapolation
followed by simple interpolation being the best performer, and by a large margin in
most cases.
Difference between estimates at each lag and estimate at lag 18 – No method
consistently outperformed the others in respect of the difference in estimates at each
lag and the estimate at lag 18. The method of simple extrapolation has huge
differences compared to the other methods and so is not recommended. However the
difference is small between the ARIMA and Winters extrapolation, with both simple
and spline interpolation methods.
3.1.4. Conclusions
For growths, none of the methods examined was better than any other in terms of
these measures of revisions. Each of the methods exhibited extreme poor performance
in some cases.
For levels one method consistently outperformed others, simple interpolation by
division followed by extrapolation by Winters. This method, while not superior in
every case, was consistently good and did not exhibit any extremes of poor
performance.
3.2. Producing Short-term Turnover Estimates using VAT-Sourced Turnover
This work focuses on comparing survey-based and VAT based turnover estimates for
the UK‟s short term surveys and establishing a relationship between the two. It
examines whether VAT-sourced turnover is able to produce estimates of sufficient
comparability with survey-based estimates and addresses potential effects of
13
periodicity (an aspect of timeliness) associated with non-monthly VAT reporters.
These are important issues to address in understanding whether the VAT-sourced
turnover can be used as a replacement for survey-sourced data; and help to build
strong foundations to assess whether t+30 day VAT-sourced data can be used on its
own; or in conjunction with a survey component.
3.2.1. Methods Tested
Survey-based and VAT-based estimates were produced for ONS‟ three main STS
surveys (MPI, MIDSS, and RSI). For each of these surveys, totals were calculated at
the overall survey level on a month by month basis for 2007/08. In addition, month by
month totals were also calculated at the NACE Rev 1.1 2-digit level for MPI/MIDSS
and the NACE 3-digit level for RSI. Due to the differences between the VAT
turnover data and the survey turnover data (definitional differences, periodicity, and
universe coverage), five methods of constructing VAT estimates, hjV̂ , were tested.
These were:
Method 1: Census – assumes VAT universe covers the target population AND that
VAT turnover represents true turnover. This is not strictly true (see Orchard, 2010)
due to under-coverage of small businesses, but was thought to be a useful basis for
later comparisons.
hi
ihjhj
vV^
where ihjv represents each individual reporting unit i within the survey-specific VAT
universe in NACE h and period j.
Method 2: Expansion – assumes the survey universe covers the target population but
that VAT turnover still represents true turnover.
hi
ihjhjhjvaV
^
where ahj is the survey design weight (a-weight) derived by comparing the VAT
universe (where every unit has a VAT turnover value) against the survey universe.
The a-weight can be represented as:
hj
hj
aN
N
VU
SUhj
where hjNSU
is the total survey universe population within a survey stratum and
hjNVU
is the total VAT universe population within a survey stratum for period j.
Method 3: Ratio – as expansion but with an additional calibration constraint using
population turnover as an auxiliary variable. Ratio estimates generally improve upon
the estimates produced by an expansion estimator, providing the auxiliary variable
correlates well with the variable of interest.
14
Method 4: Univariate modelling – assumes a simple ratio between total survey
turnover at a stratum or NACE level; and total VAT turnover for the same domain.
Whereas the ratio estimator (Method 3) applies a weight derived from an auxiliary
variable to either the VAT or survey returned turnover, a simple ratio approach looks
at the relationship between survey returned monthly turnover and VAT-derived
monthly turnover. As such, it is the simplest form of a linear regression model. This
can be explained as:
hjhjhj VY^^
where hjY^
is the monthly turnover survey estimate within stratum h for period j,
hjV^
is the monthly VAT turnover estimate for the stratum h and period j, and hj is
the ratio (constant) between the survey and VAT estimates for stratum h and period j.
Two options exist for applying a ratio model to the VAT data with the aim of
producing survey-like VAT turnover estimates. These are fitting a micro-data level
model based on those observations where both survey-sourced turnover and VAT-
sourced turnover is available for the survey period; or fitting a straightforward ratio
model at the macro (aggregate) level. Both approaches were considered but it was felt
that the macro approach would be most suitable since it was the simplest to
implement. This is in contrast to the more complex micro level multivariate approach
taken in the next section.
Method 5: Multivariate modelling – assumes that survey turnover can be estimated
for each business in the VAT universe using VAT and a vector of other known
continuous and discrete auxiliary variables held on the IDBR, X .
1 2 ijij j j ij jY V X
where ijY is the survey returned turnover for unit i within period j, j is the intercept
within period j, 1 j is the regression coefficient for the VAT turnover for unit i in
period j, 2 j is a vector of regression coefficients.
There is a possibility that the multivariate regression model may be subject to a
positive non-response bias. Positive bias might occur because the model will assign a
value to an observation (reporting unit), whether or not it would have returned a non-
zero value in reality. To adjust for the possibility of business having a zero return, a
logistic model was developed to correct for this bias.
3.2.2. Measures of Performance
These five methods were compared against the survey based estimates, in terms of
growth and levels, using estimated relative error (RE) and absolute relative error
(ARE).
Estimated relative error for NACE h is REh =
hj
hjhj
y
vy
^
^^
)(
15
where hj
y^
is the survey turnover estimate, hjv^
is the predicted VAT-derived turnover
and j is the monthly indicator j =1,….24. To produce the mean estimated relative
errors (MRE) for each sector, the average of REh over all NACE classifications within
the sector were taken.
Estimated absolute relative error is AREh =
j
hj
j
hjhj
y
vy
^
^^
||
Again to produce the mean estimated absolute relative errors (MARE) for each sector,
the average of AREh over all NACE classifications within the sector were taken.
These indicators have been previous used to assess the effectiveness of VAT data to
predict RSI turnover for editing and imputation purposes within ONS (Lewis, 2009).
Although there are no definitive rules for an acceptable level of error between the
VAT-sourced and survey-sourced estimate of turnover, a value of <±0.1 (±10%) for
the MRE and MARE will be assumed for this report. This effectively means that for
survey-like VAT turnover estimates to be acceptable, not only must any methodology
produce estimates with little or no bias (as represented by MRE), but it must also be
able to produce estimates that are reasonably accurate (as represented by MARE).
3.2.3 Results
Applying census, expansion and ratio estimator approaches (Methods 1, 2, and 3)
resulted in turnover estimates where the levels and growth were not comparable to
those seen in the survey-derived turnover estimates.
Applying a simple ratio (Method 4) between the VAT derived turnover and survey-
derived turnover at an aggregate (stratum or NACE level) showed improvement, with
levels being adequately accounted for in a number of NACE classifications,
especially production and services sectors (Table 1).
Table 1. Results of the comparison at sector level between monthly turnover
estimates derived from MPI and MIDSS surveys and monthly VAT-derived turnover
estimates derived by simple ratio at stratum level and simple ratio at NACE 2-digit
level. Green indicates where the relative error is less than 10% between the survey
and VAT-based estimates.
Sector
MRE Ratio
@Stratum
MRE Ratio
@NACE
MARE Ratio
@Stratum
MARE Ratio
@NACE
Production 0.05 0.03 0.21 0.18
Services 0.01 0.00 0.14 0.10
Unfortunately, growth comparable to that seen in survey estimates could not be
reproduced. This was thought to be due to the failure of this model to replicate the
seasonal pattern of the monthly survey data. Figure 6 illustrates the performance of
this model at stratum and NACE division 19.
Taking a multivariate approach using additional information available (Method 5) did
not improve on this either. More detailed results can be found in the UKs ESSnet
internal report (Orchard, 2010).
16
Figure 6. Monthly estimates of turnover for 2-digit NACE classifications covering
the UK production sector, specifically monthly turnover for NACE division 19 using
survey-sourced turnover ( ), VAT-sourced turnover assuming a simple ratio at the
stratum level ( ), and VAT-sourced turnover assuming a simple ratio at the NACE
2-digit level ( ).
3.2.4. Conclusions
From the analysis carried out, all methods had difficultly in producing survey-like
VAT turnover comparable to survey estimates. The simple ratio approach seemed to
perform the best being able to account for levels, but did not give good estimates of
growth in the majority of cases.
Since we do not really know whether the survey estimate or VAT estimates reflect the
true population, but make the assumption that the survey estimate does, the
conclusion of this report is that VAT alone cannot be currently used to replace
turnover estimates for the majority of NACE Rev 1.1 classifications. A survey
component must therefore be retained if changes in growth are to be adequately
accounted for.
4. Comparison with other countries
The current methodology applied within the UK, and the methodology being
developed/implemented under ESSnet WP4, compare favourably with other European
countries in the WP. This is in addition to the UKs current model for producing
composite VAT-survey estimates being identical in composition (although not detail)
to that in use or under development by DESTATIS and Statistics Netherlands.
Under ESSnet WP4 Statistics Lithuania are planning to develop and implement a
generalised regression (GREG)-type estimator to reduce their overall sample size. A
form of this, the ratio estimator, is already employed within the UK with more
sophisticated GREG-type estimators examined for implementation in a number of UK
surveys (Hedlin et al., 2001).
17
Under ESSnet WP4, ONS have developed an interpolation and extrapolation method
for accounting for missing (not yet received) VAT turnover. This method is a viable
alternative to imputation and modelling approaches used by other countries within the
WP (SN, DESTATIS, ISTAT), especially where little or no data are available.
Interpolation and extrapolation methodologies are also established common practice
in non-European countries, being employed by both Statistics Canada (Yung & Lys,
2008) and Statistics New Zealand.
Both DESTATIS & ISTAT currently use imputation based methodology to account
for timeliness issues associated with their data (although methodology does vary).
Imputation methodology similar to these, are established in ONS, giving the UK a
capability to assess DESTATIS/ISTAT imputation methodology. This can be
compared against the effectiveness of interpolation and extrapolation.
Finally, there seems to be some accordance with revisions patterns of VAT data
across countries, irrespective of methodologies used to address timeliness. Both the
UK and Germany see that estimates produced between the first vintage and final
estimate show a characteristic arc in their performance.
5. Conclusions
As part of this ESSNET work on the Timeliness of administrative sources the UK has
investigated the use of interpolation and extrapolation to counter the effect of
incomplete data, and compared survey-based turnover estimates from the UK STS
with their VAT-based counterparts.
The investigation found that simple interpolation by division followed by
extrapolation by Winters was a promising method of addressing the effect of
incomplete data. Modelling turnover data for NACE categories from VAT data
however was much more difficult. None of the methods examined were able to model
levels and growth satisfactorily. The „best‟ method involved modelling turnover as a
simple multiplicative factor of the VAT turnover for that NACE category.
The interpolation/extrapolation method examined is an established methodology in
non-EU countries (Canada, New Zealand) and could form a useful basis for
comparison with the other main approach, imputation, that DESTATIS/ISTAT
currently use.
Methodology has converged with the general consensus that although there are a
number of ways of solving a problem, those ways are limited to specific techniques.
The testing of these various techniques under different countries conditions can now
go ahead, enabling an adaptive best practice to be developed suitable to each countries
situation
6. Further work
For the next SGA, the UK proposes to build on the conclusions found via SGA 2010
and to extend the work in order to better account for growth patterns in the current
UK STS estimates. This is likely to involve examining the effect of seasonality in
VAT returns. We propose to investigate a survey-VAT hybrid model where a survey
component is retained for the larger businesses, and VAT data are used for small to
medium sized businesses (Figure 7). This approach for incorporating VAT turnover is
already in use by other European countries including the Netherlands, Germany and
Italy.
18
The UK proposes to explore further the use of the interpolation and extrapolation
methodology, focusing on the issue of timeliness. It is hoped that the work would
provide general guidance on best practice when using this methodology. In particular,
the work will examine for various time lags following the reporting period, methods
of determining the optimum survey proportion (i.e. the positioning of the A/B
boundary in Figure 7. This has the added benefit of testing the SGA 2010 conclusions
without the influence of the larger businesses.
Once an estimate for the survey and VAT strata are obtained there will need to be
some thought on how to combine the two estimates to compare with the current
estimate from STS.
Figure 7: Construction of a survey-VAT hybrid model for short-term turnover
estimation
A
B
Bu
sin
ess s
ize
by e
mp
loym
en
t LA
RG
EM
ED
IUM
SM
AL
L
SURVEY
UNIVERSE
VAT
SURVEY
Data available
A - Larger businesses
estimated by survey
techniques
B – Medium/Small
businesses estimated from
VAT data using the
interpolation/extrapolation
method.
In addition to detailed work on the interpolation/extrapolation methodology the UK
would also welcome involvement in
testing the micro-imputation(DESTATIS) and modeling methodologies
(Statistics Netherlands) on UK data to compare the performance of the
imputation and extrapolation methodology
assisting in the overall comparison of the main methodologies in producing a
final estimate ready for publication
Acknowledgement
The UK would like to acknowledge the advice of all countries participating in ESSnet
WP4 and Statistics Netherlands in particular for their coordination of meetings and
discussions.
19
References
European Statistical Service guidelines on seasonal adjustment (2009).
http://epp.eurostat.ec.europa.eu/portal/page/portal/product_details/publication?p_prod
uct_code=KS-RA-09-006
Hargreaves L (2009). An investigation of the potential benefits to ONS of receiving
additional HMRC VAT data. ONS. Internal Report.
Hedlin D, Falvey H, Chambers R, Kokic P (2001). Does the model matter for
GREG estimation? A business survey example. Journal of Official Statistics. 17(4):
527-544.
Lewis D (2009). Using tax data to assist with editing and imputation. ONS. Internal
Report.
Lorenz R (2010). Estimations in the VAT data for STS in Germany. Federal
Statistical Office of Germany. ESSnet WP4 Internal Report.
Orchard CB & James G (2009). The use of VAT turnover data in short term surveys
within the UK. ONS. ESSnet WP4 Internal Report.
Orchard CB (2010). File preparation, data matching, and distribution analysis of
VAT turnover. ONS. ESSnet WP4 Internal Report.
Parkin N (2010). Extrapolation and interpolation of value added tax returns. ONS.
ESSnet WP4 Internal Report.
Pring P (2008). Summary quality report for the monthly production inquiry survey.
ONS. ONS External Report. http://www.ons.gov.uk/about-statistics/methodology-
and-quality/quality/qual-info-economic-social-and-bus-stats/quality-reports-for-
business-statistics/index.html
Yung W & Lys P (2008). Use of administrative data in statistics Canada‟s business
surveys – the way forward. Statistics Canada. Internal Report.
20
Appendix 1
Comparison between UK and NL: Use of Admin data for STS purposes.
Topic Netherlands UK
Variable Turnover Turnover
Reference period Month Month
Release date m+30 days m+30 days
Data available on
reference period
in time for release
date
Survey on largest enterprises (top
1900) (LEs) + non random sample
of admin data of SME
Survey on largest enterprises
(>100-250 employees
depending on industry) +
sample of medium and small
businesses.
Auxiliary data Survey on largest enterprises (top
1900) + population of admin data
of SMEs for the last closed quarter
(and previous ones)
Annualised VAT turnover data
for for the previous year,
together with Annual turnover
from the Annual Business
Survey for the largest
businesses.
Main Solution Macro approach
Modelling the relationship
between estimates based on LEs
(survey) and SMEs (admin)
Ratio estimation using the
annualised VAT turnover is
used to reduce variance and
thus overall sampling cf. an
expansion estimator approach.
Fundamental
assumption
Difference between growth rates
of month m can be proxied by the
one in the last closed quarter
VAT data is similar enough to
survey data that it can be used
as an auxiliary. Relies on good
correlation between turnover
sources.
Main issues In those activities where:
Seasonal differences in growth
rates
LEs are few or scarcely
representative
Differences in growth rates are
erratic
Minimal use of timely VAT,
annualised data used only as
auxiliary.
Necessary
adjustments
Refinement of the model
Add survey on SMEs
Drop publication
Use of VAT data for small and
medium businesses
Forecasting or imputation for
missing data. Modelling of
VAT data into survey-like data.
Revisions It is possible to revise the
estimates of three months of
quarter q at q+45 days (planned?)
Final estimate at m+90 days
Possible
improvements/
Suggestions
Use of temporal disaggregation
technique (need longer time series)
Optimise survey bound
dependent upon quality of VAT
available at time t.
Common issues Revision error as a quality indicator
Conversion of quarterly data to monthly data
21