se of administrative and accounts data in ... 2010...where x is the auxiliary variable chosen in...

1

ESSNET

USE OF ADMINISTRATIVE AND ACCOUNTS DATA IN

BUSINESS STATISTICS

WP4

TIMELINESS OF ADMINISTRATIVE SOURCES FOR MONTHLY AND

QUARTERLY ESTIMATES

DELIVERABLE 4.2 – SGA-2010

PRACTICES FOR USING VAT TURNOVER DATA WITHIN THE UK TO

PRODUCE ESTIMATES OF GROWTH AND MONTHLY TURNOVER

Craig Orchard a, Kevin Moore, Ann Langford

a Corresponding author: Methodologist, Methodology Directorate

Cardiff Road, Newport, South Wales

NP10 8XG Office for National Statistics

[email protected]

Telephone (044) 1633 455755

2

Summary

This report forms the UK‟s contribution for SGA 2010 to establishing best practice in

the use of administratively sourced data across Europe, specifically ESSnet Work

Package (WP) 4 „The Timeliness of Administrative Sources for Monthly and

Quarterly Estimates‟. It gives details of the current national practices for the use of

administrative data in UK short-term business statistics; the work that the UK has

been carrying out in SGA 2010 to improve best practice in the use of data sourced

from Value Added Tax(VAT) turnover data (reduce or replace data collected through

its current short-term business surveys); and how the UK intends to build upon its

current knowledge, and that shared within the work package, to develop European

best practice and apply it to the UK.

At the moment, UK short-term business estimates are produced using data from

monthly surveys with a combined sample size of circa 40000. These surveys

incorporate annualised VAT data in the estimation process for calibration purposes,

helping to reduce overall sample size cf. expansion estimators. Since this

methodology was established, the Office for National Statistics (ONS) has gained

improved access to returned VAT turnover, allowing the possibility of replacing some

or all survey data with this source. As with the majority of other European countries,

two main problems exist with the monthly VAT datastream. The first is that not all

data are available following a period to produce an initial estimate by 30 days (as

directed by EU regulations). The second being that VAT data is either reported

monthly, quarterly, or yearly; resulting in some data being unavailable due to

periodicity of reporting. Both of these are aspects of timeliness.

Under SGA 2010, ONS agreed to address the issue of timeliness by investigating the

use of interpolation and extrapolation to align the VAT data stream with the survey

period. This has resulted in a time series based method to address timeliness where

little or no VAT data are available to produce first (or subsequent) estimates. In

addition, ONS has also compared survey-based and VAT-based turnover estimates

within the UK. This work concluded that the best practice of using VAT turnover data

within the UK would be to retain a survey of the largest enterprises and use VAT

turnover, possibly in conjunction with a smaller survey, for medium and smaller

enterprises.

The UK will now seek to implement this model in SGA2011/12/13, ensuring the

methodology is shared across the work package, and that the approaches and best

practices other European countries such as macro-imputation (DESTATIS) and

modelling (Statistics Netherlands) are effectively tested and incorporated, where

possible.

Keywords: expansion estimator, extrapolation, interpolation, Survey-based turnover,

VAT-based turnover, UK best practice

3

1. Introduction

The UK currently produces short term turnover estimates by stratified simple random

sampling where employment is used as the stratifying variable, together with NACE

classification. Annual turnover derived mainly from VAT is used as an auxiliary

variable in the survey estimation procedure (i.e. ratio estimation). At the moment, this

turnover data are annualised values held on the UKs Inter-Departmental Business

Register (IDBR). No use is made specifically of monthly administrative data returns.

As part of the UKs contribution to the ESSnet Work Package on the timeliness of

administrative data for short term business (WP4) statistic, the UK has been

investigating the use of VAT turnover information in the context of timeliness to

supplement or replace survey estimates. Timeliness, as defined in WP4, has been

divided into the ability of a National Statistics Institute (NSI) to produce t+30 day

estimates from administrative data in time for its Eurostat obligations; and the effect

of periodicity in administrative data due to quarterly reporting in producing timely

monthly estimates when monthly data are unavailable.

To investigate timeliness of the VAT turnover for use in short term business statistics,

the UK has focused on two main areas, corresponding to the two internal reports that

have been submitted as part of the ESSNET project. Specifically the two projects

examined

1. The use of interpolation and extrapolation to counter the effect of incomplete data

(both due to data not yet being received and the reporting of data in a quarterly

datastream).

2. A comparison of survey-based and VAT-based turnover estimates within the UK.

These investigations can be related to box 2 (nowcasting) and box 3b (modelling

VAT) of the hierarchical tree presented in WP4s current SGA agreement; the scenario

where administrative data do not cover the enterprise population when the STS

estimates have to be made.

The remainder of this report gives an overview of UK practices on the use of

administrative data in Short-term Business Statistics (Section 2) and outlines the two

projects undertaken and their results (Section 3). The report concludes with a

comparison with other countries and a description of suggested future work.

2. Description of the National Practice

2.1. Current Methods of Estimating Turnover in Short-Term Statistics

At present in the UK two surveys are used to estimate turnover on a monthly basis;

each covering a different parts of the economy:

Monthly Business Inquiry (MBS) – This covers manufacturing and other

production industries, as well as covering the service sector industries (not

retail). As such, it contributes to the monthly Index of Production (IoP),

the monthly Index of Services (IoS), and quarterly Gross Domestic

Product (GDP; Output approach). The sample size of MBS is

approximately 33000. MBS was introduced in 2010, and is a combination

of two earlier surveys, MPI (Monthly Production Inquiry) and MIDSS

(Monthly Inquiries into Distribution and Services Sector). These two

surveys have been analysed separately in this report.

4

Retail Sales Inquiry (RSI) - covers only the retail sector and forms the

monthly Retail Sales Index. It contributes to the monthly Index of Services

and quarterly GDP(O). The sample size of RSI is approximately 5000.

All the surveys collect turnover information, though the definition of turnover varies a

little from industry-to-industry. For example, for travel agents, 'sales on own account'

and 'commission' are collected; in RSI, 'retail sales' is collected instead of 'total

turnover'.

MBS collects data for a calendar month ('January 1st-31st', etc.) and the main survey

outputs are estimates of total monthly turnover. For RSI, the data collection period is

based on a set number of weeks (following a 4-4-5 week pattern), and the main output

is total retail sales for an average week in the collection period.

Stratification and sampling – The sampling frame for all the surveys is based on the

IDBR, the Inter-departmental Business Register. This Register is held by ONS and

updated by administrative data from HRMC, PAYE and annual surveys. Both MBS

and RSI are stratified by a cross-classification of SIC and employment size band.

SICs are grouped together in various ways (not necessarily following the

hierarchical structure of the SIC), and not all SICs within, for example,

manufacturing, fall in-scope of the surveys (sometimes external sources of

data are used in the IoP and IoS). Broadly, the level of industrial

stratification in manufacturing and retail is more detailed than in services.

Frozen1 SIC is used for stratification.

The frozen employment of a business (reporting unit) on the IDBR is used

to assign it to a size-band. In general, four size bands are used within each

SIC group, although different sets of size bands can be used in different

SIC groups. There are about eight different sets of size bands in use across

the surveys. Businesses in the largest size band are completely

enumerated; those in all other sizebands are sampled.

A fifth band is also created in MIDSS and RSI, comprising businesses

with employment between 10 and the cut-off of band 4 (i.e. medium-sized

businesses) that have IDBR frozen turnover in excess of £40 m - these

businesses are also completely enumerated.

Rotational sampling (using Permanent Random Numbers) is employed in the sampled

strata. For businesses in these strata, the expected time in the sample is either 15 or 27

months.

Imputation – A complete data set is constructed to cover all the sampled RUs

(excluding any deaths, out-of-scope, etc.) by imputing for non-response. For non-

responding businesses that were in the sample in the previous month, an imputed

value for the current month is calculated by applying the average growth (of

responders in the same imputation class) to the RU's value for last month; for new-to-

sample businesses, a value is constructed in a similar way, but using the relationship

with IDBR data instead.

1 Two versions of SIC, employment and turnover are available on the IDBR: frozen and current. The current fields

may be updated during the year, whereas the frozen fields take on the value of the current field at the end of the

year (i.e. late December) and then remain fixed throughout the year. An exception is frozen SIC which will be

updated during the year if an RU's SIC code changes at the 2-digit level. The frozen fields are used to stratify the

short-term surveys to give greater stability.

5

Outlier Detection – outliers are detected and treated using one-sided Winsorisation in

MBS. One-sided Winsorisation reduces outliers with high values to below a given

threshold. This reduces the effect of outliers that would otherwise have a significant

impact on survey estimates. For RSI an alternative method is applied.

Estimation – Estimation is carried out by employing calibration estimation (known as

ratio estimation). In some SICs, strata form the calibration groups (separate ratio

estimation), whereas in other SICs, the sampled size bands as one (i.e. bands 1-3

together) form the calibration groups (combined ratio estimation). In all cases, the

largest size band is kept separate, and since it is completely enumerated and

imputation is used, no weighting is applied.

The formula for deriving total turnover at the stratum level in sampled size bands can

be represented by:

hi

ihhh ygaY^

Where ah is the design weight from the expansion (Horvitz-Thomson) estimator

component within the stratified population, gh is the ratio estimator component using

an auxiliary variable, and yi is returned turnover for a business. The formulae used for

deriving a- and g-weights for a separate ratio estimator are given below:

h

hh

n

Na

hsi

i

h

h

hi

i

h

xn

N

x

g

Where x is the auxiliary variable chosen in order to calibrate the estimate.

For UK STS turnover estimates the auxiliary variable is the annualised turnover held

on the IDBR (Pring P, 2008).

Construction of short-terms output indices – The estimates of turnover from MPI,

MIDSS (now MBS) and RSI are then used in the construction of the Index of

Production, Index of Services and Retail Sales Index, (and, ultimately, the estimate of

GDP using the Output approach. The processes involved in constructing these series

vary somewhat, but the following are usually applied in construction of the final,

published series:

deflation (division of the turnover series by price relatives or other deflators)

seasonal adjustment

aggregation using „gross value added‟(GVA) weights

annual chain-linking

2.2. Challenges and Opportunities of VAT-Sourced Turnover

This section outlines the two main problems with the timeliness of VAT data in the

UK – namely the time after the end of a period when useable quality data is received,

and the effect of monthly and quarterly “staggered” returns.

The ONS currently receives several datasets from HM Revenue & Customs (HMRC)

under Section 91 of the Value Added Tax Act 1994. The primary use of these datasets

is to keep the IDBR up-to-date. Those datasets that are received daily, twice-monthly

and biennially contain no actual turnover data but are instead concerned with

6

maintaining the IDBR. They provide information on business births/deaths and on

changes in details such as postal addresses and contact names.

There are two types of datasets supplied by HMRC containing VAT turnover

information, one received monthly, and one received quarterly. The monthly dataset

contains raw turnover data, as it appears on the VAT forms that businesses return to

HMRC. The quarterly dataset contains estimated annual turnover figures for a twelve

month rolling period (i.e. January 2008 to December 2008 inclusive) based on returns

for the previous four quarters. In addition to the VAT turnover variable, the datasets

are supplied with VAT reference numbers (unique business identifiers), the VAT

period (the month in which the VAT returns were due) and stagger. The stagger

denotes the month in which businesses are expected to submit their VAT returns.

Different businesses are required to submit returns with different frequencies; some

monthly, some quarterly, and some yearly. Approximately 90% of businesses submit

returns on a quarterly basis, 10% on a monthly basis, and a very small amount on a

yearly basis. Businesses are required to register for VAT when the value of their

taxable supplies is expected to exceed £67000 over a twelve month or less period.

Very small businesses (with total turnover excluding VAT below £600,000) may

apply to submit returns annually.

The main limitation on the use of VAT turnover data is its timeliness. The European

Statistical Service (ESS) refers to timeliness as the lapse of time between publication

and the period to which the data refer. In the case of VAT turnover, timeliness may be

better described as the lapse in time between receiving data of useable quality and the

period it refers to. At present, HMRC receives 90% of returns 35 days after the end of

the VAT period with 100% of returns received 118 days after the end of the period.

By examining the proportion of turnover received, 94% of turnover had been returned

by 40 days after the end of the VAT period with only 40% of turnover returned by the

30th

day after the end of the VAT period. (Figure 1)

Figure 1. Timeliness of VAT declarations to HMRC each month by total VAT

Turnover expressed as an accumulative percentage. Data shown are representative of

returns made in 2009. Taken from Hargreaves, 2009.

7

Figure 2. Timeliness of VAT declarations to ONS by total VAT Turnover for

monthly (black line) and quarterly staggers (blue line) expressed as an accumulative

percentage. The figure indicates that in the quarterly datastreams, larger businesses

leave reporting their turnover until after 30 days from the period AND that only 20%

is reported in the first period.

MPI, MIDSS (now MBS) and RSI all supply survey data to their users within 15-21

working days following the month to which the survey data relates (for MPI and

MIDSS, these users are IoP and IoS respectively). The HMRC VAT turnover data

cannot currently be supplied as quickly as the survey data. If HMRC VAT turnover

data were to be supplied as quickly as survey data, it would contain less than 10% of

turnover by proportion (see Fig 1). This would result in the VAT turnover data being

more timely but may affect accuracy. It is also likely that the HMRC VAT turnover

data supplied will not include those businesses that would otherwise have been fully

enumerated by survey sampling; and therefore those businesses that contribute the

most to turnover by proportion.

In addition, the timeliness of the estimates based on HMRC VAT turnover data may

be affected due to the staggered returns. Staggering results in different businesses

returning their VAT turnover data in different periods. For instance, in the UK,

quarterly reporters are allowed to report in three „staggers‟. Reporters may report for

quarters ending March, June, September, and December (stagger 1); or for quarters

ending April, July, October and January (stagger 2); or for quarters ending May,

August, November and February (stagger 3). Monthly reporters are referred to as

stagger 0. Figure 3 demonstrates the availability of data for the production of

estimates for June. Most European countries that operate only on „financial‟ quarters

will only have stagger 0 and stagger 1.

8

Figure 3. Distribution and completeness of monthly (stagger 0) and quarterly

(staggers 1-3) reporting periods for producing June estimates. Yearly staggers have

been excluded due to the minimal turnover reported

Stagger Oct Nov Dec Jan Feb Mar Apr May Jun

0 M10 M11 M12 M1 M2 M3 M4 M5 A

1 B

2 C

3 D

allmost full data available

partial data available

Q2Q1

Q1Q4

Q4 Q1

3. Results for SGA 2010

Under SGA 2010 the UK proposed to investigate two aspects of the use of

administrative data for short term business statistics. The first of these was to address

the issue of timeliness (both due to data relating to the period not yet being received

and the reporting of data in a quarterly datastream) by developing a method of

interpolation and extrapolation. The second was to compare survey-based and VAT-

based turnover estimates within the UK. The results of both of these are discussed,

with full details of the work undertaken found in the appropriate internal reports

(Parkin, 2010; Orchard et al., 2010).

3.1. Interpolation and Extrapolation from Value Added Tax Returns

The aim of the work was to compare the suitability of the different interpolation and

extrapolation methods to combine monthly and quarterly series to obtain timely

estimates of aggregate monthly turnover.

The different methods of producing monthly estimates were compared by measuring:

the size of revisions, both of the levels and growth; the difference between initial

estimates and the estimate at 18 months after the reference period.

To ensure that industries representative of all sectors were included in the analysis,

two NACE Rev 1.1 2-digit divisions were analysed for the services sector, two for the

manufacturing sector, and two NACE 3-digit classifications for the retail sector.

These were NACE 29, 45, 51, 52.1, and 74; chosen due to previous work carried out

on this NACE classification by the UK and Statistics Netherlands. In total, 96 datasets

were created, one for each month from December 2001 through to December 2009.

In general, for each industry, data were identified as being seasonal. In addition, it

was identified that the initial monthly data received into ONS from HMRC are not

good approximations of the accumulative data received in subsequent months, either

in terms of levels or monthly growth. The data also contains outliers that should be

treated appropriately before being included in the estimation.

3.1.1. Methodology of Calculating Monthly Estimates of Turnover

The processing of monthly series is identical to the processing of quarterly series,

except that interpolation is not done for the monthly series. The series were processed

in three stages:

9

Stage A – Calculation of outliers and their adjustment factors

Outliers were identified and prior adjustments calculated using the program X-12-

ARIMA (United States Census Bureau) for each series. Details of how this was done

can be found in appendix 1 of the UK WP4 Internal Report (Parkin, 2010).

Stage B – Calculation of Monthly Estimates for Monthly and Quarterly Series

Stage B1 – adjusting for outliers – If required, adjustments were made by

dividing turnover by the appropriate adjustment factor, as calculated in stage

A.

Stages B2 and B3: Interpolation and extrapolation –

The following methods of interpolation were tested:

I1. Simple – allocating a third of the total for the quarter to each month

in the quarter

I2. Spline – allocation using a cubic spline.

The following methods of extrapolation were tested:

E1. Simple – applying the growth between the last two periods to the

last level.

E2. Winters – fits a model with seasonal factors and either a linear or

quadratic trend.

E3. Univariate ARIMA – fits an ARIMA model.

The SAS procedure PROC EXPAND was used to interpolate using cubic splines.

This procedure calculates the spline so that its integral over a quarter is equal to the

total turnover in the quarter. Two different end point constraints were used: the first

(the default in SAS) causes the first two splines, at both ends of the series, to be part

of the same cubic; the second causes the second derivative of the cubic curves, at

both ends of the series, to be zero. There was little difference in the results from these

two end point conditions, and only the second is reported in the results section below.

The SAS procedure PROC FORECAST was used to extrapolate using the Winters

and the univariate ARIMA methods. Every series was extrapolated four periods

ahead, though not all of the forecasts were used. The Winters method has a choice of

trend, both the linear and quadratic trend were tried. The quadratic trend was found to

give poor results, so only the method with linear trend is reported in the results

section below.

Stage C – Combining Series to Produce Monthly Estimate

At stage C the monthly and quarterly datastreams for each series were combined to

give an estimate for each reference month and for each lag in the range 0 to 18

10

months. The result of this was over 18,000 series of monthly turnover estimates; each

series defined by industry, reporting period, whether prior adjustment had occurred,

the interpolation method, the extrapolation method, and order of

extrapolation/interpolation.

An example to illustrate the production of an estimate in month 7 is given

below.(Figure 4) “D” represents actual data available (note that in this example data

is deemed available two months after the end of the period) “o” represents

interpolated data, “x” represents extrapolated data.

Figure 4: Illustrative Example

Month 1 2 3 4 5 6 7

Monthly D D D D D x x

Quarterly

(Stagger 1)

o D o o D x x

Quarterly

(Stagger 2)

D o o D x x x

Quarterly

(Stagger 3)

o o D x x x x

Thus in month 7 the estimate for month 6 is constructed by the addition of all the

components in column 6 and will all be based on extrapolations. The estimate for

month 5 on the other hand will be based partly on actual data and partly on

extrapolation. This example demonstrates the UK position, but can be easily be

adapted to the more typical European situation where there may only be one quarterly

stagger.

3.1.2. Measurement of Performance

Two measures of performance were used to assess the interpolation and extrapolation

methodology used to produce monthly estimates. These were the extent of revisions

in monthly estimates and the difference between early estimates and the estimates at

lag 18.

These measures were constructed as follows. Let , ,h s tT be the estimate of monthly

turnover at reference month s constructed in month t (with t s , so the estimate is

lag t s ) for series h , where the series is identified with a specific combination of

industry, prior adjustment, method (extrapolation or interpolation), and method order.

The following revision measures were calculated for each series, each measure is an

average value over the length, N , of the series (the length is the same for all series).

1a. Mean revision in level at lag L, , , , , 1

1h t t L h t t L

t

T TN

, which is the average

over all reference months, t , of the difference between the level in the reference

month measured at month t L and the level measured one month earlier. This is

equal to , , , , 1

1 1h t t L h t t L

t t

T TN N

, the difference between the average level

measured at month t L , and the average level measured one month earlier. This is

measured in pounds sterling.

11

1b. Mean absolute revision in level, , , , , 1

1h t t L h t t L

t

T TN

. The measure in 1a. may

be affected by cancellation, the absolute measure will not be. This is measured in

pounds sterling.

1c. Mean revision in monthly growth, , , , , 1

1h t t L h t t L

t

G GN

, where

, , , 1, 1

, ,

, 1, 1

100h t t L h t t L

h t t L

h t t L

T TG

T

. This is measured in percentage points (p.p.)

1d. Mean absolute revision in monthly growth, , , , , 1

1h t t L h t t L

t

G GN

. This is

measured in p.p.

2a. Ratio of root mean square difference to average turnover level, at lag L

( 0,...,17L ),

2

, , , , 18

, , 18

h t t L h t t

th t t

t

NT T

T

. This is a measure of the average difference

between the estimates at lag L and the estimates at lag 18. This measure has no units.

2b. Ratio of root mean square difference in growth to average growth in turnover,

2

, , , , 18

, , 18

h t t L h t t

th t t

t

NG G

G

. This measure has no units.

3.1.3. Results

Month on month revisions to monthly estimates – The method winters/simple with

the order interpolation then extrapolation appeared to be slightly better than other

choices.

Mean absolute revisions in growth for the better methods – Of those

combinations of interpolation and extrapolation methods tested, the four

combinations below performed the best with regards to mean absolute revisions to

growth, in percentage points (p.p.), at lags 1 to 4 months:

A. Extrapolate using ARIMA then interpolate using simple

B. Extrapolate using Winters then interpolate using simple

C. Interpolate using simple then extrapolate using ARIMA

D. Interpolate using simple then extrapolate using Winters

It is clear that there is no overall best method, in terms of month on month revisions,

for all industries and all lags. However, simple interpolation followed by Winters

extrapolation is a consistently good performer for lags 2 to 4 (with notable exceptions

for industries SIC 521 and 527, where the ARIMA methods outperform it). Figure5

demonstrates the results for SIC 029.

12

Figure 5. Mean absolute revisions to growth in SIC29 by a) extrapolating using

ARIMA then interpolating using simple b) extrapolating using Winters then

interpolating using simple c) interpolating using simple then extrapolating using

ARIMA and d) interpolating using simple then extrapolate using Winters

I ndust r y SI C29

met hod A = ei _st epar _si mpl e B = ei _wi nt er s_si mpl e

C = i e_st epar _si mpl e D = i e_wi nt er s_si mpl e

Revi si ons p. p.

0

1

2

3

4

Met hod

Lag1 2 3 4

A B C D A B C D A B C D A B C D

Note that for revisions at lag 1 the situation is different, with Winters extrapolation

followed by simple interpolation being the best performer, and by a large margin in

most cases.

Difference between estimates at each lag and estimate at lag 18 – No method

consistently outperformed the others in respect of the difference in estimates at each

lag and the estimate at lag 18. The method of simple extrapolation has huge

differences compared to the other methods and so is not recommended. However the

difference is small between the ARIMA and Winters extrapolation, with both simple

and spline interpolation methods.

3.1.4. Conclusions

For growths, none of the methods examined was better than any other in terms of

these measures of revisions. Each of the methods exhibited extreme poor performance

in some cases.

For levels one method consistently outperformed others, simple interpolation by

division followed by extrapolation by Winters. This method, while not superior in

every case, was consistently good and did not exhibit any extremes of poor

performance.

3.2. Producing Short-term Turnover Estimates using VAT-Sourced Turnover

This work focuses on comparing survey-based and VAT based turnover estimates for

the UK‟s short term surveys and establishing a relationship between the two. It

examines whether VAT-sourced turnover is able to produce estimates of sufficient

comparability with survey-based estimates and addresses potential effects of

13

periodicity (an aspect of timeliness) associated with non-monthly VAT reporters.

These are important issues to address in understanding whether the VAT-sourced

turnover can be used as a replacement for survey-sourced data; and help to build

strong foundations to assess whether t+30 day VAT-sourced data can be used on its

own; or in conjunction with a survey component.

3.2.1. Methods Tested

Survey-based and VAT-based estimates were produced for ONS‟ three main STS

surveys (MPI, MIDSS, and RSI). For each of these surveys, totals were calculated at

the overall survey level on a month by month basis for 2007/08. In addition, month by

month totals were also calculated at the NACE Rev 1.1 2-digit level for MPI/MIDSS

and the NACE 3-digit level for RSI. Due to the differences between the VAT

turnover data and the survey turnover data (definitional differences, periodicity, and

universe coverage), five methods of constructing VAT estimates, hjV̂ , were tested.

These were:

Method 1: Census – assumes VAT universe covers the target population AND that

VAT turnover represents true turnover. This is not strictly true (see Orchard, 2010)

due to under-coverage of small businesses, but was thought to be a useful basis for

later comparisons.

hi

ihjhj

vV^

where ihjv represents each individual reporting unit i within the survey-specific VAT

universe in NACE h and period j.

Method 2: Expansion – assumes the survey universe covers the target population but

that VAT turnover still represents true turnover.

hi

ihjhjhjvaV

^

where ahj is the survey design weight (a-weight) derived by comparing the VAT

universe (where every unit has a VAT turnover value) against the survey universe.

The a-weight can be represented as:

hj

hj

aN

N

VU

SUhj

where hjNSU

is the total survey universe population within a survey stratum and

hjNVU

is the total VAT universe population within a survey stratum for period j.

Method 3: Ratio – as expansion but with an additional calibration constraint using

population turnover as an auxiliary variable. Ratio estimates generally improve upon

the estimates produced by an expansion estimator, providing the auxiliary variable

correlates well with the variable of interest.

14

Method 4: Univariate modelling – assumes a simple ratio between total survey

turnover at a stratum or NACE level; and total VAT turnover for the same domain.

Whereas the ratio estimator (Method 3) applies a weight derived from an auxiliary

variable to either the VAT or survey returned turnover, a simple ratio approach looks

at the relationship between survey returned monthly turnover and VAT-derived

monthly turnover. As such, it is the simplest form of a linear regression model. This

can be explained as:

hjhjhj VY^^

where hjY^

is the monthly turnover survey estimate within stratum h for period j,

hjV^

is the monthly VAT turnover estimate for the stratum h and period j, and hj is

the ratio (constant) between the survey and VAT estimates for stratum h and period j.

Two options exist for applying a ratio model to the VAT data with the aim of

producing survey-like VAT turnover estimates. These are fitting a micro-data level

model based on those observations where both survey-sourced turnover and VAT-

sourced turnover is available for the survey period; or fitting a straightforward ratio

model at the macro (aggregate) level. Both approaches were considered but it was felt

that the macro approach would be most suitable since it was the simplest to

implement. This is in contrast to the more complex micro level multivariate approach

taken in the next section.

Method 5: Multivariate modelling – assumes that survey turnover can be estimated

for each business in the VAT universe using VAT and a vector of other known

continuous and discrete auxiliary variables held on the IDBR, X .

1 2 ijij j j ij jY V X

where ijY is the survey returned turnover for unit i within period j, j is the intercept

within period j, 1 j is the regression coefficient for the VAT turnover for unit i in

period j, 2 j is a vector of regression coefficients.

There is a possibility that the multivariate regression model may be subject to a

positive non-response bias. Positive bias might occur because the model will assign a

value to an observation (reporting unit), whether or not it would have returned a non-

zero value in reality. To adjust for the possibility of business having a zero return, a

logistic model was developed to correct for this bias.

3.2.2. Measures of Performance

These five methods were compared against the survey based estimates, in terms of

growth and levels, using estimated relative error (RE) and absolute relative error

(ARE).

Estimated relative error for NACE h is REh =

hj

hjhj

y

vy

^

^^

)(

15

where hj

y^

is the survey turnover estimate, hjv^

is the predicted VAT-derived turnover

and j is the monthly indicator j =1,….24. To produce the mean estimated relative

errors (MRE) for each sector, the average of REh over all NACE classifications within

the sector were taken.

Estimated absolute relative error is AREh =

j

hj

j

hjhj

y

vy

^

^^

||

Again to produce the mean estimated absolute relative errors (MARE) for each sector,

the average of AREh over all NACE classifications within the sector were taken.

These indicators have been previous used to assess the effectiveness of VAT data to

predict RSI turnover for editing and imputation purposes within ONS (Lewis, 2009).

Although there are no definitive rules for an acceptable level of error between the

VAT-sourced and survey-sourced estimate of turnover, a value of <±0.1 (±10%) for

the MRE and MARE will be assumed for this report. This effectively means that for

survey-like VAT turnover estimates to be acceptable, not only must any methodology

produce estimates with little or no bias (as represented by MRE), but it must also be

able to produce estimates that are reasonably accurate (as represented by MARE).

3.2.3 Results

Applying census, expansion and ratio estimator approaches (Methods 1, 2, and 3)

resulted in turnover estimates where the levels and growth were not comparable to

those seen in the survey-derived turnover estimates.

Applying a simple ratio (Method 4) between the VAT derived turnover and survey-

derived turnover at an aggregate (stratum or NACE level) showed improvement, with

levels being adequately accounted for in a number of NACE classifications,

especially production and services sectors (Table 1).

Table 1. Results of the comparison at sector level between monthly turnover

estimates derived from MPI and MIDSS surveys and monthly VAT-derived turnover

estimates derived by simple ratio at stratum level and simple ratio at NACE 2-digit

level. Green indicates where the relative error is less than 10% between the survey

and VAT-based estimates.

Sector

MRE Ratio

@Stratum

MRE Ratio

@NACE

MARE Ratio

@Stratum

MARE Ratio

@NACE

Production 0.05 0.03 0.21 0.18

Services 0.01 0.00 0.14 0.10

Unfortunately, growth comparable to that seen in survey estimates could not be

reproduced. This was thought to be due to the failure of this model to replicate the

seasonal pattern of the monthly survey data. Figure 6 illustrates the performance of

this model at stratum and NACE division 19.

Taking a multivariate approach using additional information available (Method 5) did

not improve on this either. More detailed results can be found in the UKs ESSnet

internal report (Orchard, 2010).

16

Figure 6. Monthly estimates of turnover for 2-digit NACE classifications covering

the UK production sector, specifically monthly turnover for NACE division 19 using

survey-sourced turnover ( ), VAT-sourced turnover assuming a simple ratio at the

stratum level ( ), and VAT-sourced turnover assuming a simple ratio at the NACE

2-digit level ( ).

3.2.4. Conclusions

From the analysis carried out, all methods had difficultly in producing survey-like

VAT turnover comparable to survey estimates. The simple ratio approach seemed to

perform the best being able to account for levels, but did not give good estimates of

growth in the majority of cases.

Since we do not really know whether the survey estimate or VAT estimates reflect the

true population, but make the assumption that the survey estimate does, the

conclusion of this report is that VAT alone cannot be currently used to replace

turnover estimates for the majority of NACE Rev 1.1 classifications. A survey

component must therefore be retained if changes in growth are to be adequately

accounted for.

4. Comparison with other countries

The current methodology applied within the UK, and the methodology being

developed/implemented under ESSnet WP4, compare favourably with other European

countries in the WP. This is in addition to the UKs current model for producing

composite VAT-survey estimates being identical in composition (although not detail)

to that in use or under development by DESTATIS and Statistics Netherlands.

Under ESSnet WP4 Statistics Lithuania are planning to develop and implement a

generalised regression (GREG)-type estimator to reduce their overall sample size. A

form of this, the ratio estimator, is already employed within the UK with more

sophisticated GREG-type estimators examined for implementation in a number of UK

surveys (Hedlin et al., 2001).

17

Under ESSnet WP4, ONS have developed an interpolation and extrapolation method

for accounting for missing (not yet received) VAT turnover. This method is a viable

alternative to imputation and modelling approaches used by other countries within the

WP (SN, DESTATIS, ISTAT), especially where little or no data are available.

Interpolation and extrapolation methodologies are also established common practice

in non-European countries, being employed by both Statistics Canada (Yung & Lys,

2008) and Statistics New Zealand.

Both DESTATIS & ISTAT currently use imputation based methodology to account

for timeliness issues associated with their data (although methodology does vary).

Imputation methodology similar to these, are established in ONS, giving the UK a

capability to assess DESTATIS/ISTAT imputation methodology. This can be

compared against the effectiveness of interpolation and extrapolation.

Finally, there seems to be some accordance with revisions patterns of VAT data

across countries, irrespective of methodologies used to address timeliness. Both the

UK and Germany see that estimates produced between the first vintage and final

estimate show a characteristic arc in their performance.

5. Conclusions

As part of this ESSNET work on the Timeliness of administrative sources the UK has

investigated the use of interpolation and extrapolation to counter the effect of

incomplete data, and compared survey-based turnover estimates from the UK STS

with their VAT-based counterparts.

The investigation found that simple interpolation by division followed by

extrapolation by Winters was a promising method of addressing the effect of

incomplete data. Modelling turnover data for NACE categories from VAT data

however was much more difficult. None of the methods examined were able to model

levels and growth satisfactorily. The „best‟ method involved modelling turnover as a

simple multiplicative factor of the VAT turnover for that NACE category.

The interpolation/extrapolation method examined is an established methodology in

non-EU countries (Canada, New Zealand) and could form a useful basis for

comparison with the other main approach, imputation, that DESTATIS/ISTAT

currently use.

Methodology has converged with the general consensus that although there are a

number of ways of solving a problem, those ways are limited to specific techniques.

The testing of these various techniques under different countries conditions can now

go ahead, enabling an adaptive best practice to be developed suitable to each countries

situation

6. Further work

For the next SGA, the UK proposes to build on the conclusions found via SGA 2010

and to extend the work in order to better account for growth patterns in the current

UK STS estimates. This is likely to involve examining the effect of seasonality in

VAT returns. We propose to investigate a survey-VAT hybrid model where a survey

component is retained for the larger businesses, and VAT data are used for small to

medium sized businesses (Figure 7). This approach for incorporating VAT turnover is

already in use by other European countries including the Netherlands, Germany and

Italy.

18

The UK proposes to explore further the use of the interpolation and extrapolation

methodology, focusing on the issue of timeliness. It is hoped that the work would

provide general guidance on best practice when using this methodology. In particular,

the work will examine for various time lags following the reporting period, methods

of determining the optimum survey proportion (i.e. the positioning of the A/B

boundary in Figure 7. This has the added benefit of testing the SGA 2010 conclusions

without the influence of the larger businesses.

Once an estimate for the survey and VAT strata are obtained there will need to be

some thought on how to combine the two estimates to compare with the current

estimate from STS.

Figure 7: Construction of a survey-VAT hybrid model for short-term turnover

estimation

A

B

Bu

sin

ess s

ize

by e

mp

loym

en

t LA

RG

EM

ED

IUM

SM

AL

L

SURVEY

UNIVERSE

VAT

SURVEY

Data available

A - Larger businesses

estimated by survey

techniques

B – Medium/Small

businesses estimated from

VAT data using the

interpolation/extrapolation

method.

In addition to detailed work on the interpolation/extrapolation methodology the UK

would also welcome involvement in

testing the micro-imputation(DESTATIS) and modeling methodologies

(Statistics Netherlands) on UK data to compare the performance of the

imputation and extrapolation methodology

assisting in the overall comparison of the main methodologies in producing a

final estimate ready for publication

Acknowledgement

The UK would like to acknowledge the advice of all countries participating in ESSnet

WP4 and Statistics Netherlands in particular for their coordination of meetings and

discussions.

19

References

European Statistical Service guidelines on seasonal adjustment (2009).

http://epp.eurostat.ec.europa.eu/portal/page/portal/product_details/publication?p_prod

uct_code=KS-RA-09-006

Hargreaves L (2009). An investigation of the potential benefits to ONS of receiving

additional HMRC VAT data. ONS. Internal Report.

Hedlin D, Falvey H, Chambers R, Kokic P (2001). Does the model matter for

GREG estimation? A business survey example. Journal of Official Statistics. 17(4):

527-544.

Lewis D (2009). Using tax data to assist with editing and imputation. ONS. Internal

Report.

Lorenz R (2010). Estimations in the VAT data for STS in Germany. Federal

Statistical Office of Germany. ESSnet WP4 Internal Report.

Orchard CB & James G (2009). The use of VAT turnover data in short term surveys

within the UK. ONS. ESSnet WP4 Internal Report.

Orchard CB (2010). File preparation, data matching, and distribution analysis of

VAT turnover. ONS. ESSnet WP4 Internal Report.

Parkin N (2010). Extrapolation and interpolation of value added tax returns. ONS.

ESSnet WP4 Internal Report.

Pring P (2008). Summary quality report for the monthly production inquiry survey.

ONS. ONS External Report. http://www.ons.gov.uk/about-statistics/methodology-

and-quality/quality/qual-info-economic-social-and-bus-stats/quality-reports-for-

business-statistics/index.html

Yung W & Lys P (2008). Use of administrative data in statistics Canada‟s business

surveys – the way forward. Statistics Canada. Internal Report.

http://epp.eurostat.ec.europa.eu/portal/page/portal/product_details/publication?p_product_code=KS-RA-09-006

http://epp.eurostat.ec.europa.eu/portal/page/portal/product_details/publication?p_product_code=KS-RA-09-006

http://www.ons.gov.uk/about-statistics/methodology-and-quality/quality/qual-info-economic-social-and-bus-stats/quality-reports-for-business-statistics/index.html



20

Appendix 1

Comparison between UK and NL: Use of Admin data for STS purposes.

Topic Netherlands UK

Variable Turnover Turnover

Reference period Month Month

Release date m+30 days m+30 days

Data available on

reference period

in time for release

date

Survey on largest enterprises (top

1900) (LEs) + non random sample

of admin data of SME

Survey on largest enterprises

(>100-250 employees

depending on industry) +

sample of medium and small

businesses.

Auxiliary data Survey on largest enterprises (top

1900) + population of admin data

of SMEs for the last closed quarter

(and previous ones)

Annualised VAT turnover data

for for the previous year,

together with Annual turnover

from the Annual Business

Survey for the largest

businesses.

Main Solution Macro approach

Modelling the relationship

between estimates based on LEs

(survey) and SMEs (admin)

Ratio estimation using the

annualised VAT turnover is

used to reduce variance and

thus overall sampling cf. an

expansion estimator approach.

Fundamental

assumption

Difference between growth rates

of month m can be proxied by the

one in the last closed quarter

VAT data is similar enough to

survey data that it can be used

as an auxiliary. Relies on good

correlation between turnover

sources.

Main issues In those activities where:

Seasonal differences in growth

rates

LEs are few or scarcely

representative

Differences in growth rates are

erratic

Minimal use of timely VAT,

annualised data used only as

auxiliary.

Necessary

adjustments

Refinement of the model

Add survey on SMEs

Drop publication

Use of VAT data for small and

medium businesses

Forecasting or imputation for

missing data. Modelling of

VAT data into survey-like data.

Revisions It is possible to revise the

estimates of three months of

quarter q at q+45 days (planned?)

Final estimate at m+90 days

Possible

improvements/

Suggestions

Use of temporal disaggregation

technique (need longer time series)

Optimise survey bound

dependent upon quality of VAT

available at time t.

Common issues Revision error as a quality indicator

Conversion of quarterly data to monthly data

se of administrative and accounts data in ... 2010...where x is the auxiliary variable chosen in...

Documents