1 methods for detecting errors in vat turnover data phil lewis processing, editing and imputation...

Post on 22-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Methods for detecting errors in VAT Turnover data

Phil LewisProcessing, Editing and Imputation branch

Business Statistics Methods-Survey MethodologyE-mail: philip.a.lewis@ons.gov.uk

2

Outline of talk

• Detecting suspicious patterns• Methods for detecting unit errors• Consider 5 methods• Comparing methods• Results• Conclusion and recommendations

3

Detecting suspicious patterns

• One of the problems with VAT Turnover data is that it is often not possible to re-contact businesses to get an idea of their true Turnover figure.

• It is often possible to identify errors in VAT Turnover data by considering the pattern of reported Turnover over a period.

4

Hoogland (2010)

i. Zero Turnover in three quarters, positive Turnover in the other quarter

ii. Zero Turnover in one quarter, positive Turnover in the other three quarters

iii. Same Turnover in all four quarters

iv. Same Turnover for three quarters, a different (positive) Turnover value in the other quarter

v. Negative Turnover in any of the quarters

5

Methods for detecting unit errors in reported VAT Turnover

If

then assume the current VAT Turnover has

been reported in thousands of pounds and

multiply by 1000 to get a figure in pounds.

current VAT TurnoverA B

previous VAT Turnover

6

1 – Quartile distances in industry Turnover

• Based on a method described in Hoogland and Van Haren (2007) to identify unusually large or small Turnover by locating extreme values in the distribution of VAT Turnover within a particular industry and size class.

7

• Suspicious Turnover is identified as follows.

If

Turnover > Q3 + [C × (Q3 – Median)]

or

Turnover < Q1 – [C × (Median – Q1)]

• C may be given different values for different industry and size classes.

8

2 – Period on period ratios

• Method 2 comes from De Jong (2003) and involves calculating period on period ratios for each business based on the contribution that business’s Turnover makes to its class.

• For each business calculate:

VAT TurnoverScore = Median VAT Turnover in class

9

• Then calculate

Where is the value of Score in period t.

t tt-1 t-1

tt-1

Score Score if Score > Score TestRatio =

Score Score otherwise.

tScore

10

3 – Comparison with reporting history for the business

• The method is described in slightly different forms in Hoogland and Van Haren (2007), Lorenz (2010) and Röstel (2010).

• Note that this method only identifies suspiciously large Turnover.

11

• If Turnover > £100 million

and Turnover > 10 × mean Turnover for the business in the past 24 months.

Then treat as suspicious.

12

4 – Quartile differences combined with measure of influence

• Refinement to method 1, inspired by Hoogland et al (2009).

• Calculate the influence as the proportion of VAT Turnover the business contributes to the total VAT Turnover in the industry and size class.

• Combine detection of suspicious values using quartile differences with the influence.

13

• Identify unusual Turnover values using the quartile distances measure described in method 1.

• Reminder method 1 Suspicious Turnover:

Turnover > Q3 + [C × (Q3 – Median)]

or

Turnover < Q1 – [C × (Median – Q1)]

14

• Then for each business calculate

• This method effectively subsets businesses failing the quartile distance method, so that only the most influential are viewed as being suspicious.

VAT TurnoverInfluence = Total VAT Turnover in class

15

5 – Hidiroglou-Berthelot method

• Compare to previous period’s value:

Form the ratio r

= current VAT turnover / previous VAT turnover

Transform the ratio• if r < m median then t = (r - m) / r• otherwise t = (r - m) / m

Define

E = t x max { current VAT T/O, previous VAT T/O }v

16

Then calculate

Suspicious businesses are then identified asfollows:

If

or

Q1d = max (Q2 - Q1) , A Q2

Q3d = max (Q3 - Q2) , A Q2

Q1E < Q2 - C × d

Q3E > Q2 + C × d

17

• A key difference between survey and administrative data is that with administrative data it is often not possible to re-contact the business and ask them to confirm any suspicious values.

• Evaluation of detection methods is not straightforward and cannot usually be definitive.

18

Comparing methods

• Diagnostics include the proportion of businesses identified as suspicious within each industry and size class and the average size (employment) and VAT Turnover of suspicious businesses compared with the rest of the class.

19

Results of testing detection methods with VAT data

• If businesses with larger Turnover values are of more importance:

method 4 (Quartile differences &influence) and

method 5 (Hidiroglou-Berthelot)

offer the flexibility to give higher weight to those businesses.

20

• Good quality historic data available then:

method 2 (Period on period ratios)

and

method 3 (Comparison with history )

likely to give good results.

21

• Method 1 (Quartile differences)

and the related

method 4 (Quartile differences &influence)

should be effective in identifying extreme values when only the current period data are available.

22

Results of testing detection methods with VAT data

23

Estimated false hits

24

Conclusion and recommendations

• Each of these methods uses parameters which can be fine-tuned to identify an appropriate number of suspicious businesses.

• The effective values of these parameters are likely to differ between data sources. Therefore, rather than prescribe specific values, it is recommended that the parameters are set through analysis of the effect of the method on the VAT data under consideration.

25

Before applying any detection methods

• Suspicious patterns. It is recommended that VAT data are checked for these patterns before implementing any other error detection method.

• Unit errors: relatively easy to identify and correct. It is recommended that an automatic method is developed to detect and correct any unit errors in VAT Turnover data, before applying any other rules.

26

• The final recommendation is that in developing methods for detecting errors in VAT Turnover data, it is always useful to understand the data source and the possible errors that may be found in it.

• In many cases, it will be necessary to liaise with the data providers to get this information.

27

References:

• De Jong, A. "Impect: Recent developments in harmonized processing and selective editing", Proceedings of UNECE Work Session on Statistical Data Editing, Madrid, October 2003: Web.

• Hidiroglou, M. A. and Berthelot, J.-M. “Statistical Editing and Imputation for Periodic Business Surveys”, Survey Methodology, June 1986, Vol. 12, No. 1, pp 73-83: Journal.

• Hoogland, J. "Editing strategies for VAT data", Seminar on 'Using administrative data in the Production of Business Statistics - Member States experiences', Rome, March 2010: Web.

• Hoogland, J. and Van Haren, G. "Editing and integrating VAT and SBS data", Proceedings of the third International Conference on Establishment Surveys (ICES-III), Montreal, June 2007: CD ROM.

28

References:

• Hoogland, J., Van Bemmel, K. and De Wolf, P-P. "Detection of potential influential errors in VAT turnover data used for short term statistics", Proceedings of UNECE Work Session on Statistical Data Editing, Neuchatel, October 2009: Web.

• Lorenz, R. "The integrated system of editing administrative data for STS in Germany", Seminar on 'Using administrative data in the Production of Business Statistics - Member States experiences', Rome, March 2010: Web.

• Seyb, A., Stewart, J., Chiang, G., Tinkler, I., Kupferman, L., Cox, V. and Allan, D. "Automated editing and imputation system for administrative financial data in New Zealand", Proceedings of UNECE Work Session on Statistical Data Editing, Neuchatel, October 2009: Web.

29

Extra information

• For method 2, we used a threshold of 25 as a compromise between the monthly and quarterly data.

• For method 3, we used the thresholds described in Hoogland and Van Haren (2007).

• For method 5, the Hidiroglou-Berthelot rule, we used a value of V = 1 to give extra weight to businesses with larger Turnover, as this has been shown to work well with business data in the past. The value of C for this method was 250.

• Method 1 used a value of C = 10 in the quartile method to give the same proportion of failures.

• For method 4 we chose a value of C = 8 in the quartile method and then prioritised the businesses failing that method by VAT Turnover to give a similar proportion of failures as methods 1 and 5.

top related