methodology of allocating generic field to its details

31
Methodology of Allocating Generic Field to its Details Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007

Upload: ceana

Post on 19-Jan-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Methodology of Allocating Generic Field to its Details. Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007. Outline. Background Information on Tax Data Objective Current Methodology Other Methodologies Considered Comparison of the Methodologies - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Methodology of Allocating Generic Field to its Details

Methodology of Allocating Generic Field to its Details

Jessica Andrews

Nathalie Hamel

François Brisebois

ICESIII - June 19, 2007

Page 2: Methodology of Allocating Generic Field to its Details

Outline

Background Information on Tax Data

Objective

Current Methodology

Other Methodologies Considered

Comparison of the Methodologies

Future Work and Conclusions

Page 3: Methodology of Allocating Generic Field to its Details

Tax Data

Statistics Canada receives annual data from Canada Revenue Agency (CRA) on incorporated (T2) businesses

Tax data:Balance Sheet

Income Statement

88 different Schedules

Page 4: Methodology of Allocating Generic Field to its Details

Tax Data

About 700 different fields to reportMost companies provide only 30-40 fieldsOnly 8 fields are actually required by CRA (section totals)

Non-farm revenueNon-farm expensesFarm revenueFarm expensesAssetsLiabilitiesShareholder EquityNet Income/Loss

Page 5: Methodology of Allocating Generic Field to its Details

Objective

To impute the missing detail variablesWhy ?

Tax data users need detailed data (tax replacement project (TRP))Different concepts and definitions between tax and survey dataA subset of details linked to the same generic can be mapped to different survey variables (Chart of Account)

Page 6: Methodology of Allocating Generic Field to its Details

Challenges to meet

Methodology mustWork well for a large number of details

Be capable of dealing with details which are rarely reported and those which are frequently reported

Give good micro results for tax replacement, but also give good macro results when examined at the NAICS or full database level

Page 7: Methodology of Allocating Generic Field to its Details

First attempt to complete Tax Data

Edit rulesOutlier detection within a recordDeterministic edits (to ensure the record balances within section)Review and manual correctionsOverlap between fiscal periodNegative valuesConsistency edits between tax variablesOutlier detection between records (Hidiroglou-Berthelot)CORTAX balancing edits

Deterministic imputation of key variablesInventoriesDepreciationSalaries and wages

Page 8: Methodology of Allocating Generic Field to its Details

GDA ConceptsCorporation can use either generic or detail fields to report their results

      Case 1 Case 2 Case 3

Generic 8810 Office expenses amount 100 30

Details

8811Office stationery and supply expense amount 20

8812Office utilities expense amount 30 10

8813Data processing expense amount 50 60

Total 100 100 100

Page 9: Methodology of Allocating Generic Field to its Details

GDA Concepts

Block is defined by a generic and its details

Generic field is not a totalGoal is to impute the most significant detail variables when a generic amount has been reportedGDA: Generic to detail allocation

Page 10: Methodology of Allocating Generic Field to its Details

Current method

Uses imputation classes based on industry codes and size of company

First 2 digits of NAICS (about 25 industries)Three sizes of revenue (boundaries of 5 and 25 million)

Calculates ratios within imputation classes for each block

Uses all non-zero and non-missing detailsUses only details reported at least 10% of the time (5% for block General Farm Expense)

Assigns ratios to businesses with a generic

Page 11: Methodology of Allocating Generic Field to its Details

Current method

Originally proposed as a solution with good macro (aggregate) results

Now need good micro (business) level results for TRP

ProblemsImputation classes are frequently not homogeneous in terms of distribution

A large number of small imputation classes

Page 12: Methodology of Allocating Generic Field to its Details

Other methods considered

Historic imputation method

Scores method

Cluster method

Page 13: Methodology of Allocating Generic Field to its Details

Historic imputation method

Assumes distributions of details are the same from one year to the next

ProblemsA change in business strategies/properties will not be considered this way

Most businesses which report details in the previous year will report them also in the current year, leaving few businesses which could be imputed with this method (~5% on all blocks tested)

Requires use of another method for remaining businesses

Page 14: Methodology of Allocating Generic Field to its Details

Scores method

Uses response/non response models for each detail

Groups businesses into imputation classes on the basis of percentiles of response probability

Calculates ratios within imputation classes

Assigns ratios to businesses with a generic

Page 15: Methodology of Allocating Generic Field to its Details

Scores method

ProblemsNeed to create a model for each detail

Difficult to resolve what to do in the case of blocks with many details (5 or more) which are frequently reported

This method was excluded due to it’s difficulty in coping with blocks with a moderate to large number of details

Page 16: Methodology of Allocating Generic Field to its Details

Cluster method

Divides businesses into imputation classes on the basis of response patterns to details

Uses clustering or dominant detail method

Uses discriminatory models (parametric or not) to assign businesses with generic to imputation classesCalculates ratios within imputation classesAssigns ratios to businesses with a generic

Page 17: Methodology of Allocating Generic Field to its Details

Cluster method

ProblemsFor certain blocks it can be difficult to find good variables on which to discriminate

Issue of how often clustering method and models should be reviewed

Page 18: Methodology of Allocating Generic Field to its Details

Comparing the methods

Estimate distributions of known data for year n from ratios calculated for year n-1

Create a benchmark fileReported details in years n-1 and nPut all details into generic fields in year nCalculate ratios from businesses in year n-1 for all methodsAssign ratios to businesses in year nCompare the results to the reported fields

Page 19: Methodology of Allocating Generic Field to its Details

Comparing the methods

Compare the results at the micro (businesses) and the macro (aggregate) levels

Compare true and estimated distributions

Page 20: Methodology of Allocating Generic Field to its Details

Comparing the methods

Macro statistics

for the jth detail in the block

2)ˆ( jj

j ttSSE

2)1ˆ(

j j

j

t

tSSEP

Page 21: Methodology of Allocating Generic Field to its Details

Comparing the methods

Micro StatisticsMedian Pseudo CV

for the jth detail and ith business in the block

j

ijj

ijij xxx 2ˆ

Page 22: Methodology of Allocating Generic Field to its Details

Comparing the methods

Micro StatisticsMedian Pearson Contingency Coefficient

for the jth detail and ith business in the block

f values represent the marginal distributionsd2 represents the degree of dependency (depends on n, r and c)

2/1

2

2

nd

dP

i j ji

jiij

i j ji

jiij

ff

fffn

n

nn

n

nnn

d..

2..

..

2

..

2

Page 23: Methodology of Allocating Generic Field to its Details

Comparing the methods

We show results for Block 8230: Other Revenue

This block has 20 details covering revenue distribution

Important for clients as used in many surveys

The scores method is not shown as it is difficult to implement with this many details

Page 24: Methodology of Allocating Generic Field to its Details

Comparing the methods

OTHER REVENUE FLDS 8230 TO 8250

8230 Other revenue

8231 Foreign exchange gains/losses

8232 Income/loss of subsidiaries/affiliates

8233 Income/loss of other divisions

8234 Income/loss of joint ventures

8248 Insurance recoveries

8249 Expense recoveries

8250 Bad debt recoveries

Page 25: Methodology of Allocating Generic Field to its Details

Results

Block 8230 Micro Statistics Macro Statistics

Median PseudoCV

IQR Median

PearsonCont. Coeff.

IQR SSE SSEP

Current Method

1.08 0.43 0.66 0.14 2.2e20 120

Cluster Method

0.34 1.39 0.36 0.63 2.8e20 12

Historic + Cluster

0.51 0.99 0.10 0.7 9.9e19 4.5

Page 26: Methodology of Allocating Generic Field to its Details

Cluster methodology

Most blocks use dominant detail (attractor) x clusters to define the imputation classes A business i belongs to cluster j of attractor x where x>50 if

where is the total value reported by business i in detail j. If this statement is not true for any detail then the business is assigned to cluster j+1.

ijY

100

x

Y

Y

jij

ij

Page 27: Methodology of Allocating Generic Field to its Details

Cluster methodology

Distribution ratios to details are calculated for each cluster

Discriminatory models are then created

(nonparametric for most blocks) to assign businesses with a generic

Use variables on industry (NAICS), location (province), size (revenue, log revenue), details and totals of details in other blocks

Page 28: Methodology of Allocating Generic Field to its Details

Cluster methodology

Generic amounts are assigned to details in the following 3 ways

If generic amount and no details reported then ratios are assigned as calculatedIf generic amount and all details with ratio greater than 0% are reported then ratios are assigned as calculatedIf generic amount and some details but not all are reported, then ratios are pro-rated and generic is assigned only to details which were not reported

Page 29: Methodology of Allocating Generic Field to its Details

Cluster methodology

Gives better micro results

Improved data for tax replacement

Macro results remain similar to current methodology

Micro results are consistent year to year

Page 30: Methodology of Allocating Generic Field to its Details

Future work and conclusions

The cluster methodology will be implemented for reference year 2006 for the Income Statement

Model fitting and implementation for Balance Sheet will follow

Review of models and clustering methods as deemed appropriate

Page 31: Methodology of Allocating Generic Field to its Details

For more information please contact

Pour plus d’information, veuillez contacter

Visit our web site atwww.statcan.ca

Contact Information / Coordonnées

[email protected]

[email protected]

[email protected]