using business taxation data as auxiliary variables and as substitution variables in the australian...
Post on 01-Apr-2015
214 Views
Preview:
TRANSCRIPT
Using Business Taxation Data as Using Business Taxation Data as Auxiliary Variables and as Auxiliary Variables and as Substitution Variables in the Substitution Variables in the Australian Bureau of StatisticsAustralian Bureau of Statistics
Frank Yu, Robert Clark and Gabriele B. Durant
Outline of talkOutline of talk
Use of tax data in ABSUsing tax data as auxiliary variables
example: subannual surveysUsing tax data as variables of interest
missing taxation dataexample: annual surveys
Dealing with missing tax data:Missing at RandomCommon Error Measurement model
Conclusion
Use of tax dataUse of tax data
construct and maintain population frameas auxiliary variables for estimationsubstitute survey data to reduce provider burdenas source for imputing missing/invalid survey data
provide independent estimates for validation of outputs
Data supplied by Australian Data supplied by Australian Taxation OfficeTaxation OfficeAustralian Business Register information
businesses identified by name, addressindustry, payees
Business Activity Statement data - GST and PAYG dataavailable (90%) 6 months after reference quarterturnover, wage and salaries, capital and non-capital expenses
Income Tax dataavailable (70 to 80%)18 months after reference quarterdetailed expenses and revenue and balance sheet
Use of tax data for frame creationUse of tax data for frame creation
ABS MP
ATO MP
complex units
simple units: ABN = statistical unit
from Australian Busines Register
ABS Maintained Population
ATO maintained population
Use of tax data for frame Use of tax data for frame constructionconstructionconstruction: units from ABR
industry, sectornumber of payeesmultistate indicators
maintenance:births and cancellationtax roles : e.g. employing vs non-employing unitslong term non-remitters excludedstratification: single/multiple states, industry
Frame auxiliary variables (xFrame auxiliary variables (x ii's)'s)derived size benchmarks:
from BAS, based on wage and salaries dataused as stratification variables
BAS turnoverBAS wages
need imputation (derived from average of quarterly data)
lag reference quarter by 2 quarters
Sample Survey
BAS data BIT data
concept ** * *accuracy * ** ***timeliness *** ** *detailed domain * ** ***
richness of data items
*** * **
Survey data vs tax dataSurvey data vs tax data
Use of tax data as auxiliary Use of tax data as auxiliary variablesvariables
Survey Variables of interest
Auxiliary Variables for estimation
Retail Trade Sales BAS turnover
Economic Activity Survey
financial variables
BIT variables
Annual Integrated Collection
same as EAS BAS variables
s
U\s
yixi
xi
tax data as auxiliary variablestax data as auxiliary variables
Generalised Regression EstimationGeneralised Regression Estimation
' 1 '
( )
where
/
/
( / ) ( / )
GREG HT HT
HT i is
HT i is
i i i i i is s
Y Y X X B
Y Y
X X
B X X X Y
Advantages and disadvantagesAdvantages and disadvantages
Advantagesprovide efficiencyapproximately unbiaseddoes not require X's to be measuring the right concepts
does not require X's to be current
Disadvantagesdoes not model Y directly e.g. zero units
influential pointsefficiency in estimating levels not equal to efficiency for estimating change
Issue: inactive/out of scope unitsIssue: inactive/out of scope units
Solution: apply GREG to positive units only
efficiency for estimating level does not efficiency for estimating level does not necessarily translate to efficiency for necessarily translate to efficiency for estimating changeestimating change
2, 1, 2, 1,
res
res
( ) ( )
iff 1
where is the lag 1autocorrelation of residuals,
is the lag 1 autocorrelatin of Y's, and
r is the correlation between Y and X's
XY
GREG GREG HT HT
Y
Y
XY
Var Y Y Var Y Y
r
1-
1-
Data Substitution Approach: Use Data Substitution Approach: Use tax as the variable of interesttax as the variable of interest
Assumes tax data are betterrespondents more serious about getting it right
more time to provide information
audited accounts (for BIT) for tax purposes
Detailed breakdown
Missing tax datarequire matching to frame
missingness is non-ignorable
ƒ inactive unitsƒ late units have more expenses
Examples: Economic Activity Examples: Economic Activity Survey (annual) 1990s to 05/06Survey (annual) 1990s to 05/06
estimation of totals for broad items for microbusinesses
tax data as substitution variables
augmenting sample for simple businesses
tax data to replace broad level income and expenses items
estimation of detailed items
detailed items imputed by pro-rating broad tax data based on splits observd in surveys
Examples: Annual Integrated Examples: Annual Integrated Collection (06/7 onwards)Collection (06/7 onwards)
AIC - core survey estimates
estimation of totals for survey variables for small and large businesses
tax data as auxiliary variables for generalised regression estimation
AIC - complementary estimates
estimation of totals for broad items for microbusinesses
tax data as substitution variables
AIC - complementary estimates
estimation of detailed state/industry classes
tax data as substitution variables
AIC - complementary estimates
estimation of detailed economic variables
tax data as substitution variables, disaggregated by model estimation of pro-rating factors
NotationNotation
Y available
ri = 1
Y not available
ri = 0
U
Use MAR model on frame onlyUse MAR model on frame only
Y available
ri = 1
Y not available
ri = 0
Umodel: Y= f(x) for ri = 1
Xi
Xi
frame variables tax data of interest
Use MAR model conditional on frame Use MAR model conditional on frame variables onlyvariables only
Y available
ri = 1
Y not available
ri = 0
U
model: Y= f(x) for ri = 1
impute Y^ = f(x) for ri = 0
Xi
Xi
MAR
But for non-ignorable missingnessBut for non-ignorable missingness
Y available
ri = 1
Y not available
ri = 0
U
model: Y= f(x) for ri = 1
impute Y^ = f(x) for ri = 0
Xi
Xi
Use a sample to inform about the nonreporters based Use a sample to inform about the nonreporters based on their survey response.on their survey response.Notation: Use Y to represent tax variables and Y* for Notation: Use Y to represent tax variables and Y* for survey variables (a surrogate of Y)survey variables (a surrogate of Y)
Y available
ri = 1
Y not available
ri = 0
U
sY* available
Y* available Xi
Xi
Imputing tax data from survey dataImputing tax data from survey data
Y available
ri = 1
Y not available
ri = 0
U
sY* available
Y* available
model: Y= f(Y*, xi)Xi
Xi
Imputing tax data from survey dataImputing tax data from survey data
Y available
ri = 1
Y not available
ri = 0
U
sY* available
Y* available
model: Y= f(Y*)
impute Ŷ
model: Y= f(Y*, xi)Xi
Xi
Imputing tax data from survey dataImputing tax data from survey data
Y available
ri = 1
Y not available
ri = 0
U
sY* available
Y* available
model: Y= f(Y*, x)
impute Ŷ=f(Y*, x)
Xi
Xi
Models for YModels for Y
Missing at Random: Y independent of r given x and Y*
Common measurement error: Given Y, distribution of Y*
Is independent of r
*,x Y
r Y
,
*x Y
r Y
Use MAR model: missing at random Use MAR model: missing at random given X and Y* given X and Y*
Y available
ri = 1
Y not available
ri = 0
U
sY* available
Y* available
model: Y= f(Y*, x) for ri = 1
impute Ŷ for ri = 0
Xi
Xi
MAR
*,x Y
r Y
Imputation using MAR modelImputation using MAR model1. Using data on Y and Y* observed from the units in
the sample where where both survey and tax data are reported, model Y as a function of Y*.
2. Use this model to impute Yi* for tax non reporters in the sample (assuming Y* is known for them).
3. For units not in the sample, if their tax data is missing, impute using the distribution
* * *
* * *
( | 0, ) ( | 0, , ) ( | 0, )
( | 1, , ) ( | 0, )
i i i i i i i i i i i
i i i i i i i i
f Y r x f Y r x Y f Y r x dY
f Y r x Y f Y r x dY
Use CME modelUse CME model
Y available
ri = 1
Y not available
ri = 0
U
sY* available
Y* available
model: Y*= f(Y, x) for ri = 1Xi
Xi
CME
invert to get Ŷ= g(Y*)
impute Ŷ = h(X) for ri = 0
for i in U\s
,
*x Y
r Y
Imputation using CME modelImputation using CME model
,
*x Y
r Y* *
*i
* 1i
i i
*
( | , , 0) (( | , , 1).
A typical model can be:
Y where ( | . ) 0,
This model motivates an unbiased impute: (Y )
We also want to model Y in terms of X when
Y an
i i i i i i i i
i i i i i
i i
f Y Y x r f Y Y x r
Y E Y r
Y
0 0
d Y are both not observed (i.e. for i and 0)
( | . 0) giving an impute
i
i i i i i i i
s r
E Y x r x x
Modelling survey data (Y*) and tax data Modelling survey data (Y*) and tax data (Y) - invert this to predict Y from Y*(Y) - invert this to predict Y from Y*
Model: survey data Y* (EAS 05/06) as a Model: survey data Y* (EAS 05/06) as a function of frame variable X (tax_turn_0405) function of frame variable X (tax_turn_0405) for tax nonrespondents (i.e. r =0) for tax nonrespondents (i.e. r =0)
BLUP impute:
Empirical Best Linear Unbiased Empirical Best Linear Unbiased Predictor (EBLUP) of YPredictor (EBLUP) of Yii
EBLUP impute
CME imputation processCME imputation processuse units in sample where tax and survey variables are observed and model the survey variable (Y*) as a function of tax and frame data. (Y, X)Under CME this model applies to r = 0 too.
use units in the sample where survey data are observed (i in s) but tax data are not (ri = 0) to model the survey variable (Y*)as function of frame data (x).
combine to give an impute for (Y) for tax nonrespondents (r = 0):
Combine to get EBLUP
Further workFurther work
domain estimation for CME/MARvariance estimationdiscriminating between CME and MAR based on data
ConclusionConclusionGREG is useful for estimation of survey data but efficiency gain is limited.
There is increasing interest in using tax data directly on its own to produce economic statistics.
Non-ignorable missingness becomes a key issue with tax data.
Survey data could be useful to help impute the tax data
top related