giovannatagliaferri · outline 1 introduction 2 2-steps gradient boosting 3 application on results...
TRANSCRIPT
![Page 1: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/1.jpg)
VAT Tax Gap prediction: a 2-steps
Gradient Boosting approach
Giovanna Tagliaferri
13 March 2019
![Page 2: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/2.jpg)
Outline
1 Introduction
2 2-Steps Gradient Boosting
3 Application on results from fiscal audits
Dataset description
Selection bias correction
Potential tax base estimate
4 VAT base gap propensity analysis
5 Conclusions
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 2 of 16
![Page 3: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/3.jpg)
Preamble
• Internship: work realized at Sogei in collaboration with the Italian Revenue
Agency.
• What: produce an estimate of the Italian VAT Tax Gap for the year 2011.
- Major disadvantage: selection bias ) taxpayers are not randomly se-
lected.
• How: a completely non parametric approach in 2-steps, based on Gradient
Boosting, able to provide estimates for the potential tax base (BIT) and
the undeclared part (BIND).
BIT = BID + BIND
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 3 of 16
![Page 4: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/4.jpg)
Preamble
• Internship: work realized at Sogei in collaboration with the Italian Revenue
Agency.
• What: produce an estimate of the Italian VAT Tax Gap for the year 2011.
- Major disadvantage: selection bias ) taxpayers are not randomly se-
lected.
• How: a completely non parametric approach in 2-steps, based on Gradient
Boosting, able to provide estimates for the potential tax base (BIT) and
the undeclared part (BIND).
BIT = BID + BIND
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 3 of 16
![Page 5: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/5.jpg)
Preamble
• Internship: work realized at Sogei in collaboration with the Italian Revenue
Agency.
• What: produce an estimate of the Italian VAT Tax Gap for the year 2011.
- Major disadvantage: selection bias ) taxpayers are not randomly se-
lected.
• How: a completely non parametric approach in 2-steps, based on Gradient
Boosting, able to provide estimates for the potential tax base (BIT) and
the undeclared part (BIND).
BIT = BID + BIND
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 3 of 16
![Page 6: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/6.jpg)
Application Context
• The statistical unit is the Individual Firm, individual who carries out busi-
ness activities or self-employment.
• The available information have been gathered from two sources:
- the register of Irpef, VAT and Irap declarations (available for the entire
population);
- the compliance control papers (available for tax assessed taxpayers).
• Only 2% of taxpayers are generally subject to tax assessment.
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 4 of 16
![Page 7: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/7.jpg)
2-Steps Gradient Boosting
First step: Selection bias correction
Gradient Boosting classification model, aimed at the estimation of:
π̂i = P ( i ∈ S | X ).
Target variable: compliance control presence.
Second step: Potential tax base estimate
Gradient Boosting regression model, only on the assessed units, with weights:
νi ∝1π̂i.
Target variable: potential tax base.
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 5 of 16
![Page 8: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/8.jpg)
2-Steps Gradient Boosting
First step: Selection bias correction
Gradient Boosting classification model, aimed at the estimation of:
π̂i = P ( i ∈ S | X ).
Target variable: compliance control presence.
Second step: Potential tax base estimate
Gradient Boosting regression model, only on the assessed units, with weights:
νi ∝1π̂i.
Target variable: potential tax base.
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 5 of 16
![Page 9: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/9.jpg)
2-Steps Gradient Boosting
First step: Selection bias correction
Gradient Boosting classification model, aimed at the estimation of:
π̂i = P ( i ∈ S | X ).
Target variable: compliance control presence.
Second step: Potential tax base estimate
Gradient Boosting regression model, only on the assessed units, with weights:
νi ∝1π̂i.
Target variable: potential tax base.
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 5 of 16
![Page 10: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/10.jpg)
Data
Matrix with approximately 2.3 milion of taxpayers for 160 variables.
Problems: hardware limits
Solution: subsampling
Total population Sample
Control type Frequence Percentage Frequence Percentage
Not Assessed 2′275′219 99.18% 45′489 70.85%
Assessed 18′718 0.82% 18′718 29.15%
2′293′937 100% 64′207 100%
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 6 of 16
![Page 11: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/11.jpg)
Data
Matrix with approximately 2.3 milion of taxpayers for 160 variables.
Problems: hardware limits
Solution: subsampling
Total population Sample
Control type Frequence Percentage Frequence Percentage
Not Assessed 2′275′219 99.18% 45′489 70.85%
Assessed 18′718 0.82% 18′718 29.15%
2′293′937 100% 64′207 100%
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 6 of 16
![Page 12: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/12.jpg)
Data
Matrix with approximately 2.3 milion of taxpayers for 160 variables.
Problems: hardware limits
Solution: subsampling
Total population Sample
Control type Frequence Percentage Frequence Percentage
Not Assessed 2′275′219 99.18% 45′489 70.85%
Assessed 18′718 0.82% 18′718 29.15%
2′293′937 100% 64′207 100%
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 6 of 16
![Page 13: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/13.jpg)
Selection bias correction
Hyperparameters have been tuned via cross-validation, with the sample split
in train (70%) and test (30%). The optimal choice was:
{λopt = 0.1, n.iteropt = 998} → AUC = 0.79
The most discriminating variables:
- region in which the firm operates
- branch
- number of employees
- revenues
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 7 of 16
![Page 14: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/14.jpg)
Selection bias correction
Hyperparameters have been tuned via cross-validation, with the sample split
in train (70%) and test (30%). The optimal choice was:
{λopt = 0.1, n.iteropt = 998} → AUC = 0.79
The most discriminating variables:
- region in which the firm operates
- branch
- number of employees
- revenues
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 7 of 16
![Page 15: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/15.jpg)
Potential tax base estimate
The regressive model has been estimated only on 18’718 taxpayers subject to
tax assessment. Cross validation was also performed here.
{λopt = 0.1, n.iteropt = 38 } → R2BIT = 0.83
Most important variables:
- belonging region
- taxable for other purchases and imports
- set of operations that produce VAT
- operating costs and fiscal added value
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 8 of 16
![Page 16: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/16.jpg)
Potential tax base estimate
The regressive model has been estimated only on 18’718 taxpayers subject to
tax assessment. Cross validation was also performed here.
{λopt = 0.1, n.iteropt = 38 } → R2BIT = 0.83
Most important variables:
- belonging region
- taxable for other purchases and imports
- set of operations that produce VAT
- operating costs and fiscal added value
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 8 of 16
![Page 17: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/17.jpg)
Potential tax base estimate
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 9 of 16
![Page 18: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/18.jpg)
Results
The Heckman Model has been estimated on the same sample for compara-
tive purposes.
Gradient Boosting Heckman
Train Test Test
BINDTOT 0.727mld 0.314mld 0.314mld
ˆBINDTOT 0.693mld 0.292mld 0.290mld
BITTOT 3.194mld 1.316mld 1.316mld
ˆBITTOT 3.159mld 1.292mld 1.231mld
R2BIT 0.836 0.828 0.657
R2adj,BIT 0.834 0.826 0.652
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 10 of 16
![Page 19: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/19.jpg)
Tax evasion propensity analysis
The estimated model has been used to get predictions onto not assessed
taxpayers.
The trend of tax evasion propensity was studied for the whole sample.
Prop =
∑Ni=1
ˆBIND i∑Ni=1
ˆBIT i
The lower the ratio the better the compliance.
Gradient Boosting Heckman
Propensity 30.40% 29.77%
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 11 of 16
![Page 20: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/20.jpg)
Tax evasion propensity analysis
The estimated model has been used to get predictions onto not assessed
taxpayers.
The trend of tax evasion propensity was studied for the whole sample.
Prop =
∑Ni=1
ˆBIND i∑Ni=1
ˆBIT i
The lower the ratio the better the compliance.
Gradient Boosting Heckman
Propensity 30.40% 29.77%
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 11 of 16
![Page 21: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/21.jpg)
Propensity for sex
Gradient Boosting Heckman
Sex n BITmld
BINDmld
Prop BITmld
BINDmld
Prop
Female 16053 2.34 0.81 34.74% 2.22 0.70 31.39%
Male 48154 8.72 2.55 29.25% 8.73 2.56 29.35%
Total 64207 11.05 3.36 30.40% 10.95 3.26 29.77%
Gradient Boosting Heckman
Age n BITmld
BINDmld
Prop BITmld
BINDmld
Prop
[18 − 25) 976 0.13 0.05 39.07% 0.13 0.04 36.71%
[25 − 45) 28250 4.26 1.45 34.10% 4.11 1.30 31.61%
[45 − 65) 30496 5.64 1.60 28.45% 5.64 1.60 28.37%
over 65 4485 1.02 0.25 24.65% 1.08 0.32 29.28%
Total 64207 11.05 3.36 30.40% 10.95 3.26 29.77%
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 12 of 16
![Page 22: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/22.jpg)
Propensity for sex and age
Gradient Boosting Heckman
Sex n BITmld
BINDmld
Prop BITmld
BINDmld
Prop
Female 16053 2.34 0.81 34.74% 2.22 0.70 31.39%
Male 48154 8.72 2.55 29.25% 8.73 2.56 29.35%
Total 64207 11.05 3.36 30.40% 10.95 3.26 29.77%
Gradient Boosting Heckman
Age n BITmld
BINDmld
Prop BITmld
BINDmld
Prop
[18 − 25) 976 0.13 0.05 39.07% 0.13 0.04 36.71%
[25 − 45) 28250 4.26 1.45 34.10% 4.11 1.30 31.61%
[45 − 65) 30496 5.64 1.60 28.45% 5.64 1.60 28.37%
over 65 4485 1.02 0.25 24.65% 1.08 0.32 29.28%
Total 64207 11.05 3.36 30.40% 10.95 3.26 29.77%
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 12 of 16
![Page 23: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/23.jpg)
Propensity for geographic area
a) Heckman b) Gradient Boosting
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 13 of 16
![Page 24: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/24.jpg)
Conclusions
• Advantages:
- all data are processed
- distribution free approach
- no transformation variable is required
- no problems with multicollinearity
• Further developments:
- extension of the analysis to the entire population
- robustification via ensemble with other models (Xgboost and Neural
Network)
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 14 of 16
![Page 25: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/25.jpg)
Conclusions
• Advantages:
- all data are processed
- distribution free approach
- no transformation variable is required
- no problems with multicollinearity
• Further developments:
- extension of the analysis to the entire population
- robustification via ensemble with other models (Xgboost and Neural
Network)
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 14 of 16
![Page 26: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/26.jpg)
Bibliography
[1] Statuto dell’Agenzia delle Entrate.
[2] Braiotta A., Carfora A., Pansini R.V., Pisani S.; Tax Gap and redistributive aspects
across Italy, 2015.
[3] Heckman James J.; Sample Selection Bias as a Specification Error, Econometrica 47,
no. 1 (1979): 153-61.
[4] Greene William H.; Econometric Analysis (Fifth ed.), Prentice-Hall, 2003.
[5] Friedman Jerome H.; Greedy Function Approximation: A Gradient Boosting
Machine, Annals of Statistics 29(5):1189-1232, 2001.
[6] Friedman Jerome H.; Stochastic Gradient Boosting, Computational Statistics and
Data Analysis 38(4):367-378, 2002.
[7] Bianca Zadrozny; Learning and Evaluating Classifiers under Sample Selection Bias,
Proceedings of the twenty-first international conference on Machine learning, 2004.
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 15 of 16
![Page 27: GiovannaTagliaferri · Outline 1 Introduction 2 2-Steps Gradient Boosting 3 Application on results from fiscal audits Dataset description Selection bias correction Potential tax](https://reader034.vdocument.in/reader034/viewer/2022050606/5fad2e95b2ad1a21194d81a7/html5/thumbnails/27.jpg)
Thanks for your attention!
VAT Tax Gap prediction: a 2-stepsGradient Boosting approach 16 of 16