using optimization with predictive modeling to increase ... · coupling analytics and optimization...

32
Naoki Abe Prem Melville Cezar Pendus Chandan Reddy David Jensen Brenda Dietrich IBM Research Vince Thomas James Bennett Gary Anderson Brent Cooley Shaun Barry IBM Global Business Services Gerard Miller Melissa Weatherwax Timothy Gardinier Thomas Mattox New York State DTF Using Optimization with Predictive Modeling to Increase Tax Collections

Upload: others

Post on 28-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Naoki Abe Prem Melville Cezar Pendus Chandan Reddy David Jensen Brenda Dietrich IBM Research

Vince Thomas James Bennett Gary Anderson Brent Cooley Shaun Barry IBM Global Business Services

Gerard Miller Melissa Weatherwax Timothy Gardinier Thomas Mattox New York State DTF

Using Optimization with Predictive Modeling to Increase

Tax Collections

Page 2: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

References

1

• KDD 2010 Lecture - Naoki Abe

• Optimizing debt collections using

constrained reinforcement

learningProceedings of the 16th Conference

on Knowledge Discovery and Data Mining

(KDD-10), Washington D.C., July, 2010.

• Tax Collections Optimization for New York

State - Interfaces Vol. 42, No. 1, January–

February 2012, pp. 74–84

Page 3: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 2

Why Operations Research and Business Analytics?

NYS Collections and Civil Enforcement Division (CCED)

Staff of 1000 +

2/3 field staff, 1/3 Central

All enforcement action manual

Single Skill call center

Most cases start in central office,

follow linear collection cycle

Collections $500 million

Staff of about 700

1/3 field, 2/3 Central

Major enforcement actions automated

State of the art contact center

Most cases start in central office,

follow linear collection cycle

Collections $1 billion

Page 4: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 3

Already a success story, but something was missing.

Move to field, if

large enough

If unpaid

If unpaid Serve IE, if

possible

Levy, if possible Issue warrant, if

allowed

If unpaid

Assigned to

Call Center

Issue

Letter If unpaid

If unpaid Complete Uncollectable

Collections process was too linear – “one size fits all”

Page 5: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

One Size Does Not Fit All

• Resource allocation: one size fits all rules

• Speed matters

• Use the right tool

• Correct action is not defined by

what is allowable

• Taxpayers past behavior is predicative of future behavior; so why weren’t we considering it?

Slide 4

Page 6: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 5

Our actions were based on what

could be done.

We needed to find a way to base them on what should be done

Page 8: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

The Technical Challenge

• Tax collections process is a complex process involving various legal/business constraints

• Most existing approaches rely on rigid, manual rules, including NYS legacy system

• Goal: take this rigid procedure apart, leaving fragments of it intact wherever necessary, and automatically configure the rest, based on analytics and optimization

Page 9: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 8

The Framework: Constrained MDP

Markov Decision Process (MDP) formulation provides an advanced

framework for modeling tax collection process

“States”, s, summarize information on a taxpayer’s stage in collection

process

“Action”, a, is a collection action (e.g. phone call, warrant, levy)

“Reward”, r, is the tax collected for the taxpayer in question

The goal in MDP is formulated as outputting a policy which maps TP’s states

to collection actions so as to maximize the long term cumulative rewards

Constrained MDP requires additionally that output policy belongs to a

constrained class adhering to certain constraints

Assigned to

Call Center

Available

To levy

<B, x1..,xN>

Find FIN

Sources

Assessment

Initiated

Contact

Taxpayer by

phone

No Response

From Taxpayer

Issue warrant

Install

Payment

Available

To Warrant

Payment

An Example MDP

Page 10: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Methodology:

Constrained Reinforcement Learning (C-RL)

Business requirement to customize collections actions depending on

detailed taxpayer characteristics

Use of a high-dimensional state space necessary

With high-dimensional state space

Estimating the structure of MDP is extremely challenging

Reinforcement Learning (RL) solves MDP with access only to data (not MDP itself)

We develop constrained-RL (C-RL) methods for high dimensional state space

Amt_Paid

Assmnt_Value

Time_snc_mtrd

Time_assngd_DO

Total_Amt_Paid

Time_snc_wrrntd

Num_pymnt

Page 11: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 10

Constrained Markov Decision Process

• The goal of a reinforcement learner in a constrained MDP is to

learn a policy, namely π: S → A, mapping states to actions, so as

to maximize the cumulative discounted reward:

such that π belongs to a prescribed constrained class of policies Π

• In particular, we consider MDP with “stationary linear constraints”, i.e. where Π is a set of stochastic policies π such that there are n

linear constraints of the form:

where μ is a “nearly stationary” state distribution (for the training

data)

),( R0t

tt

t asR

i,,, Ba)](s, [E asiC

Page 12: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 11

Constrained RL Methods

• Constrained Value Iteration

– Provides basis for constrained reinforcement learning

methods

• Constrained Q-Learning

– Gradually solves constrained value iteration

Q0 (s,a) = Er[r(s,a)]

Qk+1(s,a) = Epk

*,t ,r[r(s,a)+g ×Qk (t (s,a),p k

*(t (s,a)))]

p k

* = arg maxpÎPEpk

*,t ,r

[r(s,a)+g ×Qk (t (s,a),p k

*(t (s,a)))]

Q0 (s,a) ¬ r

Qk+1(s,a) ¬ (1-a)Qk (s,a)+a(r +g ×Qk (s',p k

*(s')))

p k

* = arg maxpÎPEp [r +g ×Qk (s',p (s'))]

Page 13: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 12

A Concrete Algorithm for Constrained Reinforcement Learning

• Technical Issues – Variable time intervals between states

– Eliminating dependence of reward estimates on state variables

• Definition of Advantage – A(s,a):= (1/Δt)(Q(s,a) – maxa’ Q(s,a’))

• Extended Advantage Updating Procedure [Baird ’94] to the Constrained RL Setting

– Aopt calculated based on action allocations output by constrained LP

Repeat

1. Learn for input set of (s,a)’s

1.1. A(s,a):=(1-α)A(s,a)

+α (Amax(s)+(R(s,a)+γΔtV(s’)-V(s))/Δt)

1.2. Use Regression to estimate A(s,a)

1.3. V(s):=(1-β)V(s)

+β(V(s)+(Amax(s)-Amax-old(s))/α) 2. Normalize for the same (s,a)’s

A(s,a):=(1- ω)A(s,a)+ω(A(s,a)-Amax(s))

Repeat

1. Learn for input set of (s,a)’s

1.1. A(s,a):=(1-α)A(s,a)

+α (Aopt(s)+(R(s,a)+γΔtV(s’)-V(s))/Δt)

1.2. Use Regression to estimate A(s,a)

1.3. V(s):=(1-β)V(s)

+β(V(s)+(Aopt(s)-Aopt-old(s))/α) 2. Normalize for the same (s,a)’s

A(s,a):=(1- ω)A(s,a)+ω(A(s,a)-Aopt(s))

Page 14: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Methodology Details:

Coupling analytics and optimization via C-RL

))(,(max),( 11 ttttt ssRrasR

• A generic CRL procedure for estimating expected long term cumulative rewards (R) calling data analytics and optimization iteratively

• Deployed algorithm is a variant of it, focusing on estimating the relative

advantage of competing actions

Optimization embedded within Iterative Modeling

NAME STAFF HOURS

Binghamton 4 30

Buffalo 14 105

Capital Region 23 172.5

Rochester 9 67.5

Syracuse 7 52.5

Utica 5 37.5

Metro 53 397.5

Estimation with Segmented Linear Regression (ProbE)

R = 1.3 Field_Visit+0.4 Mail

+0.0 Warrant + 2.3 Levy R = 3.5 Field_Visit+1.2 Mail +

1.6 Warrant + 45.8 Levy

Segment regression model R Segment regression model R

over a large population

1 2 3 4 5 6 7 8 9 10 11 12S1

0

5000

10000

15000

20000

1 2 3 4 5 6 7 8 9 10 11 12

S1

-6

-4

-2

0

2

4

6

8

10

12

Optimization with Linear Programming (COIN)

Action Allocations Action Effectiveness

Rule

Number Contents

502.12 A collection letter should not be sent to

a TP with invalid mailing address

2000.1 A contact action should only occur for a

TP with at least 1 open assessment

2005.9 A contact by mail bust not be made ofr a

TP with an active promise-to-pay

NAME STAFF HOURS

Binghamton 4 30

Buffalo 14 105

Capital Region 23 172.5

Rochester 9 67.5

Syracuse 7 52.5

Utica 5 37.5

Metro 53 397.5Resource Constraints Action Constraints

Page 15: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Empirical evaluation

A more standard approach

Our method (CRL)

Observed policy

Our method (CRL)

RL + Optimization

Segment model 1 Segment model 2 Segment model 3 …..

Segment model 6

Evaluation using actual data from NYS DTF Evaluation using public data in marketing

Page 16: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 15

Modeling and optimization respects resource, business and legal

constraints

• The goal of optimization* is to assign actions

• Given estimates of action effectiveness

• Subject to constraints of type:

Resource constraints (per group)

Direct action bounds (per action)

Action constraints (via rules)

• To maximize total expected reward

Action

Seg.

Field

visit

Phone Mail

S1 $1000 $800 $700

S2 $1000 $50 $25

S3 $2000 $1000 $1000

S4 $100 $20 $10

Action

Seg.

Field

visit

Phone Mail

S1 $1000 $800 $700

S2 $1000 $50 $25

S3 $2000 $1000 $1000

S4 $100 $20 $10

District

Office

CVS

Call

Center

Action

Seg.

Field

visit

Phone Mail

S1 $1000 $800 $700

S2 $1000 $50 $25

S3 $2000 $1000 $1000

S4 $100 $20 $10

Assign Actions

Action

Seg.

Field

visit

Phone Mail

S1 0 800 0

S2 $1000 $50 0

S3 $2000 $1000 0

S4 $100 $20 0

Action

Seg.

Field

visit

Phone Mail

S1 0 0 800

S2 $1000 $50 50

S3 $2000 $1000 250

S4 $100 $20 0

Action

Seg.

Field

visit

Phone Mail

S1 200 0 0

S2 1000 0 0

S3 500 0 0

S4 500 0 0

*We use IBM’s Linear Programming Engine (COIN) as sub-routine

Page 17: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 16

Types of Constraints (details)

NAME STAFF HOURS

Binghamton 4 30

Buffalo 14 105

Capital Region 23 172.5

Rochester 9 67.5

Syracuse 7 52.5

Utica 5 37.5

Metro 53 397.5

Nassau 13 97.5

Queens 20 150

Suffolk 15 112.5

Westchester 21 157.5

High Value 13 97.5

Call Center 40 300

CAT 6 45

Bankruptcy 6 45

Offer in Compromise (OIC) 9 67.5

Collection Vendor Support (CVS) 8 60

Individual Case Enforcement (ICE) 11 82.5

Action Hours per action Bounds

Contact_Taxpayer_Phone 0.14 N.A.

Contact_Taxpayer_Phone 0.14 N.A.

Create_Warrant 0.01 8000<14000

Create_Levy 0.01 ?.

Create_1st_Service_IE 0.086 N.A.

Move_to_DO 0 2000<5910

Move_To_HiValue 0 165<330

Move_to_CVS 0 300<340

Move_to_ICE 0 70<100

No_Action 0 N.A.

Rule Number Contents

502.12 A collection letter should not be sent to a TP with

invalid mailing address

2000.1 A contact action should only occur for a TP with

at least 1 open assessment

2005.9 A contact by mail bust not be made ofr a TP with

an active promise-to-pay

2601 A levy is not allowed for a TP unless the TP has

at least 1 perfected warrant

Resource constraints

∑i Number of actioni assigned to groupj * Time to perform actioni

≤ Total available man hours for the groupj

Direct action bounds (mostly on “move to” actions)

∑j Number of actioni assigned to groupj ≤ Action upper bound for actioni

Action constraints

j Number of actioni assigned to micro-segmentj = 0 if actioni is invalid

A

Action allocation depends on modeling segment, organization (group), and valid action pattern!

Page 18: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 17

Modeling and optimization details: Segments and Micro-segments

Segment 155 Definition and $X <= tax_pd_prv_yr

and $Y <= sum_wrrnt_cllct_asmts

and num_wrrntbl_tx_typs < 1

and 1 <= num_tms_cs_cmpltd

and $Z <= sum_asmts_avail_to_wrrnt < $W

Mail

Warr

ant

Levy

HiV

alu

e

ICE

Fie

ld V

isit

Actual actions

0

500

1000

1500

2000

2500

Action Distributions for Segment 155

Actual actions 604 447 922 3 678 411 1 117 11 2083 326 2235

Allocated actions 1647 0 0 1 651 0 55 0 0 993 313 1335

MailPhon

e

Warr

antFS IE Levy DO

HiVal

ueCVS ICE UC

Field

Visit

No

Actio

n1 2 3 4 5 6 7 8 9 10 11 12

DEM DEM

DBD DCI

DCI DCI

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Allocations

Actions

An example segment Some corresponding micro-segments

First 4 (of 54) microsements in Segment 155

Microsegment 1 =

(155,DEM,DEM,000010111101)

Microsegment 2 =

(155,COM,DEM,000000000001)

Microsegment 3 =

(155,COM,DCT,000000000001)

Microsegment 4 =

(155,DBD,DCI,000010101101)

valid action pattern

modeling segment

organizations (groups)

Page 19: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 18

Data Generation: TP features and action constraints

Modeling feature definitions (xml)

TP Profile (xml)

Business & legal constraint rules (xml)

Transform

(xslt)

Asmt mat Phone Warrant Levy Payment Letter

Time

TP events

TCD_Action 1 0 0 0 3 0

Num_Asmt_Mat 0 1 1 1 1 2

Num_Asmt_Prot 0 0 0 1 1 0

Cur_Tax 500 500 1200 1200 0 0

Reward 0 0 0 0 1200 0

Crt_Lvy_Allwd 0 0 1 0 1 1

Cntct_phn_Allwd 1 1 1 0 1 0

TP state features + action constraints

<taxpayer case_id="0123" vs="1900-01-01" ve="9999-12-31" tp_id="B123">

<episodes vs="1900-01-01" ve="9999-12-31">

<episode vs="2010-01-18" ve="9999-12-31"/>

</episodes>

<asmts vs="1900-01-01" ve="9999-12-31">

<asmt id="0123" tx_type="CT" rtn_earliest_due_dt="2009-03-16" vs="2010-02-24" ve="9999-12-31">

<reason vs="2010-02-24" ve="9999-12-31">LFE</reason>

<state vs="2010-02-24" ve="9999-12-31">OPEN</state>

<liability vs="2010-02-24" amount="33.87" tax="0" pen="31.65" int="2.22" ve="9999-12-31"/>

</asmt>

<asmt id="456" tax_type="WT" rtn_earliest_due_dt="2010-01-08" vs="2010-01-18" ve="2010-03-05">

<matured vs="2010-02-18" ve="2010-03-05"/>

<reason vs="2010-01-18" ve="2010-03-05">LFF</reason>

<state vs="2010-03-05" ve="2010-03-05">FULL PAID</state>

<state vs="2010-01-18" ve="2010-03-04">OPEN</state>

<liability vs="2010-03-05" ve="2010-03-05" amount="0" tax="0" pen="0" int="0"/>

<liability vs="2010-01-18" amount="100.42" tax="0" pen="94.56" int="5.86" ve="2010-03-04"/>

</asmt>

</asmts>

</taxpayer>

State feature vector

Action constraints

Reward

Action

Page 20: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 19

Segment size 1.0%

Segment Definition

state = Call_Center_Not_Warranted

and tax_pd_lst_yr < $ X

and 1 <= num_st_ff_cllct_asmts

and 1 <= num_non_rstrctd_fin_srcs

and 1 <= st_inactv_ind

and num_pymnts_snc_lst_actn < 1

Interpretation

– Case is in Call Center and has not been warranted

– There is at least one non restricted fin source

identified

– Sales tax inactive indicator is on

– There was no payment in the last period

– Tax paid last year is less than X dollars

Create warrant recommended

Mail

Warr

ant

Levy

HiV

alu

e

ICE

No A

ction

Actual-1000

0100020003000400050006000

Action Distributions Segment 212

Actual 1522 612 1702 0 3 126 2 0 0 3 1293

Allocated 0 0 5103 0 0 0 0 0 0 0 152

Coefficients -0.208 0.821 1.934 0 130.22 0.662 -0.109 0 0 -0.139 0

Mail PhoneWarra

ntFS IE Levy DO

HiValu

eCVS ICE

Field

Visit

No

Action

An example segment

Page 21: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 20 Slide 20

A segment for which warrant is allocated

Segment size 1.0%

Segment Definition

state = CCN

and tax_pd_lst_yr < $ X

and 1 <= num_st_ff_cllct_asmts

and 1 <= num_non_rstrctd_fin_srcs

and 1 <= st_inactv_ind

and num_pymnts_snc_lst_actn < 1

Interpretation

– Case is in Call Center and has not been warranted

– There is at least one non restricted fin source identified

– Sales tax inactive indicator is on

– There was no payment in the last period

– Tax paid last year is less than X dollars

Create warrant recommended

Mail

Warr

ant

Levy

HiV

alu

e

ICE

No A

ction

Actual-1000

0100020003000400050006000

Action Distributions Segment 212

Actual 1522 612 1702 0 3 126 2 0 0 3 1293

Allocated 0 0 5103 0 0 0 0 0 0 0 152

Coefficients -0.208 0.821 1.934 0 130.22 0.662 -0.109 0 0 -0.139 0

Mail PhoneWarra

ntFS IE Levy DO

HiValu

eCVS ICE

Field

Visit

No

Action

Page 22: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 21 Slide 21

A segment for which levy/mail are allocated

Segment size 0.5%

Segment Definition

state = DOW

and $ X <= tax_pd_lst_yr

and ttl_liability_blnc < $ Y

and st_inactv_ind < 1

and sum_pymnts < $ Z

and 1 <= num_non_rstrctd_fin_srcs

Interpretation

– Case is in DO and has been warranted

– A fin source has been identified

– Total liability balance is less than Y dollars

– Sales tax inactive indicator is off

– Total sum of payments is less than Z dollars

– Tax paid last year is greater than X dollars

Create levy recommended

Mail

Warr

ant

Levy

HiV

alu

e

ICE

No A

ction

Actual-200

0200400600800

10001200140016001800

Action Distributions Segment 497

Actual 51 94 517 0 620 45 0 0 0 610 554

Allocated 1656 0 0 0 554 0 0 0 0 0 249

Coefficients 43.458 72.428 0.73 0 29.08 -7.082 0 0 0 29.595 0.863

Mail PhoneWarra

ntFS IE Levy DO

HiValu

eCVS ICE

Field

Visit

No

Action

S

Page 23: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 22 Slide 22

A segment for which mail is allocated

Segment size

0.23%

Segment Definition

state = DON

and $ X <= tax_pd_lst_yr

and sum_pymnts < $ Y

and $ Z <= sum_pymnts_lst_yr

and ttl_liability_blnc < $ W

and 15 <= sum_cllct_ac_asmts

Interpretation

– Case is in DO and has not been warranted

– Tax paid last year exceeds X dollars

– Total sum of payments is less than Y dollars

– Sum of payment last year exceeds Z dollars

Mail recommended

Mail

Warr

ant

Levy

HiV

alu

e

ICE

No A

ction

Actual-200

0200400600800

10001200

Action Distributions Segment 437

Actual 32 68 430 0 59 7 0 0 0 332 306

Allocated 1004 0 0 0 0 0 0 77 0 0 141

Coefficients 35.636 28.179 -29.97 0 -33.48 -23.95 0 0 0 -13.51 -2.351

Mail PhoneWarra

ntFS IE Levy DO

HiValu

eCVS ICE

Field

Visit

No

Action

S

Page 24: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 23 Slide 23

A segment for which move to DO is allocated

Segment size 0.3%

Segment Definition

state = CCN

and $ X <= tax_pd_lst_yr

and num_pymnts_snc_lst_actn < 1

and $ Y <= sum_cllct_asmts

and 1 <= num_non_rstrctd_fin_srcs

and sum_pymnts_lst_yr < $ Z

and st_inactv_ind < 1

and $ W <= sum_asmts_avail_to_wrrnt

Interpretation

– Case is in Call Center and has not been warranted

– Sum available to warrant exceeds W dollars

– A fin source has been identified

– Sales tax inactive indicator is on

– Tax paid last year exceeds X dollars

– Sum of collectible assessments exceeds Y dollars

– Sum of payment last year is less than Z dollars

Move to DO recommended

Mail

FS

IE

HiV

alu

e

Fie

ld V

isit

Actual-200

0200400600800

1000120014001600

Action Distributions Segment 341

Actual 197 68 203 0 1 406 6 0 2 0 744

Allocated 0 0 0 0 0 201 0 0 0 0 1424

Coefficients -10.11 -11.56 -3.425 0 0 57.197 43.789 0 0 0 0

Mail PhoneWarra

ntFS IE Levy DO

HiValu

eCVS ICE

Field

Visit

No

Action

S

Page 25: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Integrating analytics, optimization and rules

Actions

Other

System 1

System 2

System 3

Event

Listener

Event

TP Profile

Taxpayer State (

Current

Modeler

Optimizer

State

Generator

Case Inventory

Allocation Rules

Resource Constraints

Business Rules

>

Segment Selector Action 1

Cnt Action 2

Cnt Action n

Cnt

1 C 1

C 2

V C 3

200 50 0

2 C 4

V C 1

C 7

0 50 250

TP ID Feat 1

Feat 2

Feat n

123456789 00 5 A 1500

122334456 01 0 G 1600

122118811 03 9 G 1700

Rules

Processor Recommended Actions

TP ID Rec. Date Rec. Action Start Date

123456789 00 6/21/2006 A1 6/21/2006

122334456 01 6/20/2006 A2 6/20/2006

122118811 03 5/31/2006 A2

BPM

New Case

Case Extract

Scheduler

State

Time Expired Taxpayer State

(Training Data)

State

TP ID State Date Feat 1

Feat 2

Feat n

123456789 00 6/1/2006 5 A 1500

122334456 01 5/31/2006 0 G 1600

122118811 03 4/16/2006 4 R 922

122118811 03 4/20/2006 9 G 1700

Feature Definitions

• Receives as input: “business rules” (action constraints); “resource constraints” and “taxpayer state features” (training data)

• Performs data analytics and optimization (CMDP)

• Produces as output “segmentation and allocation rules” for allocating actions

Page 26: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 25 Slide 257

+ 2.18%

+ 3.14%

+ 5.58%

Year to Year Increase in Revenue 2007-2010

Levy $ + 1 .47%

Results: The Numbers

Page 27: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Was it CISS? What about staffing?

Down 7% (20)

In 2010 vs. 2009

Tax Reps

(Contact Center)

Down 3% (6)

In 2010 vs. 2009

Tax Agents

(Field)

Page 28: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Average age of cases when assigned

to field decreased by 9.3%

Dollars per staff day increased by 15%

for field agents

Overall collections from field staff

increased by 12%

CISS assignment of cases was only

major change for field

Was it CISS? Were the expected results achieved?

What about in the field?

Page 29: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Dollars Per Warrant

Dollars Per Levy

• Dollars per warrant increased by 22% in 2010 vs. 2009. This generated

an overall increase in revenue of 13%.

• Dollars per levy increased by 11% in 2010 vs. 2009. This generated an

overall increase in revenue of 7%.

$796 $975

$0

$500

$1,000

$1,500

YEAR 2009 YEAR 2010

$446

$497

$420

$430

$440

$450

$460

$470

$480

$490

$500

$510

YEAR 2009 YEAR 2010

Was it CISS? Were expected results achieved? What about enforcement actions?

Page 30: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Number of warrants filed

decreased by 9%

Number of levies served

decreased by 3%

228,159

208,217

195,000

200,000

205,000

210,000

215,000

220,000

225,000

230,000

YEAR 2009 YEAR 2010

275,064

268,326

264,000

266,000

268,000

270,000

272,000

274,000

276,000

YEAR 2009 YEAR 2010

35,000 less taxpayers had these serious enforcement

actions taken against them

Beyond Revenue

Page 31: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Slide 30

Concluding Remarks Summary

– We presented a novel approach to optimizing debt collections and described an actual

deployment at NYS DTF for tax collections optimization

Contributions

– Technical novelty: tight coupling of data analytics and optimization in constrained-MDP

framework

– Degree of automation: achieved to a degree that is unprecedented in debt collections

optimization

– A large scale real world deployment of a cutting edge DM-based solution

Benefits

– Monetary benefits: 60 to 100 million dollars increase in revenue expected over next

three years

– Non-monetary benefits: flexibility, robustness and labor-free adaptation due to data

centric approach

– “I believe this project keeps us on the forefront of technology and gives us (NYS) the

edge we need to collect taxes in these tough times. More important, it will provide

another mechanism for us to administer a fair and equitable taxing system for all

taxpayers of New York State” (Tim Gardinier, NYS DTF)

Page 32: Using Optimization with Predictive Modeling to Increase ... · Coupling analytics and optimization via C-RL J S V S 3 R(s t, a t) r t max R(s t 1, (s t 1)) • A generic CRL procedure

Naoki Abe Prem Melville Cezar Pendus Chandan Reddy David Jensen Brenda Dietrich IBM Research

Vince Thomas James Bennett Gary Anderson Brent Cooley Shaun Barry IBM Global Business Services

Gerard Miller Melissa Weatherwax Timothy Gardinier Thomas Mattox New York State DTF

Using Optimization with Predictive Modeling to Increase

Tax Collections