Models to predict client’s phone
calls to Iberdrola Call Center
VI Modelling Week UCM
PARTICIPANTS (UCM): Cazallas Piqueras, Rosa Gil Franco, Dolores M Gouveia de Miranda, Vinicius Herrera de la Cruz, Jorge Inoñan Valdera, Danny Javier Advisor: Ivorra, Benjamin
Outlook
1. Problem description
2. Data analysis
3. Modelling
4. Validation and prediction
5. Conclusions and perspectives
1. Problem description
Iberdrola is a company that provides electricity and/or gas to the customers.
Customers can contact Iberdrola, for instance, by letters, internet and Phone calls (Call Center).
In a Call Center: to avoid over-staff and under-staff
Iberdrola needs a good prediction about the number of operators.
Objective: Find a model to predict the volume of customer calls.
Problem description
Forecasting methodology: for operational reasons,
we need to predict the volume of calls for next month, six weeks in advance.
Problem description
2. Data analysis
Data analysis
Weekly data
Portfolio 1: regulated market Portfolio 2: free market Portfolio 3: gas market (since 01/2007) Bill type 1: estimated consumption Bill type 2: real consumption Bill type 3: fixed quota without consumption Bill type 4: other (corrections, material, etc.)
Daily data
Calls: number of calls target variable
Transformed to daily data
Data analysis
0
2000000
4000000
6000000
8000000
10000000
12000000
14000000
16000000
18000000
20000000
2_2
00
4
23
_20
04
44
_20
04
12
_20
05
33
_20
05
2_2
00
6
23
_20
06
44
_20
06
13
_20
07
34
_20
07
3_2
00
8
24
_20
08
45
_20
08
14
_20
09
35
_20
09
3_2
01
0
24
_20
10
45
_20
10
14
_20
11
35
_20
11
4_2
01
2
25
_20
12
46
_20
12
Portfolio1
Portfolio2
Portfolio3
suma
2009 2012
EVOLUTION
Data Analysis
0
100000
200000
300000
400000
500000
600000
700000
800000
2_2
00
4
21
_20
04
40
_20
04
6_2
00
5
25
_20
05
44
_20
05
11
_20
06
30
_20
06
49
_20
06
16
_20
07
35
_20
07
2_2
00
8
21
_20
08
40
_20
08
7_2
00
9
26
_20
09
45
_20
09
11
_20
10
30
_20
10
49
_20
10
16
_20
11
35
_20
11
2_2
01
2
21
_20
12
40
_20
12
BillType1
BillType2
BillType3
BillType4
suma
2009: Change due to an European regulation (monthly bills instead of twice-monthly bills).
Data Analysis
1 2 3 4 5 6 7 8 0
50.000
100.000
150.000
200.000
250.000
300.000
350.000
400.000
1 6 11 16 21 26 31 36 41 46 51
Calls seasonality pattern Calls distribution during weeks and holidays
0 20000 40000 60000 80000
100000 120000 140000 160000 180000 200000
1-1
-20
08
1-2
-20
08
1-3
-20
08
1-4
-20
08
1-5
-20
08
1-6
-20
08
1-7
-20
08
1-8
-20
08
1-9
-20
08
1-1
0-2
00
8
1-1
1-2
00
8
1-1
2-2
00
8
1-1
-20
09
1-2
-20
09
1-3
-20
09
1-4
-20
09
1-5
-20
09
1-6
-20
09
1-7
-20
09
1-8
-20
09
1-9
-20
09
1-1
0-2
00
9
1-1
1-2
00
9
1-1
2-2
00
9
M T W Th F S Su H
Holidays
Special Events
3. Modelling
SARIMA Model
SARIMA MODEL (1,1,1)(0,1,1)7
White noise
calls
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
110000
120000
fecha
01/02/2012 16/02/2012 01/03/2012
mes=2
PLOT calls Predicción para logcalls
Límite de confianza superior 95% Límite de confianza inferior 95%
SARIMA Model
Example of a forecast:
Forecast
Real data
C.I. 95%
SARIMA Model
Example of forecast error:
SARIMA Model
33,0%
Low precision for long run predictions
24,9% ARIMA models are generally suited for short
run forecasts, but fail in long term predictions!!!
A Structural model
The purpose is to estimate the relationship among some relevant inputs and the output:
Using a dynamical model for inputs. Adding some seasonal events. Estimating an additional model for the error term.
In order to simplify the inputs, we have created two
relevant variables:
NetBill2: the difference between Billtype2 and Billtype1 Neterror: the sum of Billtype3 and Billtype4
Could be due to measure errors in our daily generation of weekly inputs.
Structural Model We have found strong contemporary relation among
target and inputs
0
20000
40000
60000
80000
100000
-20000 20000 40000 60000 80000100000
NETABILL2
CA
LL
S
CALLS vs. NETABILL2 (iters = 4)
0
20000
40000
60000
80000
100000
-20000 20000 40000 60000 80000100000
NETABILL2_1
CA
LL
S
CALLS vs. NETABILL2_1
0
20000
40000
60000
80000
100000
0 10000 30000 50000 70000
NETABILLERROR
CA
LL
S
CALLS vs. NETABILLERROR (iters = 4)
0
20000
40000
60000
80000
100000
0 10000 30000 50000 70000
NETABILLERROR_1
CA
LL
S
CALLS vs. NETABILLERROR_1 (iters = 4)
Structural Model
Variable Coefficient (lag) Explanation
Target variable
0.20 (0) An increase of 1% in billtype 3 or 4 increases 0.20% the calls
-0.285(0) 0.141 (1)
An increase of 1000 billtype 2 over billtype1, decreases 1.4% the calls
Long term seasonality
Is a parsimonious way to take long run cyclical movements
Daily pattern Dummy variables to catch deterministic daily fluctuations
Error term Capturates autocorrelation pattern
Structural Model
0
20
40
60
80
100
120
140
160
-1.0 -0.5 0.0 0.5
Series: Residuals
Sample 9/05/2009 5/13/2011
Observations 616
Mean -0.007341
Median -0.000487
Maximum 0.559416
Minimum -1.066512
Std. Dev. 0.154450
Skewness -1.341027
Kurtosis 12.80172
Jarque-Bera 2650.525
Probability 0.000000
We obtain a white noise residual correlogram…
…however…
…residuals are not normal. Squared residual correlogram are
not white noise
In a next step, we should study the posibility of any GARCH-ARCH structure for error term to model the error volatility.
Some Naïves Models
Consider the number of calls of previous Year (Y-1)
Adjust the Y-1 model with a coefficient proportional to the difference with the real data (A-1)
Apply the same variation (%) of the number of calls from Y-1 to Y-0 than from Y-1 to Y-2 (Diff) 35000
40000
45000
50000
55000
60000
65000
70000
75000
80000
10/17/2011 10/18/2011 10/19/2011 10/20/2011 10/21/2011
Real
A-1
Y-1
Y-2
Diff
4. Validation and prediction
Validation and prediction
Structural SARIMA Diff A-1 Y-1
9,31 14,97 12,40 10,32 11,68
Mean relative error
30000
35000
40000
45000
50000
55000
60000
65000
70000
75000
80000
26/12/2011 27/12/2011 28/12/2011 29/12/2011 30/12/2011
Structural
SARIMA
Differential
PY-1
Y-1
Real
A-1
Structural
SARIMA
Y-1
Diff
Weightened Models
Following Clements and Hendry we have considered a model based on the weigths of the solutions generated by all the models (Structural, SARIMA, Differential, A-1).
We define Jerr(α1, α2 ,α3 ,α4 ) the relative error function which is computed considering the weights (α1, α2 ,α3 ,α4 ) ϵ ϴ=[0,1]4.
We minimize Jerr in ϴ by considering a Newton method. Then, we obtain:
Mean Error : 8,45 (vs. 9,31 for Structural)
Structural SARIMA Diff A-1
α1 =0.6 α2 =0 α3 =0 α4 =0.4
Validation and prediction
0
10000
20000
30000
40000
50000
60000
70000
Pond
Min
Max
We have predicted from 28/05/2012 to 30/12/2012. For instance, a week obtained from inputs generated by Iberdrola.
Weight
5. Conclusions and perspectives
Conclusions and perspectives
Conclusions
We have generated models with error values between 8%-15%.
In addition, we have generated a prediction for future number of calls from an unknown time interval.
Perspectives
Compare our solution with the real data when available.
We would like to have additional variables such as
campaigns, discretional events, etc.
Thank You!