machine learning: challenges and opportunities for non-life ......2019/06/26 · reacfin breakfast...
TRANSCRIPT
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
IMPORTANT : This presentation is only the supporting document of an oral presentation. It is not intended to be exhaustive. Quoting or using this document on its own might bemisleading. Furthermore, although the authors have been careful in the selection of their sources and assumptions, the authors cannot guarantee that all information inthe document are exact or correct. As a result, these materials may not be used by anybody except their authors nor should they be relied upon in any way for any purposeother than as contemplated by a written agreement with Reacfin.
Brussels - 26th June 2019
Machine learning: Challenges and opportunities for non-life pricing and underwriting
Please read the important disclaimer at the end of this presentation Strictly Confidential
By Samuel Mahy [email protected] Maréchal [email protected]
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
1. Introduction and context
2. How to create competitive advantages with Machine Learning for insurance companies?
3. How to solve the Machine Learning challenges for companies?
4. Using machine learning in pricing and underwritting: how to start?
5. Conclusions
Topics to be covered today
P. 2
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
1. Introduction and context
a. What are the Challenges in non-life insurance?
b. What’s machine learning ?
c. Where is Machine Learning used in insurance?
2. How to create competitive advantages with Machine Learning for insurance companies?
3. How to solve the Machine Learning challenges for companies?
4. Using machine learning in pricing and underwritting: how to start?
5. Conclusions
Topics to be covered today
P. 3
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
To adress these challenges, Insurers have to
Innovate in product development and surrounding services
Capture and identify relevant featuresfor pricing models
Adapt faster to market changes (identification, building of new models, faster deployement)
Optimise retention and renewalpricing
Non-life insurers are facing many challenges putting pressure on their business model and profitability
P. 4
Increasing competition
Commoditisation of insurance products
Sophistication in pricing
Pricing comparison systems
Availability of new data sources
External data (IoT, open data,…)
Use of unstructured data
New customers behavior
Digitalisation of underwriting process
Direct vs Brokers
Focus on price (made possible
thanks to pricing comparison systems)
Main challenges faces by insurance companies
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
In order to face these challenges, key points of differentiation must be developped by insurers
P. 5
Competitive advantagesin the future
Creative sourcing of data (new sources of external data, behavior-
influencing data monitoring)
Creative usage of data
Distinctiveness of analytic methods
Advanced analytics : far beyond traditionalactuarial sciences
Natural langage processing,
Image processing, …
Large data storage and management
Technology changes muchfaster than people
Insurers should not only invest in analytics technologies
Key for insurers is to train & motivate their highly skilled
experts to adopt the newest tools
Make sure people use Advanced Analytics with creativity,
confidence and consistency
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Supervised learning: • Inputs and examples of their desired outputs are provided
• The goal is to learn a general rule that maps inputs to outputs.
Given a set of training examples (x1, x2,…, xn, y), where y is the variable to be predicted , what is the most efficient algorithm to best approximate the realizations of y
• 2 main techniques Classification : inputs are divided into two or more classes, and the learner must produce a model that assigns
unseen inputs to one (or multi-label classification) or more of these classes.
Regression: the outputs are continuous rather than discrete.
Unsupervised learning: • No labels are given to the learning algorithm
• The goal is to find structure in its input (discovering hidden patterns in data).
• Main technique Clustering: a set of inputs is to be divided into groups. Unlike in classification, the groups may not be known
beforehand.
What is machine learning?
P. 6
Objectives of Machine Learning (“ML”)
ML algorithms aim at finding by themselves the method that best predicts the outcome of the studied phenomenon.
Supervised vs. Unsupervised learning
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Start by assuming the explanatory model is known and key explaining variables are identified.
Objective : confirm the model assumption and calibrate as accurately as possible the model parameters so that errors can be minimized.
Comparing traditional statistical inference and ML approaches
P. 7
Conceptual difference
Statistical inference
techniques
Machine Learning
techniques
Start from lesser assumption
Objective : the algorithm itself identifies the key explanatory variables and their impact on the response variable.
Starting point & objective Implementation approach
Infer the process by which data you have was generated.
Estimate the model parameters which describe the relationship between the explanatory variables and the dependent variable.
You want to know how you can predict what future data will look like w.r.t. some variable.
The approach is to find a function f(x) – an algorithm that operates on x to predict the responses y
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Machine Learning and AI is the continuation of the evolution of tools and technologies used by actuaries and statisticians to analyze historical claims data: trying to improve the predictive power of models, solving the same problems with new methods, data and computer power available
Methods used in non-life pricing are evolving at a fast pace
P. 8
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
MarketingCustomer
ManagementUW Pricing Risk
Pricing Optimisation
Claims
Machine Learning techniques can be applied all along the value chain in insurance and not only for pricing
P. 9
Web-scraping and campaign steering
Customer segmentation (cross sell, up-sell, customer value)
Brokers’ performance evaluation Feature
engineering, Features selection, geodemographic segmentation,
Segmented & targeted Price increase, churn & new business
Retention Segmentation and management
Modeling risk, tariffs, control leakages, simulate impact of tariff changes
Competitor prices, reverse engineering, portfolio monitoring
Fraud detection
New product targets Simplification
quoting process
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
1. Introduction and context
2. How to create competitive advantages with Machine Learning for insurance companies?
a. How can we complete the Data Analytics Toolbox with Machine Learning techniques?
i. Defining model error and managing overfitting
ii. Regressing trees
iii.Random forest and boosting
iv.Artificial neural networks
b. How Machine Learning techniques make possible to boost size and type of data sources?
c. How Machine Learning techniques make possible to boost creativity in using data?
3. How to solve the Machine Learning challenges for companies?
4. Using machine learning in pricing and underwritting: how to start?
5. Conclusions
Topics to be covered today
P. 10
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
• 𝑌 = 𝛽0 + 𝛽1. 𝑋1+⋯+ 𝛽𝑛. 𝑋𝑛 + 𝜀
• Y is a direct linear combination of explanatory variables
• The errors are assumed to be Normally distributed: 𝜀 ∼ 𝑁 0, 𝜎2
• And so, 𝑌 ∼ 𝑁 𝜇, 𝜎2
• 𝑌 = 𝑔−1 𝛽0 + 𝛽1. 𝑋1+⋯+ 𝛽𝑛. 𝑋𝑛 + 𝜀
• Y is now a function (g-1) of a linear combination of the explanatory variables
• The distribution of the response variable does not need to be Gaussian anymore
It has to be a member of exponential family
• So, we’ll have for instance, 𝑌 ∼ Poi 𝜇 where 𝜇 = exp(𝛽𝑇𝑋)
Generalized Linear Models are still widely used by insurance companies for non-lifepricing and other applications
Distributions
𝐵𝑖𝑛 1, 𝜇
𝑃𝑜𝑖 𝜇
𝑁𝑜𝑟 𝜇, 𝜎2
𝐺𝑎𝑚 𝜇, 𝛼
𝐼𝐺𝑎𝑢 𝜇, 𝜎2
Linear Model (“LM”)
Generalized Linear Model (“GLM”)
P. 11
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
ML method minimize an error (or loss) function whereas statistical methods maximize likelihood
In ML, the error function which is minimized depends on the context.
In most cases, we can simply choose the sum of squared errors:
Error ො𝑦 =
𝑖=1
𝑛
ෝ𝑦𝑖 − 𝑦𝑖2 ,
where 𝑦𝑖 is the 𝑖th observation and ෝ𝑦𝑖 is the corresponding prediction.
However, for insurance applications, we must carefully choose our error function. E.g. when wewant to predict Poisson frequencies, it is better to instead consider the Poisson deviance statistics:
Error መ𝜆 =
𝑖=1
𝑛
𝑁𝑖 log𝑁𝑖መ𝜆𝑖 𝑣𝑖
− (𝑁𝑖 − መ𝜆𝑖 𝑣𝑖) ,
where 𝑁𝑖is the 𝑖th observation and 𝑣𝑖 and መ𝜆𝑖 are the corresponding exposure and predicted frequency.
Machine learning technique’s main focus is on prediction and therefore aim at minimizing an error function
P. 12
Which loss function to choose?
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
• When modelling, we should be sensibilized with overfitting/lack of parcimony.
• It occurs when a statistical model describes random error or noise instead of the underlying relationship.
• The goodness-of-fit indicators show a good result on the dataset used for the model calibration, but the predictive power is bad.
• Example: when trying to explain data variability using a set of explanatory variables, the more variables you use, the better are the 𝑅2, the residual sum of square, etc.
• One way to deal with this issue is to define goodness-of-fit indicators which take into account the number of parameters of the model and apply penalization, such the Akaike and Bayesian information criteria
• But these solutions are not satisfying.
• The choice of a penalization function is arbitrary! Why should it take these forms?
Overfitting deteriorates the predictive power of the models…
P. 13
The overfitting problem
Goodness-of-fit indicators
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Use two different datasets:
• A training set to calibrate the model,
• A test set (or validation set) to assess the model’s predictive ability.
…which can be improved by separating the data into a training set and a test set
P. 14
Two different kinds of errors are defined:• The training error is calculated by applying
the model to the observations used in its calibration
• The test error is the average error that results from using the model to predict the response on a new observation, one that was not used in calibrating the model.
The training error decreases with model complexity whereas the test error tends to increase when the level of model complexity creates overfitting
NB: a better solution consists in using cross-validation
A better solution
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Tree enables to segment the predictor spaceinto a number of simple regions definedaccording to the covariates
Splitting rules can be summarized in a treeview
For each region the prediction is set as the region average
The root node in orange:
• at the top of the tree
• contains the whole population
The splitting rules set aim at segmenting the predictor space into a number of simple regions.
The leaves nodes in green at the bottom of the tree: that is a node that is not further split.
A first simple ML model: Classification and regression trees
Splitting rulesPurpose
Definitions
P. 15
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Regression Tree Algorithm
Define a loss/error (or objective) function and
Try to find regions 𝑅1, 𝑅2, … , 𝑅𝐽 that minimize (or maximize) the function retained
All possible regions definitions can of course not be considered The tree algorithm therefore :
• Starts with the global population and find the optimal split of the predictor at that level using the entire population
• The same process is then applied on each sub-population
If we use the residual sum of square as loss function:
𝐸 𝑇 = 𝑆𝑆𝑇 =
𝑖∈𝑁𝑜𝑑𝑒 𝑇
𝒚𝑖 − ഥ𝒚𝑇2
𝐸 𝑅1 = 𝑆𝑆𝑅1 =
𝑖∈𝑁𝑜𝑑𝑒 𝑅1
𝒚𝑖 − ഥ𝒚12 𝐸 𝑅2 = 𝑆𝑆𝑅2 =
𝑖∈𝑁𝑜𝑑𝑒 𝑅2
𝒚𝑖 − ഥ𝒚22
The optimal splitting variable and point are then obtained through the maximisation of:
𝛥𝐼 = 𝑆𝑆𝑇 − (𝑆𝑆𝑅1 + 𝑆𝑆𝑅2)
Main idea
Optimal splitting
P. 16
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Bootstrap aggregation, or Bagging, is a general-purpose procedure for reducing the variance of a statistical learning method
Frequently used in the context of decision trees. Recall that given a set of n independent
observations 𝑍1, 𝑍2, … , 𝑍𝑛 each with variance 𝜎2, the variance of the mean ҧ𝑍 of the observations is
given by 𝜎2
𝑛.
Averaging a set of observations reduces variance. Usually multiple training sets are not at disposal
1. Bootstrap, by taking repeated samples from the (single) training data set.
2. Generate B different training data sets. 3. Train our method on the 𝑏th bootstrapped
training set in order to get መ𝑓𝑏 𝑥 the predictionat point x.
4. We then average all the predictions to obtain :
መ𝑓𝑏𝑎𝑔 𝑥 =1
𝐵
𝑏=1
𝐵
መ𝑓𝑏 𝑥
Bootstrap aggregation (Bagging) allows for variance reduction by averaging over severalregression trees
P. 17
Algorithm
Main idea
…….Bootstrap 1 Bootstrap 2 Bootstrap B
…..1 3 m2Training set
….1 n2 3 ….1 n2 3 ….1 n2 3
Draw nwith replacement
….…መ𝑓1𝑓 መ𝑓2𝑓 መ𝑓𝐵𝑓
መ𝑓𝑏𝑎𝑔
Bootstrap sets
Predictions
Averaging allthe predictions
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
1. Set መ𝑓 𝑥 = 0 and 𝑟𝑖 = 𝑦𝑖 for all 𝑖 in the training set
2. For 𝑏 = 1, 2, 3, … , 𝐵, repeat :
• Fit a tree መ𝑓𝑏 with 𝑑 splits (𝑑 + 1 terminal nodes) to the training data 𝑋, 𝑟
• Update መ𝑓 by adding in a reduced (shrunken) version of the new tree:
መ𝑓 𝑥 ← መ𝑓 𝑥 + 𝜆 መ𝑓𝑏 𝑥
• Update the residuals:
𝑟𝑖 ← 𝑟𝑖 − 𝜆 መ𝑓𝑏 𝑥𝑖
3. The final model is provided by
መ𝑓 𝑥 =
𝑏=1
𝐵
𝜆 መ𝑓𝑏 𝑥
Boosting allows to learn slowly by fitting rather small decision trees to the residuals from the model
P. 18
Algorithm
…..1 3 m2Training set
….…መ𝑓1𝑓 መ𝑓2𝑓 መ𝑓𝐵𝑓
መ𝑓𝐵𝑜𝑜𝑠𝑡
Predictions on residuals
𝑟1 𝑟2 ….… 𝑟𝐵Update
residuals
Summing part ofthe predictions
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
What are Neural Networks (NN)?
• They are a method of programming computers, as Random Forest, Support Vector Machine…
• They are often used to perform pattern recognition (unsupervised learning)
• NN can learn on their own and adapt to changing conditions (based on the data)
• NN are inspired by the biological nervous systems such as human brain’s information processing mechanism : they are composed of a large number of interconnected processing elements (neurons) working together to solve problems.
Artificial neural networks expands the perspective of ML and can be used for supervised and unsupervised learning
P. 19
Neuron
X1
X2
Xn
INPUTS OUTPUT
W1
W2
Wn
Description of a neuron
• An artificial neuron is an element with several inputs and one output
• The neuron has two modes of operation:
Training mode (calibration) :
– neuron can be trained to fire (or not) depending on the inputs
– e.g. pattern recognition : associate outputs with input patterns
Using mode (prediction) :
– e.g. pattern recognition : identify input pattern and try to output the associated output pattern
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Network representation : connected neurons with different layers
• Input layer (left) containing the input neurons
• Hidden layer(s) (middle)
Neurons in this layer are neither inputs nor outputs this is the origin of the term “Hidden”
Number of layers/neurons :
– In practice, number of layers/neurons determined by trial and error
– One hidden layer is sufficient for most of the problems
– Additional layers can be added if it increases the performance (networks with 2 or more layers are called deep Neural Networks)
• Output layer (right) containing the output neurons
Architecture of Neural Networks
P. 20
Input layer 2 hidden layers Output layer
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
1. Introduction and context
2. How to create competitive advantages with Machine Learning for insurance companies?
a. How can we complete the Data Analytics Toolbox with Machine Learning techniques?
i. Defining model error and managing overfitting
ii. Regressing trees
iii.Random forest and boosting
iv.Artificial neural networks
b. How Machine Learning techniques make possible to boost size and type of data sources?
c. How Machine Learning techniques make possible to boost creativity in using data?
3. How to solve the Machine Learning challenges for companies?
4. Using machine learning in pricing and underwritting: how to start?
5. Conclusions
Topics to be covered today
P. 21
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Data is present all along the insurance value chain
P. 22
Product
Clients
Financial markets
Claims
Legal
Control
E.g. Use of data to understand the market : use web
information to study competition, market share and compare offers
Data
Performance & continuity
Insurancecompany
E.g Use of data to target clients:
segment needs in function of their characteristics (location, behaviour,
etc.) to propose a relavant insurance offer
E.g Use of financial data to predict
asset values: asset historical data used to calibrate tool and predict asset
market values for optimizing allocation purpose
E.g. Use of external legal data:
automatic follow-up of regulation updates and regulatory trends
E.g.Use of data to control internal
data quality: such a type of control also needs to be put in place for a
regular control
E.g. Use of of data to improve actuarial practices:
measure frequency and claims amount to perform actuarial studies and improve pricing and reserving
calculations
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Numerous sources of internal or external data
Data type is different from one content to another
Different categories of data and different sources and types of information
P. 23
Internal data External data
PDF files
Commercial data
Websites
Purchased databases
Mobile dataEmails
Open data
CRMModel calculations
Social media
Data warehouse
Structured data
Unstructured data
Word files Sensor data
Number Text Audio Image/video Other
Structured data : organized and well characterized data that are easy to use because they are well identified.
• E.g. insurer’s policies and claims data
Unstructured data: non-organized data not easy to manipulate and which require much preparation (everything else).
80%
20%
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
It is key for insurance companies to broaden the data collection perspectives to boost the number of data sources available
P. 24
Machine learning techniques allow to deal with very large amount of data and therefore create opportunities for insurance company to increase the number of features to be analyzed/used in the pricing and underwriting process
Additional data can be obtained through many different sources :
1. Scraping/parsing techniques:
Extract information
automatically from
websites
2. Open data files:
Structured datasets
available to everyone
3. IoT sensor and API technologies:
Connected objects and application
programming interface
4. External data provider
Ready to use data set for
sale
5. Look twice into your own
unstructured data:
Reveal hidden
information from core
(unused) data
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
To be put in perspective with legal environment and regulation
Opportunities and threats of creative data sourcing for insurance pricing
P. 25
Retail Business
Important segmentation usually already in place
Limited potential for further segmentation
Corporate Business
Only few segmentation variables available
Greater pricing refinement potential
GDPR is in place since2018
Segmentation criteriamust be disclosed
Anti-discrimination
Insurers are limited in their use of data but opportunities exist to expand it
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
1. Introduction and context
2. How to create competitive advantages with Machine Learning for insurance companies?
a. How can we complete the Data Analytics Toolbox with Machine Learning techniques?
i. Defining model error and managing overfitting
ii. Regressing trees
iii.Random forest and boosting
iv.Artificial neural networks
b. How Machine Learning techniques make possible to boost size and type of data sources?
c. How Machine Learning techniques make possible to boost creativity in using data?
3. How to solve the Machine Learning challenges for companies?
4. Using machine learning in pricing and underwritting: how to start?
5. Conclusions
Topics to be covered today
P. 26
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Once additional data has been collected, new methods and algorithm allow to get the most out of it. Among others:
How to enhance data
P. 27
1. Statistics, ML and feature engineering:
Create structured dataset using initial datasets or charts to
understand data
2. Text mining and NLP
Process of examining large collection of written
resources and methods to perform linguistic
analysis
3. Image processing
Techniques to perform operations on images to enhance its content or
extract information
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
External sources and new types of data
Data Manipulation
GLM Modeling
Deployment
DB 2
Data Extraction
DB 4
DB 1
DB 3
HOW to introduce external and new types of data in the pricing process?
Usual Non-Life Pricing Process 1. Starts with data extractions2. Followed by some data checks and formatting steps3. Generalised Linear Model adjustement (eventually with Forward or
Backward procedure but not always…)Should we adjust this process because of new and external data?
P. 28
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Data Scientist and Actuaries view
P. 29
DB 4
DB 3
DB 2 Data Extraction & MERGE
DB 1
A B C
1 … …
3 … …
5 … …
6 … …
DB1
A D E F
5 … … …
1 … … …
6 … … …
2 … … …
4 … … …
8 … … …
DB2
A B C D E F
1 … … … … …
3 … … … … …
5 … … … … …
6 … … … … …
DB1 DB2
N rows
P variables
Enrich the existing database with new attributes/variables
When External an new data are used in order to enrich the existing database with new attributes/variables :
• External databases should be merged with internal• It requires an adequate merging key
As number of attributes/variables increases :• Overfitting should be managed (Cross-Validation, Regularization)• Features Selection & Engineering even more important
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Data Scientist and Actuaries view
P. 30
FeatureSelection
FeatureEngineering
Data Manipulation
Modeling withCV &
Regularization
Statistical or ML methods
Model Evaluation
& Comparis
on
Deployment
DB 4
DB 3
DB 2 Data Extraction & MERGE
DB 1
Additional Steps should appear in the Pricing process (like in machine learning approach)
Features Engineering/Selection: set of methodologies to extract meaningful attributes or features from the raw data
Regularization : Algorithms which are designed to manage database with large number of attributes/variables controlling overfitting
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Underwriter and marketing teams
P. 31
FeatureSelection
FeatureEngineering
Data Manipulation
Modeling withCV &
Regularization
Model Evaluation
& Comparis
on
Deployment
DB 1
FeatureSelection
FeatureEngineering
Data Manipulation
Modeling withCV &
RegularizationDB 2
Simplifying the quoting process ?
Replace existing features with new features obtained from external DB
Compare models to measure the adequacy/performance of the new features
As new features are coming from external data provider, the UW form can be simplified (eg. Quick quote system using only vehicle plate numbers)
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Features Engineering & Selection
Features Engineering is absolutely known and agreed to be key to success in applied machine learning.
Features Engineering is a Representation Problem• Machine learning algorithms learn a solution to a problem from sample data.• In this context, feature engineering asks: what is the best representation of the sample data to learn a
solution to your problem?
Frequency
Age
Age Frequency
- -
- -
- -
- -
Concept of Feature Engineering
P. 32
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Features Engineering & Selection
The results you achieve are a factor of :• the model you choose, • the data you have available • and the features you prepared.
The better the features that you prepare and choose, the better the results you will achieve
Frequency
Age
Frequency
Age
Frequency
Age^2
Linear (eg. GLM) Non-linear (eg.GAM) New Features (eg Age^2)
Poor Model choice and without featuresengineering
More complex model and without featuresengineering
Simpler Model choice BUT features engineering
Feature Engineering vs Model Complexity
P. 33
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Feature Engineering combined with external data set
Intuition : there is a correlation between the claims frequency and the distance from the highway
• Data available in the company : addresses
• Features Engineering : convert house addresses into distance from the highway
Highway only?
• No, all the roads where the speed limitation is above 90km/h
Determine the closest point to the highway in relation to the house.
• We need to know the location of the house on a map
• We need to know the location of the highway on a map
Open Street Maps
• Gives the roads’ longitude and latitude at different points.
Example (1/2) : Property Theft Insurance
P. 34
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Feature Engineering combined with external data set
If we want to find the distance from the house, we need the coordinates of the house.
Google Maps Geocoding API
• Geocoding is the process of converting addresses into geographic coordinates
Google API is free but with slow performances:
• 2,500 free requests per day
• 50 requests per second (limitation of speed)
• Enable pay-as-you-go billing to unlock higher quotas: $0.50 USD / 1000 additional requests, up to 100,000 daily.
Find the distance between the house and the first road (above 90km/h).
• We build a loop that checks if there is a road in a growing area (in a radius growing from 0 to 4000m with step of 200m)
Example (2/2) : Property Theft Insurance
P. 35
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
1. Introduction and context
2. How to create competitive advantages with Machine Learning for insurance companies?
3. How to solve the Machine Learning challenges for companies?
a. Machine learning results can be difficult to interpret
b. Data Analytics must be fit for purpose
4. Using machine learning in pricing and underwritting: how to start?
5. Conclusions
Topics to be covered today
P. 36
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
In the case of regression trees, understanding how the model predicts claims’ cost or frequency values for new
data points is not a problem, as it is very intuitive.
In the case of more complexmethods such as Bagging and
Random forests, even understandinghow the model predicts values for new data points is rather difficult.
Things may be even
worse for GBM
and NN.
Some machine learning techniques are black boxes and interpretation of the resultscan be quite difficult
Understanding the results of ML techniques is not easy
Complexity
Interpretability
P. 37
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Quant (Actuaries, data scientist,…)
Understanding the results of ML models is nevertheless key for sound business decision-making as many stakeholders use the results of the models
Machine learning techniques usually improve predictive power but at the expense of a certain loss of interpretability Find trade-off between
Other stakeholders
Not necessarily « quantitative people »
Should neverthelessunderstand and trust results to
take decisions
Predictive power Capacity to understand
the results
Ability to take sound decisions based on the
results
High-end questions
Who will use the results? For what purpose? With which impact?
Able to understand the technical details
Trust its outputs based on cross-validation, error
measures and assesment plots
P. 38
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
How to integrate Machine learning in pricing process?
First solution: stick to existing (e.g. in production) model and use machine learningtechniques as guiding tools for
• Features selection
• Features engineering
Some Machine Learning methods can produce graphs which enable to understand how important a variable is in the prediction, eg. :
• Random forests
• LASSO (penalized regression)
These graphs can therefore be used as pre-modeling approach to explore the data and decide which features we will select in the model
Feature selection and features engineering
Random Forest importance score
P. 39
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
How to integrate Machine learning in pricing process?
Machine learning techniques usually improve predictive power but at the expense of a certain loss of interpretability.
Some tools can be used in order to help in the interpretation and understanding of the results
For example with random forest, bagging or boosting trees methods
• Identification of variable importance (see supra)
Partial dependence plots(1 or 2 variables)
Residual plots as a function of a variable
Develop “interpretation tools”
P. 40
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
1. Introduction and context
2. How to create competitive advantages with Machine Learning for insurance companies?
3. How to solve the Machine Learning challenges for companies?
4. Using machine learning in pricing and underwritting: how to start?
a. Profitability analysis
b. Competition Analysis
c. Policyholder’s Behaviour
5. Conclusions
Topics to be covered today
P. 41
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Commercial Pricing is a permanent multidimensional optimization process under complex constraints where segmentation plays a crucial role
Portfolio profitability by segment, based on:> Cost of risk (e.g. measured through GLMs
or ML techniques) > Portfolio composition (representativeness
of each segment: total portfolio and recent new business)
Competitor prices by segment and own current rates> Position insurer to be more or less
competitive on certain segments
Customer behavior by segment > Elasticity model help estimate pace at
which rates can be increased by segment> Focus Sales & Marketing to increase
retention of better risks> Building conversion rates model to better
target clients
Segmentation and pricing variables> Greater segmentation for greater risk
selectivity and higher profitability> Monitor concentrations of certain risk
types
Constrainsrates of existingportfolio
Constrainsrates of newproduction
Aligned segments
TE
CH
NIC
AL
PR
IC
IN
GC
OM
PET
I-
TIO
NC
LIEN
T
BE
HA
VIO
R
A
B
C
D
SEG
MEN
-TA
TIO
N
E SCENARIO TESTING AND OPTIMISATION
Impact of different scenarios on strategic indicators and optimization
P. 42
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Profitability analysis tool Regression trees techniques can be used to compare Risk Premium and Commercial premium
Thanks to regression trees it is possible to identify the variables that are the most relevant toexplain the differences between the risk premium and the current commercial premium
• It helps in defining the most relevant variables that can, for example, then be included in aprofitability heatmap
A
P. 43
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Identifying the segments in which the insurance company is well-positioned with respect to its competitors is an important driver of a dynamic pricing process. E.g. Clustering of segments in function of the ranking of the competitors with regression trees
Competition analysis tool Regression trees can be used to identify positioning on market segments and capture price differences
B
Analyze the price dispersion of the specific company with respect to its competitors of wrt respect to the average market price
Reverse engineering of the pricing (structure) of competitors can be enhanced with ML techniques
P. 44
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
The goal is to explain the conversion / lapse probabilities with some explanatory variables
A dummy variable identifies the policies that were converted / renewed during the year
Traditionaly Generalized Linear Models are used
– E.g. A logistic regression can be performed on this dummy variable and potential explanatory variables
𝑙𝑛𝜋(𝑥1…𝑥𝑛)
1 − 𝜋(𝑥1…𝑥𝑛)= 𝛽0 + 𝛽1𝑥1 +⋯+ 𝛽𝑛𝑥𝑛
Machine learning technique (e.g. GBM) are more and more often used as they usually improve predictions and allow to find more complex patterns
Client behavior ML techniques can help improve the logistic regression
C
P. 45
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
1. Introduction and context
2. How to create competitive advantages with Machine Learning for insurance companies?
3. How to solve the Machine Learning challenges for companies?
4. Using machine learning in pricing and underwritting: how to start?
a. Profitability analysis
b. Competition Analysis
c. Policyholder’s Behaviour
5. Conclusions
Topics to be covered today
P. 46
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Conclusions
P. 47
Pricing environment
Competition willincrease
More data will beavailable
Frequency of tariffreview will increase
Future of technicalprice
Pricing will no longer rely only on GLM but
on a set of algorithms/methods
Insurer should be able to :
• Understand and use these methodsadequately
• Apply, compare and deploy these modelsrapidly
Automation required
Insurer can no longer have a process which
lasts 6 months in orderto deploy a new tariff
Automation of the pricing process is one of the key factors of
success
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Reacfin’s support
Efficient solutions aligned with your best interest
• Experienced staff with hands-on knowledge and proven track record, incl.
– Large industry experience
– Training capabilities in Belgium and abroad
– Sound knowledge of the products
• Extensive training material (methodologies, exercises and case studies) through various channels (e-learning, slides, notebooks, open source code)
• Largely networked within the industry
• Working in your best interest as independent consultant
What we do… … and why you should consider it
• Feasibility assessments and defining the solution
• Training on statistical models and machine learning techniques with hands-on exercises in R or Python
• Technical pricing (model development and improvement)
• Support for commercial pricing (competition analysis, conversion models, lapse models,…)
• Implementation in open source (R,Python) and proprietary software (SAS, Emblem,…) software
– Existing tools for pricing, dispersion analysis and competition analysis
Helping you developing best market practices for an affordable price
P. 48
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Place de l’Université, 25B-1348 Louvain-la-Neuve (Belgium)
T +32 (0) 10 84 07 50
www.reacfin.com
Samuel Mahy
Director – Head of Non-Life
M +32 498 04 23 [email protected]
Xavier Maréchal
CEO
M +32 497 48 98 [email protected]
Contact details
P. 49
Reacfin Breakfast - Machine learning: Challenges and opportunities for non-life pricing and underwriting
Place de l’Université 25B-1348 Louvain-la-Neuve
www.reacfin.com
Disclaimer:
The recipient of this document should treat all
information as strictly confidential and only in the
context stated below. Information may not be
disclosed to any third party without the prior join-
consent of Reacfin.
Estimates given in this presentation are based on our
current knowledge, they can be based upon our
previous experience within the Undertaking, as well as
taking into account similar projects in the same
context as the Undertaking, either locally, within
majority of the EU countries as well as overseas.
This presentation is only the supporting document of
a verbal presentation. Hence, it is not intended to be
exhaustive. Quoting or using this document on its own
might be misleading. As a result, these materials may
not be used by anybody except their authors nor
should they be relied upon in any way for any purpose
other than as contemplated by joint written
agreement with Reacfin.