big data & data analytics in der rückversicherung ... · structured vs. unstructured ... big...

39
Big Data & Data Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten Weiterbildungstag der DGVFM Hannover, 16. Juni 2016 Wolfgang Hauner, Head of Data Analytics Image: Bayerische Zugspitzbahn Bergbahn AG / Lechner

Upload: duongminh

Post on 27-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Big Data & Data Analytics in der Rückversicherung –Analyse strukturierter und unstrukturierter Daten

Weiterbildungstag der DGVFMHannover, 16. Juni 2016Wolfgang Hauner, Head of Data Analytics

Image: Bayerische Zugspitzbahn Bergbahn AG / Lechner

Agenda

Data Analytics Framework1 Technology Stack2

The People behind Big Data & Analytics3 Current / Planned Activities4

16.06.2016 2

Advanced Analytics: Methods and Examples5 Conclusion6

Agenda

Data Analytics Framework1 Technology Stack2

The People behind Big Data & Analytics3 Current / Planned Activities4

16.06.2016 3

Advanced Analytics: Methods and Examples5 Conclusion6

TrendA trend depicts all kind of future developments with an impact on one specific business. Trends tend to influence and drive each other.Example of a trend: Big Data

Sub-trendA sub-trend specifies one potential form of appearance of a trend. Example of a sub-trend: Predictive Analytics

Trend RadarCorrelation Map

16.06.2016

Computing Everywhere

VirtualAssistantSystems

Risk-based Security

Haptic Technologies

Integrated Systems

Autonomous Systems and Devices

Collaborative Consumption

Automated Decision Taking

Predictive Analytics

Industrialization 4.0

Cloud/Client Architecture

Software-defined Anything

Mobile Health Services

3D PrintingAugmented and virtual worlds

Citizen Development

User Centered Design

New Payment Models

Digital Identity

On-Demand-Everything

Wearable DevicesRobotics/Drones

Smart Home

Loc-based services

Big Data

Internet of Things

Cybersecurity

Digitalization

Context-aware Computing

Open Data

Web 4.0Web-Scale IT

Telematics

Crowdsourcing

Data Analytics Framework

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 4

Leverage data knowledge

16.06.2016

Data Analytics Framework

Center of Competence Knowledge / Technology

Link to Innovation Topics

Build Analytics Community

Generate business value through data analytics

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 5

Structured vs. unstructured data

16.06.2016 6Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Data Analytics Framework

Structured

Unstructured(majority of data, will increase massively)

Advanced Analytics is our core competenceData Analytics Framework

Predictive Modeling

regression andmachinelearningmodels

Multivariate Effect Analysis

regression and machine learning

models

advanced visualizationtechniques

Explorative Analysis

singular valuedecomposition, natural

language processing

Textmining

Predict outcomes based on certain explaining factors

Predict claimed amount per individual based on age, gender, disease etc.

Figure out how certain factorsinfluence a target variable

Influence of age, gender,disease etc. on claimed amount

Dynamically visualize data ofany size

Get a quick and easy overview of databy using heatmaps, correlationmatrices etc.

Convert large amounts ofunstructured text into a structuredform

Extract key topics from claims reports

Provide expert support in all analytical areas

16.06.2016

Advanced Analytics

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 7

How Data Analytics supports new business opportunities

16.06.2016

Data sources

Leading edge technology

Advanced analytics skills

Client management remains central entry point for all services and solutions

Internal data: Underwriting data, Claims data,... External data: customer data, sensor data,… Combination of internal and external data

Platforms: SAP HANA, SAS HPA,… Open source tools: Hadoop, Hive, Spark,… Infrastructure management: Ambari, git,…

Explorative analysis Textmining Multivariate effect analysis Predictive modeling

Data Analytics Framework

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 8

Which topics drive our clients?

16.06.2016

Up-/Cross-Selling

Data Sources

Textmining ChurnAnalysis

Wearables

SocialMedia

Analysis

FraudDetection

Big Data Technology

PredictiveUW

Telematics

Sensor Data

Geospatial

Data Analytics Framework

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 9

Agenda

Data Analytics Framework1 Technology Stack2

The People behind Big Data & Analytics3 Current / Planned Activities4

16.06.2016 10

Advanced Analytics: Methods and Examples5 Conclusion6

Strategic landscape

Munich Re uses leading edge software technology

Main cooperation with leading vendors like SAP, Microsoft, SAS and Hortonworks

Technology Stack

16.06.2016Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 11

Example - SAS High Performance Analytics Platform

Provides cutting-edge technologiesSuperior technical data analytics will be required to stand up to the demands of the future (Digitalization, Industry 4.0, Big Data, etc.) and to sustain our RI knowledge leadership.

Usable for the whole reinsurance group including PIRI units, Munich Health and MEAGThis enables innovative work in decentral places where otherwise access to these technologies could be missing and will ease collaboration between the analytical units (share analytical methods and data sources).

Runs analytical computations in a distributed, in-memory environmentThis enables our experts to quickly prepare, explore and model multiple scenarios using data volumes never before possible. Accurate and rapid insights are delivered in near-real time (typically in minutes, rather than hours).

Fully compatible with our existing SAP HANA environmentThe SAS HPA platform leverages the power of SAP HANA in-memory database and Hadoop which can store huge volumes of data.

Generates business value for our reinsurance and PIRI clientsBy combining client’s data with our own insights and expertise we can provide additional tangible value to our clients.

16.06.2016Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 12

Technology Stack

Key Features of the High Performance Platform for Big Data Analysis @ MR

Data Integration Fast (and easy-to-implement) access to Hadoop & HANA (parallelized „In-Memory Lift“)

Easily combine with other data souces (Oracle, MySQL, … or even XLS!)

Pass-through of queries & analysis to these external databases (and just return result)

Descriptive and explorative Analysis Point&Click Frontend

In-Memory processing of analysis

State-of-the-art Data Mining Statistical Models & Machine Learning Algorithms for …

…Prediction, Time Series Forecasting, Clustering, Optimization, Text Mining, etc.

High-Performance, i.e. In-Memory & In-Parallel

16.06.2016Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 13

Technology Stack

Data LakeTechnology Stack

16.06.2016Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 14

Agenda

Data Analytics Framework1 Technology Stack2

The People behind Big Data & Analytics3 Current / Planned Activities4

16.06.2016 15

Advanced Analytics: Methods and Examples5 Conclusion6

Typical setup

16.06.2016 16Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

The People behind Big Data & Analytics

Data Scientists and Data Engineers

16.06.2016 17Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Statistics / Machine Learning

Programming (e.g. R, Python, SAS, SQL, Hadoop Tools [Pig, Hive, Spark,…])

Visualization

Business/domain knowledge

Soft skills (Creative, Curious, Communicative, Collaborative, Critical)

Strong software engineering (SAS HPA, Hadoop, Java, HTML, Java Script, etc.) anddatabase skills

Provide data infrastructure / architecture, e. g. Data Lake

Seamless integration of different systems

Provision of user interfaces and applicationprogramming interfaces for data scientists etc.

Collection, processing and maintenance of data

Data Scientist Data Engineer

The People behind Big Data & Analytics

Agenda

Data Analytics Framework1 Technology Stack2

The People behind Big Data & Analytics3 Current / Planned Activities4

16.06.2016 18

Advanced Analytics: Methods and Examples5 Conclusion6

Pilot Fact SheetRisk Management Platform for NatCat Pools

Results Benefits Easier, faster and cheaper information about exposure and

losses

Transparency on all exposures and losses independent of participation and risk assessment

Database on loss events allows in-depth trend detection

Faster & more effective claims management

16.06.2016

Current / Planned Activities

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 19

Pilot Fact SheetEarly Loss Detection

Results Benefits

16.06.2016 20

Easier, faster and cheaper information about insurance relevant loss events

Matching with risk data portfolio possible at the same time

Transparency on all losses independent of participation and risk assessment

Database on loss events allows in-depth trend detection

Faster & more effective claims management

Outlook Further extension: Casualty (especially D&O) and additional

new Markets for property losses (Spain and Latin America) in new pilots

Current / Planned Activities

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Current Scope:Property loss events (Fire) in USA & UK

Pilot Fact SheetInternet Research & Intelligence System (IRIS)

Results Benefits

16.06.2016 21

Multi-dimensional searches based on standardized search technology to accelerate web research (example Tianjin)

Extended analytics to gather further data insights, e.g., based on topic analysis and organizational grouping

Parallel processing and delta mechanism for multi-processed search requests

Results shown in different visualizations (word cloud, table, topic analysis, etc.) and exportable to Excel

Outlook Additional analytics modules for better insights and broader

application

Collaboration functionalities for more efficient case analysis

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Current / Planned Activities

Agenda

Data Analytics Framework1 Technology Stack2

The People behind Big Data & Analytics3 Current / Planned Activities4

16.06.2016 22

Advanced Analytics: Methods and Examples5 Conclusion6

Methods overview

23

Explorative Analysis

Get overview on data characteristics• Cross tables• Visualizations• …

Predictive Analytics

Predict outcome based on explaining factors• Linear regression• Logistic regression• Decision tree• Random forest• Neural network• … and many more

TextMining

Analyze unstructured text• Topic detection• Sentiment analysis• …

Advanced Analytics: Methods and Examples

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 16.06.2016

Classification by recursive partition of observations

Each branch answers a classification question

“Leaves” are partitions of the dataset with their classification

Pros: Comprehensible, helps identify discriminatory variables

Con: Tends to “overfit” training data (remedy: pruning). Small changes in the data can lead to completely different trees.

Methods - Decision Tree

24

Sex = Male

Age > 25No Action

No Action Send Mailing

yes

yes

no

no

Example: Identify potential customers for new “Exotic Car Insurance”

Advanced Analytics: Methods and Examples

16.06.2016Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Ensemble learning method

Combines results from multiple decision trees (majority vote)

Avoids “overfitting”

Learns and predicts “fast”

Gives variable importance (variable selection)

Classical random forest favors categorical variables with large number of values

Methods - Random Forest

25Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

……

… …

……

… …

……

… …

………

Advanced Analytics: Methods and Examples

16.06.2016

Methods - Neural Network / Artificial Intelligence

26

Common underlying “elementary hardware” -neuron

Complexity in the “learned” connectivity-structure

Hierarchical multi layer system

Advanced Analytics: Methods and Examples

16.06.2016Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Image: Getty Images Image: Getty Images Image: used under license from shutterstock.com

Layer 3 (high level structures)

Layer 2 (medium level structures)

Layer 1 (low level structures)

System of interconnected nodes, exchanging information

Weights of connections can be adjusted by supervised/unsupervised “learning”

Pros: Accuracy usually high, prediction fast

Cons: “Black box” - acquired knowledge not easily comprehensible, training effort high, appropriate data needed

Application areas e. g. speech recognition, computer vision, medical diagnosis, automated trading, game-playing (AlphaGo)

Methods - Neural Network

27Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Input Hidden Output

Advanced Analytics: Methods and Examples

Image: used under license from shutterstock.com Image: used under license from shutterstock.com

Image: used under license from shutterstock.com Image: used under license from shutterstock.com

Image: Getty Images Image: Getty Images

16.06.2016

Methods - Neural Network Insurance specific Visual Intelligence

28

Advanced Analytics: Methods and Examples

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 16.06.2016

Insurance Companies, e.g. Munich RE …

AI Community, e.g. Google, Facebook …

Insurance specific Vision Intelligence

General ObjectVision Intelligence

Image: used under license from shutterstock.com

Image: used under license from shutterstock.com

Image: Getty Images

Image: used under license from shutterstock.com

Image: used under license from shutterstock.com

Example 1 – Potential use-cases of Neural NetworkInfrastructure Insurance

29

Detect road damage

Categorize damage Estimate claim Trigger repair

action

Advanced Analytics: Methods and Examples

16.06.2016Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Image: used under license from shutterstock.com Image: used under license from shutterstock.com

Example 1 – Potential use-cases of Neural NetworkCar Insurance

30

First Damage Estimation, Repair step suggestion

Accident Situation Assistant

Advanced Analytics: Methods and Examples

16.06.2016Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Image: Getty Images Image: used under license from shutterstock.com

Example 2 – Explorative Analysis: SAS Visual AnalyticsAd hoc exploration of data

16.06.2016

Advanced Analytics: Methods and Examples

Applied to loss estimates from 1980 to 2015 provided by Munich Re‘sNat Cat Service

In Italy, most frequent Nat Catevents are hydrological, most fatal Nat Cat events are climatological(driven by 2003 heat wave) andmost expensive Nat Cat eventsgeophysical

Example: Nat Cat events in Italy

Intuitive and easy to usevisualization techniques

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 31

Example 3 – Multivariate Effect Analysis: Interpretation of effects – Model for survival analysis

16.06.2016

TreeBoosting: Most popular algorithm in prediction competitions („kaggle.com“)

Applied to laboratory data comprising 76 input variables

Automatic selection of effects

Clear visualization of main and interactioneffects

Mortality increases with liver score and decreases with albumin

Multiplier effect in case of high liverand low albumin score

Advanced Analytics: Methods and Examples

Machine Learning helps understanding and selecting the mostrelevant influential factors

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 32

Example 4 – Multivariate Effect Analysis: Statistical model – Insurance product hot spot analysis

Results Benefits Provides an accurate overview of an insurance product’s

sales structure

Blue regions represent a low probability to purchase the insurance product, while purple regions represent the hotspots

Various factors like age, gender, occupation, place of residence, etc. can be taken into account for the model creation

Supports resource allocation of sales forces

Outlook Extend the model’s application area to any lines of business

Analysis of customer needs

16.06.2016

Advanced Analytics: Methods and Examples

Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten 33

Example 5 – Analysis of MR’s bodily injury database (BID) using Decision Trees / Random Forests

The BID comprises bodily injury claims known by Munich Re which exceeded a certain total loss amount.

The BID contains the following information:

Technical: date of accident, amount per head of damage, mortality table, …

Medical: degree of invalidity (DoI), diagnosis,…

Socio-ecological: marital status, occupation, gender, date of birth,…

The analysis was done in R, an open-source software for statistical analysis

16.06.2016 34Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Advanced Analytics: Methods and Examples

Example 5 - Decision trees are a vivid tool which estimates a target variable depending on various factors

16.06.2016 35Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Advanced Analytics: Methods and Examples

5 mn€

7 mn€

6,5 mn€ 8 mn€

3,5 mn€

… …

Age < 15 Age >= 15

DoI < 73 DoI >= 73 … …

Incurred amount Age DoI

Gender…

Target variable Explaining variables

Decision Tree

Fictitiousexample

Mean incurredamount

Decision trees offer a data-driven approach of automatically modeling complexstructures and identifying the most important factors

Result

Example 5 - Randoms forests allow to calculate the variable importance

16.06.2016 36Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Advanced Analytics: Methods and Examples

The age at accident is by far the most important factor according to the random forest

DoI, season of event and number of injured are also quite important factors

Variable importance according to the random forest

Additional squared error, if

model iscalculated withoutthe corresponding

factor

Agenda

Data Analytics Framework1 Technology Stack2

The People behind Big Data & Analytics3 Current / Planned Activities4

16.06.2016 37

Advanced Analytics: Methods and Examples5 Conclusion6

Big Data heavily challenges the current and future insurance markets

Data Analytics and Artificial Intelligence enable easier, fast, cheaper and deeper insights

Munich Re already uses leading edge technologies and analytical skills for clients and itself

Existing data sources often contain highly valuable information

Using this new insights enhance market position and retrieves new business potentials

And last but not least: Very attractive job opportunities for actuaries / quants with current demand

exceeding supply by far

Conclusion

16.06.2016 38Big Data & Analytics in der Rückversicherung – Analyse strukturierter und unstrukturierter Daten

Conclusion

Data Analytics improve Underwriting, Client Management and Claims Management

Thank you very much for your attention

Wolfgang Hauner, Head of Data Analytics

Image: Bayerische Zugspitzbahn Bergbahn AG / Lechner

© 2016 Münchener Rückversicherungs-Gesellschaft © 2016 Munich Reinsurance Company