data science for cyber risk

55
Data Science for Cyber Risk : Measurement, Methods, and Models drs. Scott Allen Mongeau Data Scientist Cyber Security February 2017

Upload: scott-allen-mongeau

Post on 21-Jan-2018

174 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Data Science for Cyber Risk

Data Science for Cyber Risk:

Measurement, Methods, and Models

drs. Scott Allen Mongeau

Data Scientist

Cyber Security

February 2017

Page 2: Data Science for Cyber Risk

2

The views expressed in the following material are the

author’s and do not necessarily represent the views of

the Global Association of Risk Professionals (GARP),

its Membership or its Management.

Page 3: Data Science for Cyber Risk

3 | © 2014 Global Association of Risk Professionals. All rights reserved.

Scott Allen Mongeau

scott.mongeau@

sas.com

06 837 030 97

Scott

Mongeau

Data Scientist

Cyber Security

Experience

• SAS InstituteData Scientist

• DeloitteManager Analytics

• Nyenrode UniversityLecturer Analytics

• SARK7 Analytics Owner / Principal Consultant

• Genentech Inc. / Roche Principal Analyst / Sr Manager

• AtradiusSr. R&D Engineer

• CFSICIO

Education

• PhD (ABD)

• MBA (OneMBA)

• Masters Financial Management

• Certificate Finance

• GD IT Management

• Masters Computer &Communications Technology

LinkedIn

Twitter

Blog

YouTube

• Introduction to Advanced Analytics

• Introduction to Cognitive Analytics

• TedX RSM: Data Analytics

Page 4: Data Science for Cyber Risk

4 | © 2014 Global Association of Risk Professionals. All rights reserved.

Cyber Risk: A Measurement Challenge

• Context setting

• Cyber risk measurement

• What is data analytics / data science?

• What methods and technologies are involved?

• Experience and learnings from the field

• Actionable insights

• Emerging methods

• Trends and opportunities

Page 5: Data Science for Cyber Risk

CYBER RISKS

Page 6: Data Science for Cyber Risk

6 | © 2014 Global Association of Risk Professionals. All rights reserved.

Moore’s Law: Exponential growth of computing power

6

25,000 x

Home

computers

High-capacity

servers

Smartphone

explosion

Cloud, AI /

Watson, IoT

2015

Page 7: Data Science for Cyber Risk

7 | © 2014 Global Association of Risk Professionals. All rights reserved.7

• Complexity of systems

• Increasing access

• Volume of data

• BYOD

• VMs / containers

• IoT / smart devices

• ICS SCADA

Page 8: Data Science for Cyber Risk

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Anatomy of a sophisticated Cyber Attack

Customer

Data

Weakness in supply chain is used to gain

access to your network

Credentials of supplier compromised due to

poor security implementationMimic known “service accounts” to avoid

host-based detection

Compromised machine begins to perform

active network reconnaissanceA command and control point is established

on the network, with end nodes being the POS

Install BlackPOS malware targeted POS

systemsExfiltration of customer data via multiple

servers and monetization on black market

POS POSPOS

Page 9: Data Science for Cyber Risk

9 | © 2014 Global Association of Risk Professionals. All rights reserved.9

Darknet

Deep Web

Internet

Page 10: Data Science for Cyber Risk

10 | © 2014 Global Association of Risk Professionals. All rights reserved.10

“There’s a lot of talk about nations trying to

attack us, but we are in a situation where we

are vulnerable to an army of 14-year-olds who

have two weeks’ training”

- Roel Schouwenberg

Senior Researcher, Kaspersky Labhttp://spectrum.ieee.org/telecom/security/the-real-story-of-stuxnet

Page 11: Data Science for Cyber Risk

11 | © 2014 Global Association of Risk Professionals. All rights reserved.11

Page 12: Data Science for Cyber Risk

12 | © 2014 Global Association of Risk Professionals. All rights reserved.

CYBER RISK MANAGEMENT

Page 13: Data Science for Cyber Risk

13 | © 2014 Global Association of Risk Professionals. All rights reserved.

Source: Gartner. 2015. Agenda Overview for Banking and Investment Services.

Competitive pressures… e.g. ‘digital banking’

Page 14: Data Science for Cyber Risk

14 | © 2014 Global Association of Risk Professionals. All rights reserved.

Optimizing Accessibility Versus Exposure

Pa

rtn

erin

g fo

r C

yb

er

Re

sili

en

ce: To

wa

rds th

e Q

ua

ntification o

f C

yb

er

Th

rea

ts

WE

F r

ep

ort

in c

olla

bora

tion w

ith

Delo

itte

:

htt

p:/

/ww

w3

.we

foru

m.o

rg/d

ocs/W

EF

US

A_Q

uantification

ofC

yb

erT

hre

ats

_R

ep

ort

20

15.p

df

Page 15: Data Science for Cyber Risk

15 | © 2014 Global Association of Risk Professionals. All rights reserved.

Data Analytics => Measurement

Partnering for Cyber Resilience: Towards the Quantification of Cyber Threats

WEF report in collaboration with Deloitte:

http://www3.weforum.org/docs/WEFUSA_QuantificationofCyberThreats_Report2015.pdf

Page 16: Data Science for Cyber Risk

16 | © 2014 Global Association of Risk Professionals. All rights reserved.

Data Science => Measurement

Advancing Cyber Resilience Principles and Tools for Boards

http://www3.weforum.org/docs/IP/2017/Adv_Cyber_Resilience_Principles-Tools.pdf

Reducing uncertainty:

‘scientific’ measurement and analysis

Page 17: Data Science for Cyber Risk

17 | © 2014 Global Association of Risk Professionals. All rights reserved.

CYBER RISK MEASUREMENT

Page 18: Data Science for Cyber Risk

18 | © 2014 Global Association of Risk Professionals. All rights reserved.

Many data sources… increasing data volume

Source: Cyber Security Solutions, 2014.

Page 19: Data Science for Cyber Risk

19 | © 2014 Global Association of Risk Professionals. All rights reserved.

demographics historical

accounts

transactions

bytes sent

protocols

devices

locationsuserid

IP addresses

threat tracking

Linking and Managing ‘Big’ Cyber Data

authentication

SIEM stream

Page 20: Data Science for Cyber Risk

20 | © 2014 Global Association of Risk Professionals. All rights reserved.

Contextually Enriched, Priority Ranked Security Alerts

Stream Processing

and

Analytics

Firewalls, IPS, IDS, Malware,

Web Proxy Logs, DLP, SIEM

Firewalls, IPS, IDS, Malware,

Web Proxy Logs, DLP, SIEM

Cyber Data Types and Monthly Volumes

PCAP

Trillions

FLOW

Billions

POINT

ALERTS

Millions

Thousands

?

Page 21: Data Science for Cyber Risk

Copyr i g ht © 2015, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Firewalls & Intrusion

Protection

Identity

Management

Data Loss

Prevention

X

Malware

Sandboxing

Vulnerability

Scanning

!

Encryption

Endpoint Protection,

Detection & Response

Web & Email

Gateways

THREATS

10,000 Alerts Daily

250 Reviewed by Analysts

30 Fully Investigated

SIEM / MSSP

In Search of: Targeted, Relevant, Actionable Alerts…

Page 22: Data Science for Cyber Risk

22 | © 2014 Global Association of Risk Professionals. All rights reserved.

CHALLENGES CYBER DATA SCIENCE

Disconnected &

low quality data

High false positive alerts

Unknown unknowns –

no baseline

Managing and

rationalizing data

Focused insights from

Big Data

Diagnostics for

understanding ‘normal’

Data overload

Targeted alerts based on

anomalies

Slow and manual

investigation processes

Machine learning identifies

hidden patterns

Addressing Challenges: Cyber Risk Analytics

Page 23: Data Science for Cyber Risk

23 | © 2014 Global Association of Risk Professionals. All rights reserved.

Data Science?

Page 24: Data Science for Cyber Risk

24 | © 2014 Global Association of Risk Professionals. All rights reserved.

Data Science / Data Analytics:

An Interdisciplinary Practitioner Field

Wik

iboo

ks

Cre

ative

Co

mm

ons

htt

ps://e

n.w

ikib

oo

ks.o

rg/w

iki/D

ata

_S

cie

nce

:_A

n_

Intr

odu

ctio

n/A

_M

ash

-up_

of_

Dis

cip

lines

htt

p://c

reative

com

mo

ns.o

rg/lic

ense

s/b

y-n

c-s

a/3

.0/

Page 25: Data Science for Cyber Risk

25 | © 2014 Global Association of Risk Professionals. All rights reserved.

VALUE

SO

PH

IST

ICA

TIO

N

DESCRIPTIVE

PREDICTIVE

PRESCRIPTIVE

What has

happened?

What will

happen?

How to optimize?

Page 26: Data Science for Cyber Risk

26 | © 2014 Global Association of Risk Professionals. All rights reserved.

VALUE

SO

PH

IST

ICA

TIO

N

DESCRIPTIVE

PREDICTIVE

PRESCRIPTIVE

Business

Intelligence (BI)

Econometrics

Financial Analysis

Machine Learning

Operations

Management

Page 27: Data Science for Cyber Risk

27 | © 2014 Global Association of Risk Professionals. All rights reserved.

business valueTransactional

an

aly

tic

s m

atu

rity

Strategic

DESCRIPTIVE

DIAGNOSTICS

PREDICTIVE

PRESCRIPTIVE

Identifying

Factors & Causes

Asp

irati

on

al

Tra

nsfo

rme

dOptimizing

Systems

Understanding

Social Context

& Meaning

SEMANTICData

visualization

DATA QUALITY

Business

Intelligence

Understanding

Patterns

Forecasting &

Probabilities

Traditional BI

Page 28: Data Science for Cyber Risk

Data Science for Cyber Risk

Page 29: Data Science for Cyber Risk

29 | © 2014 Global Association of Risk Professionals. All rights reserved.

Fraud Analytics: A Mature, Adjacent Domain

DECISIONS

Diagnostics

Anomaly

Detection

Predictive

ModelsText

Analytics

Pattern

Analysis

Network

Analysis

Page 30: Data Science for Cyber Risk

Prediction:

Supervised learning– You have a baseline: a dataset with examples of

what you are attempting to predict or classify

(random forests, boosted trees)

– Example: known examples of cyber attacks based

on Net Flow data

Discovering Patterns:

Unsupervised learning– You have a dataset, but little idea concerning the

patterns and categories

–Example: your have a large set of Net Flow data,

but do not know patterns

Cluster Analysis

Decision Trees

Two Major Approaches

Page 31: Data Science for Cyber Risk

31 | © 2014 Global Association of Risk Professionals. All rights reserved.

CAR Engine

TRAINING SET VALIDATION SET

USER PROFILE ATTACK PROFILE

NORMAL POSSIBLE THREAT

Device

Time of day

Source

location

IP

Threat

intelligence

Amount

Peer group

Destination

location

Secure

profile

Known

devices

Average

amount

Known

location

Known

destination

Applications

Supervised: Machine Learning

KNOWN!

Page 32: Data Science for Cyber Risk

32 | © 2014 Global Association of Risk Professionals. All rights reserved.

Unsupervised: Pattern & Anomaly Detection

• Understand normal: ‘normal is crazy enough!’

• Identify outliers on the basis of deviations

Page 33: Data Science for Cyber Risk

33 | © 2014 Global Association of Risk Professionals. All rights reserved.

Data Analytics Technologies

Data management

Relational databases: Oracle, IBM DB2, MS SQL Server, data warehouses

NOSQL: graph databases, document stores, key-value, column family

Mass storage: Hadoop clusters, cloud approaches, SAP Hana

Data management: SAS, many 3rd party products

Statistical analysis software

Math-focused: Matlab, Mathematica

Programmatic / scripting: Python, PERL, Java, Haskell

Packaged tools: SAS, SAS JMP, SPSS, R, SAP Infinite Insight

Machine learning

SAS Enterprise Miner

SAP Predictive Analysis

R, MatLab, Python, etc.

Visualization / dashboards

Tableau, QlikView, SAS Data Visualization

Semantic

Text mining, sentiment analysis, (i.e. SAS Contextual Analysis)

Big Data

Hadoop, Map Reduce, Hive, Spark, etc. etc.

Page 34: Data Science for Cyber Risk

34 | © 2014 Global Association of Risk Professionals. All rights reserved.

Enterprise Miner

Workflow

Configuration

Models / utilities

Data

IDE

Page 35: Data Science for Cyber Risk

35 | © 2014 Global Association of Risk Professionals. All rights reserved.

Learnings from the Field

Page 36: Data Science for Cyber Risk

36 | © 2014 Global Association of Risk Professionals. All rights reserved.

Learnings for the Field: Network Discovery

Example

Measures

• Centrality

• Eigenvector

• Density

• Reach

• Strength

• Recopricity

Page 37: Data Science for Cyber Risk

37 | © 2014 Global Association of Risk Professionals. All rights reserved.

• Average total # hours online per period (userid or device)

• Average # IPs active per hour per userid

• Propensity to be active on network after work hours

• Propensity to be active on network on weekends and

holidays

Learnings from the Field: Derive and Amplify

Page 38: Data Science for Cyber Risk

38 | © 2014 Global Association of Risk Professionals. All rights reserved.

Pareto Principle • 80/20% pattern in network-usage (user hours online)

• Outliers: multiple devices 24 hours online

• High correlation (80-90%) between hours online and propensity to

align with multiple usage patterns…

• Pattern has been observed across multiple samples

0%

20%

40%

60%

80%

100%

1-50 51-upwards

% Users to % Hours Active

Users Hours Active

Learnings from the Field: User Patterns

Page 39: Data Science for Cyber Risk

39 | © 2014 Global Association of Risk Professionals. All rights reserved.

Similar to the efficacy of financial ratios, ratios of key security measures

may be more indicative of threats than single point measures.

For instance:

• Ratio of total flows per hour TO unique destination IPs

• Measures nearing high of 1:1 would be threat indicator of scanning activities

• Ratio of unique internal destination IPs TO unique external IPs

• Low might be threat indicator, perhaps bot net data exfiltration

• Ratio of unique destination ports TO unique source ports

• Low would generally be considered a threat, as might indicate a

compromised system engaging in vulnerability surveillance across a range of

outgoing ports to compromise a new system at a particular port

Learnings from the Field: Power of ratios…

Page 40: Data Science for Cyber Risk

40 | © 2014 Global Association of Risk Professionals. All rights reserved.

Unusual

groupings

(cluster outliers)

Learnings for the Field: Not All Users are Alike…

Page 41: Data Science for Cyber Risk

41 | © 2014 Global Association of Risk Professionals. All rights reserved.

Clustering suggests

20 significant groups

Learnings for the Field: Not All Users are Alike…

Page 42: Data Science for Cyber Risk

42 | © 2014 Global Association of Risk Professionals. All rights reserved.

Each cluster

has a signature

pattern of 22

measures (high

and low)

Patterns in Complexity: Cluster Analysis

Page 43: Data Science for Cyber Risk

43 | © 2014 Global Association of Risk Professionals. All rights reserved.

Each cluster

has a signature

pattern of 22

measures (high

and low)

Patterns in Complexity: Cluster Analysis

Page 44: Data Science for Cyber Risk

44 | © 2014 Global Association of Risk Professionals. All rights reserved.

SIGNATURE PATTERN FOR IDENTIFIED INFECTED IP

Web Proxy Host Scanning Analysis Devices on the network that are anomalously scanning for

external devices via the Web Proxy serverWeb Proxy Destination Port Scanning Analysis Devices on the network that are anomalously scanning for

external devices via the Web Proxy serverApplication Server Host Scanning Analysis Identify devices on the network that are anomalously

scanning for devices hosting an http or application server

Attack Pattern Identification Example

Page 45: Data Science for Cyber Risk

45 | © 2014 Global Association of Risk Professionals. All rights reserved.

Graph data storage / network analytics for

cyber attack vector pattern capture

45

• NOSQL graph database ‘network pattern’ storage & retrieval

• Building a cyber attack pattern ‘library’

• Identifying suspicious patterns in large & complex datasets

Page 46: Data Science for Cyber Risk

Conclusion:Managing Models

Page 47: Data Science for Cyber Risk

47 | © 2014 Global Association of Risk Professionals. All rights reserved.

Patterns

DATA

MODELS

INSIGHTS

DIAGNOSTICS

Page 48: Data Science for Cyber Risk

48 | © 2014 Global Association of Risk Professionals. All rights reserved.

Model Diagnostics: Lift

Page 49: Data Science for Cyber Risk

49 | © 2014 Global Association of Risk Professionals. All rights reserved.

Model Diagnostics: Misclassification Rate

Page 50: Data Science for Cyber Risk

50 | © 2014 Global Association of Risk Professionals. All rights reserved.

Cyber Data Analytics Model Management

SPECIFY

RISKS

DATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECT

MODEL

BUILDING

MODEL TEST &

VALIDATION

MODEL

DEPLOYMENT

EVALUATE &

MONITOR

RESULTS

DETECTION

OPTIMIZATION

PATTERN

IDENTIFICATION

Objective ETL Develop Validate Deploy Monitor Retire

Page 51: Data Science for Cyber Risk

51 | © 2014 Global Association of Risk Professionals. All rights reserved.

Production Analytics

DATA

MODEL

TEST

DIAGNOSTICS

ASSESS

PRODUCTION

SYSTEM

VARIABLES

DATA

SCIENCE

DATA

ENGINEERING

Page 52: Data Science for Cyber Risk

52 | © 2014 Global Association of Risk Professionals. All rights reserved.

Cyber Risk Model Management

“Model risk management begins with robust model development, implementation, and use.

Another essential element is a sound model validation process. A third element is governance, which sets an effective framework with

defined roles and responsibilities for clear communication of model

limitations and assumptions, as well as the authority to restrict model usage”.

Source: Supervisory Guidance on Model Risk Management, April 2011, The Federal Reserve System

Models should be treated as enterprise assets

Model management requires collaboration between personas

Models need to be efficiently deployed into production

Models need to be effectively deployed into production

Page 53: Data Science for Cyber Risk

53 | © 2014 Global Association of Risk Professionals. All rights reserved.

Challenges:

Data Science in Commercial Organizations?

EXPERIMENTATION

Most organizations have limited appetite for conducting

experimentation / trial-and-error…

But it is rare that a data scientist will get a model / framework right on

the first try

This is a new realm – it is essential to perform diagnostic tests and to

adopt a mindset that allows for exploration of emerging phenomenon

Page 54: Data Science for Cyber Risk

54 | © 2014 Global Association of Risk Professionals. All rights reserved.

“If I had six hours to chop down a tree, I’d

spend the first four hours sharpening my axe.”

- Abraham Lincoln

Page 55: Data Science for Cyber Risk

C r e a t i n g a c u l t u r e o f

r i s k a w a r e n e s s ®

Global Association of

Risk Professionals

111 Town Square Place

14th Floor

Jersey City, New Jersey 07310

U.S.A.

+ 1 201.719.7210

2nd Floor

Bengal Wing

9A Devonshire Square

London, EC2M 4YN

U.K.

+ 44 (0) 20 7397 9630

www.garp.org

About GARP | The Global Association of Risk Professionals (GARP) is a not-for-profit global membership organization dedicated to preparing professionals and organizations to make

better informed risk decisions. Membership represents over 150,000 risk management practitioners and researchers from banks, investment management firms, government agencies,

academic institutions, and corporations from more than 195 countries and territories. GARP administers the Financial Risk Manager (FRM®) and the Energy Risk Professional (ERP®)

Exams; certifications recognized by risk professionals worldwide. GARP also helps advance the role of risk management via comprehensive professional education and training for

professionals of all levels. www.garp.org.

55 | © 2014 Global Association of Risk Professionals. All rights reserved.