statistical anomaly detection with esm

42
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Anurag Singla Sr. Manager, R & D Statistical anomaly detection with ESM Correlation

Upload: jpl1

Post on 19-Jul-2016

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Statistical Anomaly Detection With ESM

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Anurag Singla

Sr. Manager, R & D

Statistical anomaly detection with ESM Correlation

Page 2: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 2

Agenda

• Anomaly detection primer

• Use case 1: data monitors

• Review of recent ESM correlation features

• Use case 2: financial fraud detection

• Use case 3: time-sequence anomalies

• Caveats and conclusions

Page 3: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 3

Anomaly detection primer

Baselining

• Record statistics of a certain behavior/event flow

• Typically involves trending over long time periods

• Detect deviations from this baseline in order to discover anomalies

Threat (anomaly) score

• Amount of statistical deviation from baseline for a new event / pattern

− User logged in 5x longer than usual

− Unusually high volume data transfers from application host

− Large bank account transfers compared to monthly average

Page 4: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 4

Anomaly detection primer

Some types of anomalies are well understood and can be captured using static rules

• Login from multiple geo regions in short time period

• Large # sources connecting to same target

• Known virus/worm definitions

Others are more dynamic and have too much variance across individual actors.

Page 5: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 5

Use case 1: data monitors

Data Monitors

• Track real-time event streams.

• Perform automatic short term baselining.

• Audit events if some statistic (eg. Moving Average) deviates from recent history by a

certain threshold. Rule can fire and take action when threshold exceeded.

• Not suited to analysis over long periods.

Page 6: Statistical Anomaly Detection With ESM

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Use case 2: financial fraud detection

Page 7: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 7

Behaviour-based fraud detection

Methodology: compare an account holder’s recent transactions with the historical behavior

patterns of the account holder, looking for anomalies.

Fire an alert if the total amount of money a customer has transferred out of his account today is

greater than the account’s historical cumulative monthly average.

• If the customer transfers out on average $1200 per month, then as soon as a transaction is received that

puts his daily transfer total (since 12am this morning) above $1200, we should fire an alert and add the

account to a watch list for investigation.

• Conditions for detecting fraud can be fine tuned to avoid false positives.

Page 8: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 8

ESM enhancements for anomaly detection

Baselining:

• Interval queries on active lists

• Lightweight rules

• Timestamp granularity variables

Real-time detection:

• Cumulative active list fields

• Lightweight rules

• Time-partitioned active lists

Page 9: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 9

Cumulative active list fields

Problem: atomic column increment

Solution: cumulative numeric columns

• For numeric column types (Integer, Long, Double),

subtypes SUM, MIN, MAX.

• The value from AddToAL action is combined

atomically with existing value.

• To implement counter, use Integer(SUM) field and

add value 1 each time

− Can obtain mean value using counter in

conjunction with value sum field

Page 10: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 10

Cumulative active list fields

Num

transactions

Total

amount

Max Amount Min Amount

1 200.00 200.00 200.00

Num

transactions

Total

amount

Max

Amount

Min

Amount

1 200.00 200.00 200.00

Values inserted by AddToList action (CustomerId and TimeKey same for all)

Resulting values in AL entry

Page 11: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 11

Cumulative active list fields

Num

transactions

Total

amount

Max Amount Min Amount

1 50.00 50.00 50.00

Num

transactions

Total

amount

Max

Amount

Min

Amount

1 200.00 200.00 200.00

2 250.00 200.00 50.00

Values inserted by AddToList action (CustomerId and TimeKey same for all)

Resulting values in AL entry

Page 12: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 12

Cumulative active list fields

Num

transactions

Total

amount

Max Amount Min Amount

1 875.00 875.00 875.00

Num

transactions

Total

amount

Max

Amount

Min

Amount

1 200.00 200.00 200.00

2 250.00 200.00 50.00

3 1125.00 875.00 50.00

Values inserted by AddToList action (CustomerId and TimeKey same for all)

Resulting values in AL entry

Page 13: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 13

Lightweight rules

Designed for data list maintenance

• No correlation or audit event when rule fires

• No aggregation (stateless)

• Can match large # events and not get disabled

Allow separation of data maintenance and risk

analysis logic.

• Processed earlier than regular rules.

• Lightweight rule actions executed before regular

rule conditions are evaluated.

Page 14: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 14

Timestamp granularity variables

• Convert timestamp to beginning of time period

(hour/day/month, etc.)

− Use result in AL key field

• Input arguments are timestamp field (eg. EndTime,

DeviceCustomDate) plus granularity selection.

• Output is timestamp value shifted to desired

boundary.

• Example: transaction time = 2012 -09-11-14:32

− Start of hour: 2012-09-11-14:00

− Start of day: 2012-09-11-00:00

− Start of month: 2012-09-01-00:00

Page 15: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 15

Time-partitioned partially cached active lists

• Keep most recent entries (latest timestamp key

value) in memory for fast correlation

− Evict entries with older timestamp value

• Partition the cache into buckets based on time

values.

• PLEASE do not insert random time values (like

EndTime) into time field!

Page 16: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 16

Interval query on active list

Problem: query the entries of an

active list based on a time interval.

• Query on active list was snapshot based

– all the entries in the AL were

considered.

Solution: interval queries on AL

• Make the query type – interval.

• Enter the start time and end time of the

query.

• Select the field on which the time interval

will be evaluated.

Page 17: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 17

Workflow: baselining and detection

Lightweight

rules

Daliy

active list Monthly

trend

Monthly

active list Stats

trend

Historical

stats

active list

Rules Transaction

events Anomaly?

Update daily

transaction

stats

Runs every 30

days Runs every 30

days over 180

days

Read daily

values Read historical

values

Page 18: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 18

Daily transactions active list

Cumulative Fields

TimePartitioned

Page 19: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 19

Historical stats active list

Page 20: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 20

Active list roll up using trends

CustId TimeKey Total Max

46201 Aug 5th $250 $150

46201 Aug 14th $700 $700

50532 Aug 7th $100 $100

50532 Aug 28th $3290 $3000

Cust TimeKey Total Max

daily

46201 Aug $950 $700

50532 Aug $3390 $3290

Daily transactions Monthly transactions

Trend Query

Interval: 1 mo

Freq: 1 mo

Page 21: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 21

Active list roll up using trends

CustId TimeKey Total Max

daily

46201 May $2500 $1800

46201 Jun $800 $300

46201 Aug $950 $700

50532 Apr $1600 $1000

50532 Jul $810 $600

50532 Aug $3390 $3290

Cust Max

daily

Max

monthly

Mean

monthly

46201 $1800 $2500 $1417

50532 $3290 $3390 $1933

Monthly transactions Historical stats

Trend Query

Interval: 6 mos

Frequency: 1 mo

Page 22: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 22

Data maintenance rule

Page 23: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 23

Fraud detection rule

Page 24: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 24

Fraud detection rule in action

Snapshot in time for CustomerId 46201:

• Historical Stats AL: mean monthly total = $1200

• Daily Transactions: cumulative total = $1150

Transaction event received: CustomerId = 46201, transaction value = $100

1. Data Maintenance rule (lightweight): updates daily total to $1150 + $100 = $1250

2. Fraud Detection rule (standard): looks up Daily AL, finds updated cumulative value ($1250)

3. Condition is matched, rule fires and adds customer account to Suspicious Accounts AL

Page 25: Statistical Anomaly Detection With ESM

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Use case 3: sequence anomaly detection

Page 26: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 26

Use case 3: sequence anomaly detection

Page 27: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 27

What do users normally do?

Baselining process

Vertica /R Historical User Data

Profile 1 Profile 2

Profile 3

Transition Probability Matrix

Hadoop / Mahout

User-Profile Mapping

Page 28: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 28

Transition diagram – online banking

Login

Main

Page

Add

Account

Check

Balance

Transfer

Page

Transfer

Init

Transfer

Commit

Transfer

Failed

Transfer

Success

Remove

Account

1.0

Help

0.4

0.4

0.2

0.12

0.72

0.06

0.08

1.0

0.02

0.80

0.05

0.15

0.80

0.05

1.0

0.15

0.55

0.18

0.27

0.52

0.46

0.02

0.1

0.7

0.2

0.36

0.64

State transition probabilities for specific user profile (101)

Page 29: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 29

Transition diagram – online banking

Login

Main

Page

Add

Account

Check

Balance

Transfer

Page

Transfer

Init

Transfer

Commit

Transfer

Failed

Transfer

Success

Remove

Account

Help

0.4

0.4

0.2

Page 30: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 30

Transition diagram – online banking

Login

Main

Page

Add

Account

Check

Balance

Transfer

Page

Transfer

Init

Transfer

Commit

Transfer

Failed

Transfer

Success

Remove

Account

Help

0.12

0.72 0.06

0.08 0.02

Page 31: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 31

Transition diagram – online banking

Login

Main

Page

Add

Account

Check

Balance

Transfer

Page

Transfer

Init

Transfer

Commit

Transfer

Failed

Transfer

Success

Remove

Account

Help

0.80

0.05

0.15

Page 32: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 32

Transition diagram – online banking

Login

Main

Page

Add

Account

Check

Balance

Transfer

Page

Transfer

Commit

Transfer

Failed

Transfer

Success

Remove

Account

Help

0.80 0.05

0.15

Page 33: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 33

Transition diagram – online banking

Login

Main

Page

Add

Account

Check

Balance

Transfer

Page

Transfer

Commit

Transfer

Failed

Transfer

Success

Remove

Account

Help

0.52

0.46

0.02

Page 34: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 34

Transition diagram – online banking

Login

Main

Page

Add

Account

Check

Balance

Transfer

Page

Transfer

Commit

Transfer

Failed

Transfer

Success

Remove

Account

Help

0.55

0.18

0.27

Page 35: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 35

Transition probability (baseline) active list

Page 36: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 36

Real-time anomaly detection process

Online session in progress

Receive incoming event containing: {UserId, SessionId, OldState, NewState}

Lightweight (maintenance) rule does:

1. Lookup Profile for User

2. Look up transition probability between old and new state

3. Replace missing transition with low prob value (e.g. 0.000001)

4. Compute negative log (JME variable), add to session anomaly score (SUM field)

Standard rule checks anomaly score against defined threshold

Page 37: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 37

Normal user session

Login

Main

Page

Add

Account

Check

Balance

Transfer

Page

Transfer

Init

Transfer

Commit

Transfer

Failed

Transfer

Success

Remove

Account

Help

0.4 (0.40)

1.0 (0) 0.72 (0.14)

0.80 (0.097)

0.80 (0.097)

0.52 (0.28)

Anomaly Score (Sum of reds) = 1.02

Page 38: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 38

Fraudulent user session, part 1

Login

Main

Page

Add

Account

Check

Balance

Transfer

Page

Transfer

Init

Transfer

Commit

Transfer

Failed

Transfer

Success

Remove

Account

Help

0.4 (0.40)

1.0 (0)

0.80 (0.097)

Anomaly Score (part 1) = 5.20

1.0

(0

)

Page 39: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 39

Fraudulent user session, part 2

Login

Main

Page

Add

Account

Check

Balance

Transfer

Page

Transfer

Init

Transfer

Commit

Transfer

Failed

Transfer

Success

Remove

Account

Help

0.80 (0.097)

Anomaly Score (part 2) = 3.27, total = 5.20 + 3.27 = 8.47 (vs 1.02)

0.18 (0.74)

1.0

0.80 (0.097)

Page 40: Statistical Anomaly Detection With ESM

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 40

Anomaly detection - caveats

Statistical baseline issues (hat tip: John Petropoulos)

Prone to being skewed.

• During an attack, short-term stats are prone to being skewed quickly.

• How do we deal with this? Remove the offending entries?

False positives can be reduced, never eliminated

• Users sometimes behave strangely

• Seasoned fraudsters can appear eerily natural

Page 41: Statistical Anomaly Detection With ESM

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you

Page 42: Statistical Anomaly Detection With ESM

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Security for the new reality