detecting hacks: anomaly detection on networking data

35
1 © 2010 Cisco and/or its affiliates. All rights reserved. Detecting Hacks: Anomaly Detection on Networking Data James Sirota (@JamesSirota) Lead Data Scientist – Managed Threat Defense Chester Parrott (@ParrottSquawk) Data Scientist – Managed Threat Defense June 2015

Upload: james-sirota

Post on 28-Jul-2015

1.344 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Detecting Hacks: Anomaly Detection on Networking Data

1 © 2010 Cisco and/or its affiliates. All rights reserved.

Detecting Hacks: Anomaly Detection on Networking Data James Sirota (@JamesSirota) Lead Data Scientist – Managed Threat Defense Chester Parrott (@ParrottSquawk) Data Scientist – Managed Threat Defense

June 2015

Page 2: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 2

In the next few minutes… •  Defense in Depth for Big Data •  Network Anomaly Detection Overview

•  Volume Anomaly Detection •  Feature Anomaly Detection

•  Model Architecture •  Deployment on OpenSOC Platform •  Questions

Page 3: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 3

Who are we?

Big Data Security Analytics

Open Source

Managed Service

Page 4: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 4

The New Defense-In-Depth

Defense Strategy

Static Sandboxing

Threat Intel Feeds

Rules Engines

Volume-Based

Feature-Based

NLP-Based

Token Clustering

User Profiling

Asset Profiling

Interaction Profiling

Dynamic Sandboxing

Malware Classifiers

Script Classifiers

Perimeter Monitoring

Web Scraping

Soc. Media Analytics

Model Validators

Training Set Generation

Signature Matching

Rules-Based

Matching

Network Anomaly Detection

Log Anomaly Detection

Behavioral Anomaly Detection

Malware Family

Script Family Scraping Honeypots

Misuse Detection

Intrusion Detection

Supervised Class.

Look-Ahead

Analytics

Legacy Mindset Generic Threats Targeted Threats Future Threats

Page 5: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 5

Network Anomaly Detection

Network Anomaly Detection

Volume-Based

Feature-Based

Statistical Process Control

Frequency Domain

Time series Forecasting

Information Theory

Principal Component

Analysis

Sketch-Based

3-sigma algorithms

Exponential Smoothing

ARIMA

Fast Fourier Transform

Wavelets

Entropy Subspace Heavy Hitters

Set Cardinality

Probability Models

Markov Models

Bayes Nets

Unsupervised ML

Clustering

Density

Proximity

Anomalous Traffic Patterns

Interrelationships between Features

Page 6: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 6

Volume-Based vs. Feature Based

Telemetry Volume-Based Feature-Based Encrypted Traffic (Raw Packet) YES NO

Raw Packet + Header Metadata YES YES

Machine Exhaust Data YES (online) NO DPI Metadata NO YES

Netflow YES YES Enrichment Metadata YES YES

Application Logs YES YES Other Alerts NO* YES

Page 7: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 7

Anomaly Detection: 3-Phase Process

Unstructured Data

Identify

Anomaly

Classify

Alert

Examine + Reinforce

Training Set Historical Context

Page 8: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 8

Phase 1: Identify

Unstructured Data

Und

erst

andi

ng o

f N

orm

al

Anomaly A

Anomaly B

Anomaly C

Anomaly (N)

Page 9: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 9

Phase 2: Classify

Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome

Volume Anomaly

Entropy Anomaly

Feature (x)

Heavy Hitters Anomaly

Volume Anomaly

Cardinality Anomaly

Feature (x)

Protocol Anomaly Featur(x)

Anomaly (A) Anomaly (B) Anomaly (N) Class Label

x x x x x x x Port Scan

x x x x x False Positive

x x x x Network Scan

x x x x Port Scan

x x x x False Positive

x x x x x x DDoS

Page 10: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 10

Phase 3: Examine + Reinforce Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome

Volume Anomaly

Entropy Anomaly

Feature (x)

Heavy Hitters Anomaly

Volume Anomaly

Cardinality Anomaly

Feature (x)

Protocol Anomaly Featur(x)

Anomaly (A) Anomaly (B) Anomaly (N) Class Label

x x x x x x x Port Scan

x x x x x False Positive

x x x x Network Scan

x x x x False Positive

x x x x x x DDoS

x x x x x x False Positive

x x x x x x False Positive

x x x x False Positive

x x x x x x DDoS

Page 11: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 11

Basic Anomalies

Anomaly   Definition  Alpha Flows Large volume point-to-point flows

DoS Denial of service (distributed or single source)

Flash Crowd Large volume of traffic to a single destination from a large number of sources

Port Scan Probe to many destination ports on a small number of destination addresses

Network Scan Probe to many destination addresses on a small number of destination ports

Outage Events Traffic shifts because of equipment failures or maintenance

Plateau Behavior Behavior caused by traffic reaching environmental limits

Point-to-Multipoint Traffic from a single source to many destinations, e.g., content distribution

Worms Scanning by worms for vulnerable hosts, which is a special case of network scan

Page 12: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 12

Batch Analytics Normalcy Models

Page 13: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 13

Implementation

MAP MAP MAP

Time Series DB

Key: assetID-metricID-Bin

RED RED RED

Asset Bin Value

Server 1 15 5pt *

Server 2 15 5pt *

Server (N) 15 5pt *

assetID-metricID-Bin : 5pt

Telemetry

Anomaly?

* 5-point summary (5pt): 1.  the sample minimum

(smallest observation) 2.  the lower quartile or first

quartile 3.  the median (middle value) 4.  the upper quartile or third

quartile 5.  the sample maximum (largest

observation)

Table Name: Metric ID (Cumulative Volume)

Page 14: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 14

Batch Analytics Forecasting Models

Forecast

Forecasting Algorithm (ARIMA/Holt-Winters, …)

Page 15: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 15

Implementation

MAP MAP MAP

Time Series DB

Key: assetID-metricID-Bin

RED RED RED

Key: assetID-metricID-Bin: [Expected | STD]

Telemetry

Anomaly? Asset Bin Value

Server 1 15 EX |STD

Server 2 15 EX |STD

Server (N) 15 EX |STD

Table Name: Metric ID (Cumulative Volume)

Page 16: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 16

Time Series DB

Batch Model Deployment Step 1: Bootstrap: Stream Data

Unstructured Data

OpenSOC OpenSOC JSON

Step 2: Pre-Compute Expected Values (Batch)

Timestamp

HIVE

Time Series DB MR/Spark MR/Spark MR/Spark

Step 3: Generate Alerts (Online)

Unstructured Data

OpenSOC

Expected Values Reference Cache

Time Series DB

OpenSOC JSON

Timestamp

HIVE

Alert ES

Expected Values Reference

Cache

Page 17: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 17

Online Analytics Data Preparation

Deseasonalizer AV CMA RAT UF RF DV

Page 18: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 18

Online Analytics Other things to check for

Trend:

Seasonal Variability:

Evolution of Regularities:

Page 19: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 19

Online Processing

3-Sigma Algorithms

Micro Forecasting

Histogram Bins

Page 20: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 20

Frequency Domain

High

•  Trendless •  Noise •  Spikes represent

Anomalies

Medium •  Flatter •  Finer-grained

Trends

Low •  Seasonal &

‘Peaky’ •  Weekly/Daily

Trends

Page 21: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 21

Frequency Domain – Wavelet Separation

Page 22: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 22

Online Model Deployment

Time Series DB

Step 1: Bootstrap: Stream Data

Unstructured Data

OpenSOC OpenSOC JSON

Step 2: Generate Adjuster

Timestamp

HIVE

Time Series DB MR/Spark

Adjuster / Decomposer

Step 3: Generate Alerts (Online)

Unstructured Data OpenSOC

Time Series DB

OpenSOC JSON

Timestamp

HIVE

Alert ES

Adjuster Decomposer

MR/Spark MR/Spark

Page 23: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 23

Feature-Based Anomaly Detection Continuous Numeric Features*

•  Continuous Numeric Feature - can take on any value between its minimum value and its maximum value •  Normalization - adjusting values measured on different scales to a notionally common scale

1.  Proximity Based Techniques Example: K-Nearest Neighbors (KNN)

2. Clustering Example: K-Means 3. Density - Based MPS Anomaly

KBps Anomaly

Possible Explanation

TOO HIGH TOO LOW Port Scan Network Scan

TOO HIGH TOO HIGH DDoS

TOO LOW TOO HIGH Control Traffic Anomaly

OK OK No Anomaly

Sample Anomalies Detected

Page 24: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 24

Feature-Based Anomaly Detection Categorical Features *

•  Categorical Features - can take on one of a limited, and usually fixed, number of possible values •  Stream Sketch - algorithm produces an approximate answer based on a summary of the data stream in memory

Example: Protocol {UDP|FTP|HTTP|…}, GEO-MET {PHOENIX | DALLAS | LONDON| …}, … Count-Min (CM) Sketch : number of occurrences of the element in a stream (Heavy Hitters) Why not count? Protocol: 42k elements per asset. GeoMet: 246k per asset

Time Series DB Categorical Data CM

Sketch Heavy Hitters

Asset Bin Value

Server 1 15 HH

Server 2 15 HH

Server (N) 15 HH M

R

Table Name: Protocol

Unstructured Data CM

Sketch Alert

Expected: {HTTP, UDP, FTP, DNS} ACTUAL: {DNS, ICMP, HTP, FTP}

Page 25: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 25

Feature-Based Anomaly Detection Feature Ratios HyperLogLog: approximating the number of distinct elements in a multiset Useful Ratio: # distinct elements / total elements [0-1]

•  Digest- structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means

Unstructured Data Hyper

LogLog Distinct

Src_port Dst_port Src_ip Dst_ip

Storm Bolt

Src_port Dst_port Src_ip Dst_ip

Ack Total

Ratios

Digest *

Alert FEATURE DT RATIO

Anomaly Possible Reason

SRC_IP ~1/~0 Flash Crowd/DDoS

SRC_PORT ~1/~0 Failure Probing/App Hijack

DST_IP ~1/~0 Network Scan/DDoS

DST_PORT ~1/~0 Port Scan/Footprinting

Page 26: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 26

Feature-Based Anomaly Detection Correlation - Information Theory

•  Information Theory - study of fundamental limits on signal processing, compression, and storage •  Entropy- a measure of unpredictability of information content

Unstructured Data

Anomaly-Free Training Set

Entropy Summarizer

Entropy

Src_port Dst_port Src_ip Dst_ip Time Bin (n)

SRC_IP

SRC_PORT

DST_IP

DST_PORT

SRC_IP - .95 .85 .75

SRC_PORT - .97 .76

DST_IP - - - .98

DST_PORT - - - -

MR

Ale

rt

Time Bin (n)

Page 27: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 27

Principal Component Analysis (PCA)

Analysis

Component

Principal •  Feature Selection Algorithm

•  Dimensionality Reduction

•  E.g. 4 features

•  ServerA (A)

•  ServerB (B)

•  ServerC (C)

•  Cumulative = A + B + C

Page 28: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 28

PCA – Component Construction ServerA Traffic

X -0.5052803

ServerB Traffic

X -0.4990556

ServerC Traffic

X -0.4816276

Cumulative X

-0.5134882

PC1

σ: 0.0135

ServerA Traffic

X 0.2801275

ServerB Traffic

X 0.4611079

ServerC Traffic

X -0.8395562

Cumulative X

0.0636666

PC2 σ: 0.5773

ServerA Traffic

X 0.6867089

ServerB Traffic

X -0.6988557

ServerC Traffic

X -0.1441834

Cumulative X

0.138718

PC3 σ: 0.5773

ServerA Traffic

X -0.4411929

ServerB Traffic

X -0.2234362

ServerC Traffic

X -0.2058916

Cumulative X

0.8444132

PC4

σ: 0.5773

Page 29: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 29

PCA – Component Separation

Page 30: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 30

PCA – Component Separation

Page 31: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 31

Putting it All Together: OpenSOC

RAW Transform Enrich Alert (Rules-Based)

Enriched

Filter Aggregators

Router Model 1 Scorer

HIVE + Hbase Long-Term Data Store

Flume Kafka Storm

Model 2 Model n

OpenSOC-Streaming

OpenSOC-Aggregation

OpenSOC-ML

SOC Alert Consumers

UI UI UI UI UI Web Services

Secure Gateway Services

External Alert Consumers

Big Data Stores

Elastic Search Real-Time Index and Search

Hbase OpenTSDB Titan Graph

Alerts

ES/HIVE Alerts Store

Remedy Ticketing System

Page 32: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 32

We are hiring…

•  Data Scientists (Security) •  Aspiring Data Scientists

•  Security/Networking Experience Required •  Software Engineering Experience Required •  PhD not required •  Background in stats or ML not required

•  Security Researchers *Please contact us via LinkedIn with your profile

Page 33: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 33

Book idea…

Security Analytics on Hadoop •  Anomaly Detection •  Targeted Models •  Deployment Best Practices •  Alerts •  Visualization Techniques •  Etc…

If interested in contributing please contact James Sirota on LinkedIn

Page 34: Detecting Hacks: Anomaly Detection on Networking Data

© 2015 Cisco and/or its affiliates. All rights reserved. 34

OpenSOC Resources (@ProjectOpenSOC)

Github Repo •  https://github.com/OpenSOC/opensoc Slides •  http://www.slideshare.net/JamesSirota •  https://speakerdeck.com/jsirota

Corporate Blogs •  http://blogs.cisco.com/author/jamessirota •  http://blogs.cisco.com/security/opensoc-an-open-commitment-to-security

Contributor Blogs •  https://medium.com/@jamessirota •  parrottsquawk.com

Page 35: Detecting Hacks: Anomaly Detection on Networking Data

Thank you.