learning factor graphs for preempting multi-stage attacks in...

1
1 2 3 4 5 6 7 8 9 10 Iteration 0.0 0.2 0.4 0.6 0.8 1.0 g(x , z 1 , z 2 ) EM learning for g(x , z 1 , z 2 ) z 1 =0 z 1 =1 z 2 =0 z 2 =1 Approach Learning Factor Graphs for Preempting Multi-Stage Attacks in Cloud Infrastructure Phuong Cao, Alexander Withers, Zbigniew Kalbarczyk, Ravishankar Iyer University of Illinois at Urbana-Champaign Result on learning parameters Expectation-Minimization algorithm shows a fast convergence rate, i.e., 3 iterations, of marginal probability of z i for a 3-variable clique. Goals Detect multi-stage attacks that make use of stolen credentials in large enterprise networks, e.g., cloud infrastructure Employ factor graphs, a probabilistic graphical model, to capture attacker behavior and detect malicious activities. Learning graph structure that represents dependencies among observed events and attack stages Learning graph parameters, i.e., factor functions among observed events and hidden attack stages, that represents strength of their dependencies using a factor graph correspond to ongoing attack Future Work 1. Automatically build graphs for evaluating security of pre-deployed cloud applications 2. Evaluate of learned graphs in terms of detection accuracy or model complexity 3. Build models to automatically respond to on- going attacks References [1] Koller, Daphne, and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009. [2] Cao, Phuong, et al. "Preemptive intrusion detection." Proceedings of the 2014 Symposium and Bootcamp on the Science of Security. ACM, 2014. Acknowledgement This material is based upon work supported by the Maryland Procurement Office under Contract No. H98230-14-C-0141 National Center for Supercomputing Applications staffs for insightful discussions on incident reports of multi-stage attacks Attack Paths </> — {} —— ——— Cloud Config Markup Document Construct Attack Graph Attack Graph I —— C — D—— Past incidents Events Factor functions Learn parameters Build factor graph Determine Attack Stage 1 2 3 4 5 = 0.1 0.3 0.1 0.4 0.1 Motivating Example: A Credential Stuffing Attack Steal credentials / secret keys Launch Denial of Service Send spams Mine Bitcoins Target Site Leaked Credentials Botnet Black Markets Password ********* Compromised Machine Firewall File system /dev/shm Compile RHEL exploit Escalate to #root Attacker Remote Location src seq dst ack 3 x Venom Rootkit Port 9090 Loadable Kernel Module Userland backdoor Network scanning tools X Buy, steal, or fish credentials 1 3 Drop exploit kit ssh-rsa AAAAB3… 2 Masquerade 4 Escalate privilege 5 Deploy rootkit 6 Open port upon receiving port knocking sequence 8 Perform attack payloads Invisible to network scanning tools 7 Send port knocking sequence TCP dst + seq = 1221 </> Automated guessing Account takeover $$$ Credential Stuffing: Attacks that exploit leaked credentials for automated penetration of systems. + , - . / + , - . / 1 1 2 3 4 5 = 0.1 0.3 0.1 0.4 0.1 ∈ , z 8 0,1 represent hidden attack stages ∈ , x ∈ [1, |X|] represent observed events ( 1 ) 1 0.10 0 0.90 1 ( 1 , 1 , 2 ) 1 1 2 ? 1 0 0 ? 1 0 1 ? 1 1 0 ? 1 1 1 : initial compromise : host hopping : escalate privileges : maintain presence : deliver payload The most probable attack stage: ”maintain presence” 1 : incorrect logins 2 : successful login 3 : sensitive file sys location 4 : download source file 5 : compile 6 : new kernel module , | = ^ ( , ) h( 2 ) 2 0.70 0 0.30 1 Graph structure consist of edges specify link among variables I —— C — D—— Past Incidents Dataset D Independence Test Dependent Events , + - - / / Events at run-time Expectation Maximization of Factor Functions , + ( , , ) ? 1 0 0 e f Ongoing attacks Bro logs System logs Network flows Learning graph structure (offline and runtime) The goal of learning graph structure, i.e., factor graph, is to automatically establish dependencies among observed events and hidden attack stages by using an Χ , independence test on training data D. The dependencies are used to construct a set of model candidates i , e.g., simple model using only strongest dependencies or complex model using all dependencies. opq i = t log (, , i , , ) + i i || A model candidate i is scored based on three terms in respective order: - Goodness of fit with training data ( t log (, , i , , )) - Entropy of to avoid overfitting and favor model stability ( i - Complexity of model and availability of training data ( i ||) Learning graph parameters (offline) The goal of learning graph parameters is to automatically define parameters e f of factor functions in tabular forms. Expectation Maximization algorithm is used for learning parameters of each factor function because it can handle missing or incomplete training data, which is the case for most multi-stage attacks. Input. Training dataset D of past attacks Init. Start with a random initialization of | Repeat each iteration until converge: E-step. Calculate expected likelihood of log likelihood function ~+ ~ = , ~ [log((, , ~ )] M-step. Maximize parameters of ~+ ~+ = t ~+ ~ Inference of ongoing attack stages (runtime) Given a factor graph of an ongoing attack at runtime, inference is to determine the most likely unknown attack stage and output a confidence level for each stage. = P(x, z, ) At this stage, off-the-shelf inference techniques such as Belief Propagation, Monte Carlo Markov Chain, or Variational Inference can be employed. Determine response to the identified attack stage, e.g., z i = maintain presence is the most probable attack stage in this example. 0.1 0.3 0.1 0.4 0.1 0 0.1 0.2 0.3 0.4 0.5 1 2 3 4 5 P(z) Illustration of the proposed approach in the context of a credential stuffing attack.

Upload: others

Post on 01-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Factor Graphs for Preempting Multi-Stage Attacks in ...publish.illinois.edu/science-of-security-lablet/files/...1 2 3 4 5 6 7 8 9 10 Iteration 0.0 0.2 0.4 0.6 0.8 1.0 g (x,

1 2 3 4 5 6 7 8 9 10Iteration

0.0

0.2

0.4

0.6

0.8

1.0

g(x,

z 1,z

2)

EM learning for g(x, z1, z2)

z1 = 0z1 = 1z2 = 0z2 = 1

Approach

Learning Factor Graphs for Preempting Multi-Stage Attacks in Cloud InfrastructurePhuong Cao, Alexander Withers, Zbigniew Kalbarczyk, Ravishankar IyerUniversity of Illinois at Urbana-Champaign

Result on learning parametersExpectation-Minimization algorithm shows a fast convergence rate, i.e., 3 iterations, of marginal probability of zi for a 3-variable clique.

Goals• Detect multi-stage attacks that make use of stolen credentials in large

enterprise networks, e.g., cloud infrastructure• Employ factor graphs, a probabilistic graphical model, to capture

attacker behavior and detect malicious activities.– Learning graph structure that represents dependencies among observed events and attack

stages

– Learning graph parameters, i.e., factor functions among observed events and hidden attack stages, that represents strength of their dependencies using a factor graph correspond to ongoing attack

Future Work1. Automatically build graphs for evaluating

security of pre-deployed cloud applications

2. Evaluate of learned graphs in terms of detection accuracy or model complexity

3. Build models to automatically respond to on-going attacks

References[1] Koller, Daphne, and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009. [2] Cao, Phuong, et al. "Preemptive intrusion detection." Proceedings of the 2014 Symposium and Bootcamp on the Science of Security. ACM, 2014.

AcknowledgementThis material is based upon work supported by the Maryland Procurement Office under Contract No. H98230-14-C-0141National Center for Supercomputing Applications staffs for insightful discussions on incident reports of multi-stage attacks

Attack Paths

</> —{} —————

Cloud ConfigMarkup Document

ConstructAttack Graph

Attack Graph

I ——C ——D——

Pastincidents

Events

Factor functions

Learnparameters

Build factor graph

DetermineAttackStage

𝑠1𝑠2𝑠3𝑠4𝑠5

=

0.10.30.10.40.1

Motivating Example: A Credential Stuffing Attack

Steal credentials / secret keys

Launch Denial of Service

Send spams

Mine Bitcoins

Target Site

Leaked Credentials

Botnet

Black Markets Password*********

Compromised Machine

Firewall

File system/dev/shm

Compile RHEL exploit

Escalate to #root

Attacker

Remote Location

srcseq

dst

ack…

3 x

Venom Rootkit

Port 9090

Loadable Kernel Module

Userland backdoor

Network scanning tools

X

Buy, steal, or fishcredentials

1

3 Drop exploit kit

ssh-rsaAAAAB3…

2 Masquerade

4 Escalate privilege

5 Deploy rootkit

6 Open port upon receivingport knocking sequence

8 Perform attack payloads

Invisible to network scanning tools

7 Send port knocking sequenceTCP dst + seq = 1221

</>Automated

guessing

Account takeover $$$

Credential Stuffing: Attacks that exploit leaked credentials for automated penetration of systems.

𝑧+ 𝑧, 𝑧- 𝑧. 𝑧/

𝑥+ 𝑥, 𝑥- 𝑥. 𝑥/ 𝑥1

𝑧1𝑧2𝑧3𝑧4𝑧5

=

0.10.30.10.40.1

𝑧𝑖 ∈ 𝑍, z8 ∈ 0,1 representhiddenattackstages𝑥𝑖 ∈ 𝑋,x𝑖 ∈ [1,|X|]representobservedevents

𝑓(𝑠1) 𝑧10.10 0

0.90 1

𝑔(𝑥1, 𝑧1, 𝑧2) 𝑥1 𝑧1 𝑧2? 1 0 0

? 1 0 1

? 1 1 0

? 1 1 1

𝒛𝟏 :initialcompromise𝒛𝟐 :hosthopping𝒛𝟑 :escalateprivileges𝒛𝟒 :maintainpresence𝒛𝟓 :deliverpayload

The most probable attack stage:”maintain presence”

𝑥1 :incorrectlogins𝑥2 :successfullogin𝑥3 :sensitivefilesyslocation𝑥4 :downloadsourcefile𝑥5 :compile𝑥6 :newkernelmodule

𝑷 𝒙, 𝒛|𝜽 =𝟏𝒁^𝜽𝒄𝑻𝒇𝒄(𝒙𝒄, 𝒛𝒄)𝒄∈𝓒

h(𝑧2) 𝑧20.70 0

0.30 1

Graph structure consist of edgesspecify link among variables

I ——C —D——

PastIncidentsDataset D

IndependenceTest

Dependent Events

𝑧,𝑥+

𝑧-𝑥-

𝑧/𝑥/

Events at run-time

ExpectationMaximization of 𝜃

Factor Functions

𝑧,𝑥+

𝒈(𝒙𝟏, 𝒛𝟏, 𝒛𝟐) 𝒙𝟏 𝒛𝟏 𝒛𝟐

? 1 0 0

… … … …

𝜃ef

Ongoing attacks

Bro logsSystem logsNetwork flows

Learning graph structure (offline and runtime)The goal of learning graph structure, i.e., factor graph, is to automatically establish dependencies among observed events and hidden attack stages by using an Χ,independence test on training data D. The dependencies are used to construct a set of model candidates 𝑚i ∈ 𝑀 , e.g., simple model using only strongest dependencies or complex model using all dependencies.

𝑠𝑐𝑜𝑟𝑒 opq 𝑚i 𝐷 = 𝑚𝑎𝑥tlog𝑃(𝑥, 𝑧,𝑚i, 𝐷, 𝜃) + 𝑙𝑜𝑔 𝜃𝑃 𝜃 𝑚i − 𝑑𝑖𝑚 𝑚i 𝑙𝑛|𝐷|

A model candidate𝑚i is scored based on three terms in respective order: - Goodness of fit with training data (𝑚𝑎𝑥tlog𝑃(𝑥, 𝑧,𝑚i, 𝐷, 𝜃))- Entropy of 𝜃 to avoid overfitting and favor model stability (𝑙𝑜𝑔 𝜃𝑃 𝜃 𝑚i

- Complexity of model and availability of training data (𝑑𝑖𝑚 𝑚i 𝑙𝑛|𝐷|)

Learning graph parameters (offline)The goal of learning graph parameters is to automatically define parameters 𝜃efof factor functions in tabular forms.

Expectation Maximization algorithm is used for learning parameters of each factor function because it can handle missing or incomplete training data, which is the case for most multi-stage attacks.

Input. Training dataset D of past attacksInit. Start with a random initialization of 𝜃|

Repeat each iteration until converge:E-step. Calculate expected likelihood of log likelihood function

𝑄 𝜃~�+ 𝜃~ = 𝐸 𝑧 𝑥, 𝜃~ [log(𝑃(𝑥, 𝑧, 𝜃~)]

M-step. Maximize parameters of 𝜃 ~�+

𝜃 ~�+ = 𝑎𝑟𝑔𝑚𝑎𝑥 t 𝑄 𝜃~�+ 𝜃~

Inference of ongoing attack stages (runtime)Given a factor graph of an ongoing attack at runtime, inference is to determine the most likely unknown attack stage and output a confidence level for each stage.

𝑧∗ = 𝑎𝑟𝑔𝑚𝑎𝑥 � P(x, z, 𝜃)At this stage, off-the-shelf inference techniques such as Belief Propagation, Monte Carlo Markov Chain, or Variational Inference can be employed.

Determine response to the identified attack stage, e.g., zi = maintain presence is the most probable attack stage in this example.0.1

0.3

0.1

0.4

0.1

0

0.1

0.2

0.3

0.4

0.5

1 2 3 4 5

P(z)

Illustration of the proposed approach in the context of a credential stuffing attack.