an empirical approach to modeling uncertainty in … · an empirical approach to modeling...

An empirical approach to modeling uncertainty in Intrusion Analysis

Xinming (Simon) Ou

Kansas State University

Joint work with S. Raj Rajagopalan at HP Labs and Sakthi Sakthivelmurugan

Departmental seminar, Computer Science at Virginia TechMarch 19, 2010

system administrator

Network Monitoring

Tools

Abnormally high traffic

TrendMicro server communicating with

known BotNet controllers

memory dump

Seemingly malicious code

modules

Found open IRC sockets with other TrendMicro servers

netflow dump

These TrendMicro Servers are certainly compromised!

2

A day in the life of a real SA

Key challenge: uncertainty in data.

An empirical approach

• In spite of the lack of theory or good tools, sysadmins are coping with attacks.

• Can we build a system that mimics what they do (for a start):– An empirical approach to Intrusion Analysis using

existing reality

• Our goal: – Provide some degree of automation in the process

3

High-confidence Conclusions with Evidence

Targeting subsequent observations

Mapping observations to their semantics

IDS alerts, netflow dump, syslog, server log …Observations

Internal model

Reasoning Engine

4

Capture Uncertainty Qualitatively

Confidence level

Uncertainty Modes

Low Possible pModerate Likely lHigh Certain c

• Arbitrarily precise quantitative measures are not meaningful in practice

• Roughly matches confidence levels practically used by practitioners

5





Internal model

Reasoning Engine

6

Observation Correspondence

7

obs(anomalyHighTraffic) int(attackerNetActivity)

obs(netflowBlackListFilter(H, BlackListedIP))

obs(memoryDumpMaliciousCode(H))

obs(memoryDumpIRCSocket(H1,H2))

p

int(compromised(H))l

int(compromised(H))l

int(exchangeCtlMessage(H1,H2))l

Observations Internal conditionsmode

what you can see what you want to know





Internal model

Reasoning Engine

8

Internal Model

9

Logical relation among internal conditions.

Condition 1 Condition 2Condition 1 infers Condition 2

int(compromised(H1)) int(probeOtherMachine(H1,H2))

int(sendExploit(H1,H2)) int(compromised(H2))

int(sendExploit(H1,H2))int(compromised(H2))

int(compromised(H1))int(probeOtherMachine(H1,H2))

direction of, inference mode

f,

f,

b,

b,

p

l

p

c





Internal model

Reasoning Engine

10

Reasoning Methodology

11

• Simple reasoning– Observation correspondence and internal model

are inference rules

– Use inference rules on input observations to derive assertions with various levels of uncertainty

• Proof strengthening– Derive high-confidence proofs from assertions

derived from low-confidence observations

Example 1Inference through Observation Correspondence

12

int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), l )

obs(memoryDumpIRCSocket(172.16.9.20, 172.16.9.1))

obsMap

obs(memoryDumpIRCSocket(H1,H2)) int(exchangeCtlMessage(H1,H2))l

Example 2 Inference through Internal Model

13

int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), )


int(compromised(172.16.9.20), )l obsMap

Int rule

l

Proof Strengthening

14

Observations:

f is likely true f is likely true

O1 O2

f is certainly true

proof strengthening

O3

Proof Strengthening Example

15

int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), l )


int(compromised(172.16.9.20), l )obsMap

intR

obs(memoryDumpMaliciousCode(’172.16.9.20’))

int(compromised(172.16.9.20), l ) obsMap

int(compromised(172.16.9.20), ) strengthenedPf

strengthen( l, l ) = c

c

Evaluation

• Test if the empirically developed model can derive similar high-confidence trace when applied on different scenarios

• Keep the model unchanged and apply the tool to different data sets

16

SnIPS (Snort Intrusion Analysis using Proof

Strengthening) Architecture

17

Reasoning Engine

Snort alerts

(convert to tuples)

Observation Correspondence

User query, e.g.which machines are “certainly” compromised?

High-confidenceanswers with

evidence

pre-processing

Internal ModelSnort Rule Repository

Done only once

Snort rule class type

18

alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS(msg:"WEB-MISC guestbook.pl access”;uricontent:"/guestbook.pl”;classtype:attempted-recon; sid:1140;)

obsMap(obsRuleId_3615, obs(snort(’1:1140’, FromHost, ToHost)),int(probeOtherMachine(FromHost, ToHost)), ?).

Internal predicate mapped from “classtype”

Snort rule documents

19

Impact: Information gathering and system integrity compromise. Possible unauthorized administrative access to the server. Possible execution of arbitrary code of the attackers choosingin some cases.

Ease of Attack: Exploit exists

obsMap(obsRuleId_3614, obs(snort(’1:1140’, FromHost, ToHost)), int(compromised(ToHost)), p)

Hints from natural-language description of Snort rules

obsMap(obsRuleId_3615, obs(snort(’1:1140’, FromHost, ToHost)),int(probeOtherMachine(FromHost, ToHost)), ). l ?

Automatically deriving Observation Correspondence

• Snort has about 9000 rules.

• This is just a base-line and needs to be fine-tuned.

• Would make more sense for the rule writer to define the observation correspondence relation when writing a rule.

20

Internal Predicate % of rules

Mapped automatically 59%

Not mapped automatically 41%

Data set description

• Treasure Hunt (UCSB 2002) – 4hrs – Collected during a graduate class experiment– Large variety of system monitoring data: tcpdump,

sys log, apache server log etc.

• Honeypot (Purdue, 2008) – 2hrs/day over 2 months

– Collected for e-mail spam analysis project– Single host running misconfigured Squid proxy

• KSU CIS department network 2009 – 3 days– 200 machines including servers and workstations.

21

Some result from Treasure Hunt data set

22

| ?- show_trace(int(compromised(H), c)).

int(compromised(’192.168.10.90’),c) strengthenedPf

int(compromised(’192.168.10.90’), p) intRule_1

int(probeOtherMachine(’192.168.10.90’,’192.168.70.49’), p) obsRulePre_1

obs(snort(’122:1’,’192.168.10.90’,’192.168.70.49’,_h272))

int(compromised(’192.168.10.90’),l) intRule_3

int(sendExploit(’128.111.49.46’,’192.168.10.90’), l) obsRuleId_3749

obs(snort(’1:1807’,’128.111.49.46’,’192.168.10.90’,_h336))

An exploit was sent to

192.168.10.90

A probe was sent from

192.168.10.90

192.168.10.90 was certainly

compromised!

Data Reduction

Data set Duration of Network

traffic

Snort alerts pre-processed alerts

High-confidence

proofs

Treasure Hunt 4 hours 4,849,937 278 18

Honeypot 2 hrs/day for 2 months

637,564 30 8

CIS Network 3 days 1,138,572 6634 17

23

Future Work

• Continue the empirical study and improve the reasoning model

• Establish a theoretical foundation for the empirically-developed method– Modal Logic

– Bayes Theory

– Dempster-Shafer Theory

25

Related work

• Y. Zhai et al. “Reasoning about complementary intrusion evidence,” ACSAC 2004

• F. Valeur et al., “A Comprehensive Approach to Intrusion Detection Alert Correlation,” 2004

• R. Goldman and S. Harp, "Model-based Intrusion Assessment in Common Lisp", 2009

• C. Thomas and N. Balakrishnan, “Modified Evidence Theory for Performance Enhancement of Intrusion Detection Systems”, 2008

26

Summary

• Based on a true-life incident we empirically developed a logical model for handling uncertainty in intrusion analysis

• Experimental results show– Model simulates human thinking and was able to

extract high-confidence intrusion– Model empirically developed from one incident

was applicable to completely different data/scenarios

– Reduction in search space for analysis

27

Thank you

Questions?

28

Summarization

29

• Compact the information entering reasoning engine

• Group similar “internal condition” into a single “summarized internal condition”

Comparison of the three data sets

30

1

10

100

1000

10000

100000

1000000

10000000

atte

mpt

ed-a

dmin

atte

mpt

ed-d

os

atte

mpt

ed-r

econ

atte

mpt

ed-u

ser

bad-

unkn

own

defa

ult-

logi

n-at

tem

pt

mis

c-ac

tivity

mis

c-at

tack

not-

susp

icio

us

polic

y-vi

olat

ion

non-

stan

dard

-pro

toco

l

prot

ocol

-com

man

d-de

code

rpc-

port

map

-dec

ode

shel

lcod

e-de

tect

succ

essf

ul-a

dmin

succ

essf

ul-r

econ

-lim

ited

susp

icio

us-f

ilena

me-

dete

ct

syst

em-c

all-d

etec

t

troj

an-a

ctiv

ity

unkn

own

web

-app

licat

ion-

activ

ity

web

-app

licat

ion-

atta

ck

Department Treasure Hunt Honeypot

an empirical approach to modeling uncertainty in … · an empirical approach to modeling...

Documents