an empirical approach to modeling uncertainty in intrusion analysis xinming (simon) ou 1 s. raj...

An empirical approach to modeling uncertainty in Intrusion Analysis

Xinming (Simon) Ou1

S. Raj Rajagopalan2

Sakthi Sakthivelmurugan1

1 – Kansas State University, Manhattan, KS2 – HP Labs, Princeton, NJ

system administrator

Network Monitoring

Tools

Abnormally high traffic

TrendMicro server communicating

with known BotNet controllers

memory dump

Seemingly malicious

code modules

Found open IRC sockets with other TrendMicro servers

netflow dump

These TrendMicro Servers are certainly compromised!

2

A day in the life of a real SA

Key challenge How to deal with uncertainty in

intrusion analysis?

An empirical approach

• In spite of the lack of theory or good tools, sysadmins are coping with attacks.

• Can we build a system that mimics what they do (for a start):– An empirical approach to Intrusion Analysis using

existing reality• Our goal: – Help a sysadmin do a better job rather than

replace him

3

High-confidence Conclusions with Evidence

Targeting subsequent observations

Mapping observations to their semantics

IDS alerts, netflow dump, syslog, server log …

Observations

Internal model

Reasoning Engine

4

Capture Uncertainty Qualitatively

Confidence level

Uncertainty Modes

Low Possible pModerate Likely lHigh Certain c

• Arbitrarily precise quantitative measures are not meaningful in practice

• Roughly matches confidence levels practically used by practitioners

5





Observations

Internal model

Reasoning Engine

6

Observation Correspondence

7

obs(anomalyHighTraffic) int(attackerNetActivity)

obs(netflowBlackListFilter(H, BlackListedIP))

obs(memoryDumpMaliciousCode(H))

obs(memoryDumpIRCSocket(H1,H2))

p

int(compromised(H))l

int(compromised(H))l

int(exchangeCtlMessage(H1,H2))l

Observations Internal conditionsmode

what you can see

what you want to know





Observations

Internal model

Reasoning Engine

8

Internal Model

9

Logical relation among internal conditions

Condition 1 Condition 2Condition 1 infers Condition 2

int(compromised(H1)) int(probeOtherMachine(H1,H2))

int(sendExploit(H1,H2)) int(compromised(H2))

int(sendExploit(H1,H2))int(compromised(H2))

int(compromised(H1))int(probeOtherMachine(H1,H2))

direction of, inference

mode

f,

f,

b,

b,

p

l

p

c





Observations

Internal model

Reasoning Engine

10

11

Reasoning Methodology

• Simple reasoning– Observation correspondence and internal model

are inference rules– Use inference rules on input observations to

derive assertions with various levels of uncertainty • Proof strengthening– Derive high-confidence proofs from assertions

derived from low-confidence observations

Example 1 Observation Correspondence

12

int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), l )

obs(memoryDumpIRCSocket(172.16.9.20, 172.16.9.1))

obsMap

obs(memoryDumpIRCSocket(H1,H2)) int(exchangeCtlMessage(H1,H2))l

Example 2 Internal Model

13

int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), )


int(compromised(172.16.9.20), )l obsMap

Int rule

l

Proof Strengthening

14

Observations:

f is likely true f is likely true

O1 O2

f is certainly true

proof strengthening

O3

15

Proof Strengthening

A

A

Al

l

c

p

strengthen

Proof Strengthening

16

int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), l )


int(compromised(172.16.9.20), l )obsMap

intR

obs(memoryDumpMaliciousCode(’172.16.9.20’))

int(compromised(172.16.9.20), l ) obsMap

int(compromised(172.16.9.20), ) strengthenedPf

strengthen( l, l ) = c

c

Evaluation Methodology

• Test if the empirically developed model can derive similar high-confidence trace when applied on different scenarios

• Keep the model unchanged and apply the tool to different data sets

17

SnIPS (Snort Intrusion Analysis using Proof Strengthening) Architecture

18

Reasoning Engine

Snort alerts

(convert to tuples)

Observation Correspondence

User query, e.g. which machines are “certainly” compromised?

High-confidence answers with

evidence

pre-processing

Internal ModelSnort Rule Repository

Done only once

Snort rule class type

19

alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS(msg:"WEB-MISC guestbook.pl access”;uricontent:"/guestbook.pl”;classtype:attempted-recon; sid:1140;)

obsMap(obsRuleId_3615, obs(snort(’1:1140’, FromHost, ToHost)),

int(probeOtherMachine(FromHost, ToHost)), ?).

Internal predicate mapped from “classtype”

Snort rule documents

20

Impact: Information gathering and system integrity compromise. Possible unauthorized administrative access to the server. Possible execution of arbitrary code of the attackers choosingin some cases.

Ease of Attack: Exploits exists

obsMap(obsRuleId_3614, obs(snort(’1:1140’, FromHost, ToHost)),

int(compromised(ToHost)), p)

Hints from natural-language description of Snort rules

obsMap(obsRuleId_3615, obs(snort(’1:1140’, FromHost, ToHost)),int(probeOtherMachine(FromHost, ToHost)), ).l ?

Automatically deriving Observation Correspondence

• Snort has about 9000 rules.• This is just a base-line and needs to be fine-tuned.• Would make more sense for the rule writer to define

the observation correspondence relation when writing a rule

21

Internal Predicate % of rules

Mapped automatically 59%

Not mapped automatically 41%

22

Data set description

• Treasure Hunt (UCSB 2002) – 4hrs – Collected during a graduate class experiment– Large variety of system monitoring data: tcpdump,

sys log, apache server log etc.• Honeypot (Purdue, 2008) – 2hrs/day over 2 months

– Collected for e-mail spam analysis project– Single host running misconfigured Squid proxy

• KSU CIS department network 2009 – 3 days– 200 machines including servers and workstations.

Some result from Treasure Hunt data set

23

| ?- show_trace(int(compromised(H), c)). int(compromised(’192.168.10.90’),c) strengthenedPf

int(compromised(’192.168.10.90’), p) intRule_1

int(probeOtherMachine(’192.168.10.90’,’192.168.70.49’), p) obsRulePre_1

obs(snort(’122:1’,’192.168.10.90’,’192.168.70.49’,_h272))

int(compromised(’192.168.10.90’),l) intRule_3

int(sendExploit(’128.111.49.46’,’192.168.10.90’), l) obsRuleId_3749

obs(snort(’1:1807’,’128.111.49.46’,’192.168.10.90’,_h336))

An exploit was sent to

192.168.10.90

A probe was sent from

192.168.10.90

192.168.10.90 was certainly

compromised!

Data Reduction

Data set Duration of Network

traffic

Snort alerts pre-processed alerts

High-confidence

proofsTreasure Hunt 4 hours 4,849,937 278 18

Honeypot 2 hrs/day for 2 months

637,564 30 8

CIS Network 3 days 1,138,572 6634 17

24

25

Related work

• Y. Zhai et al. “Reasoning about complementary intrusion evidence,” ACSAC 2004

• F. Valeur et al., “A Comprehensive Approach to Intrusion Detection Alert Correlation,” 2004

• Goldman and Harp, "Model-based Intrusion Assessment in Common Lisp", 2009

• C. Thomas and N. Balakrishnan, “Modified Evidence Theory for Performance Enhancement of Intrusion Detection Systems”, 2008

Summary

• Based on a true-life incident we empirically developed a logical model for handling uncertainty in intrusion analysis

• Experimental results show– Model simulates human thinking and was able to

extract high-confidence intrusion– Model empirically developed from one incident was

applicable to completely different data/scenarios – Reduction in search space for analysis

26

27

Future Work

• Continue the empirical study and improve the current implementation

• Establishing a theoretical foundation for the empirically-developed method– Modal logic– Dempster-Shafer Theory– Bayes Theory

Thank you

Questions?

28

[email protected]@ksu.edu

29

Summarization

• Compact the information entering reasoning engine

• Group similar “internal condition” into a single “summarized internal condition”

Comparison of the three data sets

30

atte

mpt

ed-a

dmin

atte

mpt

ed-d

os

atte

mpt

ed-r

econ

atte

mpt

ed-u

ser

bad

-unk

now

n

def

ault-

logi

n-att

empt

misc

-acti

vity

misc

-atta

ck

not

-sus

picio

us

pol

icy-v

iola

tion

non-

stan

dard

-pro

toco

l

pro

toco

l-com

man

d-de

code

rpc-

port

map

-dec

ode

shel

lcode

-det

ect

succ

essf

ul-a

dmin

succ

essf

ul-r

econ

-lim

ited

susp

iciou

s-fil

enam

e-de

tect

syst

em-c

all-d

etec

t

troj

an-a

ctivi

ty

unk

now

n

web

-app

licati

on-a

ctivi

ty

web

-app

licati

on-a

ttack

1

10

100

1000

10000

100000

1000000

10000000

Department Treasure Hunt Honeypot

31

Output from CISint(compromised('129.130.11.69'),c) strengthenedPf

int(compromised('129.130.11.69'),l) intRule_1b int(probeOtherMachine('129.130.11.69','129.130.11.12'),l) sumFact summarized(86)

int(compromised('129.130.11.69'),l) intRule_3f int(sendExploit('129.130.11.22','129.130.11.69'),c) strengthenedPf int(sendExploit('129.130.11.22','129.130.11.69'),l,) sumFact summarized(109) int(skol(sendExploit('129.130.11.22','129.130.11.69')),p) IR_3b int(compromised('129.130.11.69'),p) sumFact summarized(324)

an empirical approach to modeling uncertainty in intrusion analysis xinming (simon) ou 1 s. raj...

Documents

input observations

lowconfidence observations

nj slide

highconfidence conclusions

p c slide

netflow dump

confidence levels

highconfidence proofs