an empirical approach to modeling uncertainty in … · an empirical approach to modeling...
TRANSCRIPT
An empirical approach to modeling uncertainty in Intrusion Analysis
Xinming (Simon) Ou
Kansas State University
Joint work with S. Raj Rajagopalan at HP Labs and Sakthi Sakthivelmurugan
Departmental seminar, Computer Science at Virginia TechMarch 19, 2010
system administrator
Network Monitoring
Tools
Abnormally high traffic
TrendMicro server communicating with
known BotNet controllers
memory dump
Seemingly malicious code
modules
Found open IRC sockets with other TrendMicro servers
netflow dump
These TrendMicro Servers are certainly compromised!
2
A day in the life of a real SA
Key challenge: uncertainty in data.
An empirical approach
• In spite of the lack of theory or good tools, sysadmins are coping with attacks.
• Can we build a system that mimics what they do (for a start):– An empirical approach to Intrusion Analysis using
existing reality
• Our goal: – Provide some degree of automation in the process
3
High-confidence Conclusions with Evidence
Targeting subsequent observations
Mapping observations to their semantics
IDS alerts, netflow dump, syslog, server log …Observations
Internal model
Reasoning Engine
4
Capture Uncertainty Qualitatively
Confidence level
Uncertainty Modes
Low Possible pModerate Likely lHigh Certain c
• Arbitrarily precise quantitative measures are not meaningful in practice
• Roughly matches confidence levels practically used by practitioners
5
High-confidence Conclusions with Evidence
Targeting subsequent observations
Mapping observations to their semantics
IDS alerts, netflow dump, syslog, server log …Observations
Internal model
Reasoning Engine
6
Observation Correspondence
7
obs(anomalyHighTraffic) int(attackerNetActivity)
obs(netflowBlackListFilter(H, BlackListedIP))
obs(memoryDumpMaliciousCode(H))
obs(memoryDumpIRCSocket(H1,H2))
p
int(compromised(H))l
int(compromised(H))l
int(exchangeCtlMessage(H1,H2))l
Observations Internal conditionsmode
what you can see what you want to know
High-confidence Conclusions with Evidence
Targeting subsequent observations
Mapping observations to their semantics
IDS alerts, netflow dump, syslog, server log …Observations
Internal model
Reasoning Engine
8
Internal Model
9
Logical relation among internal conditions.
Condition 1 Condition 2Condition 1 infers Condition 2
int(compromised(H1)) int(probeOtherMachine(H1,H2))
int(sendExploit(H1,H2)) int(compromised(H2))
int(sendExploit(H1,H2))int(compromised(H2))
int(compromised(H1))int(probeOtherMachine(H1,H2))
direction of, inference mode
f,
f,
b,
b,
p
l
p
c
High-confidence Conclusions with Evidence
Targeting subsequent observations
Mapping observations to their semantics
IDS alerts, netflow dump, syslog, server log …Observations
Internal model
Reasoning Engine
10
Reasoning Methodology
11
• Simple reasoning– Observation correspondence and internal model
are inference rules
– Use inference rules on input observations to derive assertions with various levels of uncertainty
• Proof strengthening– Derive high-confidence proofs from assertions
derived from low-confidence observations
Example 1Inference through Observation Correspondence
12
int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), l )
obs(memoryDumpIRCSocket(172.16.9.20, 172.16.9.1))
obsMap
obs(memoryDumpIRCSocket(H1,H2)) int(exchangeCtlMessage(H1,H2))l
Example 2 Inference through Internal Model
13
int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), )
obs(memoryDumpIRCSocket(172.16.9.20, 172.16.9.1))
int(compromised(172.16.9.20), )l obsMap
Int rule
l
Proof Strengthening
14
Observations:
f is likely true f is likely true
O1 O2
f is certainly true
proof strengthening
O3
Proof Strengthening Example
15
int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), l )
obs(memoryDumpIRCSocket(172.16.9.20, 172.16.9.1))
int(compromised(172.16.9.20), l )obsMap
intR
obs(memoryDumpMaliciousCode(’172.16.9.20’))
int(compromised(172.16.9.20), l ) obsMap
int(compromised(172.16.9.20), ) strengthenedPf
strengthen( l, l ) = c
c
Evaluation
• Test if the empirically developed model can derive similar high-confidence trace when applied on different scenarios
• Keep the model unchanged and apply the tool to different data sets
16
SnIPS (Snort Intrusion Analysis using Proof
Strengthening) Architecture
17
Reasoning Engine
Snort alerts
(convert to tuples)
Observation Correspondence
User query, e.g.which machines are “certainly” compromised?
High-confidenceanswers with
evidence
pre-processing
Internal ModelSnort Rule Repository
Done only once
Snort rule class type
18
alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS(msg:"WEB-MISC guestbook.pl access”;uricontent:"/guestbook.pl”;classtype:attempted-recon; sid:1140;)
obsMap(obsRuleId_3615, obs(snort(’1:1140’, FromHost, ToHost)),int(probeOtherMachine(FromHost, ToHost)), ?).
Internal predicate mapped from “classtype”
Snort rule documents
19
Impact: Information gathering and system integrity compromise. Possible unauthorized administrative access to the server. Possible execution of arbitrary code of the attackers choosingin some cases.
Ease of Attack: Exploit exists
obsMap(obsRuleId_3614, obs(snort(’1:1140’, FromHost, ToHost)), int(compromised(ToHost)), p)
Hints from natural-language description of Snort rules
obsMap(obsRuleId_3615, obs(snort(’1:1140’, FromHost, ToHost)),int(probeOtherMachine(FromHost, ToHost)), ). l ?
Automatically deriving Observation Correspondence
• Snort has about 9000 rules.
• This is just a base-line and needs to be fine-tuned.
• Would make more sense for the rule writer to define the observation correspondence relation when writing a rule.
20
Internal Predicate % of rules
Mapped automatically 59%
Not mapped automatically 41%
Data set description
• Treasure Hunt (UCSB 2002) – 4hrs – Collected during a graduate class experiment– Large variety of system monitoring data: tcpdump,
sys log, apache server log etc.
• Honeypot (Purdue, 2008) – 2hrs/day over 2 months
– Collected for e-mail spam analysis project– Single host running misconfigured Squid proxy
• KSU CIS department network 2009 – 3 days– 200 machines including servers and workstations.
21
Some result from Treasure Hunt data set
22
| ?- show_trace(int(compromised(H), c)).
int(compromised(’192.168.10.90’),c) strengthenedPf
int(compromised(’192.168.10.90’), p) intRule_1
int(probeOtherMachine(’192.168.10.90’,’192.168.70.49’), p) obsRulePre_1
obs(snort(’122:1’,’192.168.10.90’,’192.168.70.49’,_h272))
int(compromised(’192.168.10.90’),l) intRule_3
int(sendExploit(’128.111.49.46’,’192.168.10.90’), l) obsRuleId_3749
obs(snort(’1:1807’,’128.111.49.46’,’192.168.10.90’,_h336))
An exploit was sent to
192.168.10.90
A probe was sent from
192.168.10.90
192.168.10.90 was certainly
compromised!
Data Reduction
Data set Duration of Network
traffic
Snort alerts pre-processed alerts
High-confidence
proofs
Treasure Hunt 4 hours 4,849,937 278 18
Honeypot 2 hrs/day for 2 months
637,564 30 8
CIS Network 3 days 1,138,572 6634 17
23
Future Work
• Continue the empirical study and improve the reasoning model
• Establish a theoretical foundation for the empirically-developed method– Modal Logic
– Bayes Theory
– Dempster-Shafer Theory
25
Related work
• Y. Zhai et al. “Reasoning about complementary intrusion evidence,” ACSAC 2004
• F. Valeur et al., “A Comprehensive Approach to Intrusion Detection Alert Correlation,” 2004
• R. Goldman and S. Harp, "Model-based Intrusion Assessment in Common Lisp", 2009
• C. Thomas and N. Balakrishnan, “Modified Evidence Theory for Performance Enhancement of Intrusion Detection Systems”, 2008
26
Summary
• Based on a true-life incident we empirically developed a logical model for handling uncertainty in intrusion analysis
• Experimental results show– Model simulates human thinking and was able to
extract high-confidence intrusion– Model empirically developed from one incident
was applicable to completely different data/scenarios
– Reduction in search space for analysis
27
Thank you
Questions?
28
Summarization
29
• Compact the information entering reasoning engine
• Group similar “internal condition” into a single “summarized internal condition”
Comparison of the three data sets
30
1
10
100
1000
10000
100000
1000000
10000000
atte
mpt
ed-a
dmin
atte
mpt
ed-d
os
atte
mpt
ed-r
econ
atte
mpt
ed-u
ser
bad-
unkn
own
defa
ult-
logi
n-at
tem
pt
mis
c-ac
tivity
mis
c-at
tack
not-
susp
icio
us
polic
y-vi
olat
ion
non-
stan
dard
-pro
toco
l
prot
ocol
-com
man
d-de
code
rpc-
port
map
-dec
ode
shel
lcod
e-de
tect
succ
essf
ul-a
dmin
succ
essf
ul-r
econ
-lim
ited
susp
icio
us-f
ilena
me-
dete
ct
syst
em-c
all-d
etec
t
troj
an-a
ctiv
ity
unkn
own
web
-app
licat
ion-
activ
ity
web
-app
licat
ion-
atta
ck
Department Treasure Hunt Honeypot