firewalls and intrusion detection systemsusers.ece.cmu.edu/~dbrumley/courses/18487-f13...firewalls...

Post on 06-Sep-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Firewalls and Intrusion Detection Systems

David Brumleydbrumley@cmu.eduCarnegie Mellon University

IDS and Firewall Goals

Expressiveness: What kinds of policies can we write?

Effectiveness: How well does it detect attacks while avoiding false positives?

Efficiency: How many resources does it take, and how quickly does it decide?

Ease of use: How much training is necessary? Can a non-security expert use it?

Security: Can the system itself be attacked?

Transparency: How intrusive is it to use?

2

Firewalls

Dimensions:

1. Host vs. Network

2. Stateless vs. Stateful

3. Network Layer

3

Firewall Goals

Provide defense in depth by:

1. Blocking attacks against hosts and services

2. Control traffic between zones of trust

4

Logical Viewpoint

5

Inside OutsideFirewall

For each message m, either:

• Allow with or without modification

• Block by dropping or sending rejection notice

• Queue

m

?

Placement

Host-based Firewall

Network-Based Firewall

6

Host Firewall Outside

Firewall OutsideHost B

Host C

Host A

Features:• Faithful to local

configuration• Travels with you

Features:• Protect whole network• Can make decisions on

all of traffic (traffic-based anomaly)

Parameters

Types of Firewalls

1. Packet Filtering

2. Stateful Inspection

3. Application proxy

Policies

1. Default allow

2. Default deny

7

Recall: Protocol Stack

8

Application(e.g., SSL)

Transport (e.g., TCP, UDP)

Network (e.g., IP)

Link Layer(e.g., ethernet)

Physical

Application message - data

TCP data TCP data TCP data

TCP Header

dataTCPIP

dataTCPIPETH ETH

Link (Ethernet)

Header

Link (Ethernet)

Trailer

IP Header

Stateless Firewall

Filter by packet header fields

1. IP Field(e.g., src, dst)

2. Protocol (e.g., TCP, UDP, ...)

3. Flags(e.g., SYN, ACK)

Application

Transport

Network

Link Layer

Firewall

Outside Inside

Example: only allow incoming DNS packets to nameserver A.A.A.A.

Allow UDP port 53 to A.A.A.ADeny UDP port 53 allFail-safe good

practice

e.g., ipchains in Linux 2.2

9

Need to keep state

10

Inside Outside

Listening

Store SNc, SNs

Wait

SNCrandC

ANC0Syn

SYN/ACK:SNSrandS

ANSSNC

Established

ACK: SNSNC+1ANSNS

Example: TCP Handshake

Firewall

Desired Policy: Every SYN/ACK must have been preceded

by a SYN

Stateful Inspection Firewall

Added state (plus obligation to manage)

– Timeouts

– Size of table

State

Application

Transport

Network

Link Layer

Outside Inside

e.g., iptables in Linux 2.4

11

Stateful More Expressive

12

Inside Outside

Listening

Store SNc, SNs

Wait

SNCrandC

ANC0Syn

SYN/ACK:SNSrandS

ANSSNC

Established

ACK: SNSNC+1ANSNS

Example: TCP Handshake

Firewall

Record SNc in table

Verify ANs in table

State Holding Attack

13

Firewall AttackerInside

SynSyn

Syn

...1. SynFlood

2. Exhaust Resources

3. Sneak Packet

Assume stateful TCP policy

Fragmentation

14

Octet 1 Octet 2 Octet 3 Octet 4

Ver IHL TOS Total Length

ID 0DF

MF

Frag ID

...

Data

Frag 1 Frag 2 Frag 3

IP Hdr DF=0 MF=1 ID=0 Frag 1

IP Hdr DF=0 MF=1 ID=n Frag 2

IP Hdr DF=1 MF=0 ID=2n Frag 3

say n bytes

DF : Don’t fragment (0 = OK, 1 = Don’t)MF: More fragements(0 = Last, 1 = More)Frag ID = Octet number

Reassembly

15

Data

Frag 1 Frag 2 Frag 3

IP Hdr DF=0 MF=1 ID=0 Frag 1

IP Hdr DF=0 MF=1 ID=n Frag 2

IP Hdr DF=1 MF=0 ID=2n Frag 3

Frag 1 Frag 2 Frag 3

0 Byte n Byte 2n

16

Octet 1 Octet 2 Octet 3 Octet 4

Source Port Destination Port

Sequence Number

....

...DF=

1MF=1 ID=0 ...

1234(src port)

80(dst port)

...Packet 1

Overlapping Fragment Attack

...DF=

1MF=1 ID=2 ... 22 ...Packet 2

1234 8022

Assume Firewall Policy: Incoming Port 80 (HTTP) Incoming Port 22 (SSH)

Bypass policyTCP Hdr(Data!)

Stateful Firewalls

Pros

• More expressive

Cons

• State-holding attack

• Mismatch between firewalls understanding of protocol and protected hosts

17

Application Firewall

Check protocol messages directly

Examples:

– SMTP virus scanner

– Proxies

– Application-level callbacks

18

State

Application

Transport

Network

Link Layer

Outside Inside

Firewall Placement

19

Demilitarized Zone (DMZ)

20

Inside OutsideFirewall

DMZ

WWW

NNTP

DNS

SMTP

Dual Firewall

21

Inside OutsideHubDMZ

InteriorFirewall

ExteriorFirewall

Design Utilities

Solsoft

Securify

22

References

Elizabeth D. Zwicky

Simon Cooper

D. Brent Chapman

William R Cheswick

Steven M Bellovin

Aviel D Rubin

23

Intrusion Detection and PrevetionSystems

24

Logical Viewpoint

25

Inside OutsideIDS/IPS

For each message m, either:

• Report m (IPS: drop or log)

• Allow m

• Queue

m

?

Overview

• Approach: Policy vs Anomaly

• Location: Network vs. Host

• Action: Detect vs. Prevent

26

Policy-Based IDS

Use pre-determined rules to detect attacks

Examples: Regular expressions (snort), Cryptographic hash (tripwire, snort)

27

Detect any fragments less than 256 bytesalert tcp any any -> any any (minfrag: 256; msg:

"Tiny fragments detected, possible hostile activity";)Detect IMAP buffer overflowalert tcp any any -> 192.168.1.0/24 143 (

content: "|90C8 C0FF FFFF|/bin/sh"; msg: "IMAP buffer overflow!”;)

Example Snort rules

Modeling System Calls [wagner&dean 2001]

28

Entry(f)Entry(g)

Exit(f)Exit(g)

open()

close()

exit()

getuid() geteuid()

f(int x) {

if(x){ getuid() } else{ geteuid();}

x++

}

g() {

fd = open("foo", O_RDONLY);

f(0); close(fd); f(1);

exit(0);

}

Execution inconsistent with automata indicates attack

Anomaly Detection

29

Distribution of “normal” events

IDS

New Event

Attack

Safe

Example: Working Sets

30

Alice

Days 1 to 300

reddit xkcd

slashdot

fark

working setof hosts

Alice

Day 300

outside working set

reddit xkcd

slashdot

fark18487

Anomaly Detection

Pros

• Does not require pre-determining policy (an “unknown” threat)

Cons

• Requires attacks are not strongly related to known traffic

• Learning distributions is hard

31

Automatically Inferring the Evolution of Malicious Activity on the Internet

David BrumleyCarnegie Mellon University

Shobha VenkataramanAT&T Research

Oliver SpatscheckAT&T Research

Subhabrata SenAT&T Research

<ip1,+> <ip2,+> <ip3,+> <ip4,->

33

Tier 1

E K

A

...

Spam Haven

Labeled IP’s from spam assassin, IDS logs, etc.

Evil is constantly on the move

Goal:Characterize regions

changing from bad to good (Δ-good) or

good to bad (Δ-bad)

Research Questions

Given a sequence of labeled IP’s

1. Can we identify the specific regions on the Internet that have changed in malice?

2. Are there regions on the Internet that change their malicious activity more frequently than others?

34

35

Spam Haven

Tier 1

Tier 1

Tier 2

D

X

Tier 2

B C

K

Per-IP often not interesting

A

... DSL CORP

A

X

Challenges

1. Infer the right granularity

E

Previous work:Fixed granularity

Per-IPGranularity

(e.g., Spamcop)

36

Spam Haven

Tier 1

Tier 1

Tier 2

D E

Tier 2

B C

W

A

... DSL CORP

A

BGPgranularity

(e.g., Network-Aware clusters [KW’00])

Challenges

1. Infer the right granularity

XX

Previous work:Fixed granularity

B C

37

Spam Haven

Tier 1

Tier 1

Tier 2

D

X

Tier 2

B C

K

Coarse granularity

A

... DSL CORP

A

K

Challenges

1. Infer the right granularity

EE CORP

Idea:Infer granularity

Medium granularity

Well-managed network: fine

granularity

38

Spam Haven

Tier 1

Tier 1

Tier 2

D E

Tier 2

B C

W

A

... DSL SMTP

Challenges

1. Infer the right granularity

2. We need onlinealgorithms

A

fixed-memory device high-speed link

X

Research Questions

Given a sequence of labeled IP’s

1. Can we identify the specific regions on the Internet that have changed in malice?

2. Are there regions on the Internet that change their malicious activity more frequently than others?

39

Δ-Change

Δ-Motion

We Present

Background

1. IP Prefix trees

2. TrackIPTree Algorithm

40

41

Spam Haven

Tier 1

Tier 1

Tier 2

D E

Tier 2

B C

W

A

... DSL CORP

A

X

1.2.3.4/32

8.1.0.0/16

IP Prefixes:i/d denotes all IP addresses

i covered by first d bits

Ex: 8.1.0.0-8.1.255.255

Ex: 1 host (all bits)

42

OneHost

WholeNet0.0.0.0/0

0.0.0.0/1 128.0.0.0/1

128.0.0.0/2 192.0.0.0/20.0.0.0/264.0.0.0.0/

2

An IP prefix tree is formed by masking each bit of an IP address.0.0.0.0/32 0.0.0.1/32

0.0.0.0/31

128.0.0.0/3 160.0.0.0/3

128.0.0.0/4 152.0.0.0/4

43

0.0.0.0/0

0.0.0.0/1 128.0.0.0/1

0.0.0.0/264.0.0.0.0/

2128.0.0.0/2 192.0.0.0/2

A k-IPTree Classifier [VBSSS’09]

is an IP tree with at most k-leaves, each leaf labeled with

good (“+) or bad (“-”).

128.0.0.0/3 160.0.0.0/3

128.0.0.0/4 152.0.0.0/4

0.0.0.0/32 0.0.0.1/32

0.0.0.0/31

6-IPTree

+ -

+ -

+

+

Ex: 64.1.1.1 is bad

Ex: 1.1.1.1 is good

44

/1

/16

/17

/18+-

-In: stream of labeled IPs

... <ip4,+> <ip3,+> <ip2,+> <ip1,->

TrackIPTree Algorithm [VBSSS’09]

Out: k-IPTree

TrackIPTree

Δ-Change Algorithm

1. Approach

2. What doesn’t work

3. Intuition

4. Our algorithm

45

46

Goal: identify online the specific regions on the Internet that have changed in malice.

/0

/1

/16

/17

/18

T1 forepoch 1

+-

+

/0

/1

/16

/17

/18

T2 forepoch 2

++

-

Δ-Bad: A change from good to bad

Δ-Good: A change from bad to good

Δ-Good: A change from bad to good

Epoch 1 IP stream s1 Epoch 2 IP stream s2 ....

47

Goal: identify online the specific regions on the Internet that have changed in malice.

/0

/1

/16

/17

/18

T1 forepoch 1

+-

+

/0

/1

/16

/17

/18

T2 forepoch 2

++

-

False positive: Misreporting that a

change occurred

False Negative: Missing a real change

48

Goal: identify online the specific regions on the Internet that have changed in malice.

Idea: divide time into epochs and diff• Use TrackIPTree on labeled IP stream s1 to learn T1

• Use TrackIPTree on labeled IP stream s2 to learn T2

• Diff T1 and T2 to find Δ-Good and Δ-Bad

/0

/1

/16

/17

/18

T1 forepoch 1

+-

-

/0

/1

/16

T2 for epoch 2

-

Different Granularities!

49

Goal: identify online the specific regions on the Internet that have changed in malice.

Δ-Change Algorithm Main Idea: Use classification errors between Ti-1 and Ti

to infer Δ-Good and Δ-Bad

50

Ti-2 Ti-1 TiTrackIPTree TrackIPTree

Si-1 Si

Fixed

Ann. with class. error

Si-1

Told,i-1

Ann. with class. error

Si

Told,i

compare (weighted)

classificationerror

(note both based on same tree)

Δ-Good and Δ-Bad

Δ-Change Algorithm

Comparing (Weighted) Classification Error

51

/16IPs: 200Acc: 40%

IPs: 150Acc: 90%

IPs: 110Acc: 95%

Told,i-1

IPs: 40Acc: 80%

IPs: 50Acc: 30%

/16IPs: 170Acc: 13%

IPs: 100Acc: 10%

IPs: 80Acc: 5%

Told,i

IPs: 20Acc: 20%

IPs: 70Acc: 20%

Δ-Change Somewhere

Comparing (Weighted) Classification Error

52

/16IPs: 200Acc: 40%

IPs: 150Acc: 90%

IPs: 110Acc: 95%

Told,i-1

IPs: 40Acc: 80%

IPs: 50Acc: 30%

/16IPs: 170Acc: 13%

IPs: 100Acc: 10%

IPs: 80Acc: 5%

Told,i

IPs: 20Acc: 20%

IPs: 70Acc: 20%

Insufficient Change

Comparing (Weighted) Classification Error

53

/16IPs: 200Acc: 40%

IPs: 150Acc: 90%

IPs: 110Acc: 95%

Told,i-1

IPs: 40Acc: 80%

IPs: 50Acc: 30%

/16IPs: 170Acc: 13%

IPs: 100Acc: 10%

IPs: 80Acc: 5%

Told,i

IPs: 20Acc: 20%

IPs: 70Acc: 20%

Insufficient Traffic

Comparing (Weighted) Classification Error

54

/16IPs: 200Acc: 40%

IPs: 150Acc: 90%

IPs: 110Acc: 95%

Told,i-1

IPs: 40Acc: 80%

IPs: 50Acc: 30%

/16IPs: 170Acc: 13%

IPs: 100Acc: 10%

IPs: 80Acc: 5%

Told,i

IPs: 20Acc: 20%

IPs: 70Acc: 20%

Δ-Change Localized

Evaluation

1. What are the performance characteristics?

2. Are we better than previous work?

3. Do we find cool things?

55

56

Performance

In our experiments, we :

– let k=100,000 (k-IPTree size)

– processed 30-35 million IPs (one day’s traffic)

– using a 2.4 Ghz Processor

Identified Δ-Good and Δ-Bad in <22 min using <3MB memory

57

changes. Our experiments were run on a on a 2.4GHz

Sparc64-VI core. Our current (unoptimized) implementa-

tion takes 20-22 minutes to process a day’s trace (around

30-35 million IP addresses) and requires less than 2-3 MB

of memory storage.

We note that the ground truth in our data provides labels

for the individual IP addresses, but does not tell us the pre-

fixes that have changed. Thus, our ground truth allows us to

confirm that the learned IPTree has high accuracy, but we

cannot directly measure false positive rate and false nega-

tive rate of the change-detection algorithms. Thus, our ex-

perimental results instead demonstrate that our algorithm

can find small changes in prefix behaviour very early on real

data, and can do so substantially better than competing ap-

proaches. Our operators were previously unaware of most

of these ∆ -change prefixes, and as a consequence, our sum-

marization makes it easy for operators to both note changes

in behaviour of specific entities, as well as observe trends in

malicious activity. 7

4.1 Comparisons with Alternate ApproachesWe first compare ∆ -Change with previous approaches

and direct extensions to previous work. We compare two

different possible alternate approaches with ∆ -Change: (1)

using a fixed set of network-based prefixes (i.e., network-

aware clusters, see Sec. 2.2) instead of a customized IP-

Tree, (2) directly differencing the IPTrees instead of using

∆ -Change. We focus here on only spam data for space rea-

sons.

Network-aware Clusters. As we described in Sec-

tion 3.2, our change-detection approach has no false pos-

itives – every change we find will indeed be a change in

the input data stream. Thus, we only need to demonstrate

that ∆ -Change finds substantially more ∆ -changes than

network-aware clusters (i.e., has a lower false negative rate),

and therefore, is superior at summarizing changes in mali-

cious activity to the appropriate prefixes for operator atten-

tion.

We follow the methodology of [29] for labeling the

prefixes of the network-aware clusters optimally (i.e., we

choose the labeling that minimizes errors), so that we can

test the best possible performance of network-aware clus-

ters against ∆ -Change. We do this allowing the network-

aware clusters multiple passes over the IP addresses (even

though ∆ -Change is allowed only a single pass), as detailed

in [29]. We then use these clusters in place of the learned

IPTree in our change-detection algorithms.

We first compare ∆ -change prefixes identified by the

network-aware clustering and ∆ -Change. This compari-

son cannot be directly on the prefixes output by the two ap-

7As discussed in Section 1, our evaluation focuses exclusively on

changes in prefix behaviour, since prior work [28, 29] already finds per-

sistent malicious behaviour.

15 20 25 30 350

20

40

60

80

100

120

Interval in Days

No

. o

f D−

ch

an

ge

Pre

fixes

D−Change

Network−aware

(a) ∆ -change Prefixes

15 20 25 30 3510

3

104

105

106

Interval in Days

IPs

in D−

ch

an

ge

Pre

fix

es

D−Change

Network−aware

(b) IPs in ∆ -change prefixes

Figure 11. Comparing ∆ -Change algorithm with

network-aware clusters on the spam data: ∆ -Change

always finds more prefixes and covers more IPs

proaches, as slightly different prefixes may reflect the same

underlying change in the data stream, e.g., network-aware

clusters might identify a /24 while ∆ -Change identifies a

/25. In order to account for such differences, we group

together prefixes into distinct subtrees, and match a group

from the network-aware clustering to the appropriate group

from ∆ -Change if at least 50% of the volume of changed

IPs in network-aware clustering was accounted for in ∆ -

Change. In our results, network-aware clustering identified

no∆ -change prefixes that were not identified by ∆ -Change;

otherwise, we would have do the reverse matching as well.

Furthermore, this is what allows us to compare the num-

ber of ∆ -changes that were identified by both algorithms,

otherwise we would not be able to make this comparison.

Fig. 11(a) shows the results of our comparison for 37

days. Network-aware clustering typically finds only a small

fraction of the ∆ -change prefixes discovered by ∆ -Change,

ranging from 10% − 50%. On average, ∆ -Change finds

over 2.5 times as many∆ -change prefixes as network-aware

clusters. We compare also the number of IPs in ∆ -change

prefixes identified by the network-aware clustering and ∆ -

Change in Fig. 11(b). The ∆ -change prefixes discovered

by ∆ -Change typically account for a factor of 3-5× IP ad-

dresses as those discovered by the network-aware cluster-

ing. It indicates that network-aware clustering does not dis-

cover many changes that involve a substantial volume of the

input data. On many days, especially on days with changes,

the fraction of IP addresses not identified by network-aware

clusters, however, is still smaller than the fraction of pre-

fixes that it does not identify. This indicates that network-

aware clustering identifies the larger, coarser changes, but

misses the fine-grained changes.

Network-aware clusters perform so poorly because the

prefix granularity required to identify ∆ -changes typically

does not appear at all in routing tables. Indeed, as our anal-

ysis in Section 4.2 shows, a large number of ∆ -change pre-

fixes come from hosting providers, many of which do not

even appear in BGP prefix tables.

Possible Differencing of IPTrees. We now show that

the possible differencing approach described in Section 2.2

2.5x as many changes on average!

How do we compare to network-aware clusters?(By Prefix)

Spam

58

Grum botnet takedown

59

Botnets22.1 and 28.6 thousand new

DNSChanger bots appeared

38.6 thousand new Confickerand Sality bots

Caveats and Future Work

“For any distribution on which an ML algorithm works well, there is another on which is works poorly.”

– The “No Free Lunch” Theorem

60

Our algorithm is efficient and works well in practice.

....but a very powerful adversarycould fool it into having many false negatives. A formal characterizationis future work.

Detection Theory

Base Rate, fallacies, and detection systems

61

62

Let Ω be the set of all possible events. For example:

• Audit records produced on a host• Network packets seen

Ω

63

Ω

I

Set of intrusion events I

Intrusion Rate:

Example: IDS Received 1,000,000 packets. 20 of them corresponded to an intrusion.The intrusion rate Pr[I] is:Pr[I] = 20/1,000,000 = .00002

64

Ω

I A

Set of alerts A

Alert Rate:

Defn: Sound

65

Ω

I

A

Defn: Complete

66

Ω

I A

Defn: False PositiveDefn: False Negative

Defn: True Positive

Defn: True Negative

67

Ω

I A

Defn: Detection rate

Think of the detection rate as the set ofintrusions raising an alert normalized by the set of all intrusions.

70

Ω

I A

18 4

2

72

Ω

I A

Think of the Bayesian detection rate as the set of intrusions raising an alert normalized by the set of all alerts. (vs. detection ratewhich normalizes on intrusions.)

Defn: Bayesian Detection rateCrux of IDS usefulness!

74

Ω

I A2

4

18

About 18% of all alerts are false positives!

Challenge

We’re often given the detection rate and know the intrusion rate, and want to calculate the Bayesian detection rate

– 99% accurate medical test

– 99% accurate IDS

– 99% accurate test for deception

– ...

75

Fact:

76

Proof:

Calculating Bayesian Detection Rate

Fact:

So to calculate the Bayesian detection rate:

One way is to compute:

77

Example

• 1,000 people in the city

• 1 is a terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000

• Suppose we have a new terrorist facial recognition system that is 99% accurate.– 99/100 times when

someone is a terrorist there is an alarm

– For every 100 good guys, the alarm only goes off once.

• An alarm went off. Is the suspect really a terrorist?

78

City

(this times 10)

Example

Answer: The facial recognition system is 99% accurate. That means there is only a 1% chance the guy is not the terrorist.

79

(this times 10)

City

Formalization

• 1 is terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000. P[T] = 0.001

• 99/100 times when someone is a terrorist there is an alarm.P[A|T] = .99

• For every 100 good guys, the alarm only goes off once.P[A | not T] = .01

• Want to know P[T|A]

80

City

(this times 10)

• 1 is terrorists, and we have their pictures. Thus the base rate of terrorists is 1/1000. P[T] = 0.001

• 99/100 times when someone is a terrorist there is an alarm.P[A|T] = .99

• For every 100 good guys, the alarm only goes off once.P[A | not T] = .01

• Want to know P[T|A]

81

City

(this times 10)

Intuition: Given 999 good guys, we have 999*.01 ≈ 9-10 false alarms

False alarms

Guesses?

82

Unknown

Unknown

Recall to get Pr[A]

Fact:

83

Proof:

..and to get Pr[A∩ I]

Fact:

85

Proof:

86

87

For IDS

Let – I be an intrusion,

A an alert from the IDS

– 1,000,000 msgsper day processed

– 2 attacks per day

– 10 attacks per message

88

False positives

False positives

True positives

70% detection requires

FP < 1/100,000

80% detection generates 40% FP

From Axelsson, RAID 99

Conclusion

• Firewalls

– 3 types: Packet filtering, Stateful, and Application

– Placement and DMZ

• IDS

– Anomaly vs. policy-based detection

• Detection theory

– Base rate fallacy

89

90

Questions?

END

92

Thought

Overview

• Approach: Policy vs Anomaly

• Location: Network vs. Host

• Action: Detect vs. Prevent

93

Type Example

Host, Rule, IDS Tripwire

Host, Rule, IPS Personal Firewall

Net, Rule, IDS Snort

Net, Rule, IPS Network firewall

Host, Anomaly, IDS System call monitoring

Host, Anomaly, IPS NX Bit?

Net, Anomaly, IDS Working set of connections

Net, Anomaly, IPS

top related