stream mode algorithms and architecture for line … › media › 727982 ›...

27
Stream Mode Algorithms and Architecture for Line Speed Traffic Architecture for Line Speed Traffic Analysis Steve Liu Computer Science Department Computer Science Department Texas A&M University [email protected] 1 March 7, 2008

Upload: others

Post on 04-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Stream Mode Algorithms and Architecture for Line Speed TrafficArchitecture for Line Speed Traffic

Analysis

Steve LiuComputer Science DepartmentComputer Science Department

Texas A&M University [email protected]

1March 7, 2008

Page 2: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

BackgroundBackground• Network security solutions have broad presence in every

t k i tnetwork point– Antivirus scanner, network intrusion detection systems,

spamming filters Most solutions designed to operate at desktops or servers serve– Most solutions designed to operate at desktops or servers serve the intended purposes very well, but they are not perfect, nothing is perfect

• A DoD doctrine of defense-in-depth makes sensep– Use layers of (different) protection tools to make intrusion very

inconvenient and very expensive

• Our interest: enhance network security via a stream mode traffic analysis approach at Network Access Point (NAP) of an enterprise network

2

Page 3: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Stream Mode Traffic Analysis

• Highly concentrated traffic flow at the network i t (NAP) i id l l ti faccess point (NAP) is an ideal location for

enterprise traffic analysis– Single location to observe ingress & egress flows– Single location to observe ingress & egress flows– When the conditions are right, could even slow/stop the intrusion

packets before they spread too deep, too broad into the network

C i l t• Commercial systems– Deep Packet Inspection (DPI) engines, DAG cards

Some ir s and spamming filters at the gate a– Some virus and spamming filters at the gateway – Firewall is one of the oldest products for such

purpose

3

p p

Page 4: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Stream Mode Packet Flow Analysis

Promiscuous mode NIC card,

Packet sensor

R l

Promiscuous mode NIC card,Router feed, Libpcap,

TCPDUMP….

N-gramrules

RemoteImage src

Regular expression

src-destIP pairs URL

F t

Feature extractors HW:Bivio, Cloudshield, SW: Flex

Featureinstances

How to identify malicious traffic fromHow to identify malicious traffic from the time series of feature instances?

4Feature: Any string that fits a regular expression rule, e.g., “URL link” Feature instance: An instance of a feature, e.g., “www.cs.tamu.edu”

Page 5: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Two Key Issues: Algorithms and Resource ManagementAlgorithms and Resource Management

• Fast algorithmsFast algorithms• Efficient data structures

M ffi i iti l f t t f l d t ti– Memory efficiency critical for stateful detection • e.g., a 32 bit, y/n hash table 500MB

R l ti i t l l k• Real time vs. virtual clocks• Progressive Email Classifier (PEC) system

architecture

5

Page 6: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Email spamming is no longer just a nuisancenuisance

• Some Facts:Botnet farms can hit any target (over millions of them)– Botnet farms can hit any target (over millions of them)

– bandwidth waste (3:1 or higher)– Network resource exploit & information stealing (malware planting)– Highly effective hit and run strategy at different protocol levels (BGPHighly effective hit and run strategy at different protocol levels (BGP,

DNS, domain name, credit card fraud)• Existing anti-spamming ware

– Large number of software copies and signatures to maintaing p g– Comprehensive detection rules, but slow to respond

• Signatures management a major bottleneck– Acquisition and the deployment of signatures to numerousAcquisition and the deployment of signatures to numerous

machines– A small variation in the known signatures can easily defeat

a signature based filter

6

a signature based filter– Spammers can test their designs with anti-spamming ware

before starting the (hit and run) campaign

Page 7: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Spamming Behavior at a Glance

• Spammers do not have full freedom in launching spamming. – Follow the transport protocols to deliver messages– Messages must be perceivable and appealing to human users – Expensive to compose and personalize spamming messages:

• interactive (click my URL links) or passiveL i ld bi d ith d l d t hi h i• Low yield combined with greed lead to high spamming volumes

• Cheap to launch spamming: millions of zombie machines each send a few copieseach send a few copies – Any “hit back, interactive” method could cause severe harm to the

innocents • Summary• Summary

– Very difficult for spammers to achieve financial goals without leaving noticeable signatures, i.e. feature instances

– A challenge is how to keep up with their speed, volume, and

7

g p p p , ,diversity

Page 8: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Our Approach• Lossy detection:• Lossy detection:

– focused mainly on the major offenders– Avoid false positive

• Timely acquisition of the spamming signatures:• Timely acquisition of the spamming signatures: – features and their instances– Position the detector at the Network Access Points (NAP)

• Regular emails are expected to have white noise like• Regular emails are expected to have white noise like distributions of strings that happen to fall into the spamming feature space– Mediated delivery of bulk legitimate emailMediated delivery of bulk, legitimate email

• The content of a spamming campaign is divided into Invariants and variants – An invariant that also appears in regular emails cannot be used for filtering – For the first cut effort: URL (over 95% spamming have them)

8

Page 9: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Competitive Aging-Scoring Scheme (CASS)(CASS)

• A spamming invariant (string) is called its feature i t (FI) Th f t h iinstance (FI). The essence of our technique:– Extract FIs of emails and keep track of their occurrences. If

exceeding a threshold: an UNBE stream• In a naïve approach it takes O(1) to update the score of• In a naïve approach, it takes O(1) to update the score of

an FI, but O(N) to update ages of all other FIs– A major computing cost

• CASS:– The time-to-live of an FI is reset each time when its score is increased

by one (when a new copy arrives)– The time-to-live of all other FIs is reduced by one– New complexity: O(1) for both scoring and aging– Exceeding a threshold: black; move it to the blacklist– No further copies in a time period: white; discard the feature instance

9

No further copies in a time period: white; discard the feature instance

Page 10: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

PEC ArchitectureHash table of Known strings Email flow

Feature instanceextraction

32bit SendmailHash vsstring

Sendmail

New string

Birth&D th

Berkley DB

stringidentifiedDeath

Of strings

10Aging and scoring of unknown strings

Page 11: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Data Structure of ScoreboardData Structure of ScoreboardEntries for feature instances

Scoreboard Hit (SH) Table

Exceeds anomaly threshold (ATF)?

Scoreboard Miss (SM) Table

Exceeds miss threshold (MTF)

11Entries for feature instances

Exceeds miss threshold (MTF)

Page 12: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

An execution snapshot of b dscoreboard

HashURL : (414738(20-bit)+3724(12-bit)) HashURL : (124489(20-bit)+176(12-bit))

Current feature being processed

Active featuresArranged

MOD queue Placement

history

Entry moved to blacklist

Arranged in their ages (mod N)

Placement

The current time location

The current time location

timenewestoldest

12ATF =10, MTF =20Next feature instance

The entry [862 1822] is purged Queue size = 20

Page 13: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Testbed EnvironmentTestbed EnvironmentThree Modules included:

1. Email generation2 PEC (Bl kli t d b d)2. PEC (Blacklist and scoreboard):3. Control and visualization console

13

Page 14: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Experimental ConfigurationExperimental Configuration• Email generator: Intel P4-3.0 Windows XPg• Email Server: Xeon 3.0GHz, two single core

CPUs, Linux, Sendmail 8.14.1• Within a bin, the sender sends 2000 copies of

emails (mixed with bulks and regulars).The distribution of bulks and regulars is uniform– The distribution of bulks and regulars is uniform.

– Default Score threshold: 50– Miss table length: 2048g– The average mail size: 1.5K bytes– Email generator sends one mail per 0.088 seconds

on average14

on average.

Page 15: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Workflow of Email GenerationWorkflow of Email Generation

Linux Email Server (Sendmail)

MIME

Feature Dictionary

Emails (bulk/regular)

B lk Reg lar

U R U U ….. R

SMTP ProtocolDensity Generation

(uniform dist.)

MIME structures

BulkURL

Image Src

Regular

Bulk Regular

`

simulation parameters

Random Text MessageComposer

Spamming Keyword selection

Windows Control Console

Subject Generation

“From” Generation

15

Page 16: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Email GenerationEmail Generation• Generate bulk/regular mixed email copies by injecting g p y j g

different features, such as URL links or image sources – Can adjust density or interval time between bulk copies,

placement of variants and invariants.• According to the parsed parameters, message composer

picks the materials to generate MIME messages (bulks or regulars).g )– extracted from 2005 TREC Public Spam Corpus,

http://plg.uwaterloo.ca/~gvcormac/treccorpus/about.html– Random Text: from Internet– Keywords: User defined.

• The message composer calls an SMTP module to send the generated emails

16

the generated emails.

Page 17: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Detection Latency of Single UNBE source

•Fix threshold and age table length under different densities.

2500

•Test six different UNBE densities (50, 100, 150, 200 …, 300 UNBE messages/bin)

2000

ncy

Experimental ValueExpected Value

1000

1500

Dete

ctio

n La

ten

0

500

0 100 1 0 200 2 0 300

17

50 100 150 200 250 300

Number of messages in a bin

Page 18: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Interactive Effects UnderM l i l UNBE SMultiple UNBE Sources

• Observe the change of the detection

2500test 1

glatency of UNBE A in the tests.

• Given an UNBE source A, six tests were made where one addition UNBE source is added to the experiment at a time

1500

2000

on la

tenc

y

test 1test 2test 3test 4test 5test 6other sources

added to the experiment at a time.

• The density of A is fixed at 100 instances per bin, and the density of every remaining UNBE sources is increased from 50 to 300

0

500

1000

50 100 150 200 250 300De

tect

i

instance/bin

• Line Test2: Detection latency of UNBE A when adding 2 additional UNBE sources.

50 100 150 200 250 300

Number of messages in a bin for each non-A UNBE • Conclusion: The more UNBEs sources, the

detection latency of an UNBE decreases.

18

Page 19: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Throughput of Feature ParserThroughput of Feature Parser30

20

25

Bod

ys/s

ec

10

15

ough

put (

1000

0

5Thro

1.5K 3.0K 4.5K 6.0K 7.5K

Size of Mial Body (K Bytes)

The average Email size is from 1 5 KB to 7 5 KB and each

19

The average Email size is from 1.5 KB to 7.5 KB, and each email has 2 URLs.

Page 20: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Throughput of Scoreboard and Bl kliBlacklist

•Scoreboard: 1.2M transactions

•Blacklist: 0.9M (avg. 30 B) URLs, without including database access

800

900

1000

500

600

700

800

put (

K U

RLs

/sec

100

200

300

400

Thro

ughp

20

030 60 90 120 150

URL length (bytes)

Page 21: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Pointer TableD i th d t ti ti i d l li it d b f h h d•During the detection time window, only a limited number of hashed

values need to be tracked•Full table for 32-bit hash system takes too much space• Higher order bits used as the index, and the rest, and the rest bits g , ,maintained by a linked list (for each entry)

•If pointer table uses 20 bits for indexing, that means it has 1M entries, and age table length is 20K~70K, the maximum depth of linked list pointed by pointer table is 2linked list pointed by pointer table is 2.

•Very effective in reducing the actual space requirements, at minor cost of more search cycles

21

Page 22: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Current WorkCurrent Work

• The first generation PEC demonstrates theThe first generation PEC demonstrates the feasibility of high speed UNBE filtering– Not meant to replace existing solutions, but to p g ,

defeat major offenders (80-20 rule)• Next Step

– Packet level filtering – Handle multiple features (bad words, dirty

subnets, black lists, etc)– Integration with existing tools

22

Page 23: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

23

Page 24: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Screen Shot (4)A i O h d P kAging out an Orphaned Packet

• \\

24

Page 25: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Screenshot (7)ParsingParsing

An email message has 3 packets.Parser 1 uses DFA 0 to extract a URL link, and uses DFA 1 to extract a domain name in this email message

25

a domain name in this email message.

Page 26: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

System Performance Parameters

26

Page 27: Stream Mode Algorithms and Architecture for Line … › media › 727982 › IAP_2008-Liu.pdfPointer Table •D i th d t ti ti i d l li it d b f h h dDuring the detection time window,

Thank You!Thank You!

27