click to add text automated worm fingerprinting sumeet singh, cristian estan, george varghese and...

36
Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering University of California, San Diego Presented at : Operating System Design & Implementation (OSDI) 2004 Ramanarayanan Ramani (Ram)

Upload: lindsey-robertson

Post on 01-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Automated Worm Fingerprinting

Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage

Department of Computer Science and EngineeringUniversity of California, San Diego

Presented at : Operating System Design & Implementation (OSDI) 2004

Ramanarayanan Ramani (Ram)

Page 2: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Overview Why Automated Systems Detecting Worms Characterize Worms Worm Containment Worm Behavior Identify Worm Signatures Earlybird System Design Statistics Conclusion

Page 3: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Why Automated Systems Identify worm – Manually

characterize Signature – Update Antivirus & Network filters

Code Red worm took 14 hours to infect

Slammer took 10 minutes – no time to manually identify signature

Need automatic worm signature identification & secure networks

Page 4: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Detecting Worms

Network Telescopes : Monitor request to large unused, yet routable address space

Can Identify random scan worms Cannot identify Hit-list or Email

worms Cannot characterize the signature

Page 5: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Detecting Worms Using Honeypots Not allow any malicious incoming traffic Unwanted outgoing traffic : may be due

to worm : identify malicious code performing this

Use malicious code to identify signature

Takes long time & requires manual signature identification

Page 6: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Detecting Worms

Host-based behavioral detection Analyze patterns of system calls. (e.g.) Route Received packet to be

sent Identify suspicious activity Expensive to manage Needs to employed in every system

separately

Page 7: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Characterize Worm

Characterization is the process of analyzing and identifying a new worm or exploit

Create a priori vulnerability signatures

Can only be applied to vulnerabilities that are already well-known and well-characterized manually

Page 8: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Characterize Worm First Automated System : Used by

IBM for virus Allow to infect “Decoy” programs Identify invariant strings in Infected

objects to characterize viruses Assumes the presence of a known

instance of a virus and a controlled environment to monitor

Page 9: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Characterize Worm Honeycomb system of Kreibich and

Crowcroft Host-based intrusion detection

system Automatically generates signatures

by looking for longest common subsequences among sets of strings found in message exchanges.

Very slow

Page 10: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Characterize Worm Kim and Karp's Autograph system Autograph also uses network-level data to

infer worm signatures Employ Rabin fingerprints to index counters of

content substrings Use white-lists to set aside well known false

positives Has extensive support for distributed

deployments Relies on a pre filtering step that identifies

flows with suspicious scanning activity

Page 11: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Worm Containment

Mechanism used to slow or stop the spread of an active worm

Host quarantine String-matching Connection throttling

Page 12: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Worm Behavior

Behave quite differently from the popular client-server and peer-to-peer applications

Have some common behavior patterns across worms – useful to identify and characterize them

Page 13: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Worm Behavior Content invariance Some or all of the worm program is

invariant across every copy Some worms make use of limited

polymorphism - encrypting each worm instance independently and/or randomizing filler text

But still some portion is invariant

Page 14: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Worm Behavior

Content prevalence Worms are designed foremost to

spread - the invariant portion of a worm's content will appear frequently on the network as it spreads or attempts to spread

Page 15: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Worm Behavior

Address dispersion Packets containing a live worm will

tend to reflect a variety of different source and destination addresses

This range increases when there is a major outbreak

Page 16: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Identify Worm SignaturesProcessTrafc(payload,srcIP,dstIP)1 prevalence[payload]++2 Insert(srcIP,dispersion[payload].sources)3 Insert(dstIP,dispersion[payload].dests)4 if (prevalence[payload]>PrevalenceTh5 and size(dispersion[payload].sources)>SrcDispTh6 and size(dispersion[payload].dests)>DstDispTh)7 if (payload in knownSignatures)8 return9 endif10 Insert(payload,knownSignatures)11 NewSignatureAlarm(payload)12 endif

Page 17: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Identify Worm Signature

This method is called Content Sifting

Too much data to be handled in high speed networks

Too many substrings need to be stored

Too much time taken to process one packet

Page 18: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Earlybird System Design

Scan network & process packets Identify repeating substrings along

with list of the source & destination If repetition is over threshold, set

substring to be signature & ask network security system to block packets with respective signature

Page 19: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Estimate Content prevalence Finding the packet payloads that appear at least x times among the N

packets sent during a given interval Uses multi-stage filters with conservative update to dramatically

reduce the memory footprint of the problem

Append the destination port and protocol to the content before hashing

Detecting repeating strings with a small fixed length B Compute a variant of Rabin fingerprints for all possible substrings of a

certain length Each packet with a payload of s bytes has s - B +1 strings of length ,

so the memory references used per packet – very high

Page 20: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Estimating address dispersion Address dispersion is critical for avoiding false

positives Count the distinct source IP addresses and

destination IP addresses associated with each piece of content suspected of being generated by a worm

Use approximate counting of distinct addresses using Bitmaps

Direct Bitmaps : 32-bits. Hash Addresses to One bit and set that bit

For a threshold of 30 distinct addresses – 20 bits set

Ability to estimate the actual values of each counter is less

Page 21: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Estimating address dispersion Earlybird technique – Scaled Bitmaps Accurately estimates address dispersion using

five times less memory Sub-sampling the range of the hash space (e.g.) To count up to 64 sources using 32 bits,

one might hash sources into a space from 0 to 63 yet only set bits for values that hash between 0 and 31 - ignoring half of the sources

We track a continuously increasing count by simply increasing this scaling factor whenever the bitmap is filled

Page 22: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Estimating address dispersion

Once the bitmap is scaled to a new configuration, the addresses that were active throughout the previous configuration are lost and adjusting for this bias directly can lead to double counting

So we use multiple bitmaps to store history – here we use 3

Page 23: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Estimating address dispersion

Page 24: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Estimating address dispersionUpdateBitmap(IP)1 code = Hash(IP)2 level = CountLeadingZeroes(code)3 bitcode = FirstBits(code << (level+1))4 if (level base and level < base+numbmps)5 SetBit(bitcode,bitmaps[level-base])6 if (level == base and CountBitsSet(bitmaps[0]) == max)7 NextConguration()8 endif9 endif

ComputeEstimate(bitmaps,base)1 numIPs=02 for i= 0 to numbmps-13 numIPs=numIPs+b ln(b/CountBitsNotSet(bitmaps[i]))4 endfor5 correction= 2(2^base - 1) / (2^numbmps - 1) . b ln(b/(b - max))6 return numIPs 2base=(1 – (2 ^ (-numbmps)))+correction

Page 25: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

CPU scaling Processing each packet payload as a single

string is easy But when applying Rabin fingerprints, the

processing of every substring of length B can overload the CPU during high traffic load – too much processing

A packet with 1,000 bytes of payload and B = 40, requires processing 960 Rabin fingerprints

To reduce processing time – sample the packets which are processed

Randomly sampling substrings to process could cause us to miss a large fraction of the occurrences of each substring

Page 26: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

CPU Scaling Instead use value sampling and select only

those substrings for which the fingerprint matches a certain pattern – like last six bits are 0

The probability of detecting a worm with a signature of length x

Probability of tracking a worm with a signature of 100 bytes is 55%, but for a worm with a signature of 200 bytes it increases to 92%, and for 400 bytes to 99.64%

Page 27: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Complete System

Page 28: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Program LoopProcessPacket()1 InitializeIncrementalHash(payload,payloadLength,dstPort)2 while (currentHash=GetNextHash())3 if (currentADEntry=ADEntryMap.Find(currentHash))4 UpdateADEntry(currentADEntry,srcIP,dstIP,packetTime)5 if ( (currentADEntry.srcCount > SrcDispTh)

and (currentADEntry.dstCount > DstDispTh) )6 ReportAnomalousADEntry(currentADEntry,packet)7 endif8 else9 if ( MsfIncrement(currentHash) > PravalenceTh)10

newADEntry=InitializeADEntry(srcIP,dstIP,packetTime)11 ADEntryMap.Insert(currentHash,newADEntry)12 endif13 endif14 endwhile

Page 29: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Statistics

Implementation is written in C The aggregator also uses the

MySql database to log all events Used popular rrd-tools library for

graphical reporting PHP scripting for administrative

control

Page 30: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Content prevalence threshold

•Using a 60 second measurement interval and a whole packet CRC, over 97 percent of all signatures repeat two or fewer times and 94.5 percent are only observed once•Using a finer grained content hash or a longer measurement interval increases these numbers even further

Page 31: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Address dispersion threshold

After 10 minutes there are over 1000 signatures with a low dispersion threshold of 2

Using a threshold of 30, there are only 5 or 6 prevalent strings meeting the dispersion criteria

Page 32: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Garbage Collection

When the timeout is set to 100 seconds, then almost 60 percent of all signatures are garbage collected before a subsequent update

Using a timeout of 1000 seconds, this number is reduced to roughly 20 percent of signatures

Page 33: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Positives

Automatic Detection, Characterization & Containment

Low processor time consumed Low memory consumption Identify new worms and produce

signatures – even E-Mail worms

Page 34: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Problems Can’t identify worms with very less or no

invariant portion Can use compression modules like zip to confuse

Earlybird Vulnerabilities in IPSec, SSL & VPN can’t be

secured Attempt to evade our monitor through

traditional IDS evasion techniques – like IP spoofing

Stealth worm difficult to identify Purposely create worm defense to disallow some

service by spreading similar packets

Page 35: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Suggestions Uncompress Packets & Identify original

contents Need to have system as firewall for

Secure protocols Use triggering data across time scales

(In paper) or maintain history of slowly repeating data

Check working of worm – see if it is really a worm in infected systems

Page 36: Click to add Text Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Department of Computer Science and Engineering

Questions