finding the needle in the ip stack

46
FINDING THE NEEDLE IN THE IP STACK Dr. Sven Krasser McAfee, Inc. Session ID: RR-403 Session Classification: Intermediate

Upload: sven-krasser

Post on 14-Jul-2015

1.426 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Finding the Needle in the IP Stack

FINDING THE NEEDLE IN THE IP STACK

Dr. Sven Krasser

McAfee, Inc.

Session ID: RR-403

Session Classification: Intermediate

Page 2: Finding the Needle in the IP Stack

AGENDA

Data Mining – A Human Approach

English Words

Bad Behavior

What’s in a File

Conclusions

2

Page 3: Finding the Needle in the IP Stack

Data MiningA Human Approach

3

Page 4: Finding the Needle in the IP Stack

ANTHROPOMETRIC DATA

4

Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro

Page 5: Finding the Needle in the IP Stack

MEASUREMENTS

5

Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro

Page 6: Finding the Needle in the IP Stack

MEASUREMENTS (CONTINUED)

6

Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro

Page 7: Finding the Needle in the IP Stack

250 –

200 –

150 –

100 –

60 65 70 75 80Height (in inches)

Weig

ht

(in

po

un

ds

)

HEIGHT VERSUS WEIGHT

7

Page 8: Finding the Needle in the IP Stack

250 –

200 –

150 –

100 –

60 65 70 75 80Height (in inches)

Weig

ht

(in

po

un

ds)

HEIGHT VERSUS WEIGHT (CONTINUED)

Women

Men

8

Page 9: Finding the Needle in the IP Stack

PUTTING WEIGHT AND HEIGHT INTO PERSPECTIVE

9

Page 10: Finding the Needle in the IP Stack

BEST GUESS FOR GENDER

Height (in inches)

Weig

ht

(in p

ounds)

100% male

0% female

0% male

100% female

50% male

50% female

Best G

uess

10

Page 11: Finding the Needle in the IP Stack

0.15 –

0.10 –

0.05 –

0.00 –

ONE DIMENSION ONLY

Height (in inches)

55 60 65 70 75

11

Page 12: Finding the Needle in the IP Stack

BETTER FEATURES

Buttock Circumference: “The circumference of the body measured at the level of the maximum posterior protuberance of the buttocks.”

Weig

ht

(in

po

un

ds)

800 900 1000 1100 1200

200 –

180 –

160 –

140 –

120 –

100 –

12

Page 13: Finding the Needle in the IP Stack

BEST GUESS FOR REVISED FEATURES

13

Weig

ht

(in p

ounds)

Best G

uess

Buttock Circumference

Page 14: Finding the Needle in the IP Stack

FURTHER IMPROVING THE SEPARATION

Signal to NoiseFeatures with very different distribution per class

CorrelationFeatures with low correlation

DimensionalityConsider more features at the same time

14

Page 15: Finding the Needle in the IP Stack

EMAIL DATA IN THREE DIMENSIONS

15

Page 16: Finding the Needle in the IP Stack

16

SPARSE DATA25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

10 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 3 0 0 0 0 0 0 2 0 0 0 1 0 3 1 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 17: Finding the Needle in the IP Stack

CLASSIFICATION ALGORITHMS

+ FinalVerdict

Decision Trees Decision Forests

Neural Networks Support Vector Machines

17

Page 18: Finding the Needle in the IP Stack

English WordsAnd why do they look English?

18

Page 19: Finding the Needle in the IP Stack

SOME ENGLISH WORDS

• militate

• caterwaul

• deracinate

• arrant

• concinnity

• imprecation

• vertiginous

• profuse

19

Page 20: Finding the Needle in the IP Stack

SOME ENGLISH EXPLANATIONS

• militate: to have force or influence

• caterwaul: to make a harsh cry or screech

• deracinate: to uproot

• arrant: outright; thoroughgoing

• concinnity: elegance – used chiefly of literary style

• imprecation: a curse

• vertiginous: causing dizziness; also, giddy; dizzy

• profuse: plentiful; copious

20

Source: http://dictionary.reference.com/

Page 21: Finding the Needle in the IP Stack

TRANSITION PROBABILITIES

a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9

a .00 .07 .15 .02 .00 .10 .00 .00 .00 .00 .02 .05 .00 .17 .00 .02 .00 .05 .02 .27 .00 .05 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

b .14 .00 .00 .00 .29 .00 .00 .00 .00 .00 .00 .43 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .14 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

c .05 .03 .05 .00 .11 .00 .00 .08 .03 .00 .03 .00 .00 .00 .24 .00 .00 .03 .00 .22 .14 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

d .17 .04 .04 .00 .17 .00 .00 .00 .17 .00 .00 .00 .04 .00 .04 .00 .00 .00 .17 .13 .04 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

e .03 .02 .15 .11 .00 .01 .00 .03 .04 .01 .00 .01 .04 .11 .02 .02 .00 .12 .12 .09 .00 .02 .02 .01 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

f .06 .00 .00 .00 .06 .24 .00 .00 .29 .00 .00 .00 .00 .00 .12 .00 .00 .12 .06 .06 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

g .00 .00 .00 .14 .14 .00 .00 .14 .00 .00 .00 .00 .00 .00 .00 .14 .00 .00 .29 .00 .00 .00 .00 .00 .14 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

h .16 .00 .00 .00 .53 .05 .00 .00 .11 .00 .00 .00 .00 .05 .11 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

i .02 .00 .08 .00 .04 .00 .00 .00 .00 .00 .00 .06 .00 .29 .15 .00 .02 .02 .15 .04 .00 .08 .00 .00 .00 .04 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

j .00 .00 .00 .00 1.0 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

k .00 .00 .00 .00 .50 .00 .00 .00 .00 .00 .00 .50 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

l .06 .00 .00 .06 .35 .00 .00 .00 .06 .00 .00 .06 .00 .00 .12 .00 .00 .00 .06 .06 .06 .00 .00 .00 .12 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

m .08 .00 .00 .00 .08 .00 .00 .00 .08 .00 .00 .00 .00 .00 .17 .33 .00 .00 .00 .17 .08 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

n .04 .00 .02 .18 .16 .00 .12 .00 .02 .00 .00 .00 .00 .00 .02 .00 .00 .02 .12 .24 .02 .00 .02 .00 .00 .00 .00 .00 .00 .00 .00 .02 .00 .00 .00 .00

o .00 .00 .02 .02 .04 .12 .02 .00 .00 .00 .00 .04 .12 .16 .00 .00 .02 .20 .08 .00 .12 .02 .02 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

p .00 .00 .00 .00 .08 .00 .00 .00 .00 .00 .00 .00 .00 .00 .08 .00 .00 .46 .00 .00 .38 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

q .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 1.0 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

r .15 .00 .07 .02 .24 .00 .00 .00 .07 .00 .00 .02 .00 .09 .11 .04 .00 .00 .07 .02 .07 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .02 .00 .00

s .08 .00 .04 .00 .17 .02 .00 .00 .10 .00 .00 .02 .00 .00 .12 .02 .00 .00 .04 .31 .06 .00 .00 .00 .04 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

t .07 .00 .01 .00 .27 .00 .00 .16 .16 .00 .00 .00 .00 .00 .09 .01 .00 .13 .01 .04 .03 .00 .00 .00 .01 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

u .00 .00 .03 .00 .03 .00 .00 .00 .03 .00 .00 .06 .03 .16 .00 .00 .00 .29 .13 .23 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

v .11 .00 .00 .00 .33 .00 .00 .00 .56 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

w .50 .00 .00 .00 .00 .00 .00 .00 .25 .00 .00 .00 .00 .00 .25 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

x .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 1.0 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

y .29 .00 .00 .00 .00 .00 .00 .00 .14 .00 .00 .00 .00 .00 .14 .00 .00 .14 .29 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

z .00 .00 .00 .00 1.0 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00

21

Page 22: Finding the Needle in the IP Stack

ACTIVE .COM DOMAINS

22

82 million active .com domains

Page 23: Finding the Needle in the IP Stack

MARKOV CHAINS

• Analysis of recent domain registrations

• Using Second Order Markov Chains to detect potentially malicious domain names

– bnkofpunjab is not legitimate

– ferrylines.com is legitimate

– ebay.com is not determinable

abbn.0073

.1733nk

.0641

.0872ko

.0213

.2738of

.0912

.0431fp

.0912

.1534pu

.0732

.0932un

.0014

.0714nj

.2175

.2936ja

.0143

.0437

fe.2626

.1860er

.0301

.0196rr

.0939

.0371ry

.0322

.0291yl

.2419

.1932li

.3598

.1120in

.1457

.1269ne

.0633

.0411es

eb.1064

.4759ba

.0588

.2979ay

23

Page 24: Finding the Needle in the IP Stack

LIMITATIONS OF THE MARKOV MODEL

• Useful to detect malicious domain names

• Very effective for randomly generated names

• Detects some legitimate domain names as malicious domains

– Malicious names similar to legitimate ones (e.g. ebay.com phishing sites)

– International domain names and punycode

• Solution: add DNS related features into classification process

24

Page 25: Finding the Needle in the IP Stack

DNS FEATURES

Domain Number of Name Servers

bnkofpunjab.com 15

ferrylines.com 2

ebay.com 4

1. The number of the nameservers that hosted or are hosting this domain

2. The average time of one nameserver to host this domain

3. The maximum time of one nameserver to host this domain

4. The minimum time of one nameserver to host this domain

5. The number of non-activated nameservers that hosted this domain before

6. Whether the domain is an international one

25

Page 26: Finding the Needle in the IP Stack

0.15 –

0.10 –

0.05 –

0.00 –

EXAMPLE FEATURE

Time of domain on name server (in days)

De

nsi

ty

0 200 400 600

26

Page 27: Finding the Needle in the IP Stack

27

RESULTS ANALYSIS

27

Tru

e P

osit

ive R

ate

False Positive Rate

Page 28: Finding the Needle in the IP Stack

Bad BehaviorEmail and Spam

28

Page 29: Finding the Needle in the IP Stack

IP BLACKLIST LOOKUP

• Mail server looks up sender IP over DNS

• Simple classifier modeled on IP blacklist query logs

• Narrow data set – queried IP, source IP, timestamp

• Deep data set – billions of query records monthly

• More complex data can be included

29

Page 30: Finding the Needle in the IP Stack

Q?

Q=x

Q?

Q=x

Reputation server

IP LOOKUPS

Sender Receiver

<Q, S, T>

DNS

IP=Q

IP=S

30

Page 31: Finding the Needle in the IP Stack

– Source IPs (thousands)

FEATURE EXTRACTION

Breadth features

– Number of messages

– Number of recipients

– Burstiness (data transmitted in short, uneven spurts)

– Sending sessions to individual recipients

– Global sending sessions to any recipient

Spectral features

– Periodicity over 24-hour window

– Average and standard deviation of low-frequency discrete Fourier

transform (DFT coefficients)

– Average and standard deviation of high-frequency DFT coefficients

Distribution features

31

Page 32: Finding the Needle in the IP Stack

SELECTION OF ADVANCED FEATURES

Geographic features

• Location of sender and receiver

• Distance

• Local time at sender and receiver

32

Static features

• Host name features

• Dial-up Ips

• Reputation of neighboring IPs

Content features

• Ratio of good and bad messages

• Number of “from” domains handled

• Persistent sender/receiver address pairs

• Message size distribution

Sparse distribution features

• Source devices (thousands)

• Extended HELO (EHLO) strings (millions)

• “From” domains (billions)

• “To” addresses (billions)

Page 33: Finding the Needle in the IP Stack

BREADTH FEATURES

0.2

0.4

0.6

0.8

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Normalized number of receivers

No

rmal

ize

d n

um

be

r o

f m

ess

age

s p

er

rece

ive

r

33

Spam

Ham

Page 34: Finding the Needle in the IP Stack

What’s in a FileA Look at Image Spam and Malware

34

Page 35: Finding the Needle in the IP Stack

IMAGE SPAM—OCR EVASION

35

Page 36: Finding the Needle in the IP Stack

IMAGE SPAM—COMPOSITION

36

Page 37: Finding the Needle in the IP Stack

CLOSE-UP OF GRADIENT

37

Page 38: Finding the Needle in the IP Stack

CLOSE-UP OF GRADIENT (CONTINUED)

38

Page 39: Finding the Needle in the IP Stack

GRADIENT FIELD OF PHOTO

39

Page 40: Finding the Needle in the IP Stack

GRADIENT DIRECTIONS

40

Page 41: Finding the Needle in the IP Stack

IMAGE FEATURE ANALYSIS

1:0 2:266 3:285 4:0.933333 5:9678 6:7.83323 7:1 8:0 9:0.038768 10:0.0286506 11:0.0242844 12:12.9656 13:0.688315 14:0.688289

15:0.688927 16:0.688345 17:1.47216 18:1.48728 19:1.45537 20:1.4721 21:0.998652 22:0.998907 23:0.998662 24:1 25:1 26:1 27:1

28:1 29:1 30:1 31:1 32:1 33:1 34:1 35:1 36:1 37:1 38:1 39:1 40:1 41:1 42:1 43:1 44:1 45:1 46:1 47:1 48:1 49:1 50:1 51:1 52:1

53:1 54:1 55:1 56:1 57:1 58:1 59:1 60:62895.6 61:62894.4 62:62923.5 63:62897 64:11.9708 65:0.439338 66:0.0768368

67:0.0533835 68:0.694764 69:285 70:97 71:106 72:99 73:97 74:69979 75:69484 76:68665 77:69365 78:1 79:0 80:0 81:0.0342435

82:0.0281361 83:0.025709 84:1327.37 85:35.0028 86:28.6605 87:0.818808 88:1 89:2.98484e+07 90:4.16282e+06 91:8.01424e+06

92:1.49028e+07 93:3.56203e+09 94:7.21651e+06 95:4.73602e+06 96:3.10232e+07 97:0.0083796 98:0.576846 99:1.69219 100:0.480375

101:3.61226e+09 102:3.74413e+07 103:1.22301e+07 104:1.17737e+07 105:3.6044e+07 106:3.47745e+09

1:0 2:403 3:328 4:1.22866 5:14076 6:9.39074 7:1 8:0 9:0.0107123 10:0.00245869 11:0.00118774 12:8.11821 13:0.437548

14:0.43765 15:0.437561 16:0.437535 17:1.50918 18:1.49392 19:1.50991 20:1.50827 21:0.487349 22:3.32315e-05 23:9.95995e-05

24:2 25:1 26:4 27:2 28:1 29:4 30:2 31:1 32:4 33:2 34:1 35:4 36:2 37:1 38:4 39:2 40:1 41:4 42:2 43:1 44:4 45:2 46:1 47:4 48:2

49:1 50:4 51:2 52:1 53:4 54:2 55:1 56:4 57:2 58:1 59:4 60:87436.3 61:87446.5 62:87437.6 63:87435 64:21.4308 65:0.770517

66:0.0444456 67:0.0244281 68:0.549617 69:328 70:98 71:98 72:103 73:90 74:105800 75:99639 76:109102 77:104674 78:1 79:0 80:0

81:0.00520487 82:0.00256461 83:0.00166435 84:771.479 85:20.5683 86:47.573 87:2.31293 88:1 89:1.2547e+07 90:1.11096e+06

91:3.35713e+06 92:4.41541e+06 93:2.70918e+09 94:2.06067e+06 95:2.66906e+06 96:1.28006e+07 97:0.0046313 98:0.539126 99:1.2578

100:0.344938 101:2.72749e+09 102:1.02016e+07 103:1.04445e+07 104:1.03338e+07 105:1.00934e+07 106:2.69858e+09

1:0 2:418 3:320 4:1.30625 5:18652 6:7.17135 7:1 8:0 9:0.0106459 10:0.00264653 11:0.000994318 12:14.1862 13:0.243456

14:0.243497 15:0.243457 16:0.243446 17:2.41721 18:2.4152 19:2.41193 20:2.41671 21:7.91675e-05 22:8.63708e-05 23:0.339384

24:4 25:1 26:8 27:3 28:1 29:8 30:2 31:1 32:8 33:4 34:1 35:8 36:3 37:1 38:8 39:2 40:1 41:8 42:4 43:1 44:8 45:3 46:1 47:8 48:2

49:1 50:8 51:4 52:1 53:8 54:3 55:1 56:8 57:2 58:1 59:8 60:65998.9 61:66004.4 62:65999 63:65997.5 64:10.224 65:0.127104

66:0.0635766 67:0.056407 68:0.88723 69:320 70:53 71:48 72:57 73:57 74:111983 75:115960 76:114435 77:113875 78:1 79:0 80:0

81:0.006407 82:0.00189145 83:0.000485945 84:964.421 85:33.207 86:64.7237 87:1.9491 88:1 89:1.76351e+07 90:2.50429e+06

91:6.24028e+06 92:1.09962e+07 93:3.00335e+09 94:3.5386e+06 95:5.21808e+06 96:1.85759e+07 97:0.00587181 98:0.707707 99:1.1959

100:0.591959 101:3.02005e+09 102:2.17951e+07 103:2.6213e+07 104:2.59369e+07 105:2.15655e+07 106:2.98824e+09

1:0 2:425 3:213 4:1.99531 5:0 6:inf 7:1 8:0 9:0.0204143 10:0.0121072 11:0.00813035 12:14.5448 13:0.574197 14:0.562077

15:0.0938837 16:0.106849 17:2.52864 18:2.29707 19:5.7086 20:5.11698 21:0.0739991 22:0.95797 23:0.951505 24:1 25:1 26:1 27:1

28:1 29:1 30:1 31:1 32:1 33:2 34:1 35:2 36:1 37:1 38:2 39:1 40:1 41:2 42:2 43:5 44:1 45:1 46:1 47:1 48:1 49:1 50:1 51:3

52:3.66667 53:5 54:1 55:1 56:5 57:1 58:1 59:3 60:68596 61:67868.2 62:27737.3 63:29590.7 64:11.4527 65:1.08368 66:0.077625

67:0.0372273 68:0.479579 69:213 70:256 71:256 72:256 73:255 74:83329 75:78194 76:72107 77:77795 78:0 79:1 80:0 81:0.0200608

82:0.0118089 83:0.00857222 84:1814.96 85:43.1429 86:37.0588 87:0.858977 88:1 89:3.50206e+07 90:3.97185e+06 91:7.57905e+06

92:1.92885e+07 93:2.92089e+09 94:5.71381e+06 95:5.4605e+06 96:3.81577e+07 97:0.0119897 98:0.695132 99:1.38798 100:0.505495

101:2.99697e+09 102:2.84841e+07 103:1.06295e+07 104:1.04169e+07 105:2.79142e+07 106:2.93701e+09

1:0 2:345 3:328 4:1.05183 5:12654 6:8.94263 7:1 8:0 9:0.197119 10:0.144919 11:0.130974 12:16.5426 13:0.213558 14:0.213561

15:0.213558 16:0.213541 17:2.58033 18:2.58009 19:2.58045 20:2.57963 21:0.00235566 22:8.63563e-05 23:8.24988e-05 24:5 25:1

26:10 27:4 28:1 29:10 30:2 31:1 32:10 33:5 34:1 35:10 36:4 37:1 38:10 39:2 40:1 41:10 42:4 43:1.25 44:9 45:3 46:1.33333 47:8

48:2 49:1 50:8 51:5 52:1 53:10 54:4 55:1 56:10 57:2 58:1 59:10 60:52293.8 61:52294.2 62:52293.9 63:52291.8 64:10.1244

65:0.115826 66:0.0834305 67:0.0747702 68:0.896197 69:328 70:171 71:154 72:169 73:129 74:16728 75:14297 76:14292 77:15012

78:1 79:0 80:0 81:0.167754 82:0.150486 83:0.14035 84:1517.35 85:38.3333 86:65.9516 87:1.72048 88:1 89:2.79228e+07

90:3.07939e+06 91:6.79947e+06 92:1.53908e+07 93:1.22e+08 94:5.30236e+06 95:5.54061e+06 96:2.88332e+07 97:0.228875

98:0.580758 99:1.22721 100:0.533785 101:1.49517e+08 102:3.08441e+07 103:3.45075e+07 104:2.88255e+07 105:2.57652e+07

106:1.24897e+08

41

Page 42: Finding the Needle in the IP Stack

View of two-dimensional subspace

IMAGE FEATURE ANALYSIS

Ham

Spam

42

Page 43: Finding the Needle in the IP Stack

MALWARE FEATURE ANALYSIS

43

Page 44: Finding the Needle in the IP Stack

Conclusions

44

Page 45: Finding the Needle in the IP Stack

CONCLUSIONS

• Heuristics are limited

• Mathematical descriptions

• Dimensionality

• Intuition

45

Page 46: Finding the Needle in the IP Stack

July 12, 2011TrustedSource Data Mining Technologies46

RESEARCH PUBLICATIONS

http://www.trustedsource.org/en/resources/publications

46