internet traffic classification kiss

50
Internet Traffic Classification KISS Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi 1

Upload: kathie

Post on 01-Feb-2016

19 views

Category:

Documents


1 download

DESCRIPTION

Internet Traffic Classification KISS. Dario Bonfiglio, Alessandro Finamore, Marco Mellia , Michela Meo, Dario Rossi. Traffic Classification & Measurement. Why ? Identify normal and anomalous behavior Characterize the network and its users Quality of service Filtering … How? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Internet Traffic Classification KISS

Internet Traffic ClassificationKISS

Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi

1

Page 2: Internet Traffic Classification KISS

Traffic Classification & Measurement Why??

Identify normal and anomalous behavior Characterize the network and its users Quality of service Filtering …

How?How? By means of passive measurement Using Tstat

2

Page 3: Internet Traffic Classification KISS

3

Tstat

Traffic classifier Deep packet inspection Statistical methods

Persistent and scalable monitoring platform Round Robin Database (RRD) Histograms

Internal Clients

EdgeRouter

External Servers

htt

p:/

/tst

at.

tlc.

polit

o.it

htt

p:/

/tst

at.

tlc.

polit

o.it

Page 4: Internet Traffic Classification KISS

Tstat at a Glance

Page 5: Internet Traffic Classification KISS

Worm and Viruses?

Did someone open a Christmas card? Happy new year to Windows!! Did someone open a Christmas card? Happy new year to Windows!!

Page 6: Internet Traffic Classification KISS

Anomalies (Good!)Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008

Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008

Page 7: Internet Traffic Classification KISS

New Applications – P2PTVFiorentina 4 - Udinese 2Fiorentina 4 - Udinese 2

Inter 1 - Juventus 0Inter 1 - Juventus 0

Page 8: Internet Traffic Classification KISS

Traffic classification

Look at the packets…

Tell me what protocol and/or application

generated them

Page 9: Internet Traffic Classification KISS

Port:

Port: 4662/4672

Port:

Port:

Payload: “bittorrent”

Payload: E4/E5

Payload:

Payload: RTP protocol

Skype Bittorrent

Gtalk eMule

Typical approach: Deep Packet Inspection (DPI)

It fails more and more:P2P

EncryptionProprietary solution

Many different flavours

Page 10: Internet Traffic Classification KISS

The Failure of DPI

11.05.2008 12:29 eMule 0.49a released 11.05.2008 12:29 eMule 0.49a released

1.08.2008 20:25 eMule 0.49b released 1.08.2008 20:25 eMule 0.49b released

Page 11: Internet Traffic Classification KISS

Possible Solution: Behavioral Classifier

Phase 1

Feature

Phase 3

Verify

1. Statistical characterization of traffic (given source) 2. Look for the behaviour of unknown traffic and

assign the class that better fits it3. Check for possible classification mistakes

Phase 2

DecisionTraffic(Known)

(Training) (Operation)

Page 12: Internet Traffic Classification KISS

Phase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Our Approach

Statistical characterization of bits in a flow

Do NOT look at the SEMANTIC and TIMING… but rather look at the protocol FORMAT

Test2

Page 13: Internet Traffic Classification KISS

Chunking and 2

First N payload bytes

First N payload bytes

C chunks Each of

b bits2

12

C[ ], … ,

Vector of Statistics

The provides an implicit measure of entropy or randomness

2

Observeddistribution

Expecteddistribution(uniform)

Page 14: Internet Traffic Classification KISS

Consider a chunk of 2 bits:

0 1 2 3 0 1 2 3 0 1 2 3

RandomValues

DeterministicValue

Counter

Oi

and different beaviour

Page 15: Internet Traffic Classification KISS

4 bit long chunks: evolution

random

x x x x

2

Page 16: Internet Traffic Classification KISS

random

Deterministic )12(2 bN

0 0 0 1

4 bit long chunks: evolution2

Page 17: Internet Traffic Classification KISS

random

deterministic

mixed

x 0 0 0

x 0 x 0

0 x x x

4 bit long chunks: evolution2

Page 18: Internet Traffic Classification KISS

Chi Square Classifier

Split the payload into groups

Apply the test on the groups at the flow end: each message is a sample

Some groups will contain Random bits Mixed bits Deterministic bits

0 8 16 24---------------------| ID | FUNC |---------------------

Page 19: Internet Traffic Classification KISS

CSC

1

10

100

1000

10000

100000

1e+006

100 1000 10000 100000 1e+006n [pkt]

Deterministic groupRandom group

Mixed group

Page 20: Internet Traffic Classification KISS

And the counter example?

2 byte long counter

MSG L2 L1 LSG

MostSignificantGroup

LessSignificantGroup

Page 21: Internet Traffic Classification KISS

Protocol format as seen from the2

Page 22: Internet Traffic Classification KISS

Statistical characterization of bits in a flow

Decision process Test

Minimum distance / maximum likelihood

2

Phase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Our Approach

Page 23: Internet Traffic Classification KISS

C-dimension space

21

2C[ ], … ,

Iperspace

ClassificationRegions

EuclideanDistance

Support VectorMachine

2i

2j

Class

Class

My Point

Page 24: Internet Traffic Classification KISS

Example considering the 2

Page 25: Internet Traffic Classification KISS

2i

2j Centroid

Center of mass

Euclidean Distance Classifier

Page 26: Internet Traffic Classification KISS

2i

2j

True NegativeAre “Far”

True PositivesAre “Nearby”

CentroidCenter of mass

Euclidean Distance Classifier

Page 27: Internet Traffic Classification KISS

2i

2j

False Positives

CentroidCenter of mass

Iper-sphere

Euclidean Distance Classifier

Page 28: Internet Traffic Classification KISS

2i

2j Centroid

Center of mass

Iper-sphere False negatives

Radius

Euclidean Distance Classifier

Page 29: Internet Traffic Classification KISS

2i

2j Centroid

Center of mass

Iper-sphere min { False Pos. } min { False Neg. }

Confidence

The distance is a measure of the condifence of the decision

Euclidean Distance Classifier

Page 30: Internet Traffic Classification KISS

Radius

Tru

e P

ositi

ve

– F

alse

pos

itive

How to define the sphere radius?

Page 31: Internet Traffic Classification KISS

Space ofsamples(dim. C)

Kernel function

Space of feature

(dim. ∞)

Kernel functions Move point so that borders

are simple

Support Vector Machine

Page 32: Internet Traffic Classification KISS

Support vectors

Support vectors

Kernel functions Move point so that borders

are simple

Borders are planes Simple surface! Nice math Support Vectors LibSVM

Support Vector Machine

Page 33: Internet Traffic Classification KISS

Decision Distance from the border Confidence is a

probability

p ( class )

Kernel functions

Borders are planes Simple surface! Nice math Support Vectors LibSVM

Support Vector Machine

Page 34: Internet Traffic Classification KISS

Performance evaluationHow accurate is all this?

Our ApproachPhase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Statistical characterization of bits in a flow

Decision process Test

Minimum distance / maximum likelihood

2

Page 35: Internet Traffic Classification KISS

Per flow and per endpoint

What are we going to classify? It can be applied to both single flows And to endpoints

It is robust to sampling Does not require to monitor all packets, not the

first packets

35

Page 36: Internet Traffic Classification KISS

Real traffic tracesInternet

Fastweb

Known + Other Training Known Traffic False Negatives Unknown traffic False Positives

Trace

RTPeMuleDNS

Oracle(DPI +Manual )

other

Other UnknownTraffic

1 day long trace

20 GByte diUDP traffic

Page 37: Internet Traffic Classification KISS

Definition of false positive/negative

TrafficOracle (DPI) eMuleRTP

DNS

Other

Classifing “known”

true positives

false negatives

true negatives

false positives

Classifing “other”KISS KISS

Page 38: Internet Traffic Classification KISS

Case A Case BRtp 0.08 0.23Edk 13.03 7.97Dns 6.57 19.19

Case A Case B0.00 0.050.98 0.540.12 2.14

Case A Case Bother 13.6 17.01

Euclidean Distance SVM

Case A Case B0.00 0.18

Results

Known traffic(False Neg.)

[%]

Other(False Pos.)

[%]

Page 39: Internet Traffic Classification KISS

Real traffic trace

RTP errors are oracle mistakes(do not identify RTP v1)

DNS errors are due to impure training set

(for the oracle all port 53 is DNS traffic)

EDK errors are (maybe) Xbox Live(proper training for “other”)

FN are always below 3%!!!

Page 40: Internet Traffic Classification KISS

Tuning trainset size

%

True positives

False positives

Samples per class

(confidence 5%)

Small training setFor “known”: 70-80 MbyteFor “other”: 300 Mbyte

Page 41: Internet Traffic Classification KISS

2

packets

%

True positives

False positives

Tuning num of packets for

(confidence 5%)

Protocols with volumesat least 70-80 pkts per flow

Page 42: Internet Traffic Classification KISS

P2P-TV applications

P2P-TV applications are becoming popularThey heavly rely on UDP at the transport protocolThey are based on proprietary protocolsThey are evolving over time very quicklyHow to identify them?... After 6 hours, KISS give you results

Page 43: Internet Traffic Classification KISS

The Failure of DPI

Page 44: Internet Traffic Classification KISS

And for TCP?

44

Page 45: Internet Traffic Classification KISS

Chunking and 2

First N payload bytes

First N payload bytes

C chunks Each of

b bits2

12

C[ ], … ,

Vector of Statistics

The provides an implicit measure of entropy or randomness

2

Observeddistribution

Expecteddistribution(uniform)

Page 46: Internet Traffic Classification KISS

Results

46

Page 47: Internet Traffic Classification KISS

Results

47

Page 48: Internet Traffic Classification KISS

Pros and Cons

KISS is good because…• Blind approach• Completely automated• Works with many protocols• Works even with small training• Statistics can start at any point• Robust w.r.t. packet drops• Bypasses some DPI problems

but…• Learn (other) properly• Needs volumes of traffic• May require memory (for now)• Only UDP (for now)• Only offline (for now)

Page 49: Internet Traffic Classification KISS

Papers D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli “Revealing skype

traffic: when randomness plays with you”, ACM SIGCOMM, Kyoto, JP, August 2007

D. Rossi, M. Mellia, M. Meo, “A Detailed Measurement of Skype Network Traffic”, 7th International Workshop on Peer-to-Peer Systems (IPTPS '08), Tampa Bay, Florida, February 2008

D. Bonfiglio, M. Mellia, M. Meo, N. Ritacca, D. Rossi, “Tracking Down Skype Traffic”, IEEE Infocom, Phoenix, AZ, 15,17 April 2008

D. Bonfiglio, M. Mellia, M. Meo, D. Rossi Detailed Analysis of Skype Traffic IEEE Transactions on Multimedia "1", Vol. 11, No. 1, pp. 117-127, ISSN: 1520-9210, January 2009

A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: Stochastic Packet Inspection 1st Traffic Monitoring and Analysis (TMA) Workshop Aachen, 11 May 2009

Page 50: Internet Traffic Classification KISS

And for TCP

50