revealing skype traffic: when randomness plays with you d. bonfiglio 1, m. mellia 1, m. meo 1, d....

53
Revealing Skype Traffic: When Randomness Plays with You D. Bonfiglio 1 , M. Mellia 1 , M. Meo 1 , D. Rossi 2 , P. Tofanelli 3 Dipartimento di Elettronica, Politecnico di Torin o 1 ENST Télécom Paris 2 Motorola Inc. 3 ACM Sigcomm 2007 Presented by Te-Yuan Huang

Upload: byron-fields

Post on 26-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Revealing Skype Traffic:When Randomness Plays with You

D. Bonfiglio1, M. Mellia1, M. Meo1,D. Rossi2, P. Tofanelli3

Dipartimento di Elettronica, Politecnico di Torino1

ENST Télécom Paris2

Motorola Inc.3

ACM Sigcomm 2007

Presented by Te-Yuan Huang

Outline

Goal Contribution Know More about Skype Classifiers Experiments Conclusions

Outline

Goal Contribution Know More about Skype Classifiers Experiments Conclusions

Goal Identify Skype Traffic among

aggregated traffic Direct session Either UDP or TCP

The algorithm should be Work in Real-Time Reliable Able to detect short flows

(only last several seconds)

Outline

Goal Contribution Know More about Skype Classifiers Experiments Conclusions

Importance of Skype Traffic Identification

Interest of network operator Network Design & Provisioning Traffic and Performance Monitoring Tariff Policies Traffic Differentiation

Difference from Related Work K.T. Chen et al.

“Quantifying Skype USI” Only identify UDP traffic Need Skype login phase to be monitored

Fail on backbone links Fail if any modification on Skype login proc.

K. Suh et al.“Characterizing and Detect relayed traffic: A case study using Skype” Only identify relayed Skype traffic

Outline

Goal Contribution Know More about Skype Classifiers Experiments Conclusions

Let’s get hands dirty – Know more about Skype traffic sources

A Skype Message

Skype Parameters

Rate Codec Rate

Delta T Skype Message Framing Time

The time between two subsequent Skype Message

RF (Redundancy Factor) The number of past blocks that Skype

retransmits

Parameters changes on Network Conditions

Skype Communication Mode

End-to-End (E2E) Skype user call Skype user

End-to-Out (E2O) Skype-in/Skype-out PSTN involved Only voice data

No video / file transfer / IM

Skype Codec

Codecs Automatically selected ISAC

The preferred codec for E2E G.729

The preferred codec for E2O

More on Skype Message Skype encrypt the message

TCP: Reliable transport Receive packet in correct sequence

(from application layer point of view) encrypt the whole content of the message

UDP: Unreliable Maybe out-of-order Application layer header is needed

to resolve incorrect order Only can be obfuscated

Only encrypt partial message

TCP E2E Message

All ciphered

1 2 3Byte

Frame

Identified Field ID: 16-bit long identifier.

Randomly selected Fun: 5-bit long field masked by 0x8f

Used to stating the payload type 0x02, 0x03, 0x07,0x0f : signaling message 0x0d : Data message (all 4 types DATA)

Not Random, but obfuscate (Mixed) Frame: ciphered information

UDP E2E Message

1 2 3 4Byte …

ID Fun

Frame

Identified Field CCID: 4 bytes

Connection Identifier (CID) of PSTN gateway Deterministic

After initial signaling

E2O Message

1 2 3 4Byte …

CID Frame

Outline

Goal Contribution Know More about Skype Classifiers Experiments Conclusions

How to Identify Skype Traffic? Chi-Square Classifier (CSC)

Utilize the knowledge of ciphering mechanism

Naïve Bayes Classifier (NBC) Utilize the general characteristics of VoIP traf

fics Payload-Based Classifier (PBC)

Look into the non-ciphered SoM Only used for traffic in UDP

Chi-Square Classifier (CSC) Purpose:

To Know whether message portion is encrypted Rationale

Given a message, Only the third bytes is not random

Probably, E2E Skype flow by UDP The first four bytes are deterministic, others are

ciphered Probably, E2O Skype flow by UDP

The whole message is ciphered Probably, Skype flow transported by TCP

Chi-Square Classifier (CSC) – Cont. Chi-Square Distr.

Observing the objects’ ouput for nTOT times There are n possible output For ith output, it is expected to occur Ei times among

nTOT, and is observed to occur Oi times Then,

is Chi-Square Distr. With n-1 degree of freedom

1

0

22

n

i i

ii

E

EO

Chi-Square Classifier (CSC) – Cont. For each flow, take first G group of b bits

For each group g, there are 2b possible output

If the content of the flow is random, then Ei for each group is nTOT / 2b

b bits b bits b bits ….. b bits

1 2 3 G

…..

…… ……

Chi-Square Classifier (CSC) – Cont. Evaluate the test statistic as:

Define the thresholds by

Mixed2

1

0

2

2n

i i

igi

g E

EO

Rnd2

Det2

Chi-Square Classifier (CSC) – Cont. G = 16, b = 4bits are used E2E over UDP

The block g = 5 or 6 is mixed Others are random Classified Criteria

Chi-Square Classifier (CSC) – Cont. E2O over UDP

E2E or E2O over TCP

Not Skype Otherwise

Chi-Square Classifier (CSC) – Cont. Deterministic test satistics

Linear with nTOT

Chi-Square Classifier (CSC) – Cont. Mixed block:

If one bit is fixed and the others are random

Linearly increase with nTOT

Chi-Square Classifier (CSC) – Cont.

Chi-Square Classifier (CSC) – Cont. Chi-Square works only if the observation

is large enough, that isEi = nTOT/2b >=5

Namely, nTOT >= 80 Choose nTOT = 100 Also, set

150222 DetMixedRnd

Naïve Bayes Classifier Feature vector x = [xi] P{C|x} : the probability that the object is b

elong to class C, given the feature x is observed

P{x|C}: the probability that the feature x will be observed, given the object is belong to class C

Bayes Rule P{C|x} = P{x|C}P{C} / P{x}

Naïve Bayes Classifier – cont. Naïve : features are independent

P{x|C} called belief

i

i CxPCxP }|{}|{

NBC – Feature Selection

VoIP Small Message Size Less burstier than data traffic

Feature Message size

Observe a window of message at a timex = [s1, s2, …, sw]

Average-Inter Packet Gap (average-IPG)

NBC – Feature Selection Belief

How to determine P{si|C} &

NBC – Feature Characterization For each codec, the message size is

determined by Rate Header length Redundancy factor (RF) Message framing time (delta T)

The message size can be represented by Gaussian distribution

NBC – Feature Characterization Map each codec to a Gaussian distr.

Model average-IPG to a Gaussian distr. with

For Constant Bit Rate Codec

For variable Bit Rate Codec

NBC – Derive Beliefs

NBC – Make Decision Let

Define a threshold Bmin

If B > Bmin Valid Skype flow

Otherwise Not Skype flow

Payload Based Classifier (PBC)

Used as cross check for previous two classifier

Only useful for UDP traffic Two Part

Per-flow Identification Per-host Identification

PBC - Per-flow IdentificationUtilize the knowledge about UDP E2E

Message Fun: 5-bit long field masked by 0x8f

Used to stating the payload type 0x02, 0x03, 0x07,0x0f : signaling message 0x0d : Data message (all 4 types DATA)

1 2 3 4Byte …

ID Fun

Frame

PBC - Per-flow Identification Terminology

nTOT: the total number of packets in the flow nsig: the number of Skype signaling message nE2E: the number of Skype E2E data/video/ch

at/voice message nE2O: the number of Skype E2O voice messag

e

PBC - Per-flow Identification Criteria

PBC - Per-host Identification Known: a Skype client always uses

the same UDP port to send/receive traffic

Before start conversation, Signaling messages are sent between

two clients Able to identify a Skype client running

at a specific IP and port

PBC - Per-host Identification Criteria to identify the Skype client

IP/port

Experiment

Two Data Set Campus – 95 hours took on 2006/5/29

No P2P traffic is allowed Most traffic are TCP data flows

ISP – one day took on 2006/5/15 All traffic is allowed More heterogeneous Expect little Skype traffic

Measurement Result

Measurement Result – UDP, Campus

Measurement Result – UDP, ISP

Measurement Result - TCP

Parameter Tuning - Bmin

Parameter Tuning – X2(Thr)

Parameter Tuning – Bmin & X2(Thr)

Parameter Tuning – Bmin & X2(Thr)

Conclusion

Reveal Skype Traffic from aggregate streams of packets

Two Approach Statistical properties of randomness Stochastic characteristics of voice traffic

Negligible False Positives Few False Negative left out