revealing skype traffic: when randomness plays with you d. bonfiglio 1, m. mellia 1, m. meo 1, d....
TRANSCRIPT
Revealing Skype Traffic:When Randomness Plays with You
D. Bonfiglio1, M. Mellia1, M. Meo1,D. Rossi2, P. Tofanelli3
Dipartimento di Elettronica, Politecnico di Torino1
ENST Télécom Paris2
Motorola Inc.3
ACM Sigcomm 2007
Presented by Te-Yuan Huang
Goal Identify Skype Traffic among
aggregated traffic Direct session Either UDP or TCP
The algorithm should be Work in Real-Time Reliable Able to detect short flows
(only last several seconds)
Importance of Skype Traffic Identification
Interest of network operator Network Design & Provisioning Traffic and Performance Monitoring Tariff Policies Traffic Differentiation
Difference from Related Work K.T. Chen et al.
“Quantifying Skype USI” Only identify UDP traffic Need Skype login phase to be monitored
Fail on backbone links Fail if any modification on Skype login proc.
K. Suh et al.“Characterizing and Detect relayed traffic: A case study using Skype” Only identify relayed Skype traffic
Skype Parameters
Rate Codec Rate
Delta T Skype Message Framing Time
The time between two subsequent Skype Message
RF (Redundancy Factor) The number of past blocks that Skype
retransmits
Skype Communication Mode
End-to-End (E2E) Skype user call Skype user
End-to-Out (E2O) Skype-in/Skype-out PSTN involved Only voice data
No video / file transfer / IM
Skype Codec
Codecs Automatically selected ISAC
The preferred codec for E2E G.729
The preferred codec for E2O
More on Skype Message Skype encrypt the message
TCP: Reliable transport Receive packet in correct sequence
(from application layer point of view) encrypt the whole content of the message
UDP: Unreliable Maybe out-of-order Application layer header is needed
to resolve incorrect order Only can be obfuscated
Only encrypt partial message
Identified Field ID: 16-bit long identifier.
Randomly selected Fun: 5-bit long field masked by 0x8f
Used to stating the payload type 0x02, 0x03, 0x07,0x0f : signaling message 0x0d : Data message (all 4 types DATA)
Not Random, but obfuscate (Mixed) Frame: ciphered information
UDP E2E Message
1 2 3 4Byte …
ID Fun
Frame
Identified Field CCID: 4 bytes
Connection Identifier (CID) of PSTN gateway Deterministic
After initial signaling
E2O Message
1 2 3 4Byte …
CID Frame
How to Identify Skype Traffic? Chi-Square Classifier (CSC)
Utilize the knowledge of ciphering mechanism
Naïve Bayes Classifier (NBC) Utilize the general characteristics of VoIP traf
fics Payload-Based Classifier (PBC)
Look into the non-ciphered SoM Only used for traffic in UDP
Chi-Square Classifier (CSC) Purpose:
To Know whether message portion is encrypted Rationale
Given a message, Only the third bytes is not random
Probably, E2E Skype flow by UDP The first four bytes are deterministic, others are
ciphered Probably, E2O Skype flow by UDP
The whole message is ciphered Probably, Skype flow transported by TCP
Chi-Square Classifier (CSC) – Cont. Chi-Square Distr.
Observing the objects’ ouput for nTOT times There are n possible output For ith output, it is expected to occur Ei times among
nTOT, and is observed to occur Oi times Then,
is Chi-Square Distr. With n-1 degree of freedom
1
0
22
n
i i
ii
E
EO
Chi-Square Classifier (CSC) – Cont. For each flow, take first G group of b bits
For each group g, there are 2b possible output
If the content of the flow is random, then Ei for each group is nTOT / 2b
b bits b bits b bits ….. b bits
1 2 3 G
…..
…… ……
Chi-Square Classifier (CSC) – Cont. Evaluate the test statistic as:
Define the thresholds by
Mixed2
1
0
2
2n
i i
igi
g E
EO
Rnd2
Det2
Chi-Square Classifier (CSC) – Cont. G = 16, b = 4bits are used E2E over UDP
The block g = 5 or 6 is mixed Others are random Classified Criteria
Chi-Square Classifier (CSC) – Cont. Mixed block:
If one bit is fixed and the others are random
Linearly increase with nTOT
Chi-Square Classifier (CSC) – Cont. Chi-Square works only if the observation
is large enough, that isEi = nTOT/2b >=5
Namely, nTOT >= 80 Choose nTOT = 100 Also, set
150222 DetMixedRnd
Naïve Bayes Classifier Feature vector x = [xi] P{C|x} : the probability that the object is b
elong to class C, given the feature x is observed
P{x|C}: the probability that the feature x will be observed, given the object is belong to class C
Bayes Rule P{C|x} = P{x|C}P{C} / P{x}
Naïve Bayes Classifier – cont. Naïve : features are independent
P{x|C} called belief
i
i CxPCxP }|{}|{
NBC – Feature Selection
VoIP Small Message Size Less burstier than data traffic
Feature Message size
Observe a window of message at a timex = [s1, s2, …, sw]
Average-Inter Packet Gap (average-IPG)
NBC – Feature Characterization For each codec, the message size is
determined by Rate Header length Redundancy factor (RF) Message framing time (delta T)
The message size can be represented by Gaussian distribution
NBC – Feature Characterization Map each codec to a Gaussian distr.
Model average-IPG to a Gaussian distr. with
For Constant Bit Rate Codec
For variable Bit Rate Codec
NBC – Make Decision Let
Define a threshold Bmin
If B > Bmin Valid Skype flow
Otherwise Not Skype flow
Payload Based Classifier (PBC)
Used as cross check for previous two classifier
Only useful for UDP traffic Two Part
Per-flow Identification Per-host Identification
PBC - Per-flow IdentificationUtilize the knowledge about UDP E2E
Message Fun: 5-bit long field masked by 0x8f
Used to stating the payload type 0x02, 0x03, 0x07,0x0f : signaling message 0x0d : Data message (all 4 types DATA)
1 2 3 4Byte …
ID Fun
Frame
PBC - Per-flow Identification Terminology
nTOT: the total number of packets in the flow nsig: the number of Skype signaling message nE2E: the number of Skype E2E data/video/ch
at/voice message nE2O: the number of Skype E2O voice messag
e
PBC - Per-host Identification Known: a Skype client always uses
the same UDP port to send/receive traffic
Before start conversation, Signaling messages are sent between
two clients Able to identify a Skype client running
at a specific IP and port
Experiment
Two Data Set Campus – 95 hours took on 2006/5/29
No P2P traffic is allowed Most traffic are TCP data flows
ISP – one day took on 2006/5/15 All traffic is allowed More heterogeneous Expect little Skype traffic