presenter: kuei-yu hsu advisor: dr. kai-wei ke 2013/4/29 detecting skype flows hidden in web traffic

Click here to load reader

Upload: calvin-tyler

Post on 17-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

PowerPoint

Presenter: Kuei-Yu HsuAdvisor: Dr. Kai-Wei Ke2013/4/29Detecting Skype flows Hidden in Web TrafficOutlineIntroductionProposed MethodologyExperimental DatasetsExperimental ResultsConclusions2IntroductionWhat is VoIP?Delude restrictive firewallsSkype Proprietary ProtocolAbout Detection3What is VoIP?4VoIP(Voice over Internet Protocol): Refers to a way to carry phone calls over an IP data network, whether on the Internet or your own internal network.

VoIP calls are usually much cheaper than traditional long distance telephone calls to PSTN users, or even free if a call is placed directly from a VoIP end user to another one.

Delude restrictive firewalls5Restrictive firewalls are commonly adopted by network managers in an effort to give a better security to the internal network and optimize the use of network resources.

Such firewalls are unlikely to block Web traffic because it is usually perceived as a fundamental service considered essential for Internet access.

Using TCP ports 80 (HTTP) or 443 (HTTPS) for delivering non-HTTP traffic, thus fooling restrictive firewalls to gain network access.Skype Proprietary Protocol6Skype can delude a network firewall by using Web ports to establish communication with other Skype peers.

This strategy is adopted by Skype as a fallback mechanism in the case of other strategies fail to get through a restrictive firewall.

Such a strategy renders Skype traffic disguised as Web traffic quite difficult to be detected by network operators.About Detection7Detection of Skype flows in Web trafficHTTP Workload ModelGoodness-of-fit testsChi-square testKolmogorov-Smirnov testP2P VoIP characteristics

Detection ProcessTraining DatasetsEvaluation DatasetsProposed MethodologyHTTP Workload ModelGoodness-of-fit testsChi-square testKolmogorov-Smirnov testSkype characteristics8Proposed Methodology9Define a HTTP workload model and capture real Web data to build empirical distributions of some relevant parameters.

Capture Web traffic with VoIP calls hidden in it, calculate the same relevant parameters for each flow and use metrics taken from two Goodness-of-fit tests to decide whether the computed parameters are compatible (or not) with the empirical distributions derived in the previous step, classifying each flow as legitimate Web traffic or not.Proposed Methodology10

HTTP Workload Model11Define a model for evaluate Web normal behavior.

This model has the following parameters:Web request size;Web Response size;Interarrival time between requests;Number of requests per page;Page retrieval time;Goodness-of-fit tests12Chi-square testIt was first investigated by Karl Pearson in 1900.

Oi: an observed frequency;Ei: an expected (theoretical) frequency, asserted by the null hypothesis;K: the number of classes.

Goodness-of-fit tests13Kolmogorov-Smirnov testIt quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution.

F0(x): the empirical distribution function derived from the training part.Sn(x):the cumulative step function of a sample of N observations.

Skype characteristics14It does not use SIP or other known signaling protocol for VoIP calls and all its traffic is end-to-end encrypted.

Automatically detect network characteristics and choose the best option available to communicate with other Skype peers.

It only uses Web ports as a fallback mechanism, when UDP is not available.Experimental DatasetsTraining Datasets model partEvaluation Datasets detection part

15Training Datasets - model part16Using a training dataset to characterize a normal Web traffic behavior.

tcpdump: capture HTTP full packet traces, generating dump files.tcpflow: read these dump files and calculate the parameters present in the Web workload model.Training Datasets17read HTTP headers to clearly identify a Web request or a Web response and we also compute the inactivity time between Web messages.

ISP: Internet service providerACD: academic institution

Training Datasets18

Training Datasets19

Training Datasets20

Evaluation Datasets - detection part21tcpdump: captured Web packet traces, but this time only TCP/IP headers were captured.Another software: the calculations and the division of flows in Web pages are done without examining TCP payload (HTTP headers) information.Web Message Size: consider every MTU-sized packet as a part of the same Web message, if there is not too much inactive time between them.Evaluation Datasets22We used the number of requests per page as a filter to remove smaller flows.

The other three parameters(Web request sizeWeb Response sizeInterarrival time between requests) are represented by a list of values and they are used in Equations (1) and (2) to generate a 2 or a Kolmogorov-Smirnov D score.Evaluation Datasets23we have three values that can be compared with thresholds to define if this set of related request-response messages is likely to be Skype or not.

VoIP calls of different durations were produced in a controlled way by a small network of computers behind port-restrictive firewalls running the Skype program.

Experimental ResultsSensitivity and specificityROC curvesDetecting Skype flowsEvaluating real-time detection24Sensitivity and specificity25Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function.

The test outcome can be positive or negativeTrue positive = correctly identifiedFalse positive = incorrectly identifiedTrue negative = correctly rejectedFalse negative = incorrectly rejectedROC curves26ROC curves: Receiver Operating Characteristic curves

A graphical plot of the sensitivity against (1specificity) of a binary classifier. Sensitivity is the same as true positive rate and (1specificity) is equal to false positive rate.

The classifier has a discrimination threshold that is varied to produce different points in the curve.Detecting Skype flows27

28

Detecting Skype flows29

Detecting Skype flows30

Detecting Skype flows31Fig. 5. 2 detection.90% of 80 Skype flows correctly identified (i.e. true positive rate) with less than 2% of 17,294 non-Skype flows incorrectly identified (i.e. false positive rate)a 100% detection rate with around 5% of false positives.

Fig. 6. Kolmogorov-Smirnov D detection.a true positive rate of 70% with a false positive rate around 2%a 80% detection with 5% of false positives.

2 ROC curve are always closer to the top left corner in comparison with the K-S curve.Evaluating real-time detection32a network administrator may want to identify the Skype calls that are currently using the network, not the calls made some minutes or hours ago.

here the data is captured and analyzed using limited short time intervals.

the 2 detection using the newly generated trace (the set of all 10s capture files) had a true positive rate up to 85% with a smaller number of false positives compared to the 2 detection using the ISP-3 trace.Evaluating real-time detection33

Conclusions34Conclusions35It is rather common to find non-HTTP traffic using Web ports to delude firewalls and other network elements.

We evaluated a Skype detection system based on statistical tests to efficiently detect Skype flows hidden among Web traffic without a search for particular Skype patterns or signatures and without regarding payload information.Conclusions36We manually produced Skype traffic to build our Web evaluation dataset and verify that the proposed parameters are able to identify Skype flows hidden among HTTP traffic.

Using simple metrics taken from two Goodness-of-Fit tests, the 2 value and the Kolmogorov-Smirnov distance, we show that Skype flows can be clearly detected, but our results suggests that the 2 metric is a much better choice.Conclusions37considering the experimental results for the chi-square detection, our methodology provides enough flexibility for the network management to adopt different approaches regarding the possible detection of Skype flows in Web traffic.

As future workintend to further analyze the real-time detection by investigating the minimum time interval needed.intend to build and evaluate an optimized version of our tool to perform real-time monitoring in network links.ReferencesE. P. Freire, A. Ziviani, and R. M. Salles, " Detecting Skype Flows in Web Traffic," Proc. of the IEEE/IFIP Network Operations and Management Symposium (NOMS 2008), April 2008, pp. 89-96.

Emanuel P. Freire, Artur Ziviani and Ronaldo M. Salles, "Detecting VoIP Calls Hidden in Web Traffic," IEEE transaction on network and service management, Vol no. 5, pp- 210-214, December 2008.38Thanks for listening39