© 2011 cisco systems, inc. all rights reerved. 1 applications of machine learning in cisco web...

41
© 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc [email protected]

Upload: summer-ridgway

Post on 28-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

© 2011 Cisco Systems, Inc. All rights reerved.

1

Applications of Machine Learning in Cisco Web Security

Richard Wheeldon PhD BSc

[email protected]

Page 2: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

2© 2011 Cisco Systems, Inc. All rights reerved.

Cisco Web Security

• Cisco, Ironport and ScanSafe

• Request time filtering•Categorization and classification•Reputation

• Response time filtering•Malware types and attack vectors•Malware detection•Dynamic classification

• Other challenges

Page 3: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

3© 2011 Cisco Systems, Inc. All rights reerved.

The Ubiquitous Speaker Slide

• Richard Wheeldon•UCL Graduate in 1999•PhD from Birkbeck in 2003•Joined Cisco December 2009•http://www.rswheeldon.com/

• Acknowledgements•Steve Poulson - [email protected]•Bryan Feeney - [email protected]

Page 4: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

4© 2011 Cisco Systems, Inc. All rights reerved.

Cisco, Ironport and ScanSafe

• Cisco•World’s leading network company

• Ironport•Leader in Anti-spam•Provide Web Security Appliances

• ScanSafe•World leader in “Security as a Service”•Scans 1.8 billion web requests a day•Blocks 32 million of them

Page 5: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

5© 2011 Cisco Systems, Inc. All rights reerved.

We’re local

Page 6: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

6© 2011 Cisco Systems, Inc. All rights reerved.

Previous MSc projects

• Tree Kernels for CFG similarity•Guangyan Song, 2010

• Fast computation of the Kernel of a Tree and applications to Semi-Supervised Learning

•Malcolm Reynolds, 2009

• Comparing N-gram features for web page classification•Noureen Tejani, 2007

Page 7: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

7© 2011 Cisco Systems, Inc. All rights reerved.

We’re hiring• Positions

•Software Developers•QA, Operations, Research

• Locations•ScanSafe•UK - Bedfont Lakes, Reading, Staines, Edinburgh•Galway, EMEA, US, Worldwide

• Graduate recruitment•http://www.cisco.com/go/universityjobs•http://www.cisco.com/careers/• [email protected]

Page 8: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

8© 2011 Cisco Systems, Inc. All rights reerved.

1. Availability

Time our service is available to scan traffic99.999% guaranteed availability

2. Latency

Additional load time attributable to servicesEvaluated by 3rd party analysis

3. False Positives

Pages that were blocked but should not have

4. False Negatives

Pages that were not blocked, but should have

Scansafe’s SaaS

Page 9: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

9© 2011 Cisco Systems, Inc. All rights reerved.

Risks of Unfiltered Content

• Software threats•Malware•Phishing•Botnets

• Business threats•Productivity Loss•Bandwidth congestion•Legal liability•Data Leaks

Page 10: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

10© 2011 Cisco Systems, Inc. All rights reerved.

The Web vs. Email

Web EmailMost web traffic is good Most e-mail is bad

Easy to find safe sites Easy to get Spam

Harder to get dangerous URLs Harder to get examples of good mail

Blocking web sites is visible Blocking email is invisible

Performance gain from white-listing Performance gain from blocking

Very Real-Time (<2s) Not Real-Time (<Nhrs)

Page 11: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

11© 2011 Cisco Systems, Inc. All rights reerved.

Request time filtering

• Motivation•Quicker blocks save bandwidth and processing time• If the request is made, the damage may be done

• Techniques•Databases•Reputation•Rules•Trained systems

Page 12: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

12© 2011 Cisco Systems, Inc. All rights reerved.

Category-based filtering

• Responsible for most blocks

• High-risk and high-traffic

• Manual categorizers

• 10 million URLs

• 97% of traffic

• 2 million porn sites

Page 13: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

13© 2011 Cisco Systems, Inc. All rights reerved.

Web Reputation

3rd PartyFeeds Spam H o sts

Databases

Sco re between -10 and +10(Bad, N eutral o r Go o d)

• Feeds•Phishing sites•Malware sites

• Heuristics• In spam but not in ham•Age of domain registration•High traffic – e.g. Alexa 1000•Scanned but never blocked

Page 14: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

14© 2011 Cisco Systems, Inc. All rights reerved.

Web Reputation in the WSA

Page 15: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

15© 2011 Cisco Systems, Inc. All rights reerved.

Page 16: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

16© 2011 Cisco Systems, Inc. All rights reerved.

Keyword-based URL filtering

• Keyword rules•Fitness -> Health•Basketball -> Sport•Pizzeria -> Food•Restaurant -> Food•Whore -> Porn

• Strange URLs•whorepresents.com• therapistfinder.com• speedofart.com•expertsexchange.com•penisland.com•powergenitalia.it

Page 17: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

17© 2011 Cisco Systems, Inc. All rights reerved.

Recognizing Porn URLs

• http://www.penisland.com

• Example of segmentation problemP('peni') X P('sland')

P('penis') X P('land')

P('pen') X P('island')

• Extends to classificationP('penis') X P('land') X P(porn|'penis') X P(porn|'land')

P('pen') X P('island') X P(not_porn|'pen') X P(not_porn|'island')

Page 18: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

18© 2011 Cisco Systems, Inc. All rights reerved.

Phishing and Malware Examples

• Phishing examples•http://pavpals-com-usaprewiwerluithaniirse.345.pl•http://82.195.143.18/onlinepaypal.com/•http://www.jetboatflush.com/~nfioemro/www.paypal.fr/webscrcmd=...

• Malicious examples:•www1.scan-projectrf.cz.cc•www1.scan-projectsi.cz.cc•www1.scan-projectst.cz.cc•www1.scan-projectte.cz.cc•www1.scan-projectti.cz.cc

Page 19: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

19© 2011 Cisco Systems, Inc. All rights reerved.

Searchahead

• If we can identify bad URLs we can warn before the user clicks.

• Over 90% of new sites are visited as the result of an Internet search

Acceptable

Uncategorized

Prohibited

Malicious

Page 20: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

20© 2011 Cisco Systems, Inc. All rights reerved.

Response Time Scanning

• Trusted sites are targets

• Strength-in-depth combination of commercial scanners and in-house technology.

Graphics

Webmail

New Web Pages

BlogsAd Links

Links

Comments

Banner Ads

Backdoors

Rootkits

Trojan Horses

Keyloggers

Worms

Page 21: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

21© 2011 Cisco Systems, Inc. All rights reerved.

Exploited sites in recent years

• Facebook

• Times India

• Miami Dolphins

• Samsung

Page 22: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

22© 2011 Cisco Systems, Inc. All rights reerved.

Nothing is safe – not even Twitter!

http://www.youtube.com/fslabs

Page 23: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

23© 2011 Cisco Systems, Inc. All rights reerved.

Signature Databases

0

0.5

1

1.5

Signatures(millions)

2006

2007

2008

• From 2006 to 2008, the F-Secure signature database grew from 250000 entries to 1.5 million

• The rate at which variants of viruses come out is growing rapidly

• No vendor can rely exclusively on signatures

Page 24: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

24© 2011 Cisco Systems, Inc. All rights reerved.

Zero-hour protection

• Vendors take time to release signature updates

•Win32.IstBar.jl trojan

• Outbreak Intelligence (OI) provides proactive threat detection

• A huge data set of traffic to be leveraged

Page 25: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

25© 2011 Cisco Systems, Inc. All rights reerved.

How does OI use Machine Learning?

• Approaches•Malware detection•Anomaly detection•Dynamic categorization

• Techniques Employed•Supervised Learning•Unsupervised Learning•Sandboxing

Page 26: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

26© 2011 Cisco Systems, Inc. All rights reerved.

Dynamic Classification

• Document classification across 80 categories• Increases coverage•Language identification

• Identifies inappropriate content•Porn is relatively easy•Phishing is harder – but not impossible?•Hate speech is harder still

Page 27: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

27© 2011 Cisco Systems, Inc. All rights reerved.

DC for identifying malicious sites

• Automated tools generate malicious sites•Fake escrow•Fake pharmacy•Mule recruitment

• Examples from Richard Clayton’s 2010 FOSDEM talk•http://www.google.com/search?q=%22before+that+was+a+commercial+manager+of+a+large+corporation+engaged+in+electronics+production%22

•http://www.google.com/search?q=%22as+the+most+trusted+escrow+service+on+the+internet%22

Page 28: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

28© 2011 Cisco Systems, Inc. All rights reerved.

Malicious Executable Files

• The final stage of an attack is frequently downloading an executable

• Traditionally blocked using signatures

• We use a combination of signature-based scanners and machine-learning

Page 29: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

29© 2011 Cisco Systems, Inc. All rights reerved.

Drive-by attacks

• Almost no-one opens executables from odd sources any more, so instead people use drive-by attacks.

• A normal file (e.g. Flash, PDF, Javascript, Image file) is crafted to exploit a vulnerability in a viewer or library and execute code embedded within the file.

Page 30: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

30© 2011 Cisco Systems, Inc. All rights reerved.

Flash

“Symantec recently highlighted Flash for having one of the worst security records in 2009. We also know first hand that Flash is the number one reason Macs crash. We have been working with Adobe to fix these problems, but they have persisted for several years now. We don’t want to reduce the reliability and security of our iPhones, iPods and iPads by adding Flash”

Steve Jobs, April 2010

http://www.apple.com/hotnews/thoughts-on-flash/

Page 31: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

31© 2011 Cisco Systems, Inc. All rights reerved.

The growing threat of Java

• Almost as common as Flash•90% of PCs have Java•700 000 JDK downloads per month•3.48 Million JRE downloads per month

• Growth in known vulnerabilities•29 patched in a single update (Oct 2010)•Growth in exploits reported by Sophos, Symantec, Microsoft and Cisco

• Signatures + Trained Scanlet

Page 32: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

32© 2011 Cisco Systems, Inc. All rights reerved.

Detecting Malicious JavaScript

• Sandboxing•Behavioural checking•Good way to beat obfuscation techniques•Difficult to constrain

• Trained classification•Analyse features

Page 33: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

33© 2011 Cisco Systems, Inc. All rights reerved.

Javascript Features

v46f658f5e2260(v46f658f5e3226){ function v46f658f5e4207 () {return 16;} return(parseInt(v46f658f5e3226,v46f658f5e4207()));}function v46f658f5e61f4(v46f658f5e7174){ function v46f658f5ea0cd () {return 2;} var v46f658f5e813e=\'\';for(v46f658f5e9105=0; v46f658f5e9105<v46f658f5e7174.length; v46f658f5e9105+=v46f658f5ea0cd()){ v46f658f5e813e+=(String.fromCharCode(v46f658f5e2260(v46f658f5e7174.substr(v46f658f5e9105, v46f658f5ea0cd()))));}return v46f658f5e813e;} document.write(v46f658f5e61f4(\'3C5343524950543E77696E646F772E7374617475733D2\'));

The above is JavaScript, but where are the features?An exercise for the reader!

Page 34: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

34© 2011 Cisco Systems, Inc. All rights reerved.

Obfuscation

• Attackers use obfuscation•But so do legitimate vendors (e.g. Google)•And large Web 2.0 libraries

• Techniques include•Name changes•String concatenation (eval)•Dynamically loaded/generated/decrypted code (eval)•Splitting functionality across files

Page 35: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

35© 2011 Cisco Systems, Inc. All rights reerved.

Malicious Non-Executable Files

• There are a lot of file formats out there – documents, pictures, videos.

• For zero-day attacks, we have no data to compare against.

• Basically this is anomaly detection.

Page 36: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

36© 2011 Cisco Systems, Inc. All rights reerved.

Development Constraints

• Low False Positive Rate

• Robust•Tolerant against malformed data•Language-agnostic

• Scalable•1.8 Billion requests per day on 1000 servers

• Low latency

Page 37: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

37© 2011 Cisco Systems, Inc. All rights reerved.

Back-end processing

A M scanners

U R L Black l ists

A V scanners

bad

F i le Whitel ists

N o A V hi ts

U R L Whitel ists

go o d

Behav io ural features

Co ntent featuresM L

bad go o d

• If a technique is too slow for real-time scanning, that doesn’t make it useless.

• Back end processing can generate lists of good and bad files and help evaluate new techniques.

Page 38: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

38© 2011 Cisco Systems, Inc. All rights reerved.

Want to know more?

• Cisco 2Q10 Global Threat Report http://www.cisco.com/web/about/security/intelligence/cisco_threat_072610_959.pdf

• Richard Clayton : Evil on the Internet http://www.securitytube.net/Phishing-(Evil-on-the-Internet)-FOSDEM-Talk-video.aspx

• Kaspersky Lab Security News Service http://threatpost.com/

• A plan for Spam http://www.paulgraham.com/spam.html

Page 39: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

39© 2011 Cisco Systems, Inc. All rights reerved.

Still want to know more?

• Identifying Suspicious URLs : An Application of Large-Scale Online Learning http://videolectures.net/icml09_ma_isu/

• Peter Norvig Google : Statistical Learning as the Ultimate Agile Development Tool http://videolectures.net/cikm08_norvig_slatuad/

• Writing ClamAV Signatures Alain Zidouemba http://www.clamav.net/doc/webinars/Webinar-Alain-2009-03-04.ppt

Page 40: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

40© 2011 Cisco Systems, Inc. All rights reerved.

Take Home Messages

• Web Security•Challenging and interesting domain•Many applications for Machine Learning

• ScanSafe and Cisco•Many opportunities for collaboration•Several opportunities for student projects

Page 41: © 2011 Cisco Systems, Inc. All rights reerved. 1 Applications of Machine Learning in Cisco Web Security Richard Wheeldon PhD BSc rwheeldo@cisco.com

© 2011 Cisco Systems, Inc. All rights reerved.

41

Any Questions?

[email protected]