phishdef : url names say it all
DESCRIPTION
PhishDef : URL Names Say It All. Michalis Faloutsos U niversity of California, Riverside USA. Anh Le, Athina Markopoulou U niversity of California, Irvine USA. What is Phishing?. Social engineering and technical means to steal consumers’ personal identity, data, etc. - PowerPoint PPT PresentationTRANSCRIPT
PhishDef: URL Names Say It All
Anh Le, Athina Markopoulou
University of California, IrvineUSA
Michalis FaloutsosUniversity of California, Riverside
USA
What is Phishing?
Anh Le - UC Irvine - PhishDef 2
• Social engineering and technical means to steal consumers’ personal identity, data, etc.
• Cause billions of dollars of loss annually
Anh Le - UC Irvine - PhishDef 3
Financial, 33.1%
Payment Services,
37.9%
Classifieds; 6.6%
Auction; 5.5%
Gaming; 4.6%
Retail/Service;
3.6%
Social Network-ing; 2.8%
Government; 1.3%
ISP; 1.2% Other; 3.4%
Most Targeted Industry Sectors 2nd Quarter ‘10
Antiphishing.org
Example of a Phishing Site
Anh Le - UC Irvine - PhishDef 4
Current Protection
Anh Le - UC Irvine - PhishDef 5
• Google Safe Browsing
• Microsoft Smart Screen
• Third-Party
Current Protection Model
Anh Le - UC Irvine - PhishDef 6
Motivation: Blacklist-based protection is reactive -- -- cannot protect against zero-day phishing
Google Safe Browsing
Outline o Phishing Background
o Motivation
o Our proposalo New Protection Modelo Learning Algorithmso Dataseto Feature Selectiono Evaluation Results
o Concluding Remarks
Anh Le - UC Irvine - PhishDef 7
Our Proposed Protection Model
Anh Le - UC Irvine - PhishDef 8
• Main challenges: Accuracy and Classification Latency• Which classification algorithm works best?• Which set of features works best?
Prior Work o Whittaker et al. [NDSS ’10]
o Google Safe Browsing
o Ma et al. [SIGKDD ’09]o Batch-based Classification
o Ma et al. [ICML ‘09]o Batch-based vs. Online Learning
Anh Le - UC Irvine - PhishDef 9
Server-Side Classification
Main Contributions o New Protection Model:
o Client-side classification
o Propose using Adaptive Regularization of Weights (AROW)o High accuracyo Resilient to noise
o Set of Lexical Featureso Fast to extract at client sideo Obfuscation resistant
Anh Le - UC Irvine - PhishDef 10
• Batch-based Support Vector Machine
• Online Perceptron
• Confident Weighted (CW) [Dredze et al., ICML 2008]
• Adaptive Regularization of Weights (AROW)[Crammer et al., NIPS 2009]
Machine Learning Algorithms
Anh Le - UC Irvine - PhishDef 11
Online Classification
Anh Le - UC Irvine - PhishDef 12
• Maintaining a weight vector and use it for classification
• Online Perceptron
Trained Beforehand Extract In Real Time
Client Side:
Server Side:
Online Classification
Anh Le - UC Irvine - PhishDef 13
• Confident Weighted (CW)
• Adaptive Regularization of Weights (AROW)
minimum change
enough to correct last mistake
minimum change
penalty for mistake increasing confidence
o Phishing URLso PhishTank (4,082)o MalwarePatrol (2,001)
o Benign URLso Open directory (4,012)o Yahoo directory (4,143)
o Time period: June 2010
Dataset
Anh Le - UC Irvine - PhishDef 14
Feature Selection
Anh Le - UC Irvine - PhishDef 15
o Lexical Features
o External Featureso Country, AS number, registration date,
registrant, registrar, etc.
Outlineo Phishing Background
o Motivation
o Our proposalo New Protection Modelo Learning Algorithmso Dataseto Feature Selectiono Evaluation Results
o Concluding Remarks
Anh Le - UC Irvine - PhishDef 16
Evaluation Results: Lexical vs. Full Features
Lexical features alone are better-suited than full features for client-side phishing classification
Anh Le - UC Irvine - PhishDef 17
(+) ~ 1%
(-) Dependency on Remote Server
(-) Avg. Latency: 1.64 s
Evaluation Results:CW vs. AROW
AROW is more resilient to noise than CW
Anh Le - UC Irvine - PhishDef 18
Conclusion: PhishDef
19Anh Le - UC Irvine - PhishDef
o Client-side phishing classification systemo Proactive, on-the-fly
classification of zero-day phishing URLs
o Low delay client side (ms),high accuracy (97%)
o Resilient to noisy data
o Future Work: o Develop an add-on for Firefox
oQuestions
Anh Le - UC Irvine - PhishDef 20
Anh Le - UC Irvine - PhishDef 21
Example of a Phishing Site
22Anh Le - UC Irvine - PhishDef
http://www.hmrc.gov.uk/intro-income-tax.htm
http://pilety.ru/c548c205d7660ed0628b467d7d5aa54c9c3a7124/image/taxrefund.htm
Evaluation Results:Batch-Based vs. Online Learning
Online Learning outperforms Batched-Based Learningfor Phishing classificationAnh Le - UC Irvine - PhishDef 23
Chrome 11 > Firefox 4
24Anh Le - UC Irvine - PhishDef