meerkat seclab image-based object recognition · 2019-12-18 · seclab kevin borgolte meerkat:...

17
Kevin Borgolte Christopher Kruegel Giovanni Vigna seclab THE COMPUTER SECURITY GROUP AT UC SANTA BARBARA August 13th, 2015 USENIX Security 2015 [email protected] [email protected] [email protected] MEERKAT Detecting Website Defacements through Image-based Object Recognition

Upload: others

Post on 22-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

Kevin Borgolte Christopher Kruegel Giovanni Vigna

seclabTHE COMPUTER SECURITY GROUP AT UC SANTA BARBARA

August 13th, 2015 USENIX Security 2015

[email protected] [email protected] [email protected]

MEERKATDetecting Website Defacements through Image-based Object Recognition

Page 2: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2

Defacements

Source: The Register, August 3rd 2015, http://www.theregister.co.uk/2015/08/03/trump_website_hacked/

Page 3: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition

2000

-01

2001

-01

2002

-01

2003

-01

2004

-01

2005

-01

2006

-01

2007

-01

2008

-01

2009

-01

2010

-01

2011

-01

2012

-01

2013

-01

2014

-01

Month

100

1,000

10,000

100,000

1,000,000

Rep

orte

dW

ebsi

tes

Reported Websites per Month

Defacements

Phishing Pages

3

Defacements: Scale• Prolific defacers:

Team System Dz, 2,800 websites in 10 months (~10/day)• Over 4,700 manually-verified defacements each day (Zone-H)• Defacements to Phishing Pages

Average: ~7 to 1Maximum: ~33 to 1

• Over 53,000 websites from top 1 million lists weredefaced in 2014

Page 4: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 4

Approach• Prior work looks at websites’ code, host-based IDS etc.

• Compares to prior version / known good state

• MEERKAT: Visually, like a human analyst • Render website in browser • Take screenshot • Does the screenshot looks like a defacement?

• No previous version of website needed• No manual feature engineering

Page 5: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 5

Approach: Deep Neural Network

Feed-forward with Dropout

1600x900x3

160x160x318x18x3

...

...Local

Receptive Fields

...

L2Pooling

...

...

LocalContrast

Normalization

...

Defaced

Legitimate

ClassificationScreenshot Collection

Window Extraction Feature Learning

Page 6: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 6

Approach: A Window “Into” The Defacement

• Full-size screenshots impractical; window “into” defacement instead• Size of window?

• Too large ⇒ overfit (high variance) • Too small ⇒ underfit (high bias)

• Extract window from which part of the screenshot?

Feed-forward with Dropout

1600x900x3

160x160x318x18x3

...

...Local

Receptive Fields

...

L2Pooling

...

...

LocalContrast

Normalization

...

Defaced

Legitimate

Feed-forward with Dropout

1600x900x3

160x160x318x18x3

...

...Local

Receptive Fields

...

L2Pooling

...

...

LocalContrast

Normalization

...

Defaced

Legitimate

Page 7: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 7

Approach: Representative Window Extraction (1)

Page 8: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition

• Sample windows from 2-dimensional Gaussian distribution • Bias heavily toward center of page • If outside of screenshot, resample

• Why? • Center of page is descriptive! • Non-trivial to poison training set

8

Approach: Representative Window Extraction (2)

Page 9: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 9

Approach: Deep Neural Network• Feature Learning:

Stacked Auto-Encoders • Classification:

Feed-Forward Neural Network with Dropout

• Implemented on-top of Caffe • Trained on GPU, training time in weeks

Page 10: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 10

Approach: Features Learned• Color combinations

• Green on black? Black on white/bright gray?• Letter combinations

• Broken and mixed encodings• Leetspeak

• “pwned” or “h4x0red” • Typographical and grammatical errors

• “greats to” or “visit us in our website”• Defacement group logos

Page 11: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 11

Approach: Detection

Page 12: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 12

Evaluation• Dataset

• 10,053,772 defaced websites = positives • Defaced websites manually verified by Zone-H

• 2,554,905 legitimate websites = negatives • Legitimate websites not verified, might be defaced

• Dataset skewed toward defacements • Report Bayesian detection rate (BDR): P(true positive|positive)

• Unverified legitimate websites ⇒ TPR & BDR are lower-bounds!

Page 13: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 13

Evaluation: Traditional• 10-fold cross-validation

• Results: • TPR: avg. 97.878% [97.422%, 98.375%] • FPR: avg. 1.012% [0.547%, 1.419%] • BDR: avg. 99.716% [99.603%, 99.845%]

• Traditional evaluation has problems: • Same defacement possibly in two bins • Defacements from 1998 vs. 2014

Page 14: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 14

Limitations• Fingerprinting and delayed defacements• Tiny defacements• Huge advertisements• Concept drift (natural and adversarial)

• Major: learn new features from new data (no feature engineering) • Minor: adjust weights of deeper classification layer

Page 15: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 15

Limitations: Minor Concept Drift & Fine-Tuning• Train on Dec 2012 to Dec 2013

• 1.78 million defacements • 1.76 million legitimate pages

• Test on Jan to May 2014

• 1.54 million samples, 50/50 split

• Fine-tune Jan, Feb, Mar, Apr

• BDR in Jan: 98.583% • w/o FT drops to 97.177% • w/ FT increases to 98.717%

• Team System Dz started Jan 2014!

0.9700.9750.9800.9850.9900.9951.000

Tru

ePos

itiv

eR

ate

Time-wise Split, with and without Fine-Tuning

with fine-tuning

without fine-tuning

0.0100.0150.0200.0250.0300.0350.040

Fals

ePos

itiv

eR

ate

January February March April MayMonth of 2014

-0.015-0.010-0.0050.0000.0050.0100.015

Di↵

eren

cew

/F

T-

w/o

FT True Positive Rate

False Positive Rate

Page 16: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

seclab

Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 16

Conclusion• Introduced MEERKAT • Learns features automatically, match domain knowledge • Does not require prior version of website • Outperforms state of the art • Gracefully tackles minor and major concept drift

Page 17: MEERKAT seclab Image-based Object Recognition · 2019-12-18 · seclab Kevin Borgolte Meerkat: Detecting Website Defacements through Image-based Object Recognition 2000-01 2001-01

[email protected] http://kevin.borgolte.me

twitter: @caovc

Thank you for your attention!

Questions?

seclabTHE COMPUTER SECURITY GROUP AT UC SANTA BARBARA