on security and sparsity of linear classifiers for adversarial settings

PatternRecognitionandApplications Lab

UniversityofCagliari,Italy

DepartmentofElectricalandElectronic

Engineering

On Security and Sparsity of Linear Classifiersfor Adversarial Settings

AmbraDemontis,PaoloRussu,BattistaBiggio,GiorgioFumera,FabioRoli

battista.biggio@diee.unica.it

Dept.OfElectrical andElectronicEngineeringUniversity ofCagliari,Italy

S+SSPR,Merida,Mexico,Dec.12016

http://pralab.diee.unica.it

Recent Applications of Machine Learning

• Consumer technologies for personal applications

iPhone 5s with Fingerprint Recognition…

… Cracked a Few Days After Its Release

EU FP7 Project: TABULA RASA

New Challenges for Machine Learning

• The use of machine learning opens up new big possibilitiesbut also new security risks

• Proliferation and sophisticationof attacks and cyberthreats

– Skilled / economically-motivatedattackers (e.g., ransomware)

• Several security systems use machine learning to detect attacks– but … is machine learning secure enough?

Classifier Evasion

Is Machine Learning Secure Enough?

• Problem: how to evade a linear (trained) classifier?

Start 2007 with a bang!Make WBFS YOUR PORTFOLIO’sfirst winnerof the year...

startbangportfoliowinneryear...universitycampus

11111...00

+6 > 0, SPAM(correctlyclassified)

f (x) = sign(wT x)

+2+1+1+1+1...-3-4

St4rt 2007 with a b4ng!Make WBFS YOUR PORTFOLIO’sfirst winnerof the year... campus

00111...01

+3 -4 < 0, HAM(misclassifiedemail)

f (x) = sign(wT x)

Evasion of Linear Classifiers

• Formalized as an optimization problem– Goal: to minimize the discriminant function

• i.e., to be classified as legitimate with the maximum confidence– Constraints on input data manipulation

• e.g., number of words to be modified in each spam email

min$% 𝑤(𝑥′𝑠. 𝑡. 𝑑(𝑥, 𝑥%) ≤ 𝑑34$

Dense and Sparse Evasion Attacks

• L2-norm noise corresponds to dense evasion attacks

– All features are modified by a small amount

• L1-norm noise corresponds to sparse evasion attacks

– Few features are significantly modified

min$% 𝑤(𝑥′𝑠. 𝑡. |𝑥 − 𝑥%|77 ≤ 𝑑34$

min$% 𝑤(𝑥%𝑠. 𝑡. |𝑥 − 𝑥%|8 ≤ 𝑑34$

Examples on Handwritten Digits (9 vs 8)

original sample

5 10 15 20 25

SVM g(x)= −0.216

5 10 15 20 25

Sparseevasionattacks(l1-normconstrained)

original sample

5 10 15 20 25

cSVM g(x)= 0.242

5 10 15 20 25

Denseevasionattacks(l2-normconstrained)

manipulated sample

Robustness and Regularization

• SVM learning is equivalent to a robust optimization problem

Robustness and Regularization[Xu et al., JMLR 2009]

minw,b

12wTw+C max 0,1− yi f (xi )( )

i∑ min

w,bmaxui∈U

max 0,1− yi f (xi +ui )( )i∑

1/margin classification error ontrainingdata(hinge loss) boundedperturbation!

Generalizing to Other Norms

• Optimal regularizer should use dual norm of noise uncertainty sets

l2-norm regularization is optimal against l2-norm noise!

Infinity-norm regularization is optimal against l1-norm noise!

minw,b

12wTw+C max 0,1− yi f (xi )( )

i∑ min

∞+C max 0,1− yi f (xi )( )

i∑ , w

∞=max

i=1,...,dwi

Interesting Fact

• Infinity-norm SVM is more secure against L1 attacks as it bounds the maximum absolute value of the feature weights

• This explains the heuristic intuition of using more uniform feature weights in previous work [Kolcz and Teo, 2009; Biggio et al., 2010]

Security and Sparsity of Linear Classifiers

Security vs Sparsity

• Problem: SVM and Infinity-norm SVM provide dense solutions!

• Trade-off between security (to l2 or l1 attacks) and sparsity– Sparsity reduces computational complexity at test time!

Elastic-Net Regularization[H. Zou & T. Hastie, 2005]

• Originally proposed for feature selection– to group correlated features together

• Trade-off between sparsity and security against l2-norm attacks

𝑤 9:;9< = 1 − 𝜆 𝑤 8 +𝜆2 𝑤 7

elasticnet l1 l2

Octagonal Regularization

• Trade-off between sparsity and security against l1-norm attacks

𝑤 BCD; = 1 − 𝜌 𝑤 8 + 𝜌 𝑤 F

octagonal l1 infinity(max)

Experimental Analysis

Linear Classifiers

• SVM– quadratic prog.

• Infinity-norm SVM– linear prog.

• 1-norm SVM– linear prog.

• Elastic-net SVM– quadratic prog.

• Octagonal SVM– linear prog.

minG,H

12 𝑤 7

7 + 𝐶Jmax 0,1 − 𝑦O𝑓 𝑥O

minG,H

𝑤 F + 𝐶Jmax 0,1 − 𝑦O𝑓 𝑥O

minG,H

𝑤 8 + 𝐶Jmax 0,1 − 𝑦O𝑓 𝑥O

minG,H

1 − 𝜆 𝑤 8 +𝜆2 𝑤 7

7 + 𝐶Jmax 0,1 − 𝑦O𝑓 𝑥O

minG,H

1 − 𝜌 𝑤 8 + 𝜌 𝑤 F + 𝐶Jmax 0,1 − 𝑦O𝑓 𝑥O

𝑓 𝑥 = 𝑤(𝑥 + 𝑏

Security and Sparsity Measures

• Sparsity– Fraction of weights equal to zero

• Security (Weight Evenness)– E=1/d if only one weight is different from zero– E=1 if all weights are equal in absolute value

• Parameter selection with 5-fold cross-validation optimizing:AUC + 0.1 S + 0.1 E

𝑆 =1𝑑 𝑤T|𝑤T = 0, 𝑘 = 1,… , 𝑑

𝐸 =1𝑑

𝑤 8

𝑤 F∈ [1𝑑 , 1]

Results on Spam FilteringSparse Evasion Attack

• 5000 samples from TREC 07 (spam/ham emails)• 200 features (words) selected to maximize information gain• Results averaged on 5 repetitions, using 500 TR/TS samples• (S,E) measures reported in the legend (in %)

0 10 20 30 400

1Spam Filtering

SVM (0, 37)

∞−norm (4, 96)

1−norm (86, 4)el−net (67, 6)8gon (12, 88)

maximumnumberofwordsmodifiedineachspam

Results on PDF Malware DetectionSparse Evasion Attack

• PDF: hierarchy of interconnected objects (keyword/value pairs)

0 20 40 60 800

1PDF Malware Detection

SVM (0, 47)

∞−norm (0, 100)

1−norm (91, 2)el−net (55, 13)8gon (69, 29)

maximumnumberofkeywordsadded ineachmaliciousPDFfile

/Type 2/Page 1/Encoding 1…

130obj<</Kids[10R110R]/Type/Page... >>endobj170obj<</Type/Encoding...>>endobj

Features:keywordcount

11,500samples5reps- 500TR/TSsamples

114features(keywords)selectedwithinformationgain

Conclusions and Future Work

• We have shed light on the theoretical and practical implications of sparsity and security in linear classifiers

• We have defined a novel regularizer to tune the trade-off between sparsity and security against sparse evasion attacks

• Future work– To investigate a similar trade-off for

• poisoning (training) attacks• nonlinear classifiers

?Any questionsThanks foryour attention!

Limited-Knowledge (LK) attacks

PD(X,Y)data

Surrogate training data

Send queriesGet labels

Learnsurrogate classifier

f’(x)

on security and sparsity of linear classifiers for adversarial settings

Education

5 character classifiers

nearest neighbourhood classifiers in biometric fusion ·...

evaluating classifiers

robustness of classifiers: from adversarial to random...

separating and classifiers

mitigation of adversarial attacks through embedded …ing...

lecture 12 classifiers part 2 topics classifiers maxent...

sparsity in polynomial optimization

gaussian classifiers

robustness of classifiers: from adversarial to random...

8. classifiers - max planck...

statistical learning with sparsity

personalized classifiers

8. classifiers

learning with structured sparsity

more classifiers

mitigation of adversarial attacks through embedded feature...

classifiers in kam-tai languages · 2013-01-23 · 4.2.7....

refactor column-sparsity - arxiv

ideology classifiers for political speech stefan...