featuresmith - usenix · • featuresmith discovered new features – getsimoperatorname –...

Post on 24-May-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

FeatureSmithLearningtoDetectMalwarebyMiningtheSecurityLiterature

Benign

Malicious

SecurityandMachineLearning

T.Dumitraș::FeatureSmith

Usedfordetec+ngspam,phishing,malware,networka7acks,maliciousdomains,vulnerabilityexploitsinthewild,compromisedWebsites,…

Whatdoesitmeanfortwosamplestobesimilar?

2

FeaturesinMachineLearningModels

•  Howshouldwecomparesamples?–  Spam: keywords,featuresfrom

emailheader,…

–  AIbots:vocabulary,sentence structure,…

T.Dumitraș::FeatureSmith

Ittakesonetoknowone?Featureengineering

3

RunningExample:AndroidMalwareDetecEon

•  Howshouldwecomparesamples?–  Permissions

•  Protectsensi+vedataandfunc+onality•  Doesnotworkforprivilegeescala+on

–  APImethodcalls•  Revealmalwarebehaviors

•  Featureengineering–  Usedomainknowledgetoiden+fyusefulfeatures– MustconsiderthreatsemanEcs

T.Dumitraș::FeatureSmith 4

TheSecurityBodyofKnowledge

•  Growingvolumeofpapers,industryreports,blogs,…

T.Dumitraș::FeatureSmith

Difficulttoassimilateallrelevantknowledge

5

Dilemma

T.Dumitraș::FeatureSmith

VS.

Growingbodyofknowledge Needforgoodfeatures

CanweengineerfeaturesautomaEcally,byminingsecuritypapers?

6

CanwecreateanarEficialintelligencethathelpsusbuild

otherintelligentsystems?

SecurityThreatsinNaturalLanguage

•  “TheZsonemalwareisdesignedtosendSMSmessagestocertainpremiumnumbers”*

T.Dumitraș::FeatureSmith

*Zhouetal.‘Hey,you,getoffofmymarket:Detec+ngmaliciousappsinofficialandalterna+veandroidmarkets,’NDSS2012.

SMSfraud

8

SecurityThreatsinNaturalLanguage

•  “GingerMaster[…]iso>enbundledwithbenignapplica?onsandtriestogainrootaccess”*

T.Dumitraș::FeatureSmith

Evasion,privilegeescala+on

*Arpetal.‘Drebin:Effec+veandExplainableDetec+onofAndroidMalwareinYourPocket,’NDSS2014.

9

Plato’sAllegoryoftheCave

T.Dumitraș::FeatureSmith

Illustra+onbyJohnD’Alembert

10

DomainKnowledge

T.Dumitraș::FeatureSmith 11

Challenge#1

UnderstandingthesemanEcmeaning

–  Basedoncommonsense,knowledgeofsecuritydomain

T.Dumitraș::FeatureSmith 12

Challenge#2

A7ackerbehaviorskeepevolving

–  Securityarmsrace

– Mustdiscoveropen-endedbehaviors

T.Dumitraș::FeatureSmith

IEEESecurityandPrivacySymposium

13

IntuiEonforAutomaEcFeatureEngineering

T.Dumitraș::FeatureSmith

Accesssensi+vedata

Communicateovernetwork

Executeexternalcommands

getDeviceId

getSubscriberId

execH7pRequest

setWifiEnabled

Run+me.exec

Features(suspiciousAPIcalls)Malwarebehaviors*

*Arpetal.NDSS’14

14

BehaviorExtracEon

•  Behavior–  Descrip+onofmalwareac+vity–  Shortphrase

•  <subject?,verb,object?>•  Parsegramma+calstructureofsentences

“TheZsonemalwareis

designedtosendSMS

messagestocertain

premiumnumbers”*

T.Dumitraș::FeatureSmith

•  ZsonemalwaresendSMSmessages

•  designedZsonemalware

•  Zsonemalwaresendtocertainpremiumnumbers

*Zhouetal.NDSS’12

15

BehaviorUnderstanding

•  Linkbehaviorstoconcretefeatures

T.Dumitraș::FeatureSmith

“APIcallsforaccessingsensi?vedata,suchasgetDeviceId()”*

accessingsensi+vedata

getDeviceId()

*Arpetal.NDSS’14

16

BehaviorUnderstanding

•  Linkbehaviorstomalware

T.Dumitraș::FeatureSmith

ZsonemalwareisdesignedtosendSMSmessagestopremiumnumbers

Zsone sendSMSmessages

*Zhouetal.NDSS’12

17

SemanEcNetwork

•  Nodes:securityconcepts– Malwarefamilies: nameden++es–  Concretefeatures: nameden++es–  Behaviors: openended

•  Edges:seman+callyrelatedconcepts– Weightsbasedondistanceandco-occurrence

T.Dumitraș::FeatureSmith 18

SemanEcNetworkExample

T.Dumitraș::FeatureSmith

Zsone

Zitmo

SendSMSmessage

Iden+fyexecu+onpath

Extractsenderphonenumber

Openmanifestfile

SEND_SMS

sendTextMessage

Thread.start

createFromPdu

openXmlResourceParser

Malware Behavior Feature

1

1

0.25

0.75

19

HowWellDoesThisWork?

AutomaEcfeatureengineering•  FeatureSmith

–  Analyzed1,068securitypapers

–  AutomaEcallyengineered195featuresrelevanttoAndroidmalware•  Outof383foundinthepapers

Manualfeatureengineering•  Drebin*

–  State-of-the-artAndroidmalwaredetector

–  Uses545,334features•  Including315suspiciousAPIcalls,manuallycurated

T.Dumitraș::FeatureSmith

*Arpetal.NDSS’14

20

Autovs.Manual:Experiment

AutomaEc•  Featuresengineeredby

FeatureSmith

Manual•  FeaturesusedinDrebin

T.Dumitraș::FeatureSmith

•  Sameclassifica+onalgorithm•  Samecorpusofbenignandmaliciousapps•  Samefeaturetypes•  Experiment:Comparethetwofeaturesets

21

Autovs.Manual:Features

•  FeatureSmithdiscoverednewfeatures–  getSimOperatorName–  getNetworkOperatorName–  getCountry

•  Onenusedbymalware–  HelpdetectGappusinfamily (notdetectedbyDrebin)

T.Dumitraș::FeatureSmith

Missingfrommanuallyengineeredset

HumandatascienEstscannotassimilateallrelevantknowledge

22

Autovs.Manual:DetecEonPerformance

0.00 0.02 0.04 0.06 0.08 0.10FDlse PRsitive 5Dte

0.80

0.85

0.90

0.95

1.00

Tru

e P

Rsi

tive 5

Dte

T.Dumitraș::FeatureSmith 23

Autovs.Manual:DetecEonPerformance

0.00 0.02 0.04 0.06 0.08 0.10FDlse PRsitive 5Dte

0.80

0.85

0.90

0.95

1.00

Tru

e P

Rsi

tive 5

Dte

Drebin

T.Dumitraș::FeatureSmith 24

0.00 0.02 0.04 0.06 0.08 0.10FDlse PRsitive 5Dte

0.80

0.85

0.90

0.95

1.00

Tru

e P

Rsi

tive 5

Dte

FeDture6Pith

Drebin

Autovs.Manual:DetecEonPerformance

T.Dumitraș::FeatureSmith

Paritywithmanualfeaturesat1%falseposiEves

25

KnowledgeEvoluEon

T.Dumitraș::FeatureSmith

0.00 0.02 0.08 0.100.04 0.06 False Positive Rate

0.80

0.85

0.90

0.95

1.00Tr

ue P

ositi

ve R

ate

Feature sets2012 (24 features) 2013 (32 features) 2014 (40 features)2015 (46 features)

Effec+venessoffeaturesdiscoveredindifferentyears

26

AlternaEves

•  Featureselec+on– Mustenumerateallpossiblefeaturesinadvance(e.g.allAndroidpermissions)

•  Representa+onlearning–  Discoversusefulfeatures(representa+ons)fromrawdata(e.g.usinganeuralnetwork)

•  Disadvantages–  Data-driven:mayreflectbiasesinthegroundtruth–  Noautoma+cdiscoveryofthreatsemanEcs

T.Dumitraș::FeatureSmith 27

InANutshell•  Automa+cfeatureengineering

–  DiscoversemanEcallymeaningfulfeatures•  Somemissingfrommanuallycuratedset

–  Performanceonparwithstate-of-the-artmalwaredetector–  Manypoten+alapplica+ons

•  Security: AIbots,threatintelligence,intrusiondetec+on,…•  Otherfields: biomedicalresearch,IBM’sWatsonQ&Asystem

•  Complementshuman-drivenfeatureengineering–  Humandatascien+stshaveintuiEon–  FeatureSmithcanreasonoverenErebodyofknowledge

•  Paperanddata:h7p://featuresmith.org

T.Dumitraș::FeatureSmith 28

AutomatedsystemscanunderstandthesemanEcsofsecurityconcepts

ThisisapowerfultoolforcreaEngabacksanddefenses

Thankyou!

T.Dumitraș::FeatureSmith

TudorDumitraș@tudor_dumitras

http://featuresmith.org

Acknowledgments:•  WorkwithZiyunZhu•  RobotcartoonsbyKatyTresedder

30

top related