splunklive dc april 2016 - operationalizing machine learning

Copyright©2016SplunkInc.

OperationalizingMachineLearningAdrish SannyasiStaff Solutions Architect, HealthcareSplunk, Inc.

Dr. Tom LaGattaStaff Data ScientistSplunk, Inc.

2

DisclaimerDuringthecourseofthispresentation,wemaymakeforwardlookingstatementsregardingfuture

eventsortheexpectedperformanceofthecompany.Wecautionyouthatsuchstatementsreflectourcurrentexpectationsandestimatesbasedonfactorscurrentlyknowntousandthatactualeventsorresultscoulddiffermaterially.Forimportantfactorsthatmaycauseactualresultstodifferfromthose

containedinourforward-lookingstatements,pleasereviewourfilingswiththeSEC.Theforward-lookingstatementsmadeinthethispresentationarebeingmadeasofthetimeanddateofitslivepresentation.Ifreviewedafteritslivepresentation,thispresentationmaynotcontaincurrentoraccurateinformation.

Wedonotassumeanyobligationtoupdateanyforwardlookingstatementswemaymake.

Inaddition,anyinformationaboutourroadmapoutlinesourgeneralproductdirectionandissubjecttochangeatanytimewithoutnotice. Itisforinformationalpurposesonlyandshallnot,beincorporatedintoanycontractorothercommitment.Splunkundertakesnoobligationeithertodevelopthefeatures

orfunctionalitydescribedortoincludeanysuchfeatureorfunctionality inafuturerelease.


WhydoweneedML?


HistoricalData Real-timeData StatisticalModels

DB,Hadoop/S3/NoSQL, Splunk MachineLearning

T– afewdays T+afewdays

Whyisthissochallengingusingtraditionalmethods?

• DATAISSTILLINMOTION,stillinaBUSINESS PROCESS.• Enrichreal-timeMACHINEDATAwithstructuredHISTORICALDATA• Make decisionsINREALTIME usingALLTHEDATA• CombineLEADINGandLAGGINGINDICATORS (KPIs)

Splunk

SecurityOperationsCenter

NetworkOperationsCenter

BusinessOperationsCenter


WhatisML?

6

ML101:Whatisit?• MachineLearning(ML)isaprocessforgeneralizingfromexamples

– Examples=exampleor“training”data– Generalizing=building“statisticalmodels”tocapturecorrelations– Process=MLisneverdone,youmustkeepvalidating&refittingmodels

• SimpleMLworkflow:– Exploredata– FITmodelsbasedondata– APPLYmodelsinproduction– Keepvalidatingmodels

“Allmodelsarewrong,butsomeareuseful.”- GeorgeBox

7

3TypesofMachineLearning1.Supervised Learning: generalizingfromlabeled data

8

3TypesofMachineLearning2.Unsupervised Learning: generalizingfromunlabeled data

9

3TypesofMachineLearning3.ReinforcementLearning:generalizingfromrewards intime

Leitner System Recommendersystems


MLUseCases

11

ITOps:PredictiveMaintenance

1. Getresourceusagedata(CPU,latency,outagereports)

2. Exploredata,andfitpredictivemodelsonpast/real-timedata

3. Apply&validatemodelsuntilpredictionsareaccurate

4. Forecastresourcesaturation,demand&usage

5. SurfaceincidentstoITOps,whoINVESTIGATES&ACTS

Problem:Networkoutagesandtruckrollscausebigtime&moneyexpenseSolution: Buildpredictivemodeltoforecastoutagescenarios,actpre-emptively&learn

12

Security:FindInsiderThreatsProblem:Securitybreachescausebigtime&moneyexpenseSolution: Buildpredictivemodeltoforecastthreatscenarios,actpre-emptively&learn

1. Getsecuritydata(datatransfers,authentication,incidents)



4. Forecastabnormalbehavior,riskscores&notableevents

5. SurfaceincidentstoSecurityOps,whoINVESTIGATES&ACTS

13

BusinessAnalytics:PredictCustomerChurnProblem:Customerchurncausesbigtime&moneyexpenseSolution: Buildpredictivemodeltoforecastpossiblechurn,actpre-emptively&learn

1. Getcustomerdata(set-topboxes,weblogs,transactionhistory)



4. Forecastchurnrate&identifycustomerslikelytochurn

5. SurfaceincidentstoBusinessOps,whoINVESTIGATES&ACTS

14

Summary:TheMLProcessProblem:<Stuffintheworld>causesbigtime&moneyexpenseSolution: Buildpredictivemodeltoforecast<possibleincidents>,actpre-emptively&learn

1. Getallrelevantdatatoproblem



4. ForecastKPIs&notableeventsassociatedtousecase

5. SurfaceincidentstoXOps,whoINVESTIGATES&ACTS

Operatio

nalize

15

HowTo:OperationalizeMachineLearning• AnalystoutcomesareDATAtolearnfrom!• Reinforcementlearningiskey• Reducefalsepositives:

– Recommendevents/incidents toanalyst– Recordanalystoutcomes,useasmodelinput– Optimizefortruepositives&againstfalsepositives

• Rewardanalystsforgoodanalysis:– Givebonusesforsuccessful investigations– Carrotsnotsticks

• Rewardmachinesforgoodlearning:– Calibratedata,risk/rewards,analystoutcome


MLwithSplunk

17

SplunkUserBehaviorAnalytics(UBA)• ~100%ofbreachesinvolvevalidcredentials(MandiantReport)• Needtounderstandnormal&anomalousbehaviorsforALLusers• UBAdetectsAdvancedCyberattacks andMaliciousInsiderThreats• LotsofMLunderthehood:

– BehaviorBaselining&Modeling– AnomalyDetection(30+models)– AdvancedThreatDetection

• E.g.,DataExfil Threat:– “Sawthisstrangelogin&datatransfer

forusermpittman at3aminChina…”– SurfacethreattoSOCAnalysts

18

MachineLearninginSplunkITSIAdaptiveThresholding:• Learnbaselines&dynamicthresholds• Alert&actondeviations• Managefor1000sofKPIs&entities• Stdev/Avg,Quartile/Median,Range

AnomalyDetection:• Find“hiccups”inexpectedpatterns• Catchesdeviationsbeyondthresholds• UsesHolt-Wintersalgorithm

19

MLToolkit&Showcase• SplunkSupportedframeworkforbuildingMLApps

– Getitforfree:http://tiny.cc/splunkmlapp

• LeveragesPythonforScientificComputing (PSC)add-on:– Open-sourcePythondatascienceecosystem– NumPy,SciPy,scitkit-learn,pandas,statsmodels

• Showcaseusecases:PredictHardDriveFailure,ServerPowerConsumption,ApplicationUsage,CustomerChurn&more

• Standardalgorithms outofthebox:– Supervised:LogisticRegression,SVM,LinearRegression,RandomForest,etc.– Unsupervised: KMeans,DBSCAN,SpectralClustering,PCA,KernelPCA,etc.

• Implementoneof300+algorithmsbyeditingPythonscripts

20

Clustering:• kmeans,cluster• K-means• DBSCAN• Birch• SpectralClustering

SplunkMLAlgorithms

20

Unsupervised Supervised

Continuous

Categorical

Classification:• LogisticRegression• Support VectorMachine• Naïve-Bayes(Gaussian,Bernoulli)• RandomForestClassifier• KNN,Trees

Regression:• LinearRegression• PolynomialRegression• ElasticNet• Ridge• Lasso• RandomForestRegr.

Dimensionalityreduction:• PCA• KernelPCA

AssociationAnalysis• Apriori• FP-GrowthHiddenMarkovModel

predictoutliers

anomaliesanomalydetection

Vectorization:• TFIDF

• DecisionTrees

SPLcommand MLToolkitAppv1.0

21

NewGoodnessinMLToolkitv1.0• NewAlgorithms(RandomForest,Lasso,KernelPCA,andmore…)• Moreusecasestoexplore• SupportaddedforSearchHeadClustering• Removed50klimitonmodelfitting• Samplingfortraining/testdata• GuidedMLviaaMLAssistantakaModel/QueryBuilder• Installon6.4SearchHead


BuildingMLApps

23

Analysts BusinessUsers

1.GetData&FindDecision-Makers

23

ITUsers

ODBCSDKAPI

DBConnectLook-Ups

AdHocSearch

MonitorandAlert

Reports/Analyze

CustomDashboards

GPS/Cellular

Devices Networks Hadoop

Servers Applications OnlineShoppingCarts

Analysts BusinessUsers

StructuredDataSources

CRM ERP HR Billing Product Finance

DataWarehouse

Clickstreams

24

2.ExploreData,BuildSearches&Dashboards• StartwiththeExploratoryDataAnalysisphase

– “80%ofdatascience issourcing,cleaning,andpreparingthedata”– Tip:leverageITSIKPIs– lotsofdomainknowledge

• Foreachdatasource,build“datadiagnostic”dashboard– What’sinteresting?Throwupsomebasiccharts.– What’srelevantforthisusecase?– Anyanomalies?Arethresholdsuseful?

• Mixdatastreams&computeaggregates– ComputeKPIs&statisticsw/stats,eventstats,etc.– Enrichdatastreamswithusefulstructureddata– statscountbyXY– whereX,Yfromdifferentsources– BuildnewKPIsfromwhatyoufind

25

3.Fit,Apply&ValidateModels• MLSPL – NewgrammarfordoingMLinSplunk• fit – fitmodelsbasedontrainingdata– [training data] | fit LinearRegression costly_KPI

from feature1 feature2 feature3 into my_model

• apply – applymodelsontestingandproductiondata– [testing/production data] | apply my_model

• ValidateYourModel (TheHardPart)– Whyhard?Becausestatistics ishard!Also:modelerror≠realworldrisk.– Analyzeresiduals,mean-squareerror,goodnessoffit,cross-validate,etc.– TakeSplunk’sAnalytics&DataScienceEducationcourse

26

4.Predict&Act• ForecastKPIs&predictnotableevents

– Whenwillmysystemhaveacriticalerror?– Inwhichserviceorprocess?– What’stheprobablerootcause?

• Howwillpeopleactonpredictions?– IsthisaSev 1/2/3event?Whoresponds?– DeliverviaNotableEventsordashboard?– Humanresponseorautomatedresponse?

• Howdoyouimprovethemodels?– Iterate,addmoredata,extractmorefeatures– Keeptrackoftrue/falsepositives

27

5.OperationalizeYourModels• OperationalizingclosestheloopoftheMLProcess:

1. Getdata2. Exploredata&fitmodels3. Apply&validatemodels4. ForecastKPIs&events5. SurfaceincidentstoOpsteam

• Whenyoudelivertheoutcome,keeptrackoftheresponse– Human-generatedresponse(detailed journallogs,etc)– Machine-generatedresponse(workflowactions,etc)– Externalknowledge(closedticketsdata,DBrecords,etc)

• Thenoperationalize:feedbackOpsanalysistodatainputs,repeat• Lotsofhardwork&stats,butlotsofvaluewillcomeout.

Operationalize


ShowmetheML!

29

NextStepswithSplunkML• ReachouttoyourTechTeam!WecanhelparchitectMLworkflows.• LotsofMLcommandsinCoreSplunk(predict,anomalydetection,stats)• MLToolkit&Showcase – availableandfree, readytouse

– Getitforfree:http://tiny.cc/splunkmlapp

• SplunkUBA: AppliedMLforSecurity– Unsupervised learningofUsers&Entities– SurfacesAnomalies&Threats

• SplunkITSI: AppliedMLforITOAusecases– Manage1000sofKPIs&alerts– AdaptiveThresholding&AnomalyDetection

• MLNewProductInitiative(NPI)Program:– ConnectwithProduct&Engineeringteams- [email protected]

30

SEPT26-29,2016WALTDISNEYWORLD,ORLANDOSWANANDDOLPHINRESORTS

• 5000+IT&BusinessProfessionals• 3daysoftechnicalcontent• 165+sessions• 80+CustomerSpeakers• 35+Apps inSplunkAppsShowcase• 75+TechnologyPartners• 1:1networking:AskTheExpertsandSecurityExperts,BirdsofaFeatherandChalkTalks

• NEWhands-on labs!• Expandedshowfloor,DashboardsControlRoom&Clinic,andMORE!

The7th AnnualSplunkWorldwideUsers’Conference

PLUSSplunkUniversity• Threedays:Sept24-26,2016• GetSplunkCertifiedforFREE!• GetCPE creditsforCISSP,CAP,SSCP• Savethousands onSplunkeducation!

splunklive dc april 2016 - operationalizing machine learning

Documents