splunklive dc april 2016 - operationalizing machine learning
TRANSCRIPT
Copyright©2016SplunkInc.
OperationalizingMachineLearningAdrish SannyasiStaff Solutions Architect, HealthcareSplunk, Inc.
Dr. Tom LaGattaStaff Data ScientistSplunk, Inc.
2
DisclaimerDuringthecourseofthispresentation,wemaymakeforwardlookingstatementsregardingfuture
eventsortheexpectedperformanceofthecompany.Wecautionyouthatsuchstatementsreflectourcurrentexpectationsandestimatesbasedonfactorscurrentlyknowntousandthatactualeventsorresultscoulddiffermaterially.Forimportantfactorsthatmaycauseactualresultstodifferfromthose
containedinourforward-lookingstatements,pleasereviewourfilingswiththeSEC.Theforward-lookingstatementsmadeinthethispresentationarebeingmadeasofthetimeanddateofitslivepresentation.Ifreviewedafteritslivepresentation,thispresentationmaynotcontaincurrentoraccurateinformation.
Wedonotassumeanyobligationtoupdateanyforwardlookingstatementswemaymake.
Inaddition,anyinformationaboutourroadmapoutlinesourgeneralproductdirectionandissubjecttochangeatanytimewithoutnotice. Itisforinformationalpurposesonlyandshallnot,beincorporatedintoanycontractorothercommitment.Splunkundertakesnoobligationeithertodevelopthefeatures
orfunctionalitydescribedortoincludeanysuchfeatureorfunctionality inafuturerelease.
Copyright©2016SplunkInc.
HistoricalData Real-timeData StatisticalModels
DB,Hadoop/S3/NoSQL, Splunk MachineLearning
T– afewdays T+afewdays
Whyisthissochallengingusingtraditionalmethods?
• DATAISSTILLINMOTION,stillinaBUSINESS PROCESS.• Enrichreal-timeMACHINEDATAwithstructuredHISTORICALDATA• Make decisionsINREALTIME usingALLTHEDATA• CombineLEADINGandLAGGINGINDICATORS (KPIs)
Splunk
SecurityOperationsCenter
NetworkOperationsCenter
BusinessOperationsCenter
6
ML101:Whatisit?• MachineLearning(ML)isaprocessforgeneralizingfromexamples
– Examples=exampleor“training”data– Generalizing=building“statisticalmodels”tocapturecorrelations– Process=MLisneverdone,youmustkeepvalidating&refittingmodels
• SimpleMLworkflow:– Exploredata– FITmodelsbasedondata– APPLYmodelsinproduction– Keepvalidatingmodels
“Allmodelsarewrong,butsomeareuseful.”- GeorgeBox
9
3TypesofMachineLearning3.ReinforcementLearning:generalizingfromrewards intime
Leitner System Recommendersystems
11
ITOps:PredictiveMaintenance
1. Getresourceusagedata(CPU,latency,outagereports)
2. Exploredata,andfitpredictivemodelsonpast/real-timedata
3. Apply&validatemodelsuntilpredictionsareaccurate
4. Forecastresourcesaturation,demand&usage
5. SurfaceincidentstoITOps,whoINVESTIGATES&ACTS
Problem:Networkoutagesandtruckrollscausebigtime&moneyexpenseSolution: Buildpredictivemodeltoforecastoutagescenarios,actpre-emptively&learn
12
Security:FindInsiderThreatsProblem:Securitybreachescausebigtime&moneyexpenseSolution: Buildpredictivemodeltoforecastthreatscenarios,actpre-emptively&learn
1. Getsecuritydata(datatransfers,authentication,incidents)
2. Exploredata,andfitpredictivemodelsonpast/real-timedata
3. Apply&validatemodelsuntilpredictionsareaccurate
4. Forecastabnormalbehavior,riskscores¬ableevents
5. SurfaceincidentstoSecurityOps,whoINVESTIGATES&ACTS
13
BusinessAnalytics:PredictCustomerChurnProblem:Customerchurncausesbigtime&moneyexpenseSolution: Buildpredictivemodeltoforecastpossiblechurn,actpre-emptively&learn
1. Getcustomerdata(set-topboxes,weblogs,transactionhistory)
2. Exploredata,andfitpredictivemodelsonpast/real-timedata
3. Apply&validatemodelsuntilpredictionsareaccurate
4. Forecastchurnrate&identifycustomerslikelytochurn
5. SurfaceincidentstoBusinessOps,whoINVESTIGATES&ACTS
14
Summary:TheMLProcessProblem:<Stuffintheworld>causesbigtime&moneyexpenseSolution: Buildpredictivemodeltoforecast<possibleincidents>,actpre-emptively&learn
1. Getallrelevantdatatoproblem
2. Exploredata,andfitpredictivemodelsonpast/real-timedata
3. Apply&validatemodelsuntilpredictionsareaccurate
4. ForecastKPIs¬ableeventsassociatedtousecase
5. SurfaceincidentstoXOps,whoINVESTIGATES&ACTS
Operatio
nalize
15
HowTo:OperationalizeMachineLearning• AnalystoutcomesareDATAtolearnfrom!• Reinforcementlearningiskey• Reducefalsepositives:
– Recommendevents/incidents toanalyst– Recordanalystoutcomes,useasmodelinput– Optimizefortruepositives&againstfalsepositives
• Rewardanalystsforgoodanalysis:– Givebonusesforsuccessful investigations– Carrotsnotsticks
• Rewardmachinesforgoodlearning:– Calibratedata,risk/rewards,analystoutcome
17
SplunkUserBehaviorAnalytics(UBA)• ~100%ofbreachesinvolvevalidcredentials(MandiantReport)• Needtounderstandnormal&anomalousbehaviorsforALLusers• UBAdetectsAdvancedCyberattacks andMaliciousInsiderThreats• LotsofMLunderthehood:
– BehaviorBaselining&Modeling– AnomalyDetection(30+models)– AdvancedThreatDetection
• E.g.,DataExfil Threat:– “Sawthisstrangelogin&datatransfer
forusermpittman at3aminChina…”– SurfacethreattoSOCAnalysts
18
MachineLearninginSplunkITSIAdaptiveThresholding:• Learnbaselines&dynamicthresholds• Alert&actondeviations• Managefor1000sofKPIs&entities• Stdev/Avg,Quartile/Median,Range
AnomalyDetection:• Find“hiccups”inexpectedpatterns• Catchesdeviationsbeyondthresholds• UsesHolt-Wintersalgorithm
19
MLToolkit&Showcase• SplunkSupportedframeworkforbuildingMLApps
– Getitforfree:http://tiny.cc/splunkmlapp
• LeveragesPythonforScientificComputing (PSC)add-on:– Open-sourcePythondatascienceecosystem– NumPy,SciPy,scitkit-learn,pandas,statsmodels
• Showcaseusecases:PredictHardDriveFailure,ServerPowerConsumption,ApplicationUsage,CustomerChurn&more
• Standardalgorithms outofthebox:– Supervised:LogisticRegression,SVM,LinearRegression,RandomForest,etc.– Unsupervised: KMeans,DBSCAN,SpectralClustering,PCA,KernelPCA,etc.
• Implementoneof300+algorithmsbyeditingPythonscripts
20
Clustering:• kmeans,cluster• K-means• DBSCAN• Birch• SpectralClustering
SplunkMLAlgorithms
20
Unsupervised Supervised
Continuous
Categorical
Classification:• LogisticRegression• Support VectorMachine• Naïve-Bayes(Gaussian,Bernoulli)• RandomForestClassifier• KNN,Trees
Regression:• LinearRegression• PolynomialRegression• ElasticNet• Ridge• Lasso• RandomForestRegr.
Dimensionalityreduction:• PCA• KernelPCA
AssociationAnalysis• Apriori• FP-GrowthHiddenMarkovModel
predictoutliers
anomaliesanomalydetection
Vectorization:• TFIDF
• DecisionTrees
SPLcommand MLToolkitAppv1.0
21
NewGoodnessinMLToolkitv1.0• NewAlgorithms(RandomForest,Lasso,KernelPCA,andmore…)• Moreusecasestoexplore• SupportaddedforSearchHeadClustering• Removed50klimitonmodelfitting• Samplingfortraining/testdata• GuidedMLviaaMLAssistantakaModel/QueryBuilder• Installon6.4SearchHead
23
Analysts BusinessUsers
1.GetData&FindDecision-Makers
23
ITUsers
ODBCSDKAPI
DBConnectLook-Ups
AdHocSearch
MonitorandAlert
Reports/Analyze
CustomDashboards
GPS/Cellular
Devices Networks Hadoop
Servers Applications OnlineShoppingCarts
Analysts BusinessUsers
StructuredDataSources
CRM ERP HR Billing Product Finance
DataWarehouse
Clickstreams
24
2.ExploreData,BuildSearches&Dashboards• StartwiththeExploratoryDataAnalysisphase
– “80%ofdatascience issourcing,cleaning,andpreparingthedata”– Tip:leverageITSIKPIs– lotsofdomainknowledge
• Foreachdatasource,build“datadiagnostic”dashboard– What’sinteresting?Throwupsomebasiccharts.– What’srelevantforthisusecase?– Anyanomalies?Arethresholdsuseful?
• Mixdatastreams&computeaggregates– ComputeKPIs&statisticsw/stats,eventstats,etc.– Enrichdatastreamswithusefulstructureddata– statscountbyXY– whereX,Yfromdifferentsources– BuildnewKPIsfromwhatyoufind
25
3.Fit,Apply&ValidateModels• MLSPL – NewgrammarfordoingMLinSplunk• fit – fitmodelsbasedontrainingdata– [training data] | fit LinearRegression costly_KPI
from feature1 feature2 feature3 into my_model
• apply – applymodelsontestingandproductiondata– [testing/production data] | apply my_model
• ValidateYourModel (TheHardPart)– Whyhard?Becausestatistics ishard!Also:modelerror≠realworldrisk.– Analyzeresiduals,mean-squareerror,goodnessoffit,cross-validate,etc.– TakeSplunk’sAnalytics&DataScienceEducationcourse
26
4.Predict&Act• ForecastKPIs&predictnotableevents
– Whenwillmysystemhaveacriticalerror?– Inwhichserviceorprocess?– What’stheprobablerootcause?
• Howwillpeopleactonpredictions?– IsthisaSev 1/2/3event?Whoresponds?– DeliverviaNotableEventsordashboard?– Humanresponseorautomatedresponse?
• Howdoyouimprovethemodels?– Iterate,addmoredata,extractmorefeatures– Keeptrackoftrue/falsepositives
27
5.OperationalizeYourModels• OperationalizingclosestheloopoftheMLProcess:
1. Getdata2. Exploredata&fitmodels3. Apply&validatemodels4. ForecastKPIs&events5. SurfaceincidentstoOpsteam
• Whenyoudelivertheoutcome,keeptrackoftheresponse– Human-generatedresponse(detailed journallogs,etc)– Machine-generatedresponse(workflowactions,etc)– Externalknowledge(closedticketsdata,DBrecords,etc)
• Thenoperationalize:feedbackOpsanalysistodatainputs,repeat• Lotsofhardwork&stats,butlotsofvaluewillcomeout.
Operationalize
29
NextStepswithSplunkML• ReachouttoyourTechTeam!WecanhelparchitectMLworkflows.• LotsofMLcommandsinCoreSplunk(predict,anomalydetection,stats)• MLToolkit&Showcase – availableandfree, readytouse
– Getitforfree:http://tiny.cc/splunkmlapp
• SplunkUBA: AppliedMLforSecurity– Unsupervised learningofUsers&Entities– SurfacesAnomalies&Threats
• SplunkITSI: AppliedMLforITOAusecases– Manage1000sofKPIs&alerts– AdaptiveThresholding&AnomalyDetection
• MLNewProductInitiative(NPI)Program:– ConnectwithProduct&Engineeringteams- [email protected]
30
SEPT26-29,2016WALTDISNEYWORLD,ORLANDOSWANANDDOLPHINRESORTS
• 5000+IT&BusinessProfessionals• 3daysoftechnicalcontent• 165+sessions• 80+CustomerSpeakers• 35+Apps inSplunkAppsShowcase• 75+TechnologyPartners• 1:1networking:AskTheExpertsandSecurityExperts,BirdsofaFeatherandChalkTalks
• NEWhands-on labs!• Expandedshowfloor,DashboardsControlRoom&Clinic,andMORE!
The7th AnnualSplunkWorldwideUsers’Conference
PLUSSplunkUniversity• Threedays:Sept24-26,2016• GetSplunkCertifiedforFREE!• GetCPE creditsforCISSP,CAP,SSCP• Savethousands onSplunkeducation!