Copyright©2016Splunk Inc.
DavidVeuveStaffSecurityStrategist,Splunk
HowtoScale:From_rawtotstats (andbeyond!)
Disclaimer
2
Duringthecourseofthispresentation,wemaymakeforwardlookingstatementsregardingfutureeventsortheexpectedperformanceofthecompany.Wecautionyouthatsuchstatementsreflectourcurrentexpectationsandestimatesbasedonfactorscurrentlyknowntousandthatactualeventsorresultscoulddiffermaterially.Forimportantfactorsthatmaycauseactualresultstodifferfromthose
containedinourforward-lookingstatements,pleasereviewourfilingswiththeSEC.Theforward-lookingstatementsmadeinthethispresentationarebeingmadeasofthetimeanddateofitslivepresentation.Ifreviewedafteritslivepresentation,thispresentationmaynotcontaincurrentoraccurateinformation.Wedonotassumeanyobligationtoupdateanyforwardlookingstatementswemaymake.Inaddition,anyinformationaboutourroadmapoutlinesourgeneralproductdirectionandissubjecttochangeatanytimewithoutnotice.Itisforinformationalpurposesonlyandshallnot,beincorporatedintoanycontractorothercommitment.Splunkundertakesnoobligationeithertodevelopthefeaturesor
functionalitydescribedortoincludeanysuchfeatureorfunctionalityinafuturerelease.
HowtoUseThisPresentation
ThisPDFisintendedtobeareferenceguide,tocomplementtheactualpresentation.Ifyou’vealreadydabbledintstats,feelfreetoreadthrough.Ifyou’renewtotstats,Ihighlyrecommendwatchingthevideopresentationfirst.Idon’tdoquiteagoodenoughjobwiththisslideverison forittostandalone.Pleasefindthevideorecordingonthe.conf website,maybemid-to-lateOctober(askyourSplunkteamforupdatesifyoudon’tseeitbythen)
3
Agenda1. Intro2. David’sStory3. OverviewofTechniques(SI,RA,AP,tstats)4. DataModels– Whatyouneedtoknow5. Howtotransitionfrom_rawtotstats6. WhenDataModelAccelerationdoesn’twork7. RealWorldExamples8. AdvancedTopics
PersonalIntroduction
5
• DavidVeuve – StaffSecurityStrategist,SecurityProductAdoption• SMEforArchitecture,Security,Analytics• [email protected]• FormerSplunkCustomer(For3years,3.xthrough4.3)• PrimaryauthorofSearchActivityapp• FormerTalks:
– SecurityNinjutsu PartThree:.conf 2016(Thisyear!)– SecurityNinjutsu PartTwo:.conf 2015– SecurityNinjutsu PartOne:.conf 2014– PasswordsareforChumps:.conf 2014
Whyallthis?• Gettingresultsfastisgreat,butonlyhalfthepuzzle.• Ifyou/yourteamarewritingsearchesthatwillrunfor100cpu hoursperday,supposethat’s50%ofyourcluster’stime.
• Whatifwecouldshrinkthatto10CPUhours?Yourclusterjustwentfrom75%utilizedto30%utilized.
• SearchaccelerationlowersyourTCO• Searchaccelerationsavesyoutimewaiting• Searchaccelerationletsyouaskallofthequestions
Whythistalk?Whynow?• tstats isn’tthathard,butwedon’thaveverymuchtohelppeoplemakethetransition.
• EverythingthatSplunkInc doesispoweredbytstats.• I’vetaughtalotofpeopleinsmallergroupsaboutSearchAccelerationtechnologies.
• Tothemasses!
Whoareyou?• Youareeithera*super*hardcoredev,oryou’renotbrandnewtoSplunk.
• You’veplayedwithSPL.Youunderstandhowitworks.• You’reprobablycomfortablewithstats.• Peopleprobablycomeaskyouforhelpbuildingqueriesorsolvingproblems.
Whatwillyouget?• You’llunderstandhowtomakequeriesthatwowpeople.• You’llcementyourselfas*the*officeoruser-groupsearchninja.• You’llhappilylearnhoweasyitis.
David’sStoryJustaboy,standinginfrontofasearchcommand,askingittoshowthesyntaxerror.
WhereIStarted• Customeratanadvertisingcompany• Wasacasualuser,whenIwashandedaBusinessAnalyticsproject• Goingfromtensorhundredsofdatapointstomillions• Builttieredsummaryindexes• Auto-switchedbetweenhighgranularityandlowbasedonselectedtimewindows
• TonsofhelpfromNickMealy@Sideview
ThenItookabreak• ItooktwoyearsoffofSplunk,missing5.xandtheinitial6.0release.
• SplunkreleasedReportAcceleration• SplunkreleasedDataModelAcceleration
ICametoSplunk• Irebuiltmydashboard.FromSplunk4toSplunk6,loadtimewentfrom1.5minto27seconds.
• IusedReportAcceleration– loadtimewentdownto6seconds
• ButthenIhadabunchofdifferentsearchesrunning..
IHelpedaFinanceCompany• Theywantedmultipledashboards,drilldown,searches,on18keyfieldsin2000lineXMLdocuments.
• Builtanaccelerateddatamodelwith18calculatedspath fields• Usedthepivotinterfacetobuilddashboards• 30dayunaccelerated loadtimewouldhavebeen2days ifIcouldwait• 30dayacceleratedloadtimewas15seconds
IHelpedaHealthCareCompany• Theywanteddistinctcountofdest_ip persrc_ip perday,averagedandstdev’d.
• Runningoverrawwasn’tevenconsidered.• Dependingontheanalysis,wecansearchandprocessover1billionresults/minute.
TechniquesIt’sallaboutthetechnique..
SummaryIndexing• Takethesearchyou’rerunningrightnow,andstoretheresultsinanewindex.Nolicenserequired.
• How:– Justadd|collectinyoursearch,specifyingdestinationindex(maybe
”summary”)– Probablydon’twanttousesistats,sitop,si..anything.They’renotreally
valuable.– http://www.davidveuve.com/tech/how-i-do-summary-indexing-in-splunk/
• Examples:– Store#oflogins,#ofdistincthosts,#of…peruser/device/etc– Emaillogsarehorribleandslowtoprocess– storetheoutput– ITSIMetricsearches
SummaryIndexing(2)• Why:You’renotacceleratingrawevents,you’reacceleratingtheresultofasearch.Wecan’taccelerateasearchbaseddatamodel.So:summaryindexing
• Whynot?– NoMultipleLevelsofTimeGranularity--------->– Manualcoordinationofsummaryindexing---->– Missedsearches-------------------------------------->
ReportAcceleration• Takesasinglesavedsearch,withstats/timechart/top/chartandpre-computestheaggregatesatmultipletimebuckets(per10m,perhour,perday,etc.,basedonyouraccelerationrange).
• Automaticallyswitchesbetweenaccelerationandrawdataaccesswhenneeded.
• Youcannotquerythedatainwaysthatyoudidn’tplanfororiginally
ReportAcceleration(2)• How:
– GointothesavedsearchconfigurationandchecktheAcceleratebox
– Decideonoverwhattimerangeyou’dliketoaccelerate– Keepinmindthatlongertimeranges=>lessgranularity(so
ifyouchoose1year,you’lllose10minor1hr buckets
• Example– Myexecdashboardneedstoload,like,immediately.
NormalSearchExample• Youaskforastatisticalsearch
• Indexersreturnminimumnecessarystatistics(e.g.,anavg needssum/count)
• SHcomputesfinalresult(sum(sum)/sum(count))
ReportAccelerationExample• SHregularlyrequestsminimumnecessarystatistics(e.g.,avg needssum/count)splitintotimebuckets
• Later,whenuserrequestsvalues,theSHalreadyknowstheanswer.
ReportAcceleration(3)• Why?
– You’vegotasmallmodestdatasetwithlowsplit-bycardinalitywhereyouarewillingtobecraftytorunmultiplequeries
– Autofallbacktorawlogs,autobackfillandrecovery,autotimegranularity– SUPERFAST– Easy
• WhyNot?– Mostlylimitedtoasinglesearchperjob------->– Onlysupportforbasicanalytics------------------>– Kinda ablackart,notthatwidelyused--------->
AcceleratedPivot• Draganddropbasicstatsinterface,withtheoverwhelmingpoweroveraccelerateddatamodelsonthebackend
• How:– Buildadatamodel(moreonthatlater)– Accelerateit– Usethepivotinterface– Savetodashboardandgetpromoted
• Examples– Yourfirstforayintoacceleratedreporting– Anythingthatinvolvesstats
AcceleratedPivot(2)• Why?
– Supereasy– Automaticallyswitchbetweenrawlogsandaccelerated
data– DataModelAcceleration=💯
• WhyNot?– Notentirelyacceleratedbydefault---------------->– Can’tgosummariesonly inUI----------------------->– Pivotsearchlanguageisweirderthantstats ---->
tstats• Operatesonaccelerateddatamodelsortscollect files(andindex-timefieldextractions,suchassource,host,index,sourcetype,andthoseITSIoroccasionalothers)
• Canonlydostats– norawlogs(today!)• Isfasterthanyou’veeverimaginedlifetobe.• How:
– Differentsearchsyntax,whichtakesadjustment,butactuallyreallysimilartonormalstats.
– |tstats countwhereindex=*groupby indexsourcetypeê Bringafour-pointseatharness‘cause we’regoingFAST
tstats (2)• Why?
– Distributedindexedfieldsearchingwiththeflexibilityofsearchlanguagetodefinesyntax
– summaries_only=t– Fasterthanyou’veeverbeen.
DataModels– WhatyouneedtoknowSomethingcleverhere..
DataModelBasics• Essentiallyanythingyoucandefineinpropsandtransformscangointoanaccelerateddatamodel
• Onlyrawevents– can’taccelerateadatamodelbasedonsearches,orwithtransaction,oretc.– Gocheckoutsummaryindexing
• Favoriteexample:|eval myfield=spath(_raw,“path.to.my.field”)isslow.Putthatinyourdatamodel,andpivot/tstats querieswillbesuperfast
• NextfiveslidesfromDavidMarquardt’s.conf2013Presohttp://conf.splunk.com/session/2013/WN69801_WhatsNew_Splunk_DavidMarquardt_UnderstandingSplunkAccelerationTechnologies.pdf
30
Splunk EnterpriseIndexStructure
IDX1IDX2
IDX3
ColdPath
ThawedPath
Rawdata
TSIDXhot_v1_100
hot_v1_101
db_lt_et_80
db_lt_et_101
*.data*.tsidxrawdata
db_lt_et_70
apple
beer
LEXICON
POSTING
“applepieandicecreamisdelicious”
“anappleadaykeepsdoctoraway”
150100
etet
ltlt
itit
apple beer cokeice java …
HomePath
Source/Sourcetype/HostMetadata1source::/my/log2source::/blah
cream
31
RawdatastoredatoffsetsPostingvalue Seekaddress _time
0 42 1331667091
1 78 1331667091
2 120 1331667091
3 146 1331667091
4 170 1331667091
5 212 1331667091
6 240 1331667091
Rawevents
DeeplikesBudlight
Amrit likesMakers
Ledion likescognac
DavelikesJackDaniels
Zhanglikesvodka
DeeplikesMakers
DavelikesMakers
32
RawDataGetsIndexed Term PostingsListAmrit 1Bud 0Daniels 3Dave 3,6Deep 0,5Jack 3Ledion 2Makers 1,5,6Zhang 4cognac 2likes 0,1,2,3,4,5,6light 0vodka 4
Rawevents
DeeplikesBudlight
Amrit likesMakers
Ledion likescognac
DavelikesJackDaniels
Zhanglikesvodka
DeeplikesMakers
DavelikesMakers
• Eachwordintheraweventisindexed• TheTSIDXwillstoretheoffset#,andlocationinthegzip’d journal
• Queryingdave makersreturns#6
ReadingCompressedRawdatajournal.gz
078148236380434506
33
Example:Readingoffsets(120,170)1. Groupoffsetsintoresidingchunks
120fallsintorange(78,148)170fallsintorange(148,236)
2.Readdataoffdiskanddecompress3.Runthroughfieldextractions4.Recheckfilters5.Runcalculations
Thisisdisk+CPUEXPENSIVE
34
StoringIndexedFieldsinTSIDXTerm PostingsListbar::AB 1,3,7,39,98bar::cez 0,6,9,12bar::xyz 3,4,5,6baz::1 3,6,85baz::2567 0,5baz::462 3,24,45baz::98 2,3,5,8,9baz::99023 1,5,6,76,99foo::afdjsi 4,567,2345foo::aghdafo 2,234,6667foo::bazcxuid 0,1,623,7777foo::cef 0,1,2,3,4,43foo::zaz 4
BigIdea:Usethelexiconasafieldvaluestore!
Bysimplyseparatingfieldsandvalueswith“::”wecanstoresufficientinformationtorunmoreinterestingqueries.
DataModelqueriesdon’t evervisitrawlogs.TheyliveentirelywithinTSIDX!
HowtoTransitionfrom_rawtotstatsAwholenewworld(don’tyoudarecloseyoureyes)
ProcessOverview• Buildyourdatamodelwithwhateverfieldsyoucouldcareabout• Startwithyourrawsearch• Identifytheaggregationthatyouwanttodo
– Statsavg(bytes),dc(host),whateverelse
• Maketheminorsyntaxadjustmentsfortstats
ExampleWithoutDataModelsRawindex=*|statscountbyindex,sourcetype
Tstats|tstats countwhereindex=*groupby index,sourcetype
ExampleWithDataModelsRawtag=networktag=traffic|statsdc(dest_ip)bysrc_ip
Tstats|tstats dc(All_Traffic.dest_ip)fromdatamodel=Network_Trafficgroupby All_Traffic.src_ip
Challenge:IdentifyingFields• Whatfieldsareactuallyinadatamodel?• HowdidIknowtouse“All_Traffic.dest_ip”insteadof“dest_ip”orinsteadof“Network_Traffic.dest_ip?
• Tofigureitout,wecanlookatthedatamodeldefinitionviapivot,orattheresultingtsidx filesviawalklex
• Pivotdoesn’trequireSSHaccess,butstillleavesyouguessingforparts• walklex ismuchmoreaccurateandpreferable
IdentifyingFieldsviaPivot
ThedatamodelisNetwork_Traffic,andtherooteventnodeis
“All_Traffic”sofieldsshouldmostlybeAll_Traffic.fieldname
IdentifyingFieldsviaWalklex• FindtheTSIDXFileonyourindexer(let’sassumeadatamodel)
– Pathsetinyourindexconfig,butbydefaultintheindexfolder– Usually$SPLUNK_HOME/var/lib/splunk/<INDEX>/datamodel_summary/<BUCKET_ID>
/<SEARCH_HEAD_GUID>/<DATAMODEL_NAME>/<TIMERANGE>.tsidx– Goodnews:That’sbyfarthehardpart– Example:/opt/splunk/var/lib/splunk/defaultdb/datamodel_summary/1772_813B72E7-6743-
4F46-9DE6-536F78929EDD/813B72E7-6743-4F46-9DE6-536F78929EDD/DM_Splunk_SA_CIM_Network_Traffic/1466344886-1466326949-3864670955536478127.tsidx
• Runwalklex,eitherwithanemptystring“”orawildcard“*dest_ip*”– $SPLUNK_HOME/bin/splunk cmd walklex <TSIDXFILE>“”
ExampleWalklex
ExampleWalklex foraParticularField
ExampleDistinctCountofWalklex Fields• /opt/splunk/bin/splunk cmd walklex 1457540473-1457196480-3287925045170504614.tsidx""|tr
-s""|cut-d""-f3|grep"::"|awk -F"::"'{print$1;}'|sort|uniq -c
tstats whereclause• Workssurprisinglyliketheinitialsearchcriteriaofarawsearch• whereindex=*sourcetype=pan_traffic ORsourcetype=pan:traffic
– Justlikenormalsearch
• |tstats countwhereindex=pan10.1.1.1– Withnon-datamodel data,10.1.1.1willbeinthetsidx.
• whereearliest=-24h– Notethatthereisabugin6.3,6.4whereamorerestrictivetimepicker range
doesn’toverridetheearliest=…(unlikeinrawsearch– thisisabug)
tstats groupingby• Whengroupingbyvalues(e.g.,src_ip,sourcetype,etc.)it’slikeanormalstats….by…– |tstats countwhereindex=*groupby source,index
• Youcanalsogroupbytime,withoutusingthebucketcommand– |tstats countwhereearliest=-24hindex=*groupby index_timespan=1h
BugsandSurprises• There’sabugin6.3/6.4withearliestandlatestwheretstats doesn’toverridethetimepicker,soeasiesttoleaveyourtimepickeratalltime.
• Sometimeststats handleswhereclausesinsurprisingways.Forexample: nounderscoresinvalues,nosplunk_server_group,nocidrmatches (All_Traffic.dest_ip!=172.16.1.0/24– Fail.All_Traffic.dest_ip!=172.16.1.*– Success)
WhenDataModelAccelerationortstats Don’tWorka sad,sadday….
Ontheoutputofastatscommand• Sadly,youcan’taccelerateasearch-baseddatamodel,sonoluck.• ThisiswhereSummaryIndexingcomesin• Youcanalsodoindex-timefieldextractionsonsummaryindexesifyou’refancy,andthentstats onthose!
fields.conf:[indexed_itsi_kpi_id]INDEXED=true
Workaround:Stats->SI+IndexTime->tstats• Creatingindextimefieldsisahassle,involvingfields.conf,props.conf,transforms.conf,butitworksonsummaryindexeddata.
• Forexample,fromITSI,weindexthefieldindexed_itsi_kpi_id fromsummaryindexedsearches(sourcetype:stash_new)
props.conf:[stash_new]TRANSFORMS-set_kpisummary_index_fields = set_kpisummary_kpiid
transforms.conf:[set_kpisummary_kpiid]REGEX = itsi_kpi_id\s*=\s*([^\s,]+)WRITE_META = trueFORMAT = indexed_itsi_kpi_id::$1
WhenYourCardinalityisCrazyHigh• Tstats canprocesshugenumbersofevents(billions,trillions,noproblem).
• Butifwehavetostoremillionsofrowsinmemorybasedonyoursplit-by,thatcanberough
• Example:300,000personcompanytracks#ofloginsperuserperdayover100days.300,000*100=30Mrows,whichmeanswritingpartialresultstodiskandsadness.
• Betterapproachistosummaryindexeachday,andthenusetstats toprocessthoseresultseitherviaindex-timesummarizationorDMA
WhenAnyCardinalityiscrazycrazyhigh• tstats efficiencyisfundamentallybasedontheassumptionthataparticularvaluewillbeusedafewtimes.
• Ifyouhavemillionsofevents,eachwith10datapoints,with10pointsofprecisionsuchthatrepeatvaluesareunlikely,yourtsidx filewillbeabsolutelymassive.
RealWorldExamplesWhenthingsstopbeingslow,andstartgettingreal.
Splunk(x)- IndexSearches• ForrunningourSplunkInternalUBAproject,weneededtoknowwhatsourcetypes wereinthesystem.
• _raw:index=*earliest=-24h|bucket_timespan=1h|statscountbysourcetype,_time– Timetocomplete:68,476seconds(19hours)
• tstats:|tstats countwhereindex=*groupby sourcetype _timespan=1h– Timetocomplete:6.19seconds
• SpeedDifference:11,062x (notpercent,eleventhousandtimesfaster)• QueryLengthdifference:18charactersshorter
FinancialCustomerXMLUseCase• WhatTechnology?
– AcceleratedDataModelswithPivot
• Why?– HeavyXMLParsingmeantsearchquerieswereterriblyslow– Pivotwasveryeasytouse
• Result– Veryhighscale,veryhappycustomer
FinancialCustomerXMLUseCase(2)• NoXMLextraction• Raw:8.811seconds
– index=xx-xxxx sourcetype=xxx-xxxsplunk_server=myserver01.myserver.localParticularLogIdentifier host=*ServerType*|timechart countbyhost
• AcceleratedPivot:1.25seconds– |pivotXXXXXXX YYYYYY count(YYYYYY)AS"NumberofEvents"SPLITROW_timeAS_timePERIODautoSPLITCOLhostFILTERhostis
"*ServerType*"SORT100_timeROWSUMMARY0COLSUMMARY0NUMCOLS100SHOWOTHER1
• Tstats SummariesOnly:0.896seconds– |tstats summariesonly=tcountfromdatamodel=XXXXXXXwhere(nodename =YYYYYY)(YYYYYY.host="*ServerType*")groupby _time
• SpeedDifference:9.9xFaster• QueryLength:18charactersshorter
FinancialCustomerXMLUseCase(3)• SingleXMLExtractionviaspath• _raw:299.763seconds
– index=xx-xxxx sourcetype=xxx-xxxsplunk_server=myserver01.myserver.localParticularLogIdentifier |eval RuleId=spath(_raw,"___path____.___to__._____very_______._____long___._:_xml___._:____._:_____.______.__________.__________.______________.______")|timechart countbyRuleId
• AcceleratedPivot:2.4seconds● |pivotXXXXXXXYYYYYYcount(YYYYYY)AS"NumberofEvents"SPLITROW_timeAS_timePERIODautoSPLITCOLRuleId SORT100_time
ROWSUMMARY0COLSUMMARY0NUMCOLS100SHOWOTHER1
• tstatssummariesonly:2.04seconds– |tstats summariesonly=tcountfromdatamodel=XXXXXXXwhere(nodename =YYYYYY)groupby RuleId_time
• SpeedDifference:about146.9xfaster• QueryLength:50charactersshorter
FinancialCustomerXMLUseCase(4)• HeavyXMLExtraction(mentionedearlier).Searchesanonymized...• AnEntireDashboardofUnaccelerated PivotswithlotsofXMLspath
– Timetocomplete:172,800seconds(2days)
• AnEntireDashboardofAcceleratedPivots– Timetocomplete:16seconds
• SpeedDifference:about10000x• TimeTakentoBuild14PanelDashboardviaPivot:15minutes
ESEndpoint+Proxy+AV• WhatTechnology?
– ESDataModels+tstats
• Why?– ESDataModelswerealreadybuilt,andmultipledatasourcessotstats append=t
• Result– Superfastsearch,highscalable.– DataModelsmakethingseasier
• Downside– Inthiscase,a19secondsavingsevery15minutes=a$211ROI/yearona$300k
Splunkinfrastructure…maybenotenough?
ESEndpoint+Proxy+AV• Fromlastyear’sSecurityNinjutsu PartTwo,correlatingsysmon withproxyandAVdata.
• _raw:[searchtag=malwareearliest=-20m@mlatest=-15m@m|tabledest |renamedest assrc ]
earliest=-20m@m(sourcetype=sysmon ORsourcetype=carbon_black eventtype=process_launch)OR(sourcetype=proxycategory=uncategorized)
|statscount(eval(sourcetype="proxy"))asproxy_events count(eval(sourcetype="carbon_black"ORsourcetype="sysmon"))asendpoint_events bysrc
|whereproxy_events >0ANDendpoint_events >0– 21seconds
ESEndpoint+Proxy+AV(2)• tstats:|tstats prestats=tsummariesonly=tcount(Malware_Attacks.src)asmalwarehits fromdatamodel=MalwarewhereMalware_Attacks.action=allowedgroupby Malware_Attacks.src
|tstats prestats=tappend=tsummariesonly=tcount(web.src)aswebhits fromdatamodel=Webwhereweb.http_user_agent="shockwaveflash"groupby web.src
|tstats prestats=tappend=tsummariesonly=tcount(All_Changes.dest)fromdatamodel=Change_Analysis wheresourcetype=carbon_black ORsourcetype=sysmon groupby All_Changes.dest
|renameweb.src assrc Malware_Attacks.src assrc All_Changes.dest assrc
|statscount(Malware_Attacks.src)asmalwarehits count(web.src)aswebhits count(All_Changes.dest)asprocess_launches bysrc
– 2seconds
ESEndpoint+Proxy+AV(3)• SpeedDifference:10.5x
– Itdoesn’talwayshavetobe10,000x.10xoreven3xisstillahugereductioninresources.
• QueryLengthdifference:282characterslonger– Multiplenamespacescanmakethingslonger,andalsomaybemorecomplicated
sometimes.Worthitthough.
AdvancedTopicsBecauseit’sbeenstraightforwardsofar,right?
allow_old_summaries andsummaries_only• Thesetwosettingsareperhapsthemostimportanttotstats.• summaries_only meansthatwewon’tautomaticallyfallbacktorawdata– thismeansfastresults,andmuchmoreofadifferencethanyouwouldprobablyexpect.Ifsearching100daysofdata,and15minutesaren’taccelerated,weprobablydon’tcare.
• allow_old_summaries iskeyfortwoscenarios:– Youleveragethecommoninformationmodel,whichisperiodicallyupdated,
andyouwanttobeabletosearchdatafromanearlierversion(verylikely)– Youhavemultipleappswithdifferentglobalconfigsharingsettings,andyou
wanttosearchfromanappthatdidn’t*generate*thedatamodeloriginally.
allow_old_summaries andsummaries_only (2)• WhilethesesettingsareautomaticallysettotrueinES(andprobablyotherSplunkownedapps),becausetheyaresokeyyoumaywanttosetthemtotrueautomaticallyacrossthesystemvialimits.conf
• Bigimpact:pivotwillusewhateverthedefaultis– Note:thepivotuserinterfaceactuallyruns
tstats.Thepivotsearchcommandisnotimpacted– Iknow,Iknow.
prestats=t• Tstats canbefedintoupstreamstats.Forexample,tstats _timespan=…putdirectlyintoagraphlooksterrible.
• |tstats prestats=tcountwhereindex=*groupby _timespan=1dindex|timechart span=1dcountbyindex
chunk_size• Howmuchdatawillberetrievedbytstats fromatsidx fileatonce• Tradeoffbetweenmemory,sorting,andotherfactors• Defaultvalue(10000000– 10MB)isusuallytherightfit.
– Loweringthatcouldsignificantlyhurtperformance.– Forveryhighcardinality,raisingitto50MBor100MBmaybebeneficial– Worthtestingoutonlyforalong-runningsearchyouwilluseregularly
SearchingAcrossMultipleNamespaces• Withnormalsearch,youcanuseasmanydifferentindexes,sourcetypes,etc asyouwant,withrecklessabandon.
• Withtstats,youcanuseappend=t,butrequiresprestats=t.Frequentlyrequiresmungingwitheval alongtheway.
• |tstats prestats=tdc(All_Traffic.dest)fromdatamodel=Network_Trafficgroupby All_Traffic.src|tstats prestats=tappend=tcountfromdatamodel=MalwaregroupbyMalware_Attacks.dest|eval system=coalesce('All_Traffic.src','Malware_Attacks.dest')|statsdc(All_Traffic.dest),countbysystem
SearchingAcrossMultipleNamespaces(2)• Ifyouarequeryingthesameparametersinthefirstandsecondquery,suchascomparingtimespansorlookingattwocounts,useeval withcoalecese todefineafield
|tstats prestats=tappend=tcountfromdatamodel=Malwarewhereearliest=-24hgroupby Malware_Attacks.dest|eval range="current"|tstats prestats=tappend=tcountfromdatamodel=Malwarewhereearliest=-7dlatest=-24hgroupby Malware_Attacks.dest|eval range=coalesce(range,"past")|chartcountoverMalware_Attacks.dest byrange
SearchingAcrossMultipleNamespaces(3)
70
Youcanalsousedifferentfields,suchascount(Malware_Attacks.src),count(web.src),andetc.• |tstats prestats=tsummariesonly=tcount(Malware_Attacks.src)as
malwarehits fromdatamodel=MalwarewhereMalware_Attacks.action=allowedgroupby Malware_Attacks.src
• |tstats prestats=tappend=tsummariesonly=tcount(web.src)aswebhits fromdatamodel=Webwhereweb.http_user_agent="shockwaveflash"groupby web.src
• |tstats prestats=tappend=tsummariesonly=tcount(All_Changes.dest)fromdatamodel=Change_Analysis wheresourcetype=carbon_black ORsourcetype=sysmon groupbyAll_Changes.dest
• |renameweb.src assrc Malware_Attacks.src assrcAll_Changes.dest assrc
• |statscount(Malware_Attacks.src)asmalwarehits count(web.src)aswebhits count(All_Changes.dest)asprocess_launches bysrc
PullMalwareData
PullWeb(Proxy)Data
PullEndpointData
NormalizeFieldNames
DoCount
Drilldown• Drilldownsfromtstats queriesdon’toftenworkcorrectly• Besttoputthatinadashboardwhereyoucanmanuallydefinethedrilldown
_indextime• WhiletheSplunkUIdoesn’tshow_indextime normally,youcanuseitbecauseitisanindexedfield.Just|eval _time=_indextime
• Youcan’tdoaggregationsonit,butyoucanfilter!• Boththetimerangepicker*AND*_indextime apply|tstats countmin(_time)asmin_time max(_time)asmax_time where
[|statscountassearch|eval search="_indextime>".relative_time(now(),"-7d")|tablesearch]
index=*groupby _indextime
|eval lag=_indextime - (min_time +max_time)/2
|eval _time=_indextime
|timechart avg(lag)
ASpecialNoteAboutTime_timeisspecialwithtstats,foracoupleofreasons:• Youcan’tdoavg(_time)orrange(_time)• Youcandomin(_time)andmax(_time)andofcoursegroupby _timespan=10m(orwhatevertime)
Cardinality• Datamodelsarephenomenalwithsplit-bycardinality,e.g.:
– |tstats avg(bytes)fromdatamodel=Network_Traffic groupby All_Traffic.dest_ip
• Datamodelsarelessgreatwithoverwhelmingfieldcardinality,whentrackingmetricdata
• Roundoffirrelevantdatapoints.Ifyouhavetemperatureto7decimalplaces,but1decimalplaceisallthatactuallymatters,justacceleratethat.– Don’tincludetheunroundedfieldinyourdatamodel,becausethenthe
accelerationwillstoreitandyou’llusemorediskspace.
SchemeonWhat?• DataModelsareagreatcombinationofschemaonreadandschemaonwrite.
• AswitheverythinginSplunk,youcanflexiblydefineandchangeyourschema,rebuildtsidx,etc.
• Butforaccelerateddatamodels,yougetalltheperformanceofschemeonwrite…withoutlosingtheflexibilitytoredefineandrebuildasneeded.– Obviously,forVERYlargedatamodels,youmightnotwanttowaitforarebuild,
butyoucanaffectmovingforward
QuirksofDataModelAcceleration• Secondcompression.Youcan’tlookatmillisecondsormicrosecondsfor_timewithouthijinks(separatefieldandseparatefiltering)
• Requiresstats.It’scalledtstats forareason– there’snotstatsraw in6.4.
• |datamodel searchcommandwasthedevil<6.4– muchbetterinnewestrelease
• Interrogatingfieldsisahassle• TSIDXtradesdiskspaceforperformance
SummaryLet’spullitalltogether,team
Summary
78
Gettingstarted:useacceleratedpivotondatamodelsGettingstartedw/tstats:usetstats onnormalindexeddata– countingevents– lookingforindextime lagtstats isactuallyreallyeasyThatsaid,therearesomeweirdquirks.– CheckoutthePDF
THANKYOU