taming*your*data · *agenda! osu*splunk*deployment–environmental*background*...

Post on 21-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Copyright  ©  2014  Splunk  Inc.  

Mark  Runals  Sr  Security  Engineer  The  Ohio  State  University  

Taming  Your  Data  

Disclaimer  

2  

During  the  course  of  this  presentaFon,  we  may  make  forward-­‐looking  statements  regarding  future  events  or  the  expected  performance  of  the  company.  We  cauFon  you  that  such  statements  reflect  our  current  expectaFons  and  

esFmates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For  important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,  

please  review  our  filings  with  the  SEC.  The  forward-­‐looking  statements  made  in  the  this  presentaFon  are  being  made  as  of  the  Fme  and  date  of  its  live  presentaFon.  If  reviewed  aRer  its  live  presentaFon,  this  presentaFon  may  not  contain  current  or  accurate  informaFon.  We  do  not  assume  any  obligaFon  to  update  any  forward-­‐looking  statements  we  may  make.  In  addiFon,  any  informaFon  about  our  roadmap  outlines  our  general  product  direcFon  and  is  subject  to  change  at  any  Fme  without  noFce.  It  is  for  informaFonal  purposes  only,  and  shall  not  be  incorporated  into  any  contract  or  other  commitment.  Splunk  undertakes  no  obligaFon  either  to  develop  the  features  or  funcFonality  described  or  to  

include  any  such  feature  or  funcFonality  in  a  future  release.  

Disclaimer  

3  

During  the  course  of  this  presentaFon,  we  may  make  forward  looking  statements  regarding  future  events  or  the  expected  performance  of  the  company.  We  cauFon  you  that  such  statements  reflect  our  current  expectaFons  and  

esFmates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For  important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,  

please  review  our  filings  with  the  SEC.  The  forward-­‐looking  statements  made  in  the  this  presentaFon  are  being  made  as  of  the  Fme  and  date  of  its  live  presentaFon.  If  reviewed  aRer  its  live  presentaFon,  this  presentaFon  may  not  contain  current  or  accurate  informaFon.  We  do  not  assume  any  obligaFon  to  update  any  forward  looking  statements  we  may  make.  In  addiFon,  any  informaFon  about  our  roadmap  outlines  our  general  product  direcFon  and  is  subject  to  change  at  any  Fme  without  noFce.  It  is  for  informaFonal  purposes  only  and  shall  not,  be  incorporated  into  any  contract  or  other  commitment.  Splunk  undertakes  no  obligaFon  either  to  develop  the  features  or  funcFonality  described  or  to  

include  any  such  feature  or  funcFonality  in  a  future  release.  

 Agenda  

!   OSU  Splunk  deployment  –  environmental  background  !   Props/field  extracFon  score  methodology  !   Look  at  data  curator  app  

4  

FYI  -­‐  Splunk  Admin  Focused  PresentaFon  

Some  Background  &  Program  Drivers    

5  

135  Distributed  IT  units  around  OSU  •  Each  group  is  autonomous  •  No  standardizaFon  •  Huge  variety  of  technologies  •  Splunk  use  not  mandatory    Desired  lightweight  onboarding  process  •  For  units  &  for  Splunk  team  

=  

OSU  Environment  Incredible  roll-­‐on/adopFon  rate  

+  

 Fast  Forward  a  Year  or  2  +/-­‐  

6  

!   2TB  Of  data  !   1,800+  Splunk  agents  !   10k  Devices  !   12  Types  of  firewalls  !   MulFple  OS  !   90+  Teams  with  data  in  Splunk  !   700+  Sourcetypes  –  many  ‘learned’  !   350+  People  

 Fast  Forward  a  Year  or  2  +/-­‐  

7  

!   2TB  Of  data  !   1,800+  Splunk  agents  !   10k  Devices  !   12  Types  of  firewalls  !   MulFple  OS  !   90+  Teams  with  data  in  Splunk  !   700+  Sourcetypes  –  many  ‘learned’  !   350+  People  

Is  data  being  ingested  correctly?    What  fields  have  been  defined?  Where?    What  types  of  data  are  in  Splunk?    What’s  not  configured  correctly?  

Issue  Overview  

8  

Out  of  the  box  and  without  specific  data  definiFon  Splunk  will  generally  ingest  data  correctly  •  Host  names  •  Sourcetypes  •  Timestamp    •  Line  breaking  •  Auto  key-­‐value  fields    At  best  though,  this  isn’t  efficient.  At  worst,  it  can  strain  your  deployment  and  may  drop/lose  events  

 Factors  in  play  •  Hardware  •  RaFo  of  indexers  to  total  log  volume  •  Sourcetype  velocity  •  Data  distribuFon  (forwarders  pre  5.0.4  will  favor  first  indexer  listed  in  autoLB  outputs.conf)  •  Weird  date/Fme  informaFon  in  your  logs  •  Etc…    

 Data  Import/DefiniFon  Pipeline    

9  

DM  =  Index  Time  Processing  •  Sourcetyping  •  Line  breaking  •  Timestamp  •  Host  field  •  etc  

KM  =  Search  Time  Processing  •  Base  level  field  extracFon  •  Normalized  field  names  •  Field  name  alignment  within    

Common  InformaFon  Model  (CIM)  •  Knowledge  objects  

Get  Data  to  Splunk   Data  Management   Knowledge  Management  

(Mark’s  View)  

 The  Plan  

10  

Data  Management   Score  based  on  ‘Gepng  Data  in  Correctly’  .conf  2012  preso  

Knowledge  Management   Score  based  on  length  of  fields  relaFve  to  _raw  length    (conversaFon  with  Kevin  Meeks)    Data  Curator  App  

Data  Taxonomy   Create  way  to  classify  sourcetypes  

IdenFfy  Common  Issues   Munge  through  internal  logs  

 Data  Management  –  Props  Score  

11  

[mah_data_stanza]  TIME_PREFIX  =  MAX_TIMESTAMP_LOOKAHEAD  =  TIME_FORMAT  =  SHOULD_LINEMERGE  =    LINE_BREAKER  =  TRUNCATE  =    TZ  =    

 Data  Management  –  Props  Score  

12  

[mah_data_stanza]  TIME_PREFIX  =  MAX_TIMESTAMP_LOOKAHEAD  =  TIME_FORMAT  =  SHOULD_LINEMERGE  =    LINE_BREAKER  =  TRUNCATE  =    TZ  =    

+1 +1

+1 OR DATETIME_CONFIG  =    +3

 Data  Management  –  Props  Score  

13  

[mah_data_stanza]  TIME_PREFIX  =  MAX_TIMESTAMP_LOOKAHEAD  =  TIME_FORMAT  =  SHOULD_LINEMERGE  =  False  LINE_BREAKER  =  TRUNCATE  =    TZ  =    

+1

….but  what  if  my  data  should  be  merged?  

 Data  Management  –  Props  Score  

14  

[mah_data_stanza]  TIME_PREFIX  =  MAX_TIMESTAMP_LOOKAHEAD  =  TIME_FORMAT  =  SHOULD_LINEMERGE  =  True  LINE_BREAKER  =  TRUNCATE  =    TZ  =     +1

AND

One  of  these  is  populated  BREAK_ONLY_BEFORE  MUST_BREAK_AFTER  MUST_NOT_BREAK_BEFORE  MUST_NOT_BREAK_AFTER  

 Data  Management  –  Props  Score  

15  

[mah_data_stanza]  TIME_PREFIX  =  MAX_TIMESTAMP_LOOKAHEAD  =  TIME_FORMAT  =  SHOULD_LINEMERGE  =    LINE_BREAKER  =  TRUNCATE  =    TZ  =    

+1

Default  is  ([\r\n\]+)  

Don’t  want  to  line  break?  ((?!))  or  ((*FAIL))  are  a  couple  opFons*  

*hyp://answers.splunk.com/answers/106075/each-­‐file-­‐as-­‐one-­‐single-­‐splunk-­‐event  

 Data  Management  –  Props  Score  

16  

[mah_data_stanza]  TIME_PREFIX  =  MAX_TIMESTAMP_LOOKAHEAD  =  TIME_FORMAT  =  SHOULD_LINEMERGE  =    LINE_BREAKER  =  TRUNCATE  =    TZ  =    

Default  is  10000  

+1

Game  your  score!  Ø  Set  this  to  anything  other  than  the  default  

i.e.  10001  or  999999  

+0

 Data  Management  –  Props  Score  

17  

[mah_data_stanza]  TIME_PREFIX  =  MAX_TIMESTAMP_LOOKAHEAD  =  TIME_FORMAT  =  SHOULD_LINEMERGE  =    LINE_BREAKER  =  TRUNCATE  =    TZ  =     +1

If  sepng  this  across  your  environment  isn’t  possible/pracFcal  reduce  the  max  score  macro  in  the  app.  It’s  used  as  a  variable.  

Macro:    props_score_upper_bounds  =  7     6 \

 Data  Management  –  Props  Score  

18  

[mah_data_stanza]  TIME_PREFIX  =  MAX_TIMESTAMP_LOOKAHEAD  =  TIME_FORMAT  =  SHOULD_LINEMERGE  =    LINE_BREAKER  =  TRUNCATE  =    TZ  =    

Max  Score  =  7    (st_score  *  `props_score_scale`)  /  `props_score_upper_bounds`     10

Props  Score  Caveats  

19  

There  are  a  lot  of  addiFonal  props  sepngs  that  could  be  applicable  for  your  data/environment.      This  method/app  doesn’t  address  host  fields  that  are  incorrect  

syslog   Default  host  field?  

Splunk  UF  

Props  Score  Caveats  

20  

There  are  a  lot  of  addiFonal  props  sepngs  that  could  be  applicable  for  your  data/environment.      This  method/app  doesn’t  address  host  fields  that  are  incorrect  

syslog   Default  host  field?  

Splunk  UF  

 Field  ExtracFon  Score  Methodology  

21  

10.10.10.10  -­‐  -­‐  [20/Aug/2014:13:44:03.151  -­‐0400]  "POST  /services/broker/phonehome/connecFon_10.10.10.10_8089_10.10.10.10_TEST-­‐TS_68D82260-­‐CC1D-­‐4203-­‐83CA-­‐6E24F9FE6538  HTTP/1.0"  200  24  -­‐  -­‐  -­‐  1ms  

1.  Account  for  any  autokv  field  names  2.  Do  convoluted  search  to  get  length  of  fields  3.  Account  for  Fmestamp  in  log  4.  Get  total  length  

1.  Remove  spaces  2.  Remove  newline  characters  3.  Get  _raw  length  

_raw  length  Length  of  Fields  

=   %  of  Event  has    Fields  Defined  

 Field  ExtracFon  Score  Methodology  

22  

10.10.10.10  -­‐  -­‐  [20/Aug/2014:13:44:03.151  -­‐0400]  "POST  /services/broker/phonehome/connecFon_10.10.10.10_8089_10.10.10.10_TEST-­‐TS_68D82260-­‐CC1D-­‐4203-­‐83CA-­‐6E24F9FE6538  HTTP/1.0"  200  24  -­‐  -­‐  -­‐  1ms  

1.  Account  for  any  autokv  field  names  2.  Do  convoluted  search  to  get  length  of  fields  3.  Account  for  Fmestamp  in  log  4.  Get  total  length  

1.  Remove  spaces  2.  Remove  newline  characters  3.  Get  _raw  length  

_raw  length  Length  of  Fields  

=   %  of  Event  has    Fields  Defined  

11

2 3 11 11 7 36 8 3 4

 Field  ExtracFon  Score  Methodology  

23  

10.10.10.10  -­‐  -­‐  [20/Aug/2014:13:44:03.151  -­‐0400]  "POST  /services/broker/phonehome/connecFon_10.10.10.10_8089_10.10.10.10_TEST-­‐TS_68D82260-­‐CC1D-­‐4203-­‐83CA-­‐6E24F9FE6538  HTTP/1.0"  200  24  -­‐  -­‐  -­‐  1ms  

1.  Account  for  any  autokv  field  names  2.  Do  convoluted  search  to  get  length  of  fields  3.  Account  for  Fmestamp  in  log  4.  Get  total  length  

1.  Remove  spaces  2.  Remove  newline  characters  3.  Get  _raw  length  

_raw  length  Length  of  Fields  

=   %  of  Event  has    Fields  Defined  

11

2 3 11 11 7 36 8 3 4

*  Not  a  great  example  –  Splunk  forwarder  phonehome  logs  actually  have  +100%  field  length  compared  to  _raw    

 Field  ExtracFon  Score  Methodology  

24  

Caveats/ConsideraFons  

Doesn’t  account  for  field  alias  (will  arFficially  inflate  score)  

If  field  extracFon  %  is  over  100  the  score  is  set  to  100  

DirecFonally  correct  is  about  the  best  this  will  get  

 Fields  extracted  !=  field  value  Ø     

 Data  Taxonomy  

25  

Version  1  –  deprecated  out  of  the  box  

Designed  to  answer  “What  type  of  data  is  in  Splunk?”    Created  a  2nd  field  classificaFon  csv  for  several  hundred  sourcetypes  •  Data  family  •  Data  subtype    Very  useful  but  too  many  one-­‐to-­‐many  relaFonships  based  on  data  use  

netstat   ConfiguraFon?  Networking?  

Server  Monitoring  Server  InformaFon  Server  ConfiguraFon  Server  Performance  

Too  many  server  *  

 Data  Taxonomy  –  InteracFve  Host  Dashboard  

26  

Host  A  

 Data  Taxonomy  –  InteracFve  Host  Dashboard  

27  

Host  B  

 Data  Curator  App  

28  

Goals  •  Flexible  scoring  scale  

•  Generate  aggregate,  system  maturity  scores  

•  Generate  ~accurate  individual  maturity  score  

•  Show  what  app/package  contained  props  sepngs  

•  Show  current  props  sepngs  

•  Highlight  issues  related  to/solvable  by  props  sepngs  –  Line  breaking  –  Timestamp  –  Transforms  issues  

Take  Note!  •  Will  NOT  tell  you  what  the  sepngs  should  be  •  Requires  Splunk  6  search  head  •  Only  able  to  work  through  issues  I  saw  in  my  

environment  -­‐  you  may  have  others.  •  I  can  troubleshoot  my  app    

–  not  your  deployment  =)  

 Deployment  At  A  Glance  

29  

 Props  Score  Breakdown  

30  

Holy  Crap!!  Lots  of  Work  

….but  before  you  slit  your  wrists  

 Props  Score  Breakdown  

31  

 Learned  Sourcetypes  (-­‐too_small  OR  -­‐#)  

32  

Beware  of  diminishing  returns  on  working  the  ‘long  tail’  

 Sourcetype  Deep  Dive  Dashboard  

33  

Avamar  Logs  

 Sourcetype  Deep  Dive  Dashboard  

34  

Avamar  Logs  

Not  all  items  factor  into  score  

 Sourcetype  Deep  Dive  Dashboard  

35  

Avamar  Logs  

Loaded  score  based  on  volume  of  events  per  punct.    Score  created  on  the  fly  

 Sourcetype  Deep  Dive  Dashboard  

36  

Avamar  Logs   Based  on  volume  of  events  per  punct.  Quick  way  to  see  how  unique  logs  in  a  parFcular  sourcetype  are.  

Had  75  unique  punct  

 Sourcetype  Deep  Dive  Dashboard  

37  

ABDCB  (learned)  

 Sourcetype  Deep  Dive  Dashboard  

38  

Argus  

 IdenFfying  Date/Time  Issues  

39  

 IdenFfying  Date/Time  Issues  

40  

These  events  don’t  have  Fmestamps!  

 IdenFfying  Date/Time  Issues  

41  

These  events  don’t  have  Fmestamps!   What  if  Splunk  thinks  the  last  known  good  Fmestamp  was  6  years  ago?  

 IdenFfying  Date/Time  Issues  

42  

These  events  don’t  have  Fmestamps!   What  if  Splunk  thinks  the  last  known  good  Fmestamp  was  6  years  ago?  

 Date/Time  Workspace  Dashboard  

43  

Pre-­‐populated  with  sourcetypes  having  issues  

(DATETIME_CONFIG  added  to  view  aRer  screenshot)  

AddiFonal  Dashboard  Elements  •  Clustered  internal  logs  giving  you  a  level  of  visibility  •  100  most  recent  events  

(No  Fme  informaFon  set)  

 Line  Breaking/Truncate  Workspace  Dashboard  

44  

 Line  Breaking/Truncate  Workspace  Dashboard  

45  

 Line  Breaking  Sanity  Check  Dashboard  

46  

Sourcetypes  have  line  breaking  set  but  have  mulFple  line  counts  in  recent  events  

 Line  Breaking  Sanity  Check  

47  

Sourcetypes  have  line  breaking  set  but  have  mulFple  line  counts  in  recent  events  

Set  in  mulFple  apps;  potenFal  problem  down  the  road?  

Query  TroubleshooFng  

48  

Two  main  scheduled  searches  that  are  somewhat  computaFonally  expensive.    Dashboard  allows  admin  to  compare  run  length  &  frequency  to  coverage  

Sourcetype  field  length  percentage  query  

 Extract/Report/Transforms  Issues  

49  

08-­‐21-­‐2014  08:55:46.348  -­‐0400  WARN    SearchOperator:kv  -­‐  IndexOutOfBounds  invalid  The  FORMAT  capturing  group  id:  id=7,  transform_name='Message'  

08-­‐21-­‐2014  08:59:02.854  -­‐0400  WARN    SearchOperator:kv  -­‐  Invalid  key-­‐value  parser,  ignoring  it,  transform_name='extract_cmd_change'  

08-­‐21-­‐2014  08:59:03.345  -­‐0400  WARN    SearchOperator:kv  -­‐  Invalid  key-­‐value  parser,  ignoring  it,  transform_name='(?i)^(?:[^\|]*\|){3}(?P<dest_domain>[^\|]+)'  

…wut?    Which  app?    In  props  or  transforms?    

Example  Internal  Warning  Logs  

SoluFon:  grep  -­‐r  through  520+  packages  in  deployment-­‐apps  directory  for  ‘Message’?  

 Extract/Report/Transforms  Issues  

50  

 Extract/Report/Transforms  Issues  

51  

Only 5 tokens

 Extract/Report/Transforms  Issues  

52  

Anyone  know  what  the  issue  is?  

 Extract/Report/Transforms  Issues  

53  

Should  be  an  EXTRACT  

 KM  –  Sourcetype  Fields  Comparison  

54  

Boyom  of  explanatory  text.  There  is  a  freeform  text  search  box  at  top  of  dashboard  

 App  Roadmap  

55  

Now  •  Props  maturity  scores  •  Field  extracFon  scores  •  Issues  workspaces  •  Data  taxonomy  

RelaFvely  non-­‐scaling  

Next  •  Dashboard  opFmizaFon    

(ie  searchTemplate)  •  Tag  based  data  taxonomy  •  Any  iniFal  app  bug  fixes  

ARer  Next  •  Tie  in  data  model  fields  •  Field  value?  •  Expand  issue  

troubleshooFng  Based  on  community  feedback      

   

56  

?  

Check  out  the  Forwarder  Health  app  in  Splunkbase  

Blog:  runals.blogspot.com  

.conf  14  updated  Ge8ng  Data  in  Correctly  presentaFon–  Andrew  Duca  

THANK  YOU  

top related