understanding*splunk* acceleraon*technologies* · convergng*posgng*values*to*events* posng value...

33
Copyright © 2013 Splunk Inc. David Marquardt Senior So?ware Engineer #splunkconf Understanding Splunk AcceleraGon Technologies

Upload: others

Post on 21-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Copyright  ©  2013  Splunk  Inc.  

David  Marquardt  Senior  So?ware  Engineer  #splunkconf  

Understanding  Splunk  AcceleraGon  Technologies  

Page 2: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Legal  NoGces  During  the  course  of  this  presentaGon,  we  may  make  forward-­‐looking  statements  regarding  future  events  or  the  expected  performance  of  the  company.  We  cauGon  you  that  such  statements  reflect  our  current  expectaGons  and  esGmates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For  important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,  please  review  our  filings  with  the  SEC.    The  forward-­‐looking  statements  made  in  this  presentaGon  are  being  made  as  of  the  Gme  and  date  of  its  live  presentaGon.    If  reviewed  a?er  its  live  presentaGon,  this  presentaGon  may  not  contain  current  or  accurate  informaGon.      We  do  not  assume  any  obligaGon  to  update  any  forward-­‐looking  statements  we  may  make.    In  addiGon,  any  informaGon  about  our  roadmap  outlines  our  general  product  direcGon  and  is  subject  to  change  at  any  Gme  without  noGce.    It  is  for  informaGonal  purposes  only  and  shall  not,  be  incorporated  into  any  contract  or  other  commitment.    Splunk  undertakes  no  obligaGon  either  to  develop  the  features  or  funcGonality  described  or  to  include  any  such  feature  or  funcGonality  in  a  future  release.  

 

Splunk,  Splunk>,  Splunk  Storm,  Listen  to  Your  Data,  SPL  and  The  Engine  for  Machine  Data  are  trademarks  and  registered  trademarks  of  Splunk  Inc.  in  the  United  States  and  other  countries.  All  other  brand  names,  product  names,  or  trademarks  belong  to  their  respecCve  

owners.    

©2013  Splunk  Inc.  All  rights  reserved.  

2  

Page 3: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

About  Me  

!   Been  coding  core  splunkd  for  over  5  years  !   Worked  on  various  components:  

–  eval/where  commands  –  MulG-­‐index  search  –  AuthenGcaGon/authorizaGon  –  Rawdata  –  Now  high  performance  analyGcs  store…  

3  

Page 4: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Agenda  

!   Overview  Current  Index  Structure  !   Review  How  ReporGng  is  Currently  Done  !   How  We  Can  Do  Be`er  !   Demo    

4  

Page 5: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Splunk  Enterprise  Index  Structure  

IDX  1  IDX  2  

IDX  3  

Cold  Path  

Thawed  Path  

Rawdata  

TSIDX  hot_v1_100  

hot_v1_101  

db_lt_et_80  

db_lt_et_101  

*.data  *.tsidx  rawdata  

db_lt_et_70  

apple  

beer  

LEXICON  

POSTING  

“apple  pie  and  ice  cream  is  delicious”  

“an  apple  a  day  keeps  doctor  away”  

150  100  

et  et  

lt  lt  

it  it  

apple   beer   coke  ice   java   …  

Home  Path  

Source/Sourcetype/Host  Metadata  

1  source  :  :  /my/log  2  source:  :  /blah  

cream  

5  

Page 6: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

TSIDX?  What?  Time  series  index  !   Inverted  index  opGmized  for  Gme  !   Two  basic  components  –  Lexicon  –  Arrays  of  informaGon  about  events  

 Why:  Given  a  Gme  range  and  query,  where’s  my  matching  data?  

6  

Page 7: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Lexicon  

Raw  Events  

Deep  likes  Bud  light  

Amrit  likes  Makers  

Ledion  likes  cognac  

Dave  likes  Jack  Daniels  

Zhang  likes  vodka  

Deep  likes  Makers  

Dave  likes  Makers  

7  

Term   PosCngs  List  Amrit   1  Bud   0  Daniels   3  Dave   3,6  Deep   0,5  Jack   3  Ledion   2  Makers   1,5,6  Zhang   4  cognac   2  likes   0,1,2,3,4,5,6  light   0  vodka   4  

Page 8: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Values  Arrays  

8  

PosCng  value   Seek  address   _Cme   host   source   sourcetype  

0   42   1331667091   1   1   1  

1   78   1331667091   1   1   1  

2   120   1331667091   1   1   1  

3   146   1331667091   1   1   1  

4   170   1331667091   1   1   1  

5   212   1331667091   1   1   1  

6   240   1331667091   1   1   1  

Raw  events  

Deep  likes  Bud  light  

Amrit  likes  Makers  

Ledion  likes  cognac  

Dave  likes  Jack  Daniels  

Zhang  likes  vodka  

Deep  likes  Makers  

Dave  likes  Makers  

Page 9: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Okay,  How  Do  I  Search?  

Query:  likes  (vodka  OR  cognac)    STEP  1:  Consult  the  lex,  combining  posGngs  lists  

!   Doing  an  OR?  Use  a  union  !   Doing  an  AND?  Use  an  intersecGon  

 vodka  OR  cognac  =  (4)  U  (2)  =  (2,  4)  likes  (vodka  OR  cognac)  =  (0,1,2,3,4,5,6)  int.  (2,  4)                                                                                          =  (2,  4)    We  now  have  the  right  posGng  values!    

Term   PosCngs  List  Amrit   1  Bud   0  Daniels   3  Dave   3,6  Deep   0,5  Jack   3  Ledion   2  Makers   1,5,6  Zhang   4  cognac   2  likes   0,1,2,3,4,5,6  light   0  vodka   4  

9  

Page 10: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

ConverGng  PosGng  Values  to  Events  PosCng  value  

Seek  addr   _Cme   host   source   sourcetype   evenJype  

0   42   1331667091   1   1   1   -­‐  

1   78   1331667091   1   1   1   -­‐  

2   120   1331667091   1   1   1   -­‐  

3   146   1331667091   1   1   1   -­‐  

4   170   1331667091   1   1   1   -­‐  

5   212   1331667091   1   1   1   -­‐  

6   240   1331667091   1   1   1   -­‐  

STEP  2:  Use  the  values  array  to  look  up  _Gme,  seek  address,  host,  source,  sourcetype  for  (2,  4)      STEP  3:  Use  the  seek  addresses  to  read  rawdata  at  offsets  (120,  170)    

 Ledion  likes  cognac    Zhang  likes  vodka  

 STEP  4:  Back  to  search  land;  field  extracGons,  lookups,  etc.      

10  

Page 11: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Reading  Compressed  Rawdata  journal.gz  

0  78  148  236  380  434  506  

Example:  Reading  offsets  (120,  170)    1.   Group  offsets  into  residing  chunks  

120  falls  into  range  (78,  148)    170  falls  into  range  (148,  236)  

2.  Read  data  off  disk  and  decompress    

 EXPENSIVE!      

 

11  

Page 12: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

How  Expensive?              Example  bucket:  521,629  events  

 Limited  to  ~175,000  events  per  second    

 12  

Page 13: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

What  Are  We  Doing  in  TSIDX  Land?  In  SQL  terms:    

 SELECT _time, seekaddr, host, source, sourcetype!!WHERE <some query>!

 And  then  we’re  off  to  rawdata  and  search  land    How  can  we  do  more  here?    

 SELECT foo, bar WHERE <some query> !OR  even:    

 SELECT avg(baz), stdev(baz) WHERE <some query> GROUPBY foo, bar!  

13  

Page 14: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Indexed  Fields  Term   PosCngs  list  bar::AB   1,3,7,39,98  bar::cez   0,6,9,12  bar::xyz   3,4,5,6  baz::1   3,6,85  baz::2567   0,5  baz::462   3,24,45  baz::98   2,3,5,8,9  baz::99023   1,5,6,76,99  foo::afdjsi   4,567,2345  foo::aghdafo   2,234,6667  foo::bazcxuid   0,1,623,7777  foo::cef   0,1,2,3,4,43  foo::zaz   4  

Big  idea:  Use  the  lexicon  as  a  field  value  store!      By  simply  separaGng  fields  and  values  with  “::”    we  can  store  sufficient  informaGon  to  run  more    interesGng  queries    

14  

Page 15: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

How  Does  it  Work?  Term   PosCngs  list  bar::AB   1,3,7,39,98  bar::cez   0,6,9,12  bar::xyz   3,4,5,6  baz::1   3,6,85  baz::2567   0,5  baz::462   3,24,45  baz::98   2,3,5,8,9  baz::99023   1,5,6,76,99  foo::afdjsi   4,567,2345  foo::aghdafo   2,234,6667  foo::bazcxuid   0,1,623,7777  foo::cef   0,1,2,3,4,43  foo::zaz   4  

SELECT sum(baz) WHERE bar=xyz!

!   Evaluate  query:  3,4,5,6  

!   Iterate  over  baz,  updaGng  sum  for  matching  events  –  baz::1  

ê  Sum  +=  2  *  1  –  baz::2567  

ê  Sum  +=  1  *  2567  –  baz::462  

ê  Sum  +=  1  *  462  –  baz::98  

ê  Sum  +=  2  *  98  –  baz::99023  

ê  Sum  +=  2  *  99023  

15  

Page 16: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

How  Can  You  Use  This  in  Splunk  Enterprise  5.x?  tscollect  !   Creates  TSIDX  files  in  the  indexed  fields  format  !   index=main  |  fields  a,  b,  c  |  tscollect  namespace=demo  !   Only  admins  can  run  this  

  indexes_edit  capability  

 tstats  !   Runs  stats  over  the  TSIDX  files  in  the  created  namespace  !   |  tstats  avg(a)  from  demo  groupby  b,  c  

16  

Page 17: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Drawbacks  to  the  Splunk  Enterprise  5.x  Approach  

!   Only  on  the  search  head  !   No  retenGon  policy  or  limits  !   Manual  process  

–  How  to  schedule  collect?  –  Timing  problems  –  Fault  tolerance?  –  Data  lag  

$SPLUNK_DB/tsidxstats  

17  

Search  head  

Indexer  1   Indexer  2   Indexer  N  

Page 18: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Splunk  Enterprise  6:  Making  it  Easy  

What  data  do  we  want  to  accelerate?  

18  

Page 19: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Create  a  Data  Model  

19  

Page 20: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Splunk  Enterprise  6:  Making  it  Easy  

How  do  we  accelerate  that  data?  

20  

Page 21: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Click  The  Checkbox!  

21  

Page 22: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Introducing  the  High  Performance  AnalyGcs  Store  

!   AutomaGcally  collected  –  Handles  Gming  issues,  backfill…  

!   AutomaGcally  maintained  –  Uses  acceleraGon  window  

!   Stored  on  the  indexers  –  Peer  to  the  buckets  

!   Fault  tolerant  collecGon  

Search  head  

Indexer  1   Indexer  2   Indexer  N  

22  

Page 23: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Completely  Transparent!  

!   No  administraGon  overhead  !  Missing  collecGon  data  filled  in  by  search  –  No  data  lag!  

!   AnalyGc  queries  just  get  faster!  –  Results  come  from  HPAS  first  

!   Checking  acceleraGon  status  –  Data  models  management  page  –  Job  inspector  

23  

Page 24: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Great,  How  Do  I  Use  it?  !   In  pivot:  AutomaGcally  used  when  acceleraGon  is  on!  !   Manually:  |  tstats  …  from  datamodel=<name>  …  

24  

Page 25: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

What  About  Report  AcceleraGon?  

!   Accelerates  an  enGre  dataset  !   Stores  field  value  informaGon  !   Nothing  pre-­‐computed  !   Works  well  for  high-­‐cardinality  !   Higher  storage  costs  (~25%)  

–  Storage  shared  by  all  searches  on  datamodel  –  Varies  by  collecGon:  #  events,  fields,  values…  

!   Accelerates  a  parGcular  search  !   Stores  results  of  map  step  !   Pre-­‐computed  aggregate  !   Doesn’t  help  for  high-­‐cardinality  !   Typically  lower  storage  costs  

–  But  requires  storage  per-­‐search  

High  Performance  AnalyGcs  Store  Report  AcceleraGon  

25  

Page 26: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Splunk  Enterprise  6:  Making  it  Easy  

What  if  I  already  have  indexed  fields?  

26  

Page 27: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Bonus!  !   You  can  query  exisGng  indexed  fields  directly!  

–  Just  omit  the  ‘FROM’  clause  in  tstats  –  You  can  specify  indexes  in  ‘WHERE’  clause  –  Supports  search  filters  

!   Don’t  forget  the  default  indexed  fields!  –  host,  source,  sourcetype  –  _indexGme,  linecount,  punct  –  date_second,  date_minute,  etc.  

Search  head  

27  

Page 28: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

More  InformaGon  

!   Data  models  –  h`p://docs.splunk.com/DocumentaGon/Splunk/6.0/Knowledge/

Managedatamodels  

!   AcceleraGon  –  h`p://docs.splunk.com/DocumentaGon/Splunk/6.0/Knowledge/

Acceleratedatamodels  

!   ‘tstats’  command  –  h`p://docs.splunk.com/DocumentaGon/Splunk/6.0/SearchReference/Tstats  

28  

Page 29: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Demo  

Page 30: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Key  Takeaway  

Build  a  datamodel  and  try  it  yourself!  

30  

Page 31: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

Next  Steps  

Download  the  .conf2013  Mobile  App  If  not  iPhone,  iPad  or  Android,  use  the  Web  App    

Take  the  survey  &  WIN  A  PASS  FOR  .CONF2014…  Or  one  of  these  bags!    Go  to  the  Search  Party!  Marquee  Nightclub  at  The  Cosmopolitan  Today,  7:30-­‐10:30pm  

1  

2  

3  

31  

Page 32: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

QuesGons?  

Page 33: Understanding*Splunk* Acceleraon*Technologies* · ConverGng*PosGng*Values*to*Events* Posng value Seek addr% _me% host source% sourcetype% evenype % 0 42 1331667091 1 1 1 M* 1 78 1331667091

THANK  YOU