understanding*splunk* acceleraon*technologies* · convergng*posgng*values*to*events* posng value...
TRANSCRIPT
Copyright © 2013 Splunk Inc.
David Marquardt Senior So?ware Engineer #splunkconf
Understanding Splunk AcceleraGon Technologies
Legal NoGces During the course of this presentaGon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauGon you that such statements reflect our current expectaGons and esGmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in this presentaGon are being made as of the Gme and date of its live presentaGon. If reviewed a?er its live presentaGon, this presentaGon may not contain current or accurate informaGon. We do not assume any obligaGon to update any forward-‐looking statements we may make. In addiGon, any informaGon about our roadmap outlines our general product direcGon and is subject to change at any Gme without noGce. It is for informaGonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaGon either to develop the features or funcGonality described or to include any such feature or funcGonality in a future release.
Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respecCve
owners.
©2013 Splunk Inc. All rights reserved.
2
About Me
! Been coding core splunkd for over 5 years ! Worked on various components:
– eval/where commands – MulG-‐index search – AuthenGcaGon/authorizaGon – Rawdata – Now high performance analyGcs store…
3
Agenda
! Overview Current Index Structure ! Review How ReporGng is Currently Done ! How We Can Do Be`er ! Demo
4
Splunk Enterprise Index Structure
IDX 1 IDX 2
IDX 3
Cold Path
Thawed Path
Rawdata
TSIDX hot_v1_100
hot_v1_101
db_lt_et_80
db_lt_et_101
*.data *.tsidx rawdata
db_lt_et_70
apple
beer
LEXICON
POSTING
“apple pie and ice cream is delicious”
“an apple a day keeps doctor away”
150 100
et et
lt lt
it it
apple beer coke ice java …
Home Path
Source/Sourcetype/Host Metadata
1 source : : /my/log 2 source: : /blah
cream
5
TSIDX? What? Time series index ! Inverted index opGmized for Gme ! Two basic components – Lexicon – Arrays of informaGon about events
Why: Given a Gme range and query, where’s my matching data?
6
Lexicon
Raw Events
Deep likes Bud light
Amrit likes Makers
Ledion likes cognac
Dave likes Jack Daniels
Zhang likes vodka
Deep likes Makers
Dave likes Makers
7
Term PosCngs List Amrit 1 Bud 0 Daniels 3 Dave 3,6 Deep 0,5 Jack 3 Ledion 2 Makers 1,5,6 Zhang 4 cognac 2 likes 0,1,2,3,4,5,6 light 0 vodka 4
Values Arrays
8
PosCng value Seek address _Cme host source sourcetype
0 42 1331667091 1 1 1
1 78 1331667091 1 1 1
2 120 1331667091 1 1 1
3 146 1331667091 1 1 1
4 170 1331667091 1 1 1
5 212 1331667091 1 1 1
6 240 1331667091 1 1 1
Raw events
Deep likes Bud light
Amrit likes Makers
Ledion likes cognac
Dave likes Jack Daniels
Zhang likes vodka
Deep likes Makers
Dave likes Makers
Okay, How Do I Search?
Query: likes (vodka OR cognac) STEP 1: Consult the lex, combining posGngs lists
! Doing an OR? Use a union ! Doing an AND? Use an intersecGon
vodka OR cognac = (4) U (2) = (2, 4) likes (vodka OR cognac) = (0,1,2,3,4,5,6) int. (2, 4) = (2, 4) We now have the right posGng values!
Term PosCngs List Amrit 1 Bud 0 Daniels 3 Dave 3,6 Deep 0,5 Jack 3 Ledion 2 Makers 1,5,6 Zhang 4 cognac 2 likes 0,1,2,3,4,5,6 light 0 vodka 4
9
ConverGng PosGng Values to Events PosCng value
Seek addr _Cme host source sourcetype evenJype
0 42 1331667091 1 1 1 -‐
1 78 1331667091 1 1 1 -‐
2 120 1331667091 1 1 1 -‐
3 146 1331667091 1 1 1 -‐
4 170 1331667091 1 1 1 -‐
5 212 1331667091 1 1 1 -‐
6 240 1331667091 1 1 1 -‐
STEP 2: Use the values array to look up _Gme, seek address, host, source, sourcetype for (2, 4) STEP 3: Use the seek addresses to read rawdata at offsets (120, 170)
Ledion likes cognac Zhang likes vodka
STEP 4: Back to search land; field extracGons, lookups, etc.
10
Reading Compressed Rawdata journal.gz
0 78 148 236 380 434 506
Example: Reading offsets (120, 170) 1. Group offsets into residing chunks
120 falls into range (78, 148) 170 falls into range (148, 236)
2. Read data off disk and decompress
EXPENSIVE!
11
How Expensive? Example bucket: 521,629 events
Limited to ~175,000 events per second
12
What Are We Doing in TSIDX Land? In SQL terms:
SELECT _time, seekaddr, host, source, sourcetype!!WHERE <some query>!
And then we’re off to rawdata and search land How can we do more here?
SELECT foo, bar WHERE <some query> !OR even:
SELECT avg(baz), stdev(baz) WHERE <some query> GROUPBY foo, bar!
13
Indexed Fields Term PosCngs list bar::AB 1,3,7,39,98 bar::cez 0,6,9,12 bar::xyz 3,4,5,6 baz::1 3,6,85 baz::2567 0,5 baz::462 3,24,45 baz::98 2,3,5,8,9 baz::99023 1,5,6,76,99 foo::afdjsi 4,567,2345 foo::aghdafo 2,234,6667 foo::bazcxuid 0,1,623,7777 foo::cef 0,1,2,3,4,43 foo::zaz 4
Big idea: Use the lexicon as a field value store! By simply separaGng fields and values with “::” we can store sufficient informaGon to run more interesGng queries
14
How Does it Work? Term PosCngs list bar::AB 1,3,7,39,98 bar::cez 0,6,9,12 bar::xyz 3,4,5,6 baz::1 3,6,85 baz::2567 0,5 baz::462 3,24,45 baz::98 2,3,5,8,9 baz::99023 1,5,6,76,99 foo::afdjsi 4,567,2345 foo::aghdafo 2,234,6667 foo::bazcxuid 0,1,623,7777 foo::cef 0,1,2,3,4,43 foo::zaz 4
SELECT sum(baz) WHERE bar=xyz!
! Evaluate query: 3,4,5,6
! Iterate over baz, updaGng sum for matching events – baz::1
ê Sum += 2 * 1 – baz::2567
ê Sum += 1 * 2567 – baz::462
ê Sum += 1 * 462 – baz::98
ê Sum += 2 * 98 – baz::99023
ê Sum += 2 * 99023
15
How Can You Use This in Splunk Enterprise 5.x? tscollect ! Creates TSIDX files in the indexed fields format ! index=main | fields a, b, c | tscollect namespace=demo ! Only admins can run this
indexes_edit capability
tstats ! Runs stats over the TSIDX files in the created namespace ! | tstats avg(a) from demo groupby b, c
16
Drawbacks to the Splunk Enterprise 5.x Approach
! Only on the search head ! No retenGon policy or limits ! Manual process
– How to schedule collect? – Timing problems – Fault tolerance? – Data lag
$SPLUNK_DB/tsidxstats
17
Search head
Indexer 1 Indexer 2 Indexer N
Splunk Enterprise 6: Making it Easy
What data do we want to accelerate?
18
Create a Data Model
19
Splunk Enterprise 6: Making it Easy
How do we accelerate that data?
20
Click The Checkbox!
21
Introducing the High Performance AnalyGcs Store
! AutomaGcally collected – Handles Gming issues, backfill…
! AutomaGcally maintained – Uses acceleraGon window
! Stored on the indexers – Peer to the buckets
! Fault tolerant collecGon
Search head
Indexer 1 Indexer 2 Indexer N
22
Completely Transparent!
! No administraGon overhead ! Missing collecGon data filled in by search – No data lag!
! AnalyGc queries just get faster! – Results come from HPAS first
! Checking acceleraGon status – Data models management page – Job inspector
23
Great, How Do I Use it? ! In pivot: AutomaGcally used when acceleraGon is on! ! Manually: | tstats … from datamodel=<name> …
24
What About Report AcceleraGon?
! Accelerates an enGre dataset ! Stores field value informaGon ! Nothing pre-‐computed ! Works well for high-‐cardinality ! Higher storage costs (~25%)
– Storage shared by all searches on datamodel – Varies by collecGon: # events, fields, values…
! Accelerates a parGcular search ! Stores results of map step ! Pre-‐computed aggregate ! Doesn’t help for high-‐cardinality ! Typically lower storage costs
– But requires storage per-‐search
High Performance AnalyGcs Store Report AcceleraGon
25
Splunk Enterprise 6: Making it Easy
What if I already have indexed fields?
26
Bonus! ! You can query exisGng indexed fields directly!
– Just omit the ‘FROM’ clause in tstats – You can specify indexes in ‘WHERE’ clause – Supports search filters
! Don’t forget the default indexed fields! – host, source, sourcetype – _indexGme, linecount, punct – date_second, date_minute, etc.
Search head
27
More InformaGon
! Data models – h`p://docs.splunk.com/DocumentaGon/Splunk/6.0/Knowledge/
Managedatamodels
! AcceleraGon – h`p://docs.splunk.com/DocumentaGon/Splunk/6.0/Knowledge/
Acceleratedatamodels
! ‘tstats’ command – h`p://docs.splunk.com/DocumentaGon/Splunk/6.0/SearchReference/Tstats
28
Demo
Key Takeaway
Build a datamodel and try it yourself!
30
Next Steps
Download the .conf2013 Mobile App If not iPhone, iPad or Android, use the Web App
Take the survey & WIN A PASS FOR .CONF2014… Or one of these bags! Go to the Search Party! Marquee Nightclub at The Cosmopolitan Today, 7:30-‐10:30pm
1
2
3
31
QuesGons?
THANK YOU