bestpracacesandlessons* plaorm:physical learnedfrom ... · parsing queue typing* queue index*...

Copyright © 2013 Splunk Inc.

Sean Blake Professional Services Manager, Splunk #splunkconf

Best PracAces and Lessons Learned from Splunk’s Professional Services Team

PlaGorm: Physical

Legal NoAces During the course of this presentaAon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauAon you that such statements reflect our current expectaAons and esAmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in this presentaAon are being made as of the Ame and date of its live presentaAon. If reviewed aUer its live presentaAon, this presentaAon may not contain current or accurate informaAon. We do not assume any obligaAon to update any forward-‐looking statements we may make. In addiAon, any informaAon about our roadmap outlines our general product direcAon and is subject to change at any Ame without noAce. It is for informaAonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaAon either to develop the features or funcAonality described or to include any such feature or funcAonality in a future release.

Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respecDve

owners.

©2013 Splunk Inc. All rights reserved.

2

About Me

!   2+ years @ Splunk !   Diverse experience on a mulAtude of engagements !   Live outside of DC, focused on Public Sector !   Background in development

3

Agenda

!   Growing pains !   Core components

–  Indexing –  Searching –  Deployment Server|Clustering|Data Inputs

!   Random Adings –  Config Files –  ApplicaAon Breakdown –  Upgrading –  Precedence

4

Growing Pains

Plan

6

!   Single Splunk instance… Super simple –  Do this on your laptop/desktop, get familiar

!   Complexity arises with use and when you scale up !   Planning will go a long, long way… The more you know, the more you prepare, the easier it will be

!   So try to understand your environment as much as possible –  High index capacity with low concurrent users requires a different

footprint than lower capacity and high number of searches

!   An enterprise Splunk soluAon takes planning and dedicated resources to care and feed –  Enterprise Splunk = Enterprise Plan

Therefore… Plan for Growth

7

!   Splunk is very flexible, but ensure you have enough at all Aers (forwarders, indexers, search) !   A boileneck today can be remedied…but something else will take it’s place !   Use more nodes to scale up, not bigger machines (when it doubt = reference architecture)

indexer indexers search head & indexers

search head, deployment server & indexers

The Core

Indexing

9

!   MulAple indexers then always use autoLB selng on forwarders –  Don’t funnel (excepAons may apply), send directly to the indexers –  Ensure you have enough scale at each level, forwarders and indexers

!   Don’t dedicate parAcular data sources to parAcular indexers !   UAlize UF’s

–  autoLBFrequency by default sends at least 30 seconds to a single indexer ê  Be aware of load coming in, a single indexer can be backed up while others are idle so decrease the autoLBFrequency interval, and increase the input queue size to contain it

–  Update password from changeme ê  In Public Sector you will get hit with STIG findings

–  Out of the box we throile it, change limits.conf maxKBps selng to zero

Indexers are Made Up of Indexes

10

!   If data is not told where to go then it’s going to main !   Why separate?

–  Does this data have a different retenAon policy? –  Does this data have access restricAons? –  Do we want to make it easier to use, increase performance?

!   You should always separate for at a minimum ease of use and performance

!   But don’t go crazy

–  Hard and fast

Indexing

11

!   Splunk is temporal; incorrect Ame stamping is what we want to avoid

!   Indexes are made up of buckets (hot|warm|cold) –  “hot” buckets are the only ones being wriien to, hot_v1_## –  The others are easily idenAfiable, db_1375215356_1374274212_##

!   Give some wiggle room on high volume indexes –  indexes.conf -‐> maxHotBuckets = 10 & maxDataSize = auto_high_volume

!   Data will NOT be purged by Splunk unAl ALL events in the bucket reach the expiraAon date

What Does a Bucket Look Like?

12

Latest Event: 8/30/2013 20:15:56

db_1375215356_1374274212_##

Earliest Event: 06/19/2013 22:50:12

!   Remember, this bucket will not be removed unAl the event from 8/30/2013 20:15:56 has reached its expiraAon

!   Be aware when you onboard new servers as archival of old data will affect the range of a bucket

!   Hot buckets roll on restart automaAcally or based on selngs in indexes.conf !   Storage Math: GB/day * .5 (compression) * RetenAon Policy * 1.10 (padding)

Indexer Affinity

13

!   [monitor::///path/to/files] !   Could go awry in a mulA-‐indexer environment in some circumstances (.zip files, UF listening to UDP|TCP directly)

forwarder

indexers

Know the Indexing Queues

Parsing Queue

Typing Queue

Index Queue

UTF Encoding

Parsing Pipeline

Merging Pipeline

Typing Pipeline

Index Pipeline

Header Processing

Forwarding (tcpout|syslog out)

Indexing

Agg Queue

Block Signature

Line Merging

Timestamp ExtracAon

Line Breaking

Regex Replacement

Annotator

Network Inputs

File System Inputs

Scripted Inputs

Modular Inputs

Debugging the Indexing Queues

15

!   You can aiend Octavio’s session: The S.o.S App: All Splunk on Splunk AcDon, All The Time

!   Queues are full? •  parsingQueue/aggQueue

-  Ensure proper Ame stamping and line breaking for your events, wasted I/O & CPU cycles

-  props.conf -‐ TIME_PREFIX, TIME_FORMAT, TZ, LINE_BREAKER, TRUNCATE, SHOULD_LINEMERGE (false), MAX_TIMESTAMP_LOOKAHEAD !

•  typingQueue -  props.conf -‐ TRANSFORMS-xxx, SEDCMD!-  transforms.conf -‐ SOURCE_KEY, DEST_KEY, REGEX, FORMAT!

•  indexQueue -  Bad I/O on the storage, this will cause everything else to back up

Search Types

16

Type Reference Hardware

Performance Impact Notes

Dense 50K matching events /second CPU Generally tax CPU because of

decompression

Sparse 5K matching events/second CPU Generally returning .01 to 1%

Super-‐sparse

Up to 2 seconds/bucket I/O Could take long Ame with a lot

of buckets “needle haystack”

Rare From 10-‐50 buckets/second I/O Take advantage of bloom

filters

Search Tips

17

!   Events are broken down into tokens, tokens are split by segmenters major and minor

!   TERM() and CASE() –  TERM: will disable the segmenters and make some searches more efficient

ê  10.1.2.3 = 10 AND 1 AND 2 AND 3 AND “10.1” AND “10.1.2” AND 10.1.2.3 ê  TERM(10.1.2.3) = 10.1.2.3

–  CASE: exactly what you think, case sensiAve search, by default a search for the word “Splunk” will hit on splunk, Splunk, SPLUNK, etc…

!   Avoid vague wildcards and all-‐Ame searches !   Use metadata (source, sourcetype, host, index) to speed up !   OpAmize your buckets

More Search Tips !   Search performance: idenAfying slow searches, re-‐factoring searches to take advantage of map-‐reduce –  The scheduler.log is filled with a lot of the details –  S.o.S applicaAon has a good dashboard on high cost searches

!   Bundle replicaAon white/blacklists… Upgrade to 5.x for file based replicaAon if you haven’t

!   Summary indexing, report acceleraAon, TSIDX (Splunk Enterprise 6) may help

!   Spreading out scheduled searches using cron syntax !   Using snaps effecAvely and reduce the risk of missing delayed events

18

Deployment Server

19

!   You could have aiended GenA’s session: Best PracDces + New Feature Overview for the Latest Version of Splunk Deployment Server

! clientName is a great feature –  Can only be used once per deployment client but deployment client’s can share the

same clientName serverclass.conf: [serverClass:all_indexer]!whitelist.0 = splk-indexer![serverClass:all_indexer:app:org_all_indexer_base]!![serverClass:all_search]!whitelist.0 = splk-search!![serverClass:all_forwarders]!whitelist.0 = *!blacklist.0 = splk-*!

deploymentclient.conf: [deployment-client]!clientName = splk-indexer!

Cluster Master

20

!   New component for HA in 5.x !   Changes the way indexer config files are distributed

–  If you have the Deployment Server configured you need to alter the setup –  Config files are distributed to indexers via the Cluster Master’s

$SPLUNK_HOME/etc/master-‐apps directory

!   Avoid rebalancing during an indexer restart by stopping the cluster master first, then indexers

!   You can aiend Dritan’s session: Architect Splunk for High Availability and Disaster Recovery

Data Ingest

21

!   You can aiend Maiy’s session: Onboard Data into Splunk, Correctly !   Time stamp and line breaking are the most important

–  Splunk is smart, it will probably get it right –  But, you can make it more efficient

props.conf [my_sourcetype]!TIME_PREFIX = ^!MAX_TIMESTAMP_LOOKAHEAD = 19!TIME_FORMAT = %Y-%m-%d %H:%M:%S!TZ = GMT!LINE_BREAKER = ([\r\n]+)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}!SHOULD_LINEMERGE = false!TRUNCATE = 10000!KV_MODE = none|auto|multi|json|xml!ANNOTATE_PUNCT = false!

Data Ingest

22

!   Use LINE_BREAKER for mulA-‐line events –  SHOULD_LINEMERGE performs extra line breaking, selng to false in props.conf will

save indexing Ame

!   Don’t use the punct field? Then override ANNOTATE_PUNCT!!   TesAng tricks

–  Use a test index for scrubbing data or use data preview of Splunk Web (5.x) –  Rex command is useful for tesAng your regex (or use IFX) –  A new Splunk search will pick up new search Ame extracAons, props or transforms

(EXTRACT-xxx or REPORT-xxx) –  Transforms against raw indexed data will NOT be picked up, requires a restart

(TRANSFORMS-xxx) –  “One shot” is useful for adding data to a test index (CLI driven)

Syslog Collector

23

!   UAlize rsyslog, syslog-‐ng, kiwi syslog instead of sending directly to Splunk –  Small retenAon policy on the syslog collector; a couple of days is typically

sufficient because the data is forwarded to Splunk for long term storage –  Take advantage of Splunk UF’s for load balancing, UF’s also know when

Indexers are unresponsive

forwarder indexer syslog devices

Random Tidings

.conf “file” Overload

25

!   We love configuraAon files, a significant amount is configured via Splunk Web –  But, you sAll need SSH or RDP access, not everything is exposed

!   Don’t touch the default –  This directory structure is for the developer, use local for your configuraAons

(always)

!   Props, transforms, inputs, outputs, web, server, limits, indexes, tags, eveniypes and just too many more to name –  Take it in stride –  Read the specs, typically filled with good informaAon and ships with every instance

$SPLUNK_HOME/etc/system/README/ (it’s on that internet thing also)

.conf “file” breakdown

26

!   Consist of [stanzas] !   Followed by <airibute> = value pairs –  <airibute> is CaSe SeNsIAVe so one

will work the other will not –  Some aiributes are required, some

are not

!   [stanzas] can have scope, more specific scoped take precedence

outputs.conf:![tcpout]!indexAndForward = true!# this is a good comment and it sets the default value!compressed = true!![tcpout:primary_indexers]!autoLB = true!compressed = false # winner for stanza, but bad comment!server = primay1:9997, primary2:9997!![tcpout:secondary_indexers]!autoLB = true!server = secondary1:9997, secondary2:9997!

ApplicaAons

27

!   Splunkbase has 400+ apps targeAng specific technologies and use cases

!   They are all wriien for a single Splunk server installaAon

!   Know the config files: –  inputs.conf -‐> how are you collecAng the data the

app is targeAng? what is the sourcetype? –  props & transforms.conf -‐> how is the data being

indexed, what are the fields? –  indexes.conf -‐> what indexes does the app rely on? –  savedsearches, macros, even8ypes, tags.conf?

More on ApplicaAons

28

!   What goes where? Indexer vs. Search head vs. Forwarder –  inputs.conf data collecAon definiAons, typically they are disabled by default

(forwarder) –  props.conf & transforms.conf may contain selngs for both parsing Aer

(indexer, heavy forwarder) and the search Aer (search head) ê  Look for TRANSFORMS-xxx in props, this means parsing Aer

–  indexes.conf typically containing summary index definiAons (indexer)

!   Deployment server? –  Don’t’ put apps that have a GUI; install them directly on the search head –  Override app specific configs in /<appname>/local directory –  Migrate parsing layer configs and indexes.conf to a DS based app

ApplicaAon… Broken?

29

!   Installed !   But, dashboards aren’t painAng! InvesAgate config files in the default directory –  props.conf:

ê  What sourcetype or source [stanza’s] exist? ê  Do your events sourcetype or source match those [stanza’s]?

!   Ok, source|sourcetype matches up, dashboards are sAll not painAng! –  even8ypes.conf|savedsearches.conf:

ê  InvesAgate the searches and run them in Splunk, do you get results? ê  Maybe your events have a slightly different format from what the app is expecAng, override the config file(s) in local

!   Ok, data is painAng, but it’s the wrong fields –  props.conf|transforms.conf:

ê  The field extracAons are off, override the config file(s) in local

Upgrading

30

!   Have a plan and read the release notes (especially major releases) !   If you don’t restart oUen, perform a sanity restart before upgrade !   Try and keep distributed search Aer (indexers|search heads) on same version, less important for forwarders

!   Be mindful of overridden files especially if you copied the whole file to local –  Only copy the [stanza] and <airibute> = value pair

!   Backup directories: –  $SPLUNK_DB (i.e. your indexes) –  $SPLUNK_HOME/etc/

!   Upgrade Splunk and let seile before App upgrades

*.conf Precedence

31

! Config files with the same name are combined at startup… This could lead to conflicts –  General directory order: etc/system/local, etc/apps/<appname>/local, etc/

apps/<appname>/default, etc/system/default –  Clustering changes this behavior, be aware and read the docs

!   Search Ame field extracAons behave differently –  Fall to /etc/user/<username>/ directory structure for highest priority

Details in precedence

32

!   props.conf –  RENAME (sourcetype), EXTRACT-xxx, REPORT-xxx, KV_MODE, FIELDALIAS-xxx, EVAL-xxx, LOOKUP-xxx, MILLISECONDS, FILTER, EVENTTYPING & TAGGING!ê  Collisions within same [stanza] name fall to ASCII order to determine winner ê  You can use priority airibute to bypass ASCII

!   For index Ame transforms it only enters the parsing pipeline once –  Remember this when using SOURCE_KEY in transforms.conf or calling

mulAple transform stanza’s from props.conf

!   Use btool to validate which [stanza] wins –  Remember to run as same user running Splunk!

Other stuff

33

!   Regular Expressions –  gskinner.com/RegExr –  PCRE

!   TesAng your IOPs –  bonnie++ –  iozone (app we are developing)

!   What’s your ulimit? –  hip://blogs.splunk.com/2011/11/21/whats-‐your-‐ulimit/

!   If you are running Splunk as root… Buyer beware

Final Thoughts

34

!   Chances are there is more than one way to do what you are looking to do

!   Your Network = Your Responsibility –  Have a plan!

!   Due Diligence = docs.splunk.com (RT_M) –  Hardware sizing –  Precedence –  Parsing and rouAng of data –  Much more… A B C D E F G H I J K L M N

O P Q R S T U V W X Y Z

More InformaAon

35

!   Contact: [email protected] !   ApplicaAons: apps.splunk.com !   Answers: answers.splunk.com !   EducaAon: www.splunk.com/view/educaAon/SP-‐CAAAAH9 !   Professional Services: www.splunk.com/view/professional-‐services/SP-‐CAAABH9

!   Videos: www.splunk.com/videos

Next Steps

36

Download the .conf2013 Mobile App If not iPhone, iPad or Android, use the Web App

Take the survey & WIN A PASS FOR .CONF2014… Or one of these bags! View the sessions listed on next slide All sessions are available on the Mobile App Videos will be available shortly

1

2

3

What’s Next! !   Architect Splunk for Physical, Virtual and Cloud Environments

!   Architect Splunk for High Availability and Disaster Recovery

!   Onboard Data into Splunk, Correctly

!   The S.o.S App: All Splunk on Splunk Action, All The Time

!   Planning and Execution for Successful Deployments

37

THANK YOU

bestpracaces*and*lessons* plaorm:*physical* learned*from ... · parsing* queue typing* queue index*...

Documents

bestpracacesandlessons* plaorm:physical learnedfrom ... · parsing queue typing* queue index*...