bestpracaces*and*lessons* plaorm:*physical* learned*from ... · parsing* queue typing* queue index*...
TRANSCRIPT
Copyright © 2013 Splunk Inc.
Sean Blake Professional Services Manager, Splunk #splunkconf
Best PracAces and Lessons Learned from Splunk’s Professional Services Team
PlaGorm: Physical
Legal NoAces During the course of this presentaAon, we may make forward-‐looking statements regarding future events or the expected performance of the company. We cauAon you that such statements reflect our current expectaAons and esAmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-‐looking statements, please review our filings with the SEC. The forward-‐looking statements made in this presentaAon are being made as of the Ame and date of its live presentaAon. If reviewed aUer its live presentaAon, this presentaAon may not contain current or accurate informaAon. We do not assume any obligaAon to update any forward-‐looking statements we may make. In addiAon, any informaAon about our roadmap outlines our general product direcAon and is subject to change at any Ame without noAce. It is for informaAonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaAon either to develop the features or funcAonality described or to include any such feature or funcAonality in a future release.
Splunk, Splunk>, Splunk Storm, Listen to Your Data, SPL and The Engine for Machine Data are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respecDve
owners.
©2013 Splunk Inc. All rights reserved.
2
About Me
! 2+ years @ Splunk ! Diverse experience on a mulAtude of engagements ! Live outside of DC, focused on Public Sector ! Background in development
3
Agenda
! Growing pains ! Core components
– Indexing – Searching – Deployment Server|Clustering|Data Inputs
! Random Adings – Config Files – ApplicaAon Breakdown – Upgrading – Precedence
4
Growing Pains
Plan
6
! Single Splunk instance… Super simple – Do this on your laptop/desktop, get familiar
! Complexity arises with use and when you scale up ! Planning will go a long, long way… The more you know, the more you prepare, the easier it will be
! So try to understand your environment as much as possible – High index capacity with low concurrent users requires a different
footprint than lower capacity and high number of searches
! An enterprise Splunk soluAon takes planning and dedicated resources to care and feed – Enterprise Splunk = Enterprise Plan
Therefore… Plan for Growth
7
! Splunk is very flexible, but ensure you have enough at all Aers (forwarders, indexers, search) ! A boileneck today can be remedied…but something else will take it’s place ! Use more nodes to scale up, not bigger machines (when it doubt = reference architecture)
indexer indexers search head & indexers
search head, deployment server & indexers
The Core
Indexing
9
! MulAple indexers then always use autoLB selng on forwarders – Don’t funnel (excepAons may apply), send directly to the indexers – Ensure you have enough scale at each level, forwarders and indexers
! Don’t dedicate parAcular data sources to parAcular indexers ! UAlize UF’s
– autoLBFrequency by default sends at least 30 seconds to a single indexer ê Be aware of load coming in, a single indexer can be backed up while others are idle so decrease the autoLBFrequency interval, and increase the input queue size to contain it
– Update password from changeme ê In Public Sector you will get hit with STIG findings
– Out of the box we throile it, change limits.conf maxKBps selng to zero
Indexers are Made Up of Indexes
10
! If data is not told where to go then it’s going to main ! Why separate?
– Does this data have a different retenAon policy? – Does this data have access restricAons? – Do we want to make it easier to use, increase performance?
! You should always separate for at a minimum ease of use and performance
! But don’t go crazy
– Hard and fast
Indexing
11
! Splunk is temporal; incorrect Ame stamping is what we want to avoid
! Indexes are made up of buckets (hot|warm|cold) – “hot” buckets are the only ones being wriien to, hot_v1_## – The others are easily idenAfiable, db_1375215356_1374274212_##
! Give some wiggle room on high volume indexes – indexes.conf -‐> maxHotBuckets = 10 & maxDataSize = auto_high_volume
! Data will NOT be purged by Splunk unAl ALL events in the bucket reach the expiraAon date
What Does a Bucket Look Like?
12
Latest Event: 8/30/2013 20:15:56
db_1375215356_1374274212_##
Earliest Event: 06/19/2013 22:50:12
! Remember, this bucket will not be removed unAl the event from 8/30/2013 20:15:56 has reached its expiraAon
! Be aware when you onboard new servers as archival of old data will affect the range of a bucket
! Hot buckets roll on restart automaAcally or based on selngs in indexes.conf ! Storage Math: GB/day * .5 (compression) * RetenAon Policy * 1.10 (padding)
Indexer Affinity
13
! [monitor::///path/to/files] ! Could go awry in a mulA-‐indexer environment in some circumstances (.zip files, UF listening to UDP|TCP directly)
forwarder
indexers
Know the Indexing Queues
Parsing Queue
Typing Queue
Index Queue
UTF Encoding
Parsing Pipeline
Merging Pipeline
Typing Pipeline
Index Pipeline
Header Processing
Forwarding (tcpout|syslog out)
Indexing
Agg Queue
Block Signature
Line Merging
Timestamp ExtracAon
Line Breaking
Regex Replacement
Annotator
Network Inputs
File System Inputs
Scripted Inputs
Modular Inputs
Debugging the Indexing Queues
15
! You can aiend Octavio’s session: The S.o.S App: All Splunk on Splunk AcDon, All The Time
! Queues are full? • parsingQueue/aggQueue
- Ensure proper Ame stamping and line breaking for your events, wasted I/O & CPU cycles
- props.conf -‐ TIME_PREFIX, TIME_FORMAT, TZ, LINE_BREAKER, TRUNCATE, SHOULD_LINEMERGE (false), MAX_TIMESTAMP_LOOKAHEAD !
• typingQueue - props.conf -‐ TRANSFORMS-xxx, SEDCMD!- transforms.conf -‐ SOURCE_KEY, DEST_KEY, REGEX, FORMAT!
• indexQueue - Bad I/O on the storage, this will cause everything else to back up
Search Types
16
Type Reference Hardware
Performance Impact Notes
Dense 50K matching events /second CPU Generally tax CPU because of
decompression
Sparse 5K matching events/second CPU Generally returning .01 to 1%
Super-‐sparse
Up to 2 seconds/bucket I/O Could take long Ame with a lot
of buckets “needle haystack”
Rare From 10-‐50 buckets/second I/O Take advantage of bloom
filters
Search Tips
17
! Events are broken down into tokens, tokens are split by segmenters major and minor
! TERM() and CASE() – TERM: will disable the segmenters and make some searches more efficient
ê 10.1.2.3 = 10 AND 1 AND 2 AND 3 AND “10.1” AND “10.1.2” AND 10.1.2.3 ê TERM(10.1.2.3) = 10.1.2.3
– CASE: exactly what you think, case sensiAve search, by default a search for the word “Splunk” will hit on splunk, Splunk, SPLUNK, etc…
! Avoid vague wildcards and all-‐Ame searches ! Use metadata (source, sourcetype, host, index) to speed up ! OpAmize your buckets
More Search Tips ! Search performance: idenAfying slow searches, re-‐factoring searches to take advantage of map-‐reduce – The scheduler.log is filled with a lot of the details – S.o.S applicaAon has a good dashboard on high cost searches
! Bundle replicaAon white/blacklists… Upgrade to 5.x for file based replicaAon if you haven’t
! Summary indexing, report acceleraAon, TSIDX (Splunk Enterprise 6) may help
! Spreading out scheduled searches using cron syntax ! Using snaps effecAvely and reduce the risk of missing delayed events
18
Deployment Server
19
! You could have aiended GenA’s session: Best PracDces + New Feature Overview for the Latest Version of Splunk Deployment Server
! clientName is a great feature – Can only be used once per deployment client but deployment client’s can share the
same clientName serverclass.conf: [serverClass:all_indexer]!whitelist.0 = splk-indexer![serverClass:all_indexer:app:org_all_indexer_base]!![serverClass:all_search]!whitelist.0 = splk-search!![serverClass:all_forwarders]!whitelist.0 = *!blacklist.0 = splk-*!
deploymentclient.conf: [deployment-client]!clientName = splk-indexer!
Cluster Master
20
! New component for HA in 5.x ! Changes the way indexer config files are distributed
– If you have the Deployment Server configured you need to alter the setup – Config files are distributed to indexers via the Cluster Master’s
$SPLUNK_HOME/etc/master-‐apps directory
! Avoid rebalancing during an indexer restart by stopping the cluster master first, then indexers
! You can aiend Dritan’s session: Architect Splunk for High Availability and Disaster Recovery
Data Ingest
21
! You can aiend Maiy’s session: Onboard Data into Splunk, Correctly ! Time stamp and line breaking are the most important
– Splunk is smart, it will probably get it right – But, you can make it more efficient
props.conf [my_sourcetype]!TIME_PREFIX = ^!MAX_TIMESTAMP_LOOKAHEAD = 19!TIME_FORMAT = %Y-%m-%d %H:%M:%S!TZ = GMT!LINE_BREAKER = ([\r\n]+)\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}!SHOULD_LINEMERGE = false!TRUNCATE = 10000!KV_MODE = none|auto|multi|json|xml!ANNOTATE_PUNCT = false!
Data Ingest
22
! Use LINE_BREAKER for mulA-‐line events – SHOULD_LINEMERGE performs extra line breaking, selng to false in props.conf will
save indexing Ame
! Don’t use the punct field? Then override ANNOTATE_PUNCT!! TesAng tricks
– Use a test index for scrubbing data or use data preview of Splunk Web (5.x) – Rex command is useful for tesAng your regex (or use IFX) – A new Splunk search will pick up new search Ame extracAons, props or transforms
(EXTRACT-xxx or REPORT-xxx) – Transforms against raw indexed data will NOT be picked up, requires a restart
(TRANSFORMS-xxx) – “One shot” is useful for adding data to a test index (CLI driven)
Syslog Collector
23
! UAlize rsyslog, syslog-‐ng, kiwi syslog instead of sending directly to Splunk – Small retenAon policy on the syslog collector; a couple of days is typically
sufficient because the data is forwarded to Splunk for long term storage – Take advantage of Splunk UF’s for load balancing, UF’s also know when
Indexers are unresponsive
forwarder indexer syslog devices
Random Tidings
.conf “file” Overload
25
! We love configuraAon files, a significant amount is configured via Splunk Web – But, you sAll need SSH or RDP access, not everything is exposed
! Don’t touch the default – This directory structure is for the developer, use local for your configuraAons
(always)
! Props, transforms, inputs, outputs, web, server, limits, indexes, tags, eveniypes and just too many more to name – Take it in stride – Read the specs, typically filled with good informaAon and ships with every instance
$SPLUNK_HOME/etc/system/README/ (it’s on that internet thing also)
.conf “file” breakdown
26
! Consist of [stanzas] ! Followed by <airibute> = value pairs – <airibute> is CaSe SeNsIAVe so one
will work the other will not – Some aiributes are required, some
are not
! [stanzas] can have scope, more specific scoped take precedence
outputs.conf:![tcpout]!indexAndForward = true!# this is a good comment and it sets the default value!compressed = true!![tcpout:primary_indexers]!autoLB = true!compressed = false # winner for stanza, but bad comment!server = primay1:9997, primary2:9997!![tcpout:secondary_indexers]!autoLB = true!server = secondary1:9997, secondary2:9997!
ApplicaAons
27
! Splunkbase has 400+ apps targeAng specific technologies and use cases
! They are all wriien for a single Splunk server installaAon
! Know the config files: – inputs.conf -‐> how are you collecAng the data the
app is targeAng? what is the sourcetype? – props & transforms.conf -‐> how is the data being
indexed, what are the fields? – indexes.conf -‐> what indexes does the app rely on? – savedsearches, macros, even8ypes, tags.conf?
More on ApplicaAons
28
! What goes where? Indexer vs. Search head vs. Forwarder – inputs.conf data collecAon definiAons, typically they are disabled by default
(forwarder) – props.conf & transforms.conf may contain selngs for both parsing Aer
(indexer, heavy forwarder) and the search Aer (search head) ê Look for TRANSFORMS-xxx in props, this means parsing Aer
– indexes.conf typically containing summary index definiAons (indexer)
! Deployment server? – Don’t’ put apps that have a GUI; install them directly on the search head – Override app specific configs in /<appname>/local directory – Migrate parsing layer configs and indexes.conf to a DS based app
ApplicaAon… Broken?
29
! Installed ! But, dashboards aren’t painAng! InvesAgate config files in the default directory – props.conf:
ê What sourcetype or source [stanza’s] exist? ê Do your events sourcetype or source match those [stanza’s]?
! Ok, source|sourcetype matches up, dashboards are sAll not painAng! – even8ypes.conf|savedsearches.conf:
ê InvesAgate the searches and run them in Splunk, do you get results? ê Maybe your events have a slightly different format from what the app is expecAng, override the config file(s) in local
! Ok, data is painAng, but it’s the wrong fields – props.conf|transforms.conf:
ê The field extracAons are off, override the config file(s) in local
Upgrading
30
! Have a plan and read the release notes (especially major releases) ! If you don’t restart oUen, perform a sanity restart before upgrade ! Try and keep distributed search Aer (indexers|search heads) on same version, less important for forwarders
! Be mindful of overridden files especially if you copied the whole file to local – Only copy the [stanza] and <airibute> = value pair
! Backup directories: – $SPLUNK_DB (i.e. your indexes) – $SPLUNK_HOME/etc/
! Upgrade Splunk and let seile before App upgrades
*.conf Precedence
31
! Config files with the same name are combined at startup… This could lead to conflicts – General directory order: etc/system/local, etc/apps/<appname>/local, etc/
apps/<appname>/default, etc/system/default – Clustering changes this behavior, be aware and read the docs
! Search Ame field extracAons behave differently – Fall to /etc/user/<username>/ directory structure for highest priority
Details in precedence
32
! props.conf – RENAME (sourcetype), EXTRACT-xxx, REPORT-xxx, KV_MODE, FIELDALIAS-xxx, EVAL-xxx, LOOKUP-xxx, MILLISECONDS, FILTER, EVENTTYPING & TAGGING!ê Collisions within same [stanza] name fall to ASCII order to determine winner ê You can use priority airibute to bypass ASCII
! For index Ame transforms it only enters the parsing pipeline once – Remember this when using SOURCE_KEY in transforms.conf or calling
mulAple transform stanza’s from props.conf
! Use btool to validate which [stanza] wins – Remember to run as same user running Splunk!
Other stuff
33
! Regular Expressions – gskinner.com/RegExr – PCRE
! TesAng your IOPs – bonnie++ – iozone (app we are developing)
! What’s your ulimit? – hip://blogs.splunk.com/2011/11/21/whats-‐your-‐ulimit/
! If you are running Splunk as root… Buyer beware
Final Thoughts
34
! Chances are there is more than one way to do what you are looking to do
! Your Network = Your Responsibility – Have a plan!
! Due Diligence = docs.splunk.com (RT_M) – Hardware sizing – Precedence – Parsing and rouAng of data – Much more… A B C D E F G H I J K L M N
O P Q R S T U V W X Y Z
More InformaAon
35
! Contact: [email protected] ! ApplicaAons: apps.splunk.com ! Answers: answers.splunk.com ! EducaAon: www.splunk.com/view/educaAon/SP-‐CAAAAH9 ! Professional Services: www.splunk.com/view/professional-‐services/SP-‐CAAABH9
! Videos: www.splunk.com/videos
Next Steps
36
Download the .conf2013 Mobile App If not iPhone, iPad or Android, use the Web App
Take the survey & WIN A PASS FOR .CONF2014… Or one of these bags! View the sessions listed on next slide All sessions are available on the Mobile App Videos will be available shortly
1
2
3
What’s Next! ! Architect Splunk for Physical, Virtual and Cloud Environments
! Architect Splunk for High Availability and Disaster Recovery
! Onboard Data into Splunk, Correctly
! The S.o.S App: All Splunk on Splunk Action, All The Time
! Planning and Execution for Successful Deployments
37
Q & A
THANK YOU