splunk application logging best practices
TRANSCRIPT
Copyright*©*2012*Splunk*Inc.*
Applica9on*Logging*Best*Prac9ces*Clint*Sharp,*Geek*Marketeer*
#datajourney*
Legal*No9ces*During*the*course*of*this*presenta9on,*we*may*make*forwardJlooking*statements*regarding*future*events*or*the*expected*performance*of*the*company.*We*cau9on*you*that*such*statements*reflect*our*current*expecta9ons*and*es9mates*based*on*factors*currently*known*to*us*and*that*actual*events*or*results*could*differ*materially.*For*important*factors*that*may*cause*actual*results*to*differ*from*those*contained*in*our*forwardJlooking*statements,*please*review*our*filings*with*the*SEC.**The*forwardJlooking*statements*made*in*this*presenta9on*are*being*made*as*of*the*9me*and*date*of*its*live*presenta9on.**If*reviewed*aUer*its*live*presenta9on,*this*presenta9on*may*not*contain*current*or*accurate*informa9on.***We*do*not*assume*any*obliga9on*to*update*any*forwardJlooking*statements*we*may*make.**In*addi9on,*any*informa9on*about*our*roadmap*outlines*our*general*product*direc9on*and*is*subject*to*change*at*any*9me*without*no9ce.**It*is*for*informa9onal*purposes*only*and*shall*not,*be*incorporated*into*any*contract*or*other*commitment.**Splunk*undertakes*no*obliga9on*either*to*develop*the*features*or*func9onality*described*or*to*include*any*such*feature*or*func9onality*in*a*future*release.*
*
Splunk,(the(engine(for(machine(data([MODIFY*THIS*TO*LIST*THOSE*SPLUNK*TRADEMARKS*REFERENCED*IN*PRESENTATION](are(registered(trademarks(or(trademarks(of(Splunk(Inc.(and/or(its(subsidiaries(and/or(affiliates(in(the(United(States(and/or(other(jurisdic=ons.(*All(other(brand(names,(product(names(or(trademarks(belong(to(their(respec=ve(holders.((
©2012(Splunk(Inc.(All(rights(reserved.*
2*
Agenda*
! *Se^ng*some*context*! *Early*vs.*Late*Binding*Schema*! *Logging*best*prac9ces*! *Basic*Opera9onal*best*prac9ces*! *Developer*best*prac9ces*
Why*Should*You*Care*How*to*Log?*
! Isn’t*logging*only*for*errors?*! How*much*code*is*that?*! What*will*it*get*me?*! Why*wouldn’t*I*just*use*a*ByteJCode*Instrumenta9on*product?*
I’ll*give*you*a*hint,*I’m*going*to*answer*all*my*own*ques9ons*
Life*Sucks*for*Developers*
! You*have*to*debug*complex*distributed*applica9ons*! You*might*need*expensive/heavy*tools*in*development*(can’t*be*moved*to*produc9on)*
! Need*many*different*tools*for*different*purposes*
! Lots*of*code*is*NOT*under*your*control*–*only*pieces*
Life*is*Great*for*Developers*
! At*least*you*have*a*job*in*this*economy*! You*get*paid*well*(!?!?!?)*! You*can*dress*however*you*like*(kilts,*etc)*
“Seman9c*Logging”*
! You*have*no*control*over*other*systems*events*! You*have*full*control*over*events*that*YOU*write*! Most*events*are*wrijen*by*developers*to*help*them*debug**! Some*events*are*wrijen*to*form*an*audit*trail*
Seman&c(Events(are*wrijen*explicitly**for*the*gathering*of*analy9cs*
Late*Binding*Schema*
Splunk*applies*structure**at(search(=me(
(We(call(this(“Late(Binding(Schema”*
10*
Early*vs.*Late*Binding*Schema*
SELECT*customers.**FROM*customers*WHERE*customers.customer_id*NOT*IN(SELECT*customer_id*FROM*orders*WHERE*year(orders.order_date)*=*2004)*
Early*Structure*Binding*J*Tradi9onal*
Structure! Data!• Schema*–*created*at*design*9me*
• Queries*–*understood*at*design*9me*for*maximum*performance*
• Homogeneous*–*must*fit*into*tables*or*be*converted*to*fit*into*tables**
• Must*exactly*match*constraints*
Early*vs.*Late*Binding*Schema*Late*Structure*Binding*J*Splunk*
Structure! Data!• SchemaJless*• Created*at*search(9me*
• Queries/searches*can*be*adJhoc*
*
• Heterogeneous*–*can*come*from*any*textual*source*
• Constantly*changing*• No*conversion*required,*no*constraints*
Analy9cs*Early*Structure*Binding*
Decide*the*ques9on(s)*you*want*to*ask*
Design*the*Schema*
Normalize*the*data*and*write*DB*inser9on*code*
Create*SQL*&*Feed*into*Analy9cs*Tool*
Write*Seman9c*Events*
Collect*with*Splunk*
Create*Searches,*Reports*&*Graphs*
Late*Binding*Schema*
(Minutes(&(NonMDestruc=ve)(
(Days,Weeks(Or(Months(&(Distruc=ve)(
Logging*Best*Prac9ces*! Log*in*Text*–*Binary*sounds*good*because*it’s*compressed,*but*it*
requires*decoding*and*will*not*segment*! Make(it(easy(for(humans*–*Try*not*to*use*complex*encoding*that*
require*lookups*! Categorize(–*Use*INFO,*WARN,*ERROR,*DEBUG,*etc(! Don’t(use(XML(–(Unless*you*absolutely*need*mul9Jdepth*nes9ng*
– We’re*happy*for*you*to*pay*us*to*log*in*XML,*but*JSON*is*much*easier*to*read*
! JSON(is(beXer*–*Splunk*has*na9ve*JSON*support,*even*for*nested*structures(
! Keep(mul=Mline(events(to(a(minimum(
Logging*Best*Prac9ces*
! Do(not(use(=me(offsets(
! Use(human(readable(=mestamps(
! Favor(the(beginning(of(the(line*–*the*farther*you*place*the*9mestamp*from*the*beginning,*the*more*difficult*it*is*to*tell*it’s*a*9mestamp*and*not*other*data*
Clearly*Timestamp*Every*Event*
Logging*Best*Prac9ces*Log*more*than*just*Debugging*Events**
Log!anything!that!can!add!value!when!aggregated,!charted!or!further!analyzed!Example!Bogus!Pseudo?Code:!!void*submitPurchase(purchaseId) !{ !
!log.info("action=submitPurchaseStart, purchaseId=%d", purchaseId)! !//these calls throw an exception on error! !submitToCreditCard(...)! !generateInvoice(...)! !generateFullfillmentOrder(...)! !log.info("action=submitPurchaseCompleted, purchaseId=%d", purchaseId)! } !!• **Graph*purchase*volume*by*hour,*by*day,*by*month.**• **How*long*are*purchases*taking*during*different*9mes*of*the*day*and*different*days*of*the*week?**• **Are*purchases*taking*longer*than*they*did*last*month?**• **Are*my*systems*ge^ng*slower*and*slower,*or*are*they*ok?**• **How*many*purchases*are*failing?*Graph*the*failures*over*9me.**• **Which*specific*purchases*are*failing?**!
Logging*Best*Prac9ces*Clearly*mark*key/value*pairs*
Splunk!loves!key!value!pairs!that!look!like:!****************key=value,*key2=value2,*key3=value3….*
Look!at!the!following!events:!****1)**Log.debug(“error*%d”,*userId)***2)**Log.debug(“orderstatus=error*errorcode=454*user=%d”,*userId)**Searching*for*“error”*if*logging*using*#1,*will*probably*bring*back*all*kinds*of*errors,*but*searching*for*orderstatus=error*will*bring*back*only*the*ones*you*really*want.**Sure,*it’s*verbose*–*but*Splunk*because*Splunk(Compresses,*this*yields*good*compression*due*to*repeatable*terms*
Logging*Best*Prac9ces*Break*mul9Jvalue*informa9on*into*separate*events*
Example:!Events*represent*what*apps*are*installed*on*a*mobile*device**<TS>*phonenumber=333J444J4444,*app=angrybirds,*installdate=xx/xx/xx*<TS>*phonenumber=333J444J4444,app=facebook,installdate=yy/yy/yy**Use*the*“transac9on”*search*command*to*group*them*
*!If!you!do!this,!you’ll!have!to!edit!a!config!file:!
<TS>*phonenumber=333J444J4444,app=angrybirds,facebook*
Logging*Best*Prac9ces*
! Log*Unique*Iden9fiers*! Carry*Unique*Iden9fiers*through*mul9ple*touch*points*if*possible**
" enables*transac9on*search*! Use*Transi9ve*Closure*if*you*need*to:**
transid=abcdef,**transid=abcdef,**otherid=*qrstuv,*.*.*.*.*.*otherid=qrstuv*
TransacGon!
Why*JSON*
! Direct*to/from*Data*Structure*in*Modern*Languages*
" Python,*Ruby,*Javascript,*etc*
! Easy*to*serialize/deJserialize*objects*to/from*JSON*
" Thus*storing*and*retrieving*objects*via*Splunk*
! It’s*the*“Lingua*Franca”*of*light*weight*Cloud*Services*" Web*Hooks*and*push*
*
JSON*Search*Examples*{"web-app": {! "servlet": [ ! {! "servlet-name": "cofaxCDS",! "servlet-class": "org.cofax.cds.CDSServlet",! "init-param": {! "configGlossary:installationAt": "Philadelphia, PA",! "configGlossary:adminEmail": "[email protected]",! "maxUrlLength": 500}},! {! "servlet-name": "cofaxEmail",! "servlet-class": "org.cofax.cds.EmailServlet",! "init-param": {! "mailHost": "mail1",! "mailHostOverride": "mail2"}},! {! "servlet-name": "cofaxAdmin",! "servlet-class": "org.cofax.cds.AdminServlet"},!! {! "servlet-name": "fileServlet",! "servlet-class": "org.cofax.cds.FileServlet"},!!. . . . . . . .!!
source="/Users/wma/splunk/siJstaging/sample.json"*|*spath*output=foo*path=webJapp.servlet{2}.servletJclass*|*top*foo*
Opera9onal*Best*Prac9ces*for*Splunk*! Log(locally(to(files(! Use(rota=on(policies(–*destroy*or*back*up*(your*choice)*! Run(Splunk(Forwarders(
" *provides*elas9c*buffering*–*or*else*produc9on***applica9ons*can*block!*
**
Splunk!Indexer!or!Storm!
Network*
Local*Log*File*
Splunk!Forwarder!
Event!!Producing!ApplicaGon!
Opera9onal*Best*Prac9ces*for*Splunk*
! Syslog*is*great*for*large*volumes*of*low*value*data.*" Obviously*lossy*" But*has*exis9ng*services*on*U*nix*
! Syslog*NG*is*bejer,*but*watch*your*configura9on*! Syslog*can’t*handle*mul9Jline*events.**Packet*sizes*are*too*
small.**
Opera9onal*Best*Prac9ces*for*Enterprise*Splunk*
! Over*provision*indexers(" More*indexers*=*bejer*search*performance*" I’ve*seen*too*many*people*underJpower*their*Splunk**machines*and*then*complain*that*Splunk*is*slow*
***More*indexers*will*add*more*paralleliza9on*to*searches****
Opera9onal*Best*Prac9ces*for*Splunk*
The(more(you(put(in(Splunk,(the(more(visibility(you(have:*! Applica9on*logs*! Database*logs*! Network*logs*! Configura9on*files*! Performance*data*(iostat,*vmstat,*ps,*etc)*! Anything*that*has*a*9me*component***
Development*Best*Prac9ces*
! Developer*teams*are*now*required*to*create*tags*and*nota9ons*in*logs*for*easier*iden9fica9on*
! Part*of*each*applica9on*backlog*includes*crea9ng*custom*Splunk*reports*dashboards*and*alerts*
! Enrich*your*Logs!*" Build*in*specific*tags*and*keywords*" Standardize*an*op9mize*your*log*formats*
**Washington*Post*Splunk*Presenta9on*
Well*Instrumented*Applica9ons*Can*Get*You*
! Per*API*performance*metrics*with*almost*no*overhead*! Detailed*tracing*of*where*an*error*occurs*in*a*flow*! Bejer*monitoring*! Interes9ng*“Found*Data”*! Business*Analy9cs*along*with*Performance*Analy9cs*
" Be*a*hero*to*your*boss*by*accident!*
Middleware*Performance*Example*
35*
2011J07J28*09:21:47*server=sandapcspapl1*adaptor=APL*call=ValidateAc9va9onPayment*type=Requests*val=1*newval=109083*oldval=109082*2011J07J28*09:21:47*server=sandapcspapl1*adaptor=APL*call=GetCustomerInforma9on*type=ResponseTime*val=1142*newval=1142*oldval=1318*2011J07J28*09:21:47*server=sandapcspapl1*adaptor=APL*call=UpdateAc9va9onPayment*type=Successful*val=3*newval=103334*oldval=103331*2011J07J28*09:21:47*server=sandapcspapl1*adaptor=APL*call=ValidateAc9va9onPayment*type=RequestsOneMinuteCount*val=1*newval=1*oldval=0*2011J07J28*09:21:47*server=sandapcspapl1*adaptor=APL*call=PostPaygoPayment*type=Successful*val=6*newval=178006*oldval=178000*