splunk application logging best practices

38
Copyright © 2012 Splunk Inc. Applica9on Logging Best Prac9ces Clint Sharp, Geek Marketeer #datajourney

Upload: greg-hanchin

Post on 20-Aug-2015

2.365 views

Category:

Technology


1 download

TRANSCRIPT

Copyright*©*2012*Splunk*Inc.*

Applica9on*Logging*Best*Prac9ces*Clint*Sharp,*Geek*Marketeer*

#datajourney*

Legal*No9ces*During*the*course*of*this*presenta9on,*we*may*make*forwardJlooking*statements*regarding*future*events*or*the*expected*performance*of*the*company.*We*cau9on*you*that*such*statements*reflect*our*current*expecta9ons*and*es9mates*based*on*factors*currently*known*to*us*and*that*actual*events*or*results*could*differ*materially.*For*important*factors*that*may*cause*actual*results*to*differ*from*those*contained*in*our*forwardJlooking*statements,*please*review*our*filings*with*the*SEC.**The*forwardJlooking*statements*made*in*this*presenta9on*are*being*made*as*of*the*9me*and*date*of*its*live*presenta9on.**If*reviewed*aUer*its*live*presenta9on,*this*presenta9on*may*not*contain*current*or*accurate*informa9on.***We*do*not*assume*any*obliga9on*to*update*any*forwardJlooking*statements*we*may*make.**In*addi9on,*any*informa9on*about*our*roadmap*outlines*our*general*product*direc9on*and*is*subject*to*change*at*any*9me*without*no9ce.**It*is*for*informa9onal*purposes*only*and*shall*not,*be*incorporated*into*any*contract*or*other*commitment.**Splunk*undertakes*no*obliga9on*either*to*develop*the*features*or*func9onality*described*or*to*include*any*such*feature*or*func9onality*in*a*future*release.*

*

Splunk,(the(engine(for(machine(data([MODIFY*THIS*TO*LIST*THOSE*SPLUNK*TRADEMARKS*REFERENCED*IN*PRESENTATION](are(registered(trademarks(or(trademarks(of(Splunk(Inc.(and/or(its(subsidiaries(and/or(affiliates(in(the(United(States(and/or(other(jurisdic=ons.(*All(other(brand(names,(product(names(or(trademarks(belong(to(their(respec=ve(holders.((

©2012(Splunk(Inc.(All(rights(reserved.*

2*

Agenda*

! *Se^ng*some*context*! *Early*vs.*Late*Binding*Schema*! *Logging*best*prac9ces*! *Basic*Opera9onal*best*prac9ces*! *Developer*best*prac9ces*

4*

Why*Should*You*Care*How*to*Log?*

! Isn’t*logging*only*for*errors?*! How*much*code*is*that?*! What*will*it*get*me?*! Why*wouldn’t*I*just*use*a*ByteJCode*Instrumenta9on*product?*

I’ll*give*you*a*hint,*I’m*going*to*answer*all*my*own*ques9ons*

Life*Sucks*for*Developers*

! You*have*to*debug*complex*distributed*applica9ons*! You*might*need*expensive/heavy*tools*in*development*(can’t*be*moved*to*produc9on)*

! Need*many*different*tools*for*different*purposes*

! Lots*of*code*is*NOT*under*your*control*–*only*pieces*

Life*is*Great*for*Developers*

! At*least*you*have*a*job*in*this*economy*! You*get*paid*well*(!?!?!?)*! You*can*dress*however*you*like*(kilts,*etc)*

“Seman9c*Logging”*

!  You*have*no*control*over*other*systems*events*!  You*have*full*control*over*events*that*YOU*write*!  Most*events*are*wrijen*by*developers*to*help*them*debug**!  Some*events*are*wrijen*to*form*an*audit*trail*

Seman&c(Events(are*wrijen*explicitly**for*the*gathering*of*analy9cs*

Late*Binding*Schema*

Splunk*knows*virtually*nothing*about*the*data*as*it*is*indexed*

9*

Late*Binding*Schema*

Splunk*applies*structure**at(search(=me(

(We(call(this(“Late(Binding(Schema”*

10*

Early*vs.*Late*Binding*Schema*

SELECT*customers.**FROM*customers*WHERE*customers.customer_id*NOT*IN(SELECT*customer_id*FROM*orders*WHERE*year(orders.order_date)*=*2004)*

Early*Structure*Binding*J*Tradi9onal*

Structure! Data!•  Schema*–*created*at*design*9me*

•  Queries*–*understood*at*design*9me*for*maximum*performance*

•  Homogeneous*–*must*fit*into*tables*or*be*converted*to*fit*into*tables**

•  Must*exactly*match*constraints*

Early*vs.*Late*Binding*Schema*Late*Structure*Binding*J*Splunk*

Structure! Data!•  SchemaJless*•  Created*at*search(9me*

•  Queries/searches*can*be*adJhoc*

*

•  Heterogeneous*–*can*come*from*any*textual*source*

•  Constantly*changing*•  No*conversion*required,*no*constraints*

Analy9cs*Early*Structure*Binding*

Decide*the*ques9on(s)*you*want*to*ask*

Design*the*Schema*

Normalize*the*data*and*write*DB*inser9on*code*

Create*SQL*&*Feed*into*Analy9cs*Tool*

Write*Seman9c*Events*

Collect*with*Splunk*

Create*Searches,*Reports*&*Graphs*

Late*Binding*Schema*

(Minutes(&(NonMDestruc=ve)(

(Days,Weeks(Or(Months(&(Distruc=ve)(

Logging*Best*Prac9ces*

Create*Human*Readable*Events**

**For*the*most*part*

Logging*Best*Prac9ces*!  Log*in*Text*–*Binary*sounds*good*because*it’s*compressed,*but*it*

requires*decoding*and*will*not*segment*!  Make(it(easy(for(humans*–*Try*not*to*use*complex*encoding*that*

require*lookups*!  Categorize(–*Use*INFO,*WARN,*ERROR,*DEBUG,*etc(!  Don’t(use(XML(–(Unless*you*absolutely*need*mul9Jdepth*nes9ng*

–  We’re*happy*for*you*to*pay*us*to*log*in*XML,*but*JSON*is*much*easier*to*read*

!  JSON(is(beXer*–*Splunk*has*na9ve*JSON*support,*even*for*nested*structures(

!  Keep(mul=Mline(events(to(a(minimum(

Logging*Best*Prac9ces*

!  Do(not(use(=me(offsets(

!  Use(human(readable(=mestamps(

!  Favor(the(beginning(of(the(line*–*the*farther*you*place*the*9mestamp*from*the*beginning,*the*more*difficult*it*is*to*tell*it’s*a*9mestamp*and*not*other*data*

Clearly*Timestamp*Every*Event*

Logging*Best*Prac9ces*Log*more*than*just*Debugging*Events**

Log!anything!that!can!add!value!when!aggregated,!charted!or!further!analyzed!Example!Bogus!Pseudo?Code:!!void*submitPurchase(purchaseId) !{ !

!log.info("action=submitPurchaseStart, purchaseId=%d", purchaseId)! !//these calls throw an exception on error! !submitToCreditCard(...)! !generateInvoice(...)! !generateFullfillmentOrder(...)! !log.info("action=submitPurchaseCompleted, purchaseId=%d", purchaseId)! } !!• **Graph*purchase*volume*by*hour,*by*day,*by*month.**• **How*long*are*purchases*taking*during*different*9mes*of*the*day*and*different*days*of*the*week?**• **Are*purchases*taking*longer*than*they*did*last*month?**• **Are*my*systems*ge^ng*slower*and*slower,*or*are*they*ok?**• **How*many*purchases*are*failing?*Graph*the*failures*over*9me.**• **Which*specific*purchases*are*failing?**!

Logging*Best*Prac9ces*Clearly*mark*key/value*pairs*

Splunk!loves!key!value!pairs!that!look!like:!****************key=value,*key2=value2,*key3=value3….*

Look!at!the!following!events:!****1)**Log.debug(“error*%d”,*userId)***2)**Log.debug(“orderstatus=error*errorcode=454*user=%d”,*userId)**Searching*for*“error”*if*logging*using*#1,*will*probably*bring*back*all*kinds*of*errors,*but*searching*for*orderstatus=error*will*bring*back*only*the*ones*you*really*want.**Sure,*it’s*verbose*–*but*Splunk*because*Splunk(Compresses,*this*yields*good*compression*due*to*repeatable*terms*

Logging*Best*Prac9ces*Break*mul9Jvalue*informa9on*into*separate*events*

Example:!Events*represent*what*apps*are*installed*on*a*mobile*device**<TS>*phonenumber=333J444J4444,*app=angrybirds,*installdate=xx/xx/xx*<TS>*phonenumber=333J444J4444,app=facebook,installdate=yy/yy/yy**Use*the*“transac9on”*search*command*to*group*them*

*!If!you!do!this,!you’ll!have!to!edit!a!config!file:!

<TS>*phonenumber=333J444J4444,app=angrybirds,facebook*

Logging*Best*Prac9ces*

!  Log*Unique*Iden9fiers*!  Carry*Unique*Iden9fiers*through*mul9ple*touch*points*if*possible**

"  enables*transac9on*search*!  Use*Transi9ve*Closure*if*you*need*to:**

transid=abcdef,**transid=abcdef,**otherid=*qrstuv,*.*.*.*.*.*otherid=qrstuv*

TransacGon!

21*

Quick*Seman9c*Logging*Demo*

Why*JSON*

!  Direct*to/from*Data*Structure*in*Modern*Languages*

"  Python,*Ruby,*Javascript,*etc*

!  Easy*to*serialize/deJserialize*objects*to/from*JSON*

"  Thus*storing*and*retrieving*objects*via*Splunk*

!  It’s*the*“Lingua*Franca”*of*light*weight*Cloud*Services*"  Web*Hooks*and*push*

*

JSON*Search*Examples*{"web-app": {! "servlet": [ ! {! "servlet-name": "cofaxCDS",! "servlet-class": "org.cofax.cds.CDSServlet",! "init-param": {! "configGlossary:installationAt": "Philadelphia, PA",! "configGlossary:adminEmail": "[email protected]",! "maxUrlLength": 500}},! {! "servlet-name": "cofaxEmail",! "servlet-class": "org.cofax.cds.EmailServlet",! "init-param": {! "mailHost": "mail1",! "mailHostOverride": "mail2"}},! {! "servlet-name": "cofaxAdmin",! "servlet-class": "org.cofax.cds.AdminServlet"},!! {! "servlet-name": "fileServlet",! "servlet-class": "org.cofax.cds.FileServlet"},!!. . . . . . . .!!

source="/Users/wma/splunk/siJstaging/sample.json"*|*spath*output=foo*path=webJapp.servlet{2}.servletJclass*|*top*foo*

Opera9onal*Best*Prac9ces*for*Splunk*!  Log(locally(to(files(!  Use(rota=on(policies(–*destroy*or*back*up*(your*choice)*!  Run(Splunk(Forwarders(

" *provides*elas9c*buffering*–*or*else*produc9on***applica9ons*can*block!*

**

Splunk!Indexer!or!Storm!

Network*

Local*Log*File*

Splunk!Forwarder!

Event!!Producing!ApplicaGon!

Opera9onal*Best*Prac9ces*for*Splunk*

!  Syslog*is*great*for*large*volumes*of*low*value*data.*" Obviously*lossy*" But*has*exis9ng*services*on*U*nix*

!  Syslog*NG*is*bejer,*but*watch*your*configura9on*!  Syslog*can’t*handle*mul9Jline*events.**Packet*sizes*are*too*

small.**

Opera9onal*Best*Prac9ces*for*Enterprise*Splunk*

!  Over*provision*indexers(" More*indexers*=*bejer*search*performance*"  I’ve*seen*too*many*people*underJpower*their*Splunk**machines*and*then*complain*that*Splunk*is*slow*

***More*indexers*will*add*more*paralleliza9on*to*searches****

Opera9onal*Best*Prac9ces*for*Splunk*

The(more(you(put(in(Splunk,(the(more(visibility(you(have:*!  Applica9on*logs*!  Database*logs*!  Network*logs*!  Configura9on*files*!  Performance*data*(iostat,*vmstat,*ps,*etc)*!  Anything*that*has*a*9me*component***

28*

Treat*Splunk*as*part*of*your*development*soUware*stack*

29*

Use*Splunk*as*your*Analy9cs*Engine**

Collect*events*from*every*single*machine*

Development*Best*Prac9ces*

!  Developer*teams*are*now*required*to*create*tags*and*nota9ons*in*logs*for*easier*iden9fica9on*

!  Part*of*each*applica9on*backlog*includes*crea9ng*custom*Splunk*reports*dashboards*and*alerts*

!  Enrich*your*Logs!*"  Build*in*specific*tags*and*keywords*"  Standardize*an*op9mize*your*log*formats*

**Washington*Post*Splunk*Presenta9on*

31*

Your*Code*Isn’t*Considered*“Delivered”*Un9l*You*Have*Built*Analy9cs*that*Support*it!*

What*It*Gets*You*

33*

Well*Instrumented*Applica9ons*Can*Get*You*

! Per*API*performance*metrics*with*almost*no*overhead*! Detailed*tracing*of*where*an*error*occurs*in*a*flow*! Bejer*monitoring*! Interes9ng*“Found*Data”*! Business*Analy9cs*along*with*Performance*Analy9cs*

"  Be*a*hero*to*your*boss*by*accident!*

Middleware*Performance*Example*

35*

2011J07J28*09:21:47*server=sandapcspapl1*adaptor=APL*call=ValidateAc9va9onPayment*type=Requests*val=1*newval=109083*oldval=109082*2011J07J28*09:21:47*server=sandapcspapl1*adaptor=APL*call=GetCustomerInforma9on*type=ResponseTime*val=1142*newval=1142*oldval=1318*2011J07J28*09:21:47*server=sandapcspapl1*adaptor=APL*call=UpdateAc9va9onPayment*type=Successful*val=3*newval=103334*oldval=103331*2011J07J28*09:21:47*server=sandapcspapl1*adaptor=APL*call=ValidateAc9va9onPayment*type=RequestsOneMinuteCount*val=1*newval=1*oldval=0*2011J07J28*09:21:47*server=sandapcspapl1*adaptor=APL*call=PostPaygoPayment*type=Successful*val=6*newval=178006*oldval=178000*

Per*API*Response*Times*

36*

Business*Metrics*(Sales)*

Thanks!*Ques9ons?*