sustainable logging – splunklive! 2014

Post on 05-Dec-2014

463 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

There are several factors that will make your Splunk implementation a success. This presentation covers why our organisation implemented Splunk for log management and the steps you can take to make your implementation successful.

TRANSCRIPT

Copyright © 2014 Splunk Inc.

Sustainable Logging: SUCCEEDING WITH SPLUNK

2

Paul Gilowey Foundation Technology Specialist

paul.gilowey@santam.co.za

@paulcgt

Sustainable Logging: SUCCEEDING WITH SPLUNK

Words and thoughts expressed herein are my own, and not those of Santam.

3

ww

w.d

an-d

are.

org

4

My technology background

5

The evolution that led to Splunk

6

In the beginning there was ONE.

depotwallpaper.com

7

Then things got really complex.

8

9

10

In 2012, a new project

11

A big decision

It’s time to say goodbye…

12

Highly distributed and integrated

13

A brand new world

Claims Finance Docs B2B Portal Legacy

Reverse Proxies

Load-balancers IDM Integration ESM Virtualisation

New Policy Administration

MDM

14

James Wheeler souvenirpixels.com

Too many logs to monitor

15 capetownstockphotos.com

So little time to trace problems

16

Not only in production

https://www.flickr.com/photos/wsdot/

17

On a tight timeline

18 https://www.flickr.com/photos/usnavy/

December 2013 Production and Non-Production

20GB

19

Now what?

So we’re collecting log events.

20

Developers like doing things the old way

21

tail -f ./catalina.out

22

We like this. It’s comforting.

23

Effecting change

24

CTO’s Office

Splunk users (dev, ops, etc.)

Choosing your champion

25

• have influence across departments

• act as product owner

• be fanatical

• be hands-on

• have a development background

• be an architect

Dave Keeshan - https://www.flickr.com/photos/spudmurphy/

Your champion should…

26

Tips to help your champion

27

Help developers

troubleshoot (even in dev)

Ed Yordon https://www.flickr.com/photos/yourdon/

28

Change how developers think

about log events

29

Police

lazy logging

[INFO ] Got here

[INFO ] finished loop 420

[INFO ] JDE…

[INFO ] >>>>>>>>AAAAAAAA

[INFO ] BBBBBBBBBBBBBBB

[ERROR] It failed!!!!!!

30

Ops might as well be blindfolded.

https://www.flickr.com/photos/foxtongue

31

Do you really want to be called at 2am?

32

Demonstrate thoughtful logging

[DEBUG] TxId=328, Counting invoice line items…

[INFO ] TxId=328, Invoice LineItemsTotal=420

[DEBUG] TxId=328, Calling remote service JDE…

[TRACE] TxId=328, JDE Request: {“TxID”:”328”,

“Items”[{“desc”:”Motor Vehicle”,”prem”:305.24},…

[WARN ] TxId=328, Timed out while calling remote service

JDE… target system may be down. Will retry in 30s.

33

Show the benefit of structured log events

[INFO] Purchase complete - total=42 currency=ZAR language=en_ZA priority=13

“Purchase complete” priority<4 |

stats sum(total) as currencyTotal by currency |

table currency, currencyTotal

34

11 Sep 2014 15:05:27,960 [Thread-428] [DEBUG] [stm.amx.communication.outboundcommunicationmanager] za.co.santam.communication.outboundcommunicationmanager.RunnableStatusReceiver - btid=77320d33-5f8c-4178-b13e-c594816463d8, cmpid=za.co.santam.communication.outboundcommunicationmanager.RunnableStatusReceiver, uid=System, za.co.santam.communication.outboundcommunicationmanager.RunnableStatusReceiver.processStatusMessage : Status [STATUS_PROCESSING_COMPLETED = 6], will act on [STATUS_FINISHED = 1], for now only GENERATE_DIGITAL_DOCUMENT.

11 Sep 2014 15:05:36,272 [Thread-428] [DEBUG] [stm.amx.communication.outboundcommunicationmanager] za.co.santam.communication.outboundcommunicationmanager.RunnableReceiver - btid=e76665e2-e876-455a-a087-aeb5ba97d5a8, cmpid=za.co.santam.communication.outboundcommunicationmanager.RunnableStatusReceiver, uid=System, za.co.santam.communication.outboundcommunicationmanager.RunnableStatusReceiver.processMessages : Blocking(2000) read storage until message arrives...

11 Sep 2014 15:05:36,472 [Thread-427] [DEBUG] [stm.amx.communication.outboundcommunicationmanager] za.co.santam.communication.outboundcommunicationmanager.RunnableReceiver - btid=e76665e2-e876-455a-a087-aeb5ba97d5a8, cmpid=za.co.santam.communication.outboundcommunicationmanager.RunnableStorageReceiver, uid=System, za.co.santam.communication.outboundcommunicationmanager.RunnableStorageReceiver.processMessages : message received.

11 Sep 2014 15:05:36,475 [Thread-427] [TRACE] [com.tibco.amx.platform] com.tibco.governance.amxagent.msginterceptor.component.AMXGovMsgInterceptorComponent - Target URI : urn:amx:env2/stm.amx.communication.outboundcommunicationmanager/StatusReceiver_1.2.0.v2014-09-10-1604#reference(StatusReceiver_ContentManagerProxyAsync_v4_Int).

Change this…

35

… into this.

36

Formalise stacktrace logging policy

Function call ->

Function call ->

Function call ->

Function call

<- Log stacktrace

<- Log stacktrace

<- Log stacktrace

<- Log stacktrace

37

Avoid filtering events.

[DEBUG] TxId=328, Real important debug statement.

[INFO ] TxId=328, This would have been useful to see...

[DEBUG] TxId=328, Useful when we really need it.

[TRACE] TxId=328, Oh man, I need this event so bad.

[DEBUG] TxId=328, Flippin’ important debug message.

[INFO ] TxId=328, This would have been useful to see...

[WARN ] TxId=328, Why am I logging at all?

38

Avoid filtering events.

[WARN ] TxId=328, Real important debug statement.

[WARN ] TxId=328, This would have been useful to see...

[WARN ] TxId=328, Useful when we really need it.

[WARN ] TxId=328, Oh man, I need this event so bad.

[WARN ] TxId=328, Flippin’ important debug message.

[WARN ] TxId=328, Cummon, I *really* wanna see this!

[WARN ] TxId=328, Why am I logging at all?

39

tail -f ./catalina.out

40

Why developer buy-in matters

41

“A fool with a tool is still a fool.” Grady Booch

42

• Laughable deadlines

• Long days, longer nights

• Management pressure

43

If we log excessively…

44

Bob B. Brown - https://www.flickr.com/photos/beleaveme

45

tail -f ./catalina.out

46

Nope, no fires today, folks.

Robert du Bois https://www.flickr.com/photos/lordisgood

47

No value, no money.

Neubie - https://www.flickr.com/photos/neubie/

48

Shelfware.

Robert Couse-Baker https://www.flickr.com/photos/29233640@N07/

49

8 steps to successful implementation

50

Start small (but plan to grow big)

Pewstruck.com - https://www.flickr.com/photos/canoodlepets/

1

51

Start with a

clean slate

2

52

Learn Implement Stabilise Spread the

word Refine

Take a

smart approach

3

53

Dashboards are pretty, alerts are king

Reactive becomes proactive

Register defects (ERROR = defect)

Filter, don’t flood mailboxes

Build alerts

and

set policy

4

54

Get a feel for the pain

Make sure filtering is working

Police false positives

Receive

all alerts

yourself

5

55

Mine their data yourself – Find what’s difficult to show – Build dashboards to showcase their solutions

Broaden their minds – complement traditional BI by using log events

Help

managers

look good

6

56

“Not too hot, not too cold, just right!”

“Meh – too sloooow…”

“Too expensive!”

Apply the Goldilocks Principle 7

57

Monitor licence usage by source or source type

index=_internal source=*metrics.log

group="per_sourcetype_thruput"

| stats sum(kb) as KB by series

| where KB > 20000

8

58

Wrapping up

59

Encourage thoughtful logging

Promote good logging practices

Police bad behaviour

Be intimately involved

Adopt a helpful attitude

Make sure you show value

To be successful:

Thanks for listening!

Paul Gilowey Foundation Technology Specialist

paul.gilowey@santam.co.za

@paulcgt

top related