large scale log analytics with solr (from lucene revolution 2015)

67
OCTOBER 13-16, 2016 AUSTIN, TX

Upload: sematext-group-inc

Post on 06-Jan-2017

8.759 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

OCTOBER 13-16, 2016 • AUSTIN, TX

Page 2: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

Large Scale Log Analytics with SolrRafał Kuć and Radu Gheorghe

Sematext Group

Page 3: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

3

01About Us

RaduRafał

Logsene

Page 4: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

4

02Agenda

Logstash + Solr

rsyslog + Solr

rsyslog + Redis + Logstash + Solr

Solr

Page 5: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

5

01Flow in Logstash

/var/log/apache.log

redis

https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png

input

Page 6: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

6

01Flow in Logstash

/var/log/apache.log

redis

https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png

plain

{json}

input

codec

Page 7: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

7

01Flow in Logstash

/var/log/apache.log

redis

Rafał @kucrafal

grok{

"user": "Rafał","twitter": "@kucrafal"

}

- w $numberOfWorkers

https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png

plain

{json}

input

codec

filter

Page 8: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

8

01Flow in Logstash

/var/log/apache.log

redis

Rafał @kucrafal

grok{

"user": "Rafał","twitter": "@kucrafal"

}

- w $numberOfWorkers

https://cdn2.iconfinder.com/data/icons/gconstruct/2118/gconstruct1-14.png

workers => 2

plain

{json}

input

codec

filter

output

Page 9: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

9

01Simple Config https://github.com/sematext/lucene-revolution-samples/tree/master/2015

input {

file {

path => "/opt/logs/example.log"

start_position => "beginning"

}

}

output {

solr_http {

solr_url => "http://localhost:8983/solr/gettingstarted"

flush_size => 5000

workers => 4

}

}

bin/plugin install logstash-output-solr_http

apache combined logs

Page 10: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

10

01Base Result

Page 11: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

11

01Parse JSONinput {

file {

path => "/opt/logs/example.log.parsed"

start_position => "beginning"

…filter {

json {

source => "message"

}

}

output {

solr_http {

apache combined logs in JSON

bin/logstash -f logstash.conf -w 4 # filterWorkers=4

Page 12: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

12

01JSON Result

Page 13: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

input {

file {

path => "/opt/logs/example.log"

start_position => "beginning"

…filter {

grok {

match => [ "message", "%{COMBINEDAPACHELOG}" ]

}

}

output {

solr_http {

13

01Grok

Page 14: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

14

01Grok Result

Page 15: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

15

01Flow Options

https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Gorilla-server.svg/2000px-Gorilla-server.svg.pnghttps://www.elastic.co/assets/blt69f6410148efbab8/logstash.png

Page 16: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

16

01Flow Options (cont.)

http://www.hanselman.com/blog/content/binary/Windows-Live-Writer/ef572a4c3e50_13F7B/redis_logo_a83f44f3-708d-4fad-aa6e-6eb0d6f82001.pnghttps://upload.wikimedia.org/wikipedia/commons/thumb/f/f8/Question_mark_alternate.svg/2000px-Question_mark_alternate.svg.png

or Kafka or *MQ or...

something light here

rsyslog

rsyslog

rsyslog

Page 17: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

17

01Flow in rsyslog

/var/log/apache.log

syslog socket

input

Page 18: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

18

01Flow in rsyslog

/var/log/apache.log

syslog socketmain queue (RAM+Disk)

inputqueue.typequeue.size...

Page 19: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

19

01Flow in rsyslog

/var/log/apache.log

syslog socketmain queue (RAM+Disk)

inputqueue.typequeue.size...

queue.workerThreads(filter, parse and send events)

Page 20: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

20

01Flow in rsyslog

/var/log/apache.log

syslog socketmain queue (RAM+Disk)

inputqueue.typequeue.size...

queue.workerThreads(filter, parse and send events)

queue.dequeueBatchSize

Page 21: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

21

01Flow in rsyslog

/var/log/apache.log

syslog socketmain queue (RAM+Disk)

inputqueue.typequeue.size...

queue.workerThreads(filter, parse and send events)

queue.dequeueBatchSize

rsyslog_solr.py

rsyslog_solr.py

rsyslog_solr.py

action

template {JSON}

Page 22: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

22

01Flow in rsyslog

/var/log/apache.log

syslog socketmain queue (RAM+Disk)

inputqueue.typequeue.size...

queue.workerThreads(filter, parse and send events)

queue.dequeueBatchSize

rsyslog_solr.py

rsyslog_solr.py

rsyslog_solr.py

action

template {JSON}

Page 23: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

23

01Simple Config (1/2) https://github.com/sematext/lucene-revolution-samples/tree/master/2015

module(load="imfile")

module(load="omprog")

input(type="imfile"

File="/opt/logs/example.log"

Tag="apache:")

main_queue(

queue.highWatermark="100000"

queue.lowWatermark="50000"

queue.maxDiskSpace="5g"

queue.fileName="solr_action"

queue.spoolDirectory="/opt/rsyslog/queues"

queue.saveOnShutdown="on"

queue.workerThreads="4"

queue.dequeueBatchSize="500"

)

apache combined logs

Page 24: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

24

01Simple Config (2/2)template(name="json_lines" type="list" option.json="on") {

constant(value="{")

constant(value="\"timestamp\":\"")

property(name="timereported" dateFormat="rfc3339")

constant(value="\",\"message\":\"")

property(name="msg")

...

constant(value="\",\"syslog-tag\":\"")

property(name="syslogtag")

constant(value="\"}\n")

}

action(

type="omprog"

binary="/opt/rsyslog/rsyslog_solr.py"

template="json_lines"

)

get from https://github.com/rsyslog/rsyslog/tree/master/plugins/external/solr

Page 25: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

25

01Base Result

Page 26: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

26

01Base Result

15% rsyslog,4x1% rsyslog_solr.py

Page 27: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

27

01Base Result

15% rsyslog,4x1% rsyslog_solr.py

125MB rsyslog, 4x15MB rsyslog_solr.pyDepends on queue. Here up to 100K events in RAM

Page 28: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

28

01JSON Config# same main queue settings and modules

input(type="imfile"

File="/opt/logs/example.log.parsed"

Tag="apache:")

module(load="mmnormalize")

action(type="mmnormalize"

rulebase="/opt/rsyslog/json.rb"

)

template(name="json_lines" type="list") {

property(name="$!root") constant(value="\n")

}

action(type="omprog"

...

apache combined logsalready parsed in JSON

version=2

rule=:%root:json%

Page 29: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

29

01JSON Result

Page 30: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

30

01Normalizing Config

input(type="imfile"

File="/opt/logs/example.log"

Tag="apache")

action(type="mmnormalize"

rulebase="/opt/rsyslog/apache_combined.rb"

)

template(name="json_lines" type="list") {

property(name="$!all-json")

constant(value="\n")

}

version=2

rule=:%[

{"type": "word", "name": "clientip"},

{"type": "literal", "text": " "},

...

{"type": "char-to", "name": "agent", "extradata": "\""},

{"type": "literal", "text": "\""},

{"type": "rest", "name": "blob"}

]%

Page 31: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

31

01Normalizing Result

Page 32: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

32

01Normalizing “Should Scale”*

sys

tem log

d -ng

performance depends mostly on log length and not on the number of rules:http://blog.gerhards.net/2013/01/performance-of-liblognormrsyslog-parse.html

Page 33: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

rule=apache_combined:%[

{"type": "word", "name": "clientip"},

...

{"type": "char-to", "name": "agent", "extradata": "\""},

{"type": "literal", "text": "\""},

{"type": "rest", "name": "blob"}

]%

rule=apache_common:%[

{"type": "word", "name": "clientip"},

...

{"type": "number", "name": "bytes"},

{"type": "rest", "name": "blob", "priority": 65535}

]%

...

33

01Normalizing with Five Rulesinput(type="imfile"

File="/opt/logs/example*"

Tag="apache")

action(type="mmnormalize"

rulebase="/opt/rsyslog/multiple_rules.rb"

)

if $!root <> "" then {

set $.final-json = $!root;

} else {

set $.final-json = $!all-json;

}

template(name="json_lines" type="list") {

property(name="$.final-json") constant(value="\n")

}

Page 34: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

34

015 Rules Result

Page 35: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

35

01OK, so this works:

rsyslog

rsyslog

rsyslog

Page 36: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

36

01How about this:

rsyslog

rsyslog

rsyslog

Page 37: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

37

01rsyslog.confmodule(load="imfile")

module(load="omhiredis")

input(type="imfile"

File="/opt/logs/example.log"

Tag="apache:")

template(name="json_lines" type="list" option.json="on") {...}

main_queue(queue.workerthreads="1"

queue.dequeueBatchSize="100"

queue.size="10000")

action(type="omhiredis"

mode="publish"

key="rsyslog_logstash"

template="json_lines")

./configure --enable-omhiredis

small&light queue

Page 38: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

38

01logstash.conf

input {

redis {

data_type => "channel"

key => "rsyslog_logstash"

batch_count => 100

}

}

output {

solr_http {

...

}

}

JSON codec is implied

Page 39: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

39

01Combined Result

rsyslog 1%

Redis 2%

Logstash 200%

rsyslog 10MB (10K queue)

Redis 1000MB (configurable)

Logstash 380MB

Page 40: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

40

015-Rule Normalizing Result

rsyslog 100%

Redis 2%

Logstash 200%

rsyslog 30MB

Redis 1000MB

Logstash 450MB

Page 41: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

41

01Shipper conclusions

rsyslog

rsyslog

rsyslog

rsyslog

rsyslog

rsyslog

easy setup; flexibleheavy

light; fastless flexible&easy

offloads buffers and Logstash processing;flexible and efficientsetup and maintenance overhead

Page 42: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

42

01Solr Tuning Agenda

Schema and config adjustments

Time-based collections

Tiered cluster (e.g. hot vs cold nodes)

Page 43: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

43

01Schema: Two Kinds of Fields

message:failed

"docValues": true"omitNorms": true,

"omitTermFreqAndPositions": true

Page 44: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

44

01Schema: Two Kinds of Fields

message:failed

"docValues": true"omitNorms": true,

"omitTermFreqAndPositions": true

+20 to 100% capacity* 10% faster indexing*

* http://blog.sematext.com/2014/11/17/solr-presentations-lucene-solr-revolution/

Page 45: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

45

01Commits

"updateHandler.autoSoftCommit.maxTime": 5000

"updateHandler.autoCommit.maxTime": 60000<ramBufferSizeMB>200</ramBufferSizeMB>

5s feels near-realtime while searching

Flush to disk every minute or 200MB

Page 46: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

46

01Commits

"updateHandler.autoSoftCommit.maxTime": 5000

"updateHandler.autoCommit.maxTime": 60000<ramBufferSizeMB>200</ramBufferSizeMB>

5s feels near-realtime while searching

Flush to disk every minute of 200MB

+10% capacity; 10% faster indexing*

Page 47: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

47

01Time-Based Collections

indexing, merges,most searches

doesn’t change => cache friendly can be optimized

delete without triggering merges

Page 48: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

48

01Time-Based Collections

indexing, merges,most searches

doesn’t change => cache friendly=> can be optimized

delete without triggering merges

20-30x capacity; less indexing degradation*

* http://www.slideshare.net/sematext/side-by-side-with-elasticsearch-solr-part-2

Page 49: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

49

01Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

Page 50: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

50

01Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

Page 51: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

51

01Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

ADDREPLICA

Page 52: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

52

01Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

Page 53: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

53

01Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

Page 54: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

54

01Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

Page 55: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

55

01Tiered Cluster

hot1

hot2

cold1

cold2

cold3

cold4

quick recent searches and indexing rare lengthy requests

Page 56: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

56

01Tiered Cluster

cold1

cold2

cold3

cold4

quick recent searches and indexing rare lengthy requests

hot1

hot2

buffer for indexing spikes

Page 57: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

57

01Tiered Cluster

cold1

cold2

cold3

cold4

quick recent searches and indexing rare lengthy requests

hot1

hot2

buffer for indexing spikes

less shards per collectionand the cluster is still balanced

Page 58: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

58

01Tiered Cluster

cold1

cold2

cold3

cold4

quick recent searches and indexing rare lengthy requests

hot1

hot2

buffer for indexing spikes

less shards per collectionand the cluster is still balanced

CPU++

RAM++IO++

Page 59: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

59

01Wrap-Up

Page 60: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

60

01Wrap-Up

DocValues

commits

Page 61: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

61

01Wrap-Up

DocValues

commits

https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.pnghttps://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png

Page 62: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

62

01Wrap-Up

DocValues

commits

https://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.pnghttps://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png

Page 63: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

63

01Wrap-Up

DocValues

commits

http://www.funnyshirts.net/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/z/o/zombies-hate-fast-food-funny-tshirt-preview.pnghttps://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.pnghttps://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png

Page 64: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

64

01Wrap-Up

DocValues

commits

http://www.funnyshirts.net/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/z/o/zombies-hate-fast-food-funny-tshirt-preview.pnghttps://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.pnghttps://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png

rsyslog

Page 65: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

65

01Wrap-Up

DocValues

commits

http://www.funnyshirts.net/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/z/o/zombies-hate-fast-food-funny-tshirt-preview.pnghttps://cdn0.iconfinder.com/data/icons/dance-fitness/72/13-512.pnghttps://www.standardlife.co.uk/resources/custom/uk/images/heroes/illustration/easy-box.png

rsyslog

rsyslog

rsyslog

rsyslog

Page 66: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

66

01Questions?

Rafał Kuć@[email protected]

Radu [email protected]@sematext.com

Sematext@sematexthttp://sematext.com

Page 67: Large Scale Log Analytics with Solr (from Lucene Revolution 2015)

67

01Questions?

Rafał Kuć@[email protected]

Radu [email protected]@sematext.com

Sematext@sematexthttp://sematext.com

we’re hiring, too!