solr for indexing and searching logs

60
Using Solr to Search and Analyze Logs Radu Gheorghe @radu0gheorghe @sematext

Upload: sematext-group-inc

Post on 26-Jan-2015

127 views

Category:

Technology


2 download

DESCRIPTION

How to index logs from Logstash, Ryslog, Flume, Fluentd, via Morphlines, etc. into Solr and make them searchable.

TRANSCRIPT

Page 1: Solr for Indexing and Searching Logs

Using Solr to Search and

Analyze Logs

Radu Gheorghe

@radu0gheorghe@sematext

Page 2: Solr for Indexing and Searching Logs

Elasticsearch API

syslogreceiver

Logsene

Kibana

syslogd

Logstash

Page 4: Solr for Indexing and Searching Logs

What about ?

Page 5: Solr for Indexing and Searching Logs

defining and handling logs in general

4 sets of tools to send logs to

Performance tuning and SolrCloud

Page 6: Solr for Indexing and Searching Logs

syslog

Defining and Handling Logs(story time!)

syslog

syslog

syslog

?

Page 7: Solr for Indexing and Searching Logs

Requirements

1) What’s wrong?

http://eddysuaib.com/wp-content/uploads/2012/12/Keyword-icon.png

( for debugging)

Page 8: Solr for Indexing and Searching Logs

Problem

looooots of messages coming in

http://www.sciencesurvivalblog.com/getting-published/unfinished-manuscripts_2346

Page 9: Solr for Indexing and Searching Logs

Solved with no indexing

BUT

Page 10: Solr for Indexing and Searching Logs

Elasticsearch

Page 11: Solr for Indexing and Searching Logs

Requirements

1) What’s wrong? ✓

2) What will go wrong?

(stats)

Page 12: Solr for Indexing and Searching Logs

Parsing Raw Logs

BUT

mickey mouse 10

user item time

still slow format changes

Page 13: Solr for Indexing and Searching Logs

Parsing Raw Logs

BUT

mickey mouse 0 10

add error code

still slow format changes

Page 14: Solr for Indexing and Searching Logs

Facets. Logging in JSON

2013-11-06… mickey mouse

{ "date": "2013-11-06", "message": "mickey mouse"}

Page 15: Solr for Indexing and Searching Logs

Facets. Logging in JSON

2013-11-06… @cee:{"user": "mickey"}

{ "date": "2013-11-06", "user": "mickey"}

2013-11-06… mickey mouse

{ "date": "2013-11-06", "message": "mickey mouse"}

Page 16: Solr for Indexing and Searching Logs

Requirements

1) What’s wrong? ✓

2) What will go wrong? ✓

3) Handle logs like production data ✓

Page 17: Solr for Indexing and Searching Logs

Requirements

1) What’s wrong? ✓

2) What will go wrong? ✓

3) Handle logs like production data ✓

What is a log?

How to handle logs?

Page 18: Solr for Indexing and Searching Logs

4 Ways of Sending Logs to Solr

logger

Logstash

files

Page 19: Solr for Indexing and Searching Logs

Schemaless

% cd solr-4.5.1/example/% mv solr solr.bak

% cp -R example-schemaless/solr/ .

Page 20: Solr for Indexing and Searching Logs

Automatic ID generation

solrconfig.xml

<updateRequestProcessorChain name="add-unknown-fields-to-the-schema"> ……..

<processor class="solr.UUIDUpdateProcessorFactory"> <str name="fieldName">id</str> </processor><processor class="solr.LogUpdateProcessorFactory"/><processor class="solr.RunUpdateProcessorFactory"/>

</updateRequestProcessorChain>

http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/

Page 21: Solr for Indexing and Searching Logs

logger

/dev/log

mmjsonparseomprog + script

Page 22: Solr for Indexing and Searching Logs

/dev/log -> parse -> format -> send to Solr

% logger '@cee: {"hello": "world"}'

rsyslog.conf

module(load="imuxsock") # version 7+

Page 23: Solr for Indexing and Searching Logs

/dev/log -> parse -> format -> send to Solr

...

module(load="mmjsonparse")

action(type="mmjsonparse")

Page 24: Solr for Indexing and Searching Logs

/dev/log -> parse -> format -> send to Solr

...template(name="CEE"

type="list") {

property(name="$!all-json")

constant(value="\n")

}

Page 25: Solr for Indexing and Searching Logs

/dev/log -> parse -> format -> send to Solr

...action(type="mmjsonparse")template(name="CEE"…module(load="omprog")

if $parsesuccess == "OK" then action(type="omprog"

binary="/opt/json-to-solr.py"

template="CEE")

Page 26: Solr for Indexing and Searching Logs

/dev/log -> parse -> format -> send to Solr

import json, pysolr, sys

solr = pysolr.Solr('http://localhost:8983/solr/')

while True:

line = sys.stdin.readline()

doc = json.loads(line)

solr.add([doc])

Page 27: Solr for Indexing and Searching Logs

Avro

MorphlineSolr Sink

Page 28: Solr for Indexing and Searching Logs

Avro -> buffer -> parse -> send to Solr

https://github.com/mpercy/flume-log4j-example

flume.confagent.sources = avroSrc

agent.sources.avroSrc.type = avro

agent.sources.avroSrc.bind = 0.0.0.0

agent.sources.avroSrc.port = 41414

Page 29: Solr for Indexing and Searching Logs

Avro -> buffer -> parse -> send to Solr

flume.conf

agent.channels = solrMemoryChannel

agent.channels.solrMemoryChannel.type = memory

agent.sources.avroSrc.channels = solrMemoryChannel

Page 30: Solr for Indexing and Searching Logs

Avro -> buffer -> parse -> send to Solr

flume.conf

agent.sinks = solrSink

agent.sinks.solrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink

agent.sinks.solrSink.morphlineFile = conf/morphline.conf

agent.sinks.solrSink.channel = solrMemoryChannel

Page 31: Solr for Indexing and Searching Logs

Avro -> buffer -> parse -> send to Solr

morphline.conf... commands : [

{ readLine { charset : UTF-8 }}

{ grok {

dictionaryFiles : [conf/grok-patterns]

expressions : {

message : """%{INT:pid} %{DATA:message}"""

...

https://github.com/cloudera/search/tree/master/samples/solr-nrt/grok-dictionaries

Page 32: Solr for Indexing and Searching Logs

Avro -> buffer -> parse -> send to Solr

morphline.conf

SOLR_LOCATOR : { collection : collection1 #zkHost : "127.0.0.1:2181" solrUrl : "http://localhost:8983/solr/"}... commands : [

...

{ loadSolr {

solrLocator : ${SOLR_LOCATOR}

...

Page 33: Solr for Indexing and Searching Logs

fluent-logger fluent-plugin-solr

Page 34: Solr for Indexing and Searching Logs

fluent-logger -> fluentd -> fluent-plugin-solr

% pip install fluent-logger

from fluent import sender,event

sender.setup('solr.test')

event.Event('forward', {'hello': 'world'})

Page 35: Solr for Indexing and Searching Logs

fluent-logger -> fluentd -> fluent-plugin-solr

<source>

type forward

</source>

<match solr.**>

type solr

host localhost

port 8983

core collection1

</match>

Page 36: Solr for Indexing and Searching Logs

fluent-logger -> fluentd -> fluent-plugin-solr

% gem install fluent-plugin-solr

doc = Solr::Document.new(:hello => record["hello"])

https://github.com/btigit/fluent-plugin-solr

out_solr.rb

Page 37: Solr for Indexing and Searching Logs

file input solr_http output

Logstashfile

grok filter

Page 38: Solr for Indexing and Searching Logs

logstash.conf:

input { file { path => "/tmp/testlog" }}

file input -> grok filter -> solr_http output

% echo '2 world' >> /tmp/testlog

Page 39: Solr for Indexing and Searching Logs

logstash.conf:

filter { grok { match => ["message", "%{NUMBER:pid} %{GREEDYDATA:hello}"] }}

file input -> grok filter -> solr_http output

{"pid": "2", "hello":"world"}

Page 40: Solr for Indexing and Searching Logs

logstash.conf:

output { solr_http { # master or v1.2.3+ solr_url => "http://localhost:8983/solr" }}

file input -> grok filter -> solr_http output

Page 41: Solr for Indexing and Searching Logs

Fast and Cloud

Page 42: Solr for Indexing and Searching Logs

“It Depends”

http://www.bigskytech.com/wp-content/uploads/2011/02/guage.png

load test monitor: SPM

20% off: LR2013SPM20

Page 43: Solr for Indexing and Searching Logs

|>>>>|Single Core: # of docs/update

http://static.memrise.com.s3.amazonaws.com/uploads/blog-pictures/Simpsons_Updates.bmp

Page 44: Solr for Indexing and Searching Logs

|>>>>|Single Core: Commits

http://cache.desktopnexus.com/thumbnails/1306-bigthumbnail.jpghttp://www.musicfestivaljunkies.com/wp-content/uploads/2012/01/HardLogo.png

<autoSoftCommit> <maxTime>...

<autoCommit> <openSearcher>false <maxTime>???

<ramBufferSizeMB>???

Page 45: Solr for Indexing and Searching Logs

|>>>>|Single Core: Size and Merges

http://sweetclipart.com/multisite/sweetclipart/files/scissors_blue_silver.pnghttp://mergewords.com/gfx/logo-big.png

omitNorms="true"omitTermFreqAndPositions="true" <mergeFactor>??

Page 46: Solr for Indexing and Searching Logs

|>>>>|Single Core: Caches

http://vector-magz.com/wp-content/uploads/2013/06/diamond-clip-art4.pnghttp://www.clker.com/cliparts/1/f/6/3/11971228961330048838SaraSara_Ice_cube_2.svg.med.png

http://clipartist.info/RSS/openclipart.org/2011/May/02-Monday/migrating_penguin_penguinmigrating-555px.png

<fieldValueCache ... size="???" autowarmCount="0"

docValues="true"

facets

changing datato sort&facet

Page 47: Solr for Indexing and Searching Logs

SolrCloud: ZooKeeper

bin/zkServer.sh start

OR

java -DzkRun … -jar start.jarhttp://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.png

http://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png

Page 48: Solr for Indexing and Searching Logs

SolrCloud: ZooKeeper

zkcli.sh -cmd upconfig \ -zkhost SERVER:2181 \ -confdir solr/collection1/conf/ \ -confname start

-Dbootstrap_confdir=solr/collection1/conf -Dcollection.configName=start

http://www.clker.com/cliparts/c/a/8/d/1331060720387485902Roaring%20Tiger.svg.hi.pnghttp://fc03.deviantart.net/fs71/f/2012/196/6/a/piggy_back_rides_are_the_best_rides__by_yipped-d57b3sh.png

Page 49: Solr for Indexing and Searching Logs

SolrCloud: Start Nodes

java -DzkHost=SERVER:2181 -jar start.jar

Page 50: Solr for Indexing and Searching Logs

Timed Collections

04Nov

05Nov

06 Nov

07Nov

search latest

search all

index

optimize

Page 51: Solr for Indexing and Searching Logs

Collections API

05Nov

06Nov

07 Nov

08Nov

action=CREATE&name=08Nov&numShards=4

action=DELETE&name=05Nov

Page 52: Solr for Indexing and Searching Logs

Aliases. Optimize

05Nov

06Nov

07 Nov

08Nov

action=CREATEALIAS&name=ALL&collection=06Nov,07Nov,08Nov

action=CREATEALIAS&name=LATEST&collection=08Nov07Nov/update?optimize=true

Page 53: Solr for Indexing and Searching Logs
Page 54: Solr for Indexing and Searching Logs

logs =production

data

Page 55: Solr for Indexing and Searching Logs

logs =production

data

Logstash

Page 56: Solr for Indexing and Searching Logs

logs =production

data

Logstash

docs/updatecommits

mergeFactor

omit*docValues

caches

Page 57: Solr for Indexing and Searching Logs

logs =production

data

Logstash

docs/updatecommits

mergeFactor

omit*docValues

caches

Page 58: Solr for Indexing and Searching Logs

logs =production

data

Logstash

docs/updatecommits

mergeFactor

omit*docValues

caches

time

Collections APIaliases

optimize