building resilient log aggregation pipeline with elasticsearch & kafka

Post on 16-Apr-2017

277 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BuildingResilientLogAggregationPipeline

UsingElasticsearch andKafka

Rafał Kuć@Sematext Group,Inc.

Sematext &I

LogseneSPM

logs

metrics

Next30minutes…

Logshipping- buffers- protocols- parsing

Centralbuffering- Kafka- Redis

Storage&Analysis- Elasticsearch- Kibana- Grafana

Logshippingarchitecture

File Shipper

File Shipper

File Shipper

CentralizedBuffer

ES ES ES

ES ES ES

ES ES ES

data

Focus:Elasticsearch

File Shipper

File Shipper

File Shipper

CentralizedBuffer

ES ES ES

ES ES ES

ES ES ES

data

Elasticsearchclusterarchitecture

client

client

client

data

data

data

data

data

data

master

master

master

ingest

ingest

ingest

Dedicatedmastersplease

client

client

client

data

data

data

data

data

data

master

master

master

discovery.zen.minimum_master_nodes ->N/2+1mastereligiblenodes

ingest

ingest

ingest

Onebigindexisano-go

Notscalableenoughfortimebaseddata

Onebigindexisano-go

Indexingslowsdownwithtime

Onebigindexisano-go

Expensivemerges

Onebigindexisano-go

Delete byquery neededfordataretention

Onebigindexisano-go

Notscalableenoughfortimebaseddata

Indexingslowsdownwithtime

Expensivemerges

Delete byquery neededfordataretention

Dailyindicesareagoodstart

2016.11.18 2016.11.19 2016.11.22 2016.11.23...

Indexing isfaster forsmallerindices

Deletes arecheap

Search canbeperformedonindicesthatareneeded

Static indicesarecachefriendly

indexing

mostsearches

Dailyindicesareagoodstart

2016.11.18 2016.11.19 2016.11.22 2016.11.23...

Indexing isfaster forsmallerindices

Deletes arecheap

Search canbeperformedonindicesthatareneeded

Static indicesarecachefriendly

indexing

mostsearches

Wedelete wholeindices

Dailyindicesaresub-optimal

black

friday

saturdaysunday

loadisnoteven

Sizebasedindicesareoptimal

sizelimitforindices

logs_01

indexing

around5– 10GBpershardonAWS

Sizebasedindicesareoptimal

sizelimitforindices

logs_01

indexing

around5– 10GBpershardonAWS

Sizebasedindicesareoptimal

sizelimitforindices

logs_01

indexing

logs_02

around5– 10GBpershardonAWS

Sizebasedindicesareoptimal

sizelimitforindices

logs_01

indexing

logs_02

around5– 10GBpershardonAWS

Sizebasedindicesareoptimal

sizelimitforindices

logs_01 logs_02

indexing

logs_N...

around5– 10GBpershardonAWS

Sliceusingsize

Predictable searchingandindexingperformance

Better indicesbalancing

Fewershards

Easier handling ofspikyloads

Lesscostsbecauseofbetter hardwareutilization

ProperElasticsearchconfiguration

Keepindex.refresh_interval atmaximumpossiblevalue1sec->100%,5sec->125%,30sec-> 175%

Youcanloosen upmerges- possiblebecauseofheavyaggregationuse- segments_per_tier ->higher-max_merge_at_once->higher-max_merged_segment ->lower

Allprefixedwithindex.merge.policy

} higherindexingthroughput

ProperElasticsearchconfiguration

Index onlyneededfields

Usedocvalues

Donotindex_source

Donotstore_all

Optimizationtime

Wecanoptimize datanodesfortimebaseddata

client

client

client

data

data

data

data

data

data

master

master

master

ingest

ingest

ingest

Hot– coldarchitecture

EShot EScold EScold

-Dnode.attr.tag=hot -Dnode.attr.tag=cold -Dnode.attr.tag=cold

Hot– coldarchitecture

logs_2016.11.22

EShot EScold EScold

-Dnode.attr.tag=hot -Dnode.attr.tag=cold -Dnode.attr.tag=cold

curl-XPUTlocalhost:9200/logs_2016.11.22 -d'{"settings":{"index.routing.allocation.exclude.tag":"cold","index.routing.allocation.include.tag":"hot"}}'

Hot– coldarchitecture

logs_2016.11.22

EShot EScold EScold

indexing

Hot– coldarchitecture

logs_2016.11.22logs_2016.11.23

EShot EScold EScold

indexing

Hot– coldarchitecture

logs_2016.11.22logs_2016.11.23

EShot EScold EScold

indexing

moveindexafterdayends

curl-XPUTlocalhost:9200/logs_2016.11.22/_settings-d'{"index.routing.allocation.exclude.tag":"hot","index.routing.allocation.include.tag”:"cold"

}'

Hot– coldarchitecture

logs_2016.11.23 logs_2016.11.22

EShot EScold EScold

indexing

Hot– coldarchitecture

logs_2016.11.23logs_2016.11.24 logs_2016.11.22

EShot EScold EScold

indexing

Hot– coldarchitecture

logs_2016.11.23logs_2016.11.24 logs_2016.11.22

EShot EScold EScold

indexing

moveindexafterdayends

Hot– coldarchitecture

logs_2016.11.24 logs_2016.11.22 logs_2016.11.23

EShot EScold EScold

indexing

Hot– coldarchitecture

HotESTier

GoodCPULotsofI/O

ColdESTier

MemoryboundDecentI/O

EScold

ColdESTier

MemoryboundDecentI/O

Hot– coldarchitecturesummary

EScold

Optimizecosts – differenthardwarefordifferenttier

Performance – usecaseoptimizedhardware

Isolation – longrunningsearchesdon’taffectindexing

Elasticsearchclient nodeneeds

client

client

client

data

data

data

data

data

data

master

master

master

ingest

ingest

ingest

Elasticsearchclient nodeneeds

Nodata=noIOPS

Largequerythroughput=highCPUusage

Lotsofresults=highmemory usage

Lotsofconcurrentqueries=higherresources utilization

Elasticsearchingest nodeneeds

client

client

client

data

data

data

data

data

data

master

master

master

ingest

ingest

ingest

Elasticsearchingestnodeneeds

Nodata=noIOPS

Largeindexthroughput=highCPU&memoryusage

Complicatedrules=highCPUusage

Largerdocuments=moreresources utilization

Elasticsearchmaster nodeneeds

client

client

client

data

data

data

data

data

data

master

master

master

ingest

ingest

ingest

Elasticsearchingestnodeneeds

Nodata=noIOPS

Largenumberofindices=highCPU&memoryusage

Complicatedmappings=highmemoryusage

Dailyindices=spikesinresources utilization

Focus:CentralizedBuffer

File Shipper

File Shipper

File Shipper

CentralizedBuffer

ES ES ES

ES ES ES

ES ES ES

data

WhyApacheKafka?

Fast &easytouse

Easytoscale

Faulttolerantandhighlyavailable

Supportsstreaming

Worksinpublish/subscribemode

Kafkaarchitecture

ZooKeeper

ZooKeeper

ZooKeeper

Kafka

Kafka

KafkaKafka

Kafka&topics

security_logs access_logs

app1_logs app2_logs

Kafkastoresdatain topics

writtenondisk

Kafka&topics&partitions&replicas

logspartition2

logspartition1

logspartition3

logspartition4

logsreplicapartition2

logsreplicapartition1

logsreplicapartition3

logsreplicapartition4

ScalingKafka

logspartition1

ScalingKafka

logspartition1

logspartition2

logspartition3

logspartition4

ScalingKafka

logspartition1

logspartition2

logspartition3

logspartition4

logspartition5

logspartition6

logspartition7

logspartition8

logspartition9

logspartition10

logspartition11

logspartition12

logspartition13

logspartition14

logspartition15

logspartition16

ThingstorememberwhenusingKafka

Scales byaddingmorepartitions notthreads

ThemoreIOPS thebetter

Keepthe#ofconsumersequalto#ofpartitions

Replicas usedforHA andFT only

Offsets storedperconsumer– multipledestinationseasilypossible

Focus:Shipper

File Shipper

File Shipper

File Shipper

CentralizedBuffer

ES ES ES

ES ES ES

ES ES ES

data

Whatabouttheshipper?

logs

CentralizedBuffer

Whichshippertouse?

Whichprotocol shouldbeused

Whataboutthebuffering

LogtoJSON orparse andhow

Buffers

performance & availability

batches&threads whencentralbufferisgone

Buffertypes

Disk ||memory ||combinedhybrid approachOnsource||centralized

App

Buffer

App

Buffer

fileorlocallogshipper

easyscaling– fewermovingpartsoftenwiththeuseoflightweightshipper

App

App

Kafka /Redis /Logstash /etc…

oneplaceforallchangesextrafeaturesmadeeasy(likeTTL)

ES

ES

BuffersSummary

Simple Reliable

App

Buffer

App

Buffer

ES

App

App

ES

Protocols

UDP– fast,coolfortheapplication,notreliableTCP – reliable(almost) applicationgetsACK whenwritten tobuffer

Application levelACKsmaybeneeded

HTTP

RELP

Beats

Kafka

Logstash,rsyslog,Fluentd

Logstash,rsyslog

Logstash,Filebeat

Logstash,rsyslog,Filebeat,Fluentd

Choosingtheshipper

application

rsyslog Elasticsearchhttp

socket

memory&diskassistedqueues

Choosingtheshipper

application

rsyslog Elasticsearchhttp

socket

memory&diskassistedqueues

application

filersyslogfilebeat

consumer

WhataboutOS?

SayNO toswapSettherightdiskscheduler

CFQ forspinningdisksdeadline forSSD

Usepropermount optionsforext4noatimenodirtimedata=writeback,nobarier

ForbaremetalcheckCPUgovernordisabletransparenthugepages

/proc/sys/vm/nr_hugepages=0

Weareengineers!

Wedevelop DevOpstools!

WeareDevOps people!

Wedofunstuff;)http://sematext.com/jobs

Thankyouforlistening!Getintouch!

Rafałrafal.kuc@sematext.com@kucrafal

http://sematext.com@sematext http://sematext.com/jobs

Cometalktousatthebooth

top related