Download - the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file type tail path /var/log/httpd.log format apache2 tag web.access

Muga NishizawaTreasure Data, Inc.

the missing log collector

Muga Nishizawa (@muga_nishizawa)Chief Software Architect, Treasure Data

3

Treasure Data Overview Founded to deliver big data analytics in days not months without

specialist IT resources for one-tenth the cost of other alternatives Service based subscription business model World class open source team

• Founded world’s largest Hadoop User Group• Developed Fluentd and MessagePack• Contributed to Memcached, Hibernate, etc.

Treasure Data is in production• 60+ customers incl. Fortune 500 companies• 400+ billion records stored

Processing 40,000 messages per second

=Fluentd

syslogd+

many

=Fluentd

syslogd+

many

✓ Plugins

✓ JSON

> Open sourced log collector written in Ruby

> Using rubygems ecosystem for plugins

In short

It’s like syslogd, butuses JSON for log messages

Make log collection easyusing Fluentd

Reporting & Monitoring

Reporting & Monitoring

Collect Store Process Visualize


easier & shorter time

Hadoop / Hive

MongoDBTreasure Data

Tableau

Excel

RReporting & Monitoring


easier & shorter timeHow to shorten here?

Hadoop / Hive

MongoDBTreasure Data

Tableau

Excel

R

Before Fluentd

Application

･･･

Server2

Application

･･･

Server3

Application

･･･

Server1

FluentLog ServerHigh Latency!must wait for a day...

After Fluentd

Application

･･･

Server2

Application

･･･

Server3

Application

･･･

Server1

In streaming!

Fluentd Fluentd Fluentd

Fluentd Fluentd

Many Users

Many Meetups

Growth by Community

Why did we develop Fluentd?

Apache

App

App

Other data sources

td-agent RDBMS

Treasure Data columnar data

warehouse

Query Processing Cluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

Treasure Data Service Architecture

Apache

App

App

Other data sources

td-agent RDBMS

Treasure Data columnar data

warehouse

Query Processing Cluster

Query API

HIVE, PIG (to be supported)

JDBC, REST

MAPREDUCE JOBS

User

td-command

BI apps

Treasure Data Service ArchitectureOpen Sourced

writes logs to text files

Rails app

GoogleSpreadsheet

MySQL

MySQL

MySQL

MySQL


Nightly

INSERT

hundreds of app servers

Daily/Hourly

Batch

KPI

visualizationFeedback rankings

Rails app


Rails app

- Limited scalability- Fixed schema- Not realtime- Unexpected INSERT latency

Example Use Case – MySQL to TD


sends event logs

sends event logs

sends event logs

Rails app td-agent

td-agent

td-agent

GoogleSpreadsheet

Treasure Data

MySQL

Logs are available

after several mins.

Daily/Hourly

Batch

KPI


Rails app

Rails app

✓ Unlimited scalability✓ Flexible schema✓ Realtime✓ Less performance impact

Example Use Case – MySQL to TD

td-agent

> Open sourced distribution package of fluentd

> ETL part of Treasure Data

> Including useful components> ruby, jemalloc, fluentd> 3rd party gems: td, mongo, webhdfs, etc...

td plugin is for TD

> http://packages.treasure-data.com/

How Fluentd works?

=Fluentd

syslogd+

many

✓ Plugins

✓ JSON

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesfilter / buffer / routing

Input Plugins Output Plugins

Buffer Plugins(Filter Plugins)

Nagios

MongoDB

Hadoop

Alerting

Amazon S3

Analysis

Archiving

MySQL

Apache

Frontend

Access logs

syslogd

App logs

System logs

Backend

Databasesfilter / buffer / routing

Architecture

Buffer OutputInput

> Forward> HTTP> File tail> dstat> ...

> Forward> File> Amazon S3> MongoDB> ...

> Memory> File

Pluggable Pluggable Pluggable

Architecture

Buffer OutputInput

> Forward> HTTP> File tail> dstat> ...

> Forward> File> Amazon S3> MongoDB> ...

> Memory> File

Pluggable Pluggable Pluggable

117 plugins!Contributions by Community

Input Plugins Output Plugins

2012-02-04 01:33:51myapp.buylog { “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing”}

timetag

record

JSON

log

> second unit

> from data source oradding parsed time

Event structure(log message)

✓ Time

> for message routing

✓ Tag

> JSON format

> MessagePackinternally

> non-unstructured

✓ Record

in_tail: reads file and parses lines

fluentdapache

access.log

✓ read a log file✓ custom regexp✓ custom parser in Ruby

in_tail

out_mongo: writes buffered chunks

fluentdapache

access.log buffer

in_tail

failure handling & retrying

fluentdapache

access.log buffer

✓ retry automatically✓ exponential retry wait✓ persistent on a file

in_tail

out_s3

fluentdapache

access.log buffer


Amazon S3

✓ slice files based on time

in_tail

2013-01-01/01/access.log.gz2013-01-01/02/access.log.gz2013-01-01/03/access.log.gz...

out_hdfs

fluentdapache

access.log buffer



in_tail


HDFS

✓ custom text formater

routing / copying

fluentdapache

access.log buffer

✓ routing based on tags✓ copy to multiple storages

in_tail

Amazon S3

Hadoop

Fluentd

# RubyFluent.open(“myapp”)Fluent.event(“login”, {“user” => 38})#=> 2012-12-11 07:56:01 myapp.login {“user”:38}

> Ruby> Java> Perl> PHP> Python> D> Scala> ...

Application

Time:Tag:Record

Client libraries

# logs from a file<source> type tail path /var/log/httpd.log format apache2 tag web.access</source>

# logs from client libraries<source> type forward port 24224</source>

# store logs to MongoDB and S3<match **> type copy

<match> type mongo host mongo.example.com capped capped_size 200m </match>

<match> type s3 path archive/ </match></match>

Fluentd

out_forward

fluentdapache

access.log buffer



in_tail


fluentd

fluentd

fluentd

✓ automatic fail-over✓ load balancing

forwarding

fluentd

fluentd

fluentd

fluentd

fluentd

fluentdfluentd

send / ackFluentd

Fluentd - plugin distribution platform

$ fluent-gem search -rd fluent-plugin

$ fluent-gem install fluent-plugin-mongo

Use cases


sends event logs

sends event logs

sends event logs

Rails app td-agent

td-agent

td-agent

GoogleSpreadsheet

Treasure Data

MySQL

Logs are available

after several mins.

Daily/Hourly

Batch

KPI


Rails app

Rails app

✓ Unlimited scalability✓ Flexible schema✓ Realtime✓ Less performance impact

Cookpad

✓ Over 100 RoR servers (2012/2/4)

http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013

NHN Japan

by @tagomoris

✓ 16 nodes✓ 120,000+ lines/sec✓ 400Mbps at peak✓ 1.5+ TB/day (raw)

Web Servers Fluentd

Cluster

ArchiveStorage(scribed)

FluentdWatchers

GraphTools

Notifications(IRC)

Hadoop ClusterCDH4

(HDFS, YARN)

webhdfs

HuahinManager

hiveserver

STREAM

Shib ShibUI

BATCH SCHEDULEDBATCH

Treasure Data

FrontendJob Queue

WorkerHadoop

Hadoop

Fluentd

Applications push metrics to Fluentd(via local Fluentd)

Librato Metricsfor realtime analysis

Treasure Data

for historical analysis

Fluentd sums up data minutes(partial aggregation)

Key to Fluentd’s growth is...

=

Fluentd

syslogd+

many+

Community

✓ Plugins

✓ JSON

Muga NishizawaTreasure Data, Inc.

the missing log collector

Download - the missing log collector - QCon TokyoTime:Tag :Record Client libraries # logs from a file type tail path /var/log/httpd.log format apache2 tag web.access

Top Related