fluentd introduction at ipros
DESCRIPTION
Fluentd presentatino slide at #iprostm http://atnd.org/events/44556TRANSCRIPT
Fluentd Introduction
Masahiro NakagawaTreasuare Data, Inc.
Senior Software Engineer
at iPROS
Thursday, October 31, 13
Who are you?
> Masahiro Nakagawa> @repeatedly / [email protected]
> Treasure Data, Inc> Senior Software Engineer, since 2012/11
> Open Source Projects> D programming Language> MessagePack, Fluentd, etc...
●
●
●
Thursday, October 31, 13
Structured logging
Reliable forwarding
Pluggable architecturehttp://fluentd.org/
Thursday, October 31, 13
Agenda
> Background
> Overview
> Product Comparison
> Use cases
Thursday, October 31, 13
Background
Thursday, October 31, 13
Data Processing
Collect Store Process Visualize
Data source
Reporting Monitoring
Thursday, October 31, 13
Related Products
Store Process
ClouderaHorton WorksTreasure Data
Collect Visualize
TableauExcel
R
easier & shorter time
???
Thursday, October 31, 13
Thursday, October 31, 13
Before Fluentd
Application
・・・
Server2
Application
・・・
Server3
Application
・・・
Server1
FluentLog ServerHigh Latency!must wait for a day...
Thursday, October 31, 13
After Fluentd
Application
・・・
Server2
Application
・・・
Server3
Application
・・・
Server1
In streaming!
Fluentd Fluentd Fluentd
Fluentd Fluentd
Thursday, October 31, 13
Overview
Thursday, October 31, 13
> Open sourced log collector written in Ruby
> Using rubygems ecosystem for plugins
In short
It’s like syslogd, butuses JSON for log messages
Thursday, October 31, 13
tail
insert
eventbuffering
127.0.0.1 - - [30/Oct/2013:07:26:27] "GET / ...127.0.0.1 - - [30/Oct/2013:07:26:30] "GET / ...127.0.0.1 - - [30/Oct/2013:07:26:32] "GET / ...127.0.0.1 - - [30/Oct/2013:07:26:40] "GET / ...127.0.0.1 - - [30/Oct/2013:07:27:01] "GET / ...
...
Fluentd
Web Server
2013-10-30 01:33:51apache.log
{ "host": "127.0.0.1", "method": "GET", ...}
Example (apache to mongo)
Thursday, October 31, 13
> default second unit
> from data source oradding parsed time
Event structure(log message)
✓ Time
> for message routing
✓ Tag
> JSON format
> MessagePackinternally
> non-unstructured
✓ Record
Thursday, October 31, 13
Pluggable Architecture
Buffer Output
Input
> Forward> HTTP> File tail> dstat> ...
> Forward> File> MongoDB> ...
> File> Memory
Engine
Output
> rewrite> ...
Pluggable Pluggable
Thursday, October 31, 13
Fluentd
# RubyFluent.open(“myapp”)Fluent.event(“login”, {“user” => 38})#=> 2013-10-30 18:56:01 myapp.login {“user”:38}
> Ruby> Java> Perl> PHP> Python> D> Scala> ...
Application
Time:Tag:Record
Client libraries
Thursday, October 31, 13
Configuration and operation
> No central / master node> HTTP include helps conf sharing
> Operation depends on your environment> Use your deamon management> Use Chef in Treasure Data
> Apache like syntax and Ruby DSL
●
●
●
Thursday, October 31, 13
# receive events via HTTP<source> type http port 8888</source>
# read logs from a file<source> type tail path /var/log/httpd.log format apache tag apache.access</source>
# save access logs to MongoDB<match apache.access> type mongo database apache collection log</match>
# save alerts to a file<match alert.**> type file path /var/log/fluent/alerts</match>
# forward other logs to servers<match **> type forward <server> host 192.168.0.11 weight 20 </server> <server> host 192.168.0.12 weight 60 </server></match>
include http://example.com/conf
Thursday, October 31, 13
Reliability (core + plugin)
> Buffering> Use file buffer for persistent data> buffer chunk has ID for idempotent
> Retrying
> Error handling> transaction, failover, etc on forward plugin> secondary
●
●
●
Thursday, October 31, 13
Plugins - use rubygems
$ fluent-gem search -rd fluent-plugin
$ fluent-gem search -rd fluent-mixin
$ fluent-gem install fluent-plugin-mongo
Thursday, October 31, 13
http://fluentd.org/plugin/Thursday, October 31, 13
in_tail
✓ read a log file✓ custom regexp✓ custom parser in Ruby
FluentdApache
access.log
> apache> apache2> syslog> nginx
> json> csv> tsv> ltsv
Supported format:
Thursday, October 31, 13
Fluentd
out_mongo
Apache
bufferaccess.log
✓ retry automatically✓ exponential retry wait✓ persistent on a file
Thursday, October 31, 13
Fluentd
out_webhdfs
buffer
✓ retry automatically✓ exponential retry wait✓ persistent on a file
✓ slice files based on time2013-01-01/01/access.log.gz2013-01-01/02/access.log.gz2013-01-01/03/access.log.gz...
HDFS
✓ custom text formatter
Apache
access.log
Thursday, October 31, 13
out_copy + other plugins
✓ routing based on tags✓ copy to multiple storages
Amazon S3
HadoopFluentd
buffer
Apache
access.log
Thursday, October 31, 13
out_forward
apache
✓ automatic fail-over✓ load balancing
FluentdApache
bufferaccess.log
✓ retry automatically✓ exponential retry wait✓ persistent on a file
Fluentd
Fluentd
Fluentd
Thursday, October 31, 13
Forward topology
send/ack
Fluentd
Fluentd
Fluentd
Fluentd
Fluentd
Fluentd
Fluentd
send/ack
Thursday, October 31, 13
Other plugins
> Filter, Aggregator, Converter> rewrite-tag-filter, sampling-filter, ...> *-counter, *-monitor, ...> record-modifier, flatten, map, typecast, ...
> See @tagomoris’s slide> http://www.slideshare.net/tagomoris/fluentd-
meetupfukuoka201303
●
●
Thursday, October 31, 13
Nagios
MongoDB
Hadoop
Alerting
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Databasesfilter / buffer / routing
Thursday, October 31, 13
Other status
> Localizing docs into Japanese > https://github.com/fluent/fluentd-docs/tree/
master/docs/ja
> Windows support> Started by JBAT
https://github.com/fluent/fluentd/tree/windows> Feedback and patch are welcome!
●
●
Thursday, October 31, 13
v11
> Spec is not fixed yet
> Breaking source code compatibility
> Several improvments> routing label, filter, error stream, etc.> serverengine based: multi-process, signal, etc.
> http://magazine.rubyist.net/?0044-FluentdV11NewFeatures
●
●
●
●
Thursday, October 31, 13
td-agent
> Open sourced distribution package of Fluentd> ETL part of Treasure Data> deb, rpm, homebrew
> Including useful components> ruby, jemalloc, fluentd> 3rd party gems: td, mongo, webhdfs, etc...
> http://packages.treasure-data.com/
●
●
●
Thursday, October 31, 13
Product Comparison
Thursday, October 31, 13
Flume Flume: distributed log collector by Cloudera
Flume
Hadoop HDFS
Flume Flume
Flume Master Phisical Topology
Logical Topology
Thursday, October 31, 13
Network topology
Agent
Agent
Agent
Agent
Collector
CollectorCollector
Master
Agent
Agent
Agent
Agent
Collector
CollectorCollector
ack
send
send/ack
Flume OG
Flume NG
Master
MasterMaster Option
Thursday, October 31, 13
Pros and Cons
> Pros> Using central master to manage all nodes
> Cons> Java culture (Pros for Java-er?)
Difficult configuration and setup> Difficult topology> Mainly for Hadoop
less plugins?
●
●
Thursday, October 31, 13
Pros and Cons
> Pros> Bundled 140 plugins (input/filter/codec/output)> Built-in ElasticSearch and Kibana> Works on Windows but unstable...
> Cons> mainly for JRuby> Need external daemon for centralized env
Redis, RabbitMQ or etc
●
●
Thursday, October 31, 13
Use cases
Thursday, October 31, 13
Treasure Data
FrontendJob Queue
WorkerHadoop
Hadoop
Fluentd
Applications push metrics to Fluentd(via local Fluentd)
Librato Metricsfor realtime analysis
Treasure Data
for historical analysis
Fluentd sums up data minutes(partial aggregation)
Thursday, October 31, 13
hundreds of app servers
sends event logs
sends event logs
sends event logs
Rails app td-agent
td-agent
td-agent
GoogleSpreadsheet
Treasure Data
MySQL
Logs are available
after several mins.
Daily/Hourly
Batch
KPI
visualizationFeedback rankings
Rails app
Rails app
✓ Unlimited scalability✓ Flexible schema✓ Realtime✓ Less performance impact
Cookpad
✓ Over 100 RoR servers (2012/2/4)
Thursday, October 31, 13
http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013
LINE
by @tagomoris
✓ 16 nodes✓ 120,000+ lines/sec✓ 400Mbps at peak✓ 1.5+ TB/day (raw)
Web Servers Fluentd
Cluster
ArchiveStorage(scribed)
FluentdWatchers
GraphTools
Notifications(IRC)
Hadoop ClusterCDH4
(HDFS, YARN)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
Thursday, October 31, 13
Other use-cases> Scaleout by @choplin> データサイエンティスト養成読本> http://gihyo.jp/book/2013/978-4-7741-5896-9
> Smartnews> http://developer.smartnews.be/blog/tag/
fluentd/> ニンテンドー3DS すれちがい通信> http://www.nintendo.co.jp/3ds/interview/
streetpass_relay/vol1/index4.html
●
●
●
Thursday, October 31, 13
Other companies
Thursday, October 31, 13
> Fluentd is now a widely-used project
> There are many use cases
> Many contributors and plugins
> Keep it simple
> Easy to use and integrate your environment
Conclusion
●
●
Thursday, October 31, 13