flume in 10minutes
DESCRIPTION
Slides for the video walkthrough at https://www.youtube.com/watch?v=112opbzgBiwTRANSCRIPT
1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8
Flume NG Basics
2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8
Oracle’s Big Data Approach
• Acquire and organize all data
• Enable greater access to wide data
• Analyze and refine important data
• Decide and publish insights
4 Steps to Greater Value
3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8
How do I get data to my Hadoop Cluster?Using Flume NG to collect distributed data
4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8
My log data is not near my Hadoop cluster
Application Servers
Customer Logs
OracleBig Data Appliance
?
5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8
Moving Data with Flume NG
OracleBig Data Appliance
Application Servers
Flume NGAgent
Flume NGAgent
Flume NGAgent
Logs
Logs
Logs
HDFS Write
HDFS Write
HDFS Write
Flume NGAgent
Flume NGAgent
Flume NGAgent
Avro
Avro
Avro
6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8
Building a Basic Flume Agent
• Flume is flexible– Durable Transactions– In-Flight Data Modification– Compresses Data
• Flume simpler than it used to be– No Zookeeper requirement– No Master-Slave architecture
• 3 basic pieces– Source, Channel, Sink
One configuration file
7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8
Flume Configuration
hdfs-agent.sources= netcat-collect hdfs-agent.sinks = hdfs-write hdfs-agent.channels= memoryChannel
hdfs-agent.sources.netcat-collect.type = netcat hdfs-agent.sources.netcat-collect.bind = 127.0.0.1 hdfs-agent.sources.netcat-collect.port = 11111
hdfs-agent.sinks.hdfs-write.type = hdfs hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:8020/user/oracle/sabre_example hdfs-agent.sinks.hdfs-write.rollInterval = 30 hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Texthdfs-agent.sinks.hdfs-write.hdfs.fileType=DataStream
hdfs-agent.channels.memoryChannel.type = memoryhdfs-agent.channels.memoryChannel.capacity=10000
hdfs-agent.sources.netcat-collect.channels=memoryChannelhdfs-agent.sinks.hdfs-write.channel=memoryChannel
Invoke this with: flume-ng agent –f this_file –n hdfs-agent
8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8
Sending Data to the Agent
• Connect netcat to the host
• Pipe input to it
• Records are transmitted on newline
• head example.xml | nc localhost 11111
9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8
Alternatives to Flume
• Scribe– Thrift-based– Lightweight, but no support– Not designed around Hadoop
• Kafka– Designed to resemble a publish-subscribe system– Explicitly distributed– Apache Incubator Project
And Their Trade-Offs
10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8
11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 8