flume in 10minutes

Post on 31-Oct-2014

21 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides for the video walkthrough at https://www.youtube.com/watch?v=112opbzgBiw

TRANSCRIPT

1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Flume NG Basics

2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Oracle’s Big Data Approach

• Acquire and organize all data

• Enable greater access to wide data

• Analyze and refine important data

• Decide and publish insights

4 Steps to Greater Value

3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

How do I get data to my Hadoop Cluster?Using Flume NG to collect distributed data

4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

My log data is not near my Hadoop cluster

Application Servers

Customer Logs

OracleBig Data Appliance

?

5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Moving Data with Flume NG

OracleBig Data Appliance

Application Servers

Flume NGAgent

Flume NGAgent

Flume NGAgent

Logs

Logs

Logs

HDFS Write

HDFS Write

HDFS Write

Flume NGAgent

Flume NGAgent

Flume NGAgent

Avro

Avro

Avro

6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Building a Basic Flume Agent

• Flume is flexible– Durable Transactions– In-Flight Data Modification– Compresses Data

• Flume simpler than it used to be– No Zookeeper requirement– No Master-Slave architecture

• 3 basic pieces– Source, Channel, Sink

One configuration file

7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Flume Configuration

hdfs-agent.sources= netcat-collect hdfs-agent.sinks = hdfs-write hdfs-agent.channels= memoryChannel

hdfs-agent.sources.netcat-collect.type = netcat hdfs-agent.sources.netcat-collect.bind = 127.0.0.1 hdfs-agent.sources.netcat-collect.port = 11111

hdfs-agent.sinks.hdfs-write.type = hdfs hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:8020/user/oracle/sabre_example hdfs-agent.sinks.hdfs-write.rollInterval = 30 hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Texthdfs-agent.sinks.hdfs-write.hdfs.fileType=DataStream

hdfs-agent.channels.memoryChannel.type = memoryhdfs-agent.channels.memoryChannel.capacity=10000

hdfs-agent.sources.netcat-collect.channels=memoryChannelhdfs-agent.sinks.hdfs-write.channel=memoryChannel

Invoke this with: flume-ng agent –f this_file –n hdfs-agent

8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Sending Data to the Agent

• Connect netcat to the host

• Pipe input to it

• Records are transmitted on newline

• head example.xml | nc localhost 11111

9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Alternatives to Flume

• Scribe– Thrift-based– Lightweight, but no support– Not designed around Hadoop

• Kafka– Designed to resemble a publish-subscribe system– Explicitly distributed– Apache Incubator Project

And Their Trade-Offs

10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

top related