flume in 10minutes

11
Flume NG Basics

Upload: dwmclary

Post on 31-Oct-2014

20 views

Category:

Technology


0 download

DESCRIPTION

Slides for the video walkthrough at https://www.youtube.com/watch?v=112opbzgBiw

TRANSCRIPT

Page 1: Flume in 10minutes

1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Flume NG Basics

Page 2: Flume in 10minutes

2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Oracle’s Big Data Approach

• Acquire and organize all data

• Enable greater access to wide data

• Analyze and refine important data

• Decide and publish insights

4 Steps to Greater Value

Page 3: Flume in 10minutes

3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

How do I get data to my Hadoop Cluster?Using Flume NG to collect distributed data

Page 4: Flume in 10minutes

4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

My log data is not near my Hadoop cluster

Application Servers

Customer Logs

OracleBig Data Appliance

?

Page 5: Flume in 10minutes

5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Moving Data with Flume NG

OracleBig Data Appliance

Application Servers

Flume NGAgent

Flume NGAgent

Flume NGAgent

Logs

Logs

Logs

HDFS Write

HDFS Write

HDFS Write

Flume NGAgent

Flume NGAgent

Flume NGAgent

Avro

Avro

Avro

Page 6: Flume in 10minutes

6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Building a Basic Flume Agent

• Flume is flexible– Durable Transactions– In-Flight Data Modification– Compresses Data

• Flume simpler than it used to be– No Zookeeper requirement– No Master-Slave architecture

• 3 basic pieces– Source, Channel, Sink

One configuration file

Page 7: Flume in 10minutes

7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Flume Configuration

hdfs-agent.sources= netcat-collect hdfs-agent.sinks = hdfs-write hdfs-agent.channels= memoryChannel

hdfs-agent.sources.netcat-collect.type = netcat hdfs-agent.sources.netcat-collect.bind = 127.0.0.1 hdfs-agent.sources.netcat-collect.port = 11111

hdfs-agent.sinks.hdfs-write.type = hdfs hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:8020/user/oracle/sabre_example hdfs-agent.sinks.hdfs-write.rollInterval = 30 hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Texthdfs-agent.sinks.hdfs-write.hdfs.fileType=DataStream

hdfs-agent.channels.memoryChannel.type = memoryhdfs-agent.channels.memoryChannel.capacity=10000

hdfs-agent.sources.netcat-collect.channels=memoryChannelhdfs-agent.sinks.hdfs-write.channel=memoryChannel

Invoke this with: flume-ng agent –f this_file –n hdfs-agent

Page 8: Flume in 10minutes

8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Sending Data to the Agent

• Connect netcat to the host

• Pipe input to it

• Records are transmitted on newline

• head example.xml | nc localhost 11111

Page 9: Flume in 10minutes

9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Alternatives to Flume

• Scribe– Thrift-based– Lightweight, but no support– Not designed around Hadoop

• Kafka– Designed to resemble a publish-subscribe system– Explicitly distributed– Apache Incubator Project

And Their Trade-Offs

Page 10: Flume in 10minutes

10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8

Page 11: Flume in 10minutes

11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.

Insert Information Protection Policy Classification from Slide 8