d istributed s ystems apache flume muhammad afaq

Post on 04-Jan-2016

227 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DISTRIBUTED SYSTEMSApache FlumeMuhammad Afaq

OVERVIEW What is Flume?

Flume Agent

Flume Components

Conf File

Example Configuration

Example: User Trends Retrieval with Flume using Twitter API

WHAT IS FLUME?

Reliable service for collection and aggregation of large amount of data. Especially streaming data, for example Log data.

Flume is one of the projects which comes into Hadoop framework.

For log analysis based on Hadoop, Flume can be used to get the log information, such as logs from websites or system logs.

FLUME AGENT Flume architecture or flume agent has source

(anything like web server, application server or website etc.)

From source, data moves to channel where our log data will be stored.

From channel, the log data will be moved to sink (storage, for example Hadoop, or local file system etc.)

FLUME COMPONENTS

Source An active component which receives the event

and places it in the channel.

Channel A passive component which buffers the event

and sends it to the sink,

Sink Writes the data into next hop for final

destination.

CONF FILE

Basic Rules

Every agent must have at least one channel.

Every source must have at least one channel.

Every sink must have exactly one channel.

Every component must have a type.

EXAMPLE CONFIGURATION# example.conf: A single-node Flume configuration

# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1

# Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API In this example, we will retrieve users’ trends

as logs from a personal Twitter account using an API. These trends can be further analyzed as desired.

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Download Flume

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Check whether the flume tar is present or not

Create flume-ng directory

Copy the flume tar to flume-ng directory

Check whether flume tar is copied or not

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Change directory to flume-ng

Extract file from flume tar

Check whether flume files are extracted or not

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Move flume-sources-1.0-SNAPSHOT.jar file to

‘lib’ directory of apache-flume and check its presence there

Create flume.env.sh file in the ‘conf’ directory of apache flume

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Open flume-env.sh

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Edit flume-env.sh according to the below

snapshot

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Open a Browser and go the below URL:URL: https://apps.twitter.com

Log in to Twitter

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Create a new application

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Twitter Apps

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) The highlighted part will be used in

flume.conf

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Edit flume.conf

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Change the directory to the ‘bin’ folder of

apache flume

Start fetching the data from Twitter

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Data being fetched from Twitter

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Browse the filesystem

Click on user

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Click on flume

Click on tweets

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Click on FlumeData file

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

This is the data that has been downloaded from Twitter

top related