d istributed s ystems apache flume muhammad afaq

25
DISTRIBUTED SYSTEMS Apache Flume Muhammad Afaq

Upload: lorin-york

Post on 04-Jan-2016

227 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

DISTRIBUTED SYSTEMSApache FlumeMuhammad Afaq

Page 2: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

OVERVIEW What is Flume?

Flume Agent

Flume Components

Conf File

Example Configuration

Example: User Trends Retrieval with Flume using Twitter API

Page 3: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

WHAT IS FLUME?

Reliable service for collection and aggregation of large amount of data. Especially streaming data, for example Log data.

Flume is one of the projects which comes into Hadoop framework.

For log analysis based on Hadoop, Flume can be used to get the log information, such as logs from websites or system logs.

Page 4: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

FLUME AGENT Flume architecture or flume agent has source

(anything like web server, application server or website etc.)

From source, data moves to channel where our log data will be stored.

From channel, the log data will be moved to sink (storage, for example Hadoop, or local file system etc.)

Page 5: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

FLUME COMPONENTS

Source An active component which receives the event

and places it in the channel.

Channel A passive component which buffers the event

and sends it to the sink,

Sink Writes the data into next hop for final

destination.

Page 6: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

CONF FILE

Basic Rules

Every agent must have at least one channel.

Every source must have at least one channel.

Every sink must have exactly one channel.

Every component must have a type.

Page 7: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

EXAMPLE CONFIGURATION# example.conf: A single-node Flume configuration

# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1

# Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

Page 8: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API In this example, we will retrieve users’ trends

as logs from a personal Twitter account using an API. These trends can be further analyzed as desired.

Page 9: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Download Flume

Page 10: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Check whether the flume tar is present or not

Create flume-ng directory

Copy the flume tar to flume-ng directory

Check whether flume tar is copied or not

Page 11: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Change directory to flume-ng

Extract file from flume tar

Check whether flume files are extracted or not

Page 12: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Move flume-sources-1.0-SNAPSHOT.jar file to

‘lib’ directory of apache-flume and check its presence there

Create flume.env.sh file in the ‘conf’ directory of apache flume

Page 13: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Open flume-env.sh

Page 14: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Edit flume-env.sh according to the below

snapshot

Page 15: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Open a Browser and go the below URL:URL: https://apps.twitter.com

Log in to Twitter

Page 16: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Create a new application

Page 17: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Twitter Apps

Page 18: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) The highlighted part will be used in

flume.conf

Page 19: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Edit flume.conf

Page 20: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Change the directory to the ‘bin’ folder of

apache flume

Start fetching the data from Twitter

Page 21: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Data being fetched from Twitter

Page 22: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Browse the filesystem

Click on user

Page 23: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

Click on flume

Click on tweets

Page 24: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.) Click on FlumeData file

Page 25: D ISTRIBUTED S YSTEMS Apache Flume Muhammad Afaq

USER TRENDS RETRIEVAL WITH FLUME USING TWITTER API (CONT.)

This is the data that has been downloaded from Twitter