hadoop week 7

Week 1 Introduction to HDFS

Week 2 Setting Up Hadoop Cluster

Week 3 Map-Reduce Basics, types and formats

Week 4 PIG

Week 5 HIVE

Week 6 HBASE

Week 7 ZOOKEEPER

Week 8 SQOOP

Zookeeper

Problem?

More Examples

For Example:

Purposes assume the system is an ad system for

serving advertisements to web sites. Ad systems are

complex beasts that require a fair bit of

coordination. Imagine all the subsystems needing to

run on those 100 nodes: database, monitoring,

fraud detectors, beacon servers, web server event

log processors, failover servers, customer

dashboards, targeting engines, campaign planners,

campaign scenario testers, upgrades, installs, media

managers, and so on.

There's a lot going on

Traditional Solution and issues

Tridition Approach

1. Manual Intervention.

2. Better Alert System

3. Code for Syncronization

4. Spend more money in hiring great programmers

Issues with the Approach

1. Triditional solutions are not scalable.

2. They become very complex over a period of time.

3. Lots of time is wasted in fixing bugs

4. Services may not have fixed IP address.

5. Programmers cannot use locks correctly.

6. Message based coordination can be hard to use in some applications.

Characterstics of New solution

1. Simple Interface

2. Centralized coordination service

3. Highly reliable

Simplified Zookeeper

Apache ZooKeeper is a software

project of the Apache Software

Foundation, providing an open source

distributed configuration

service, synchronization service, and

naming registry for large distributed

systems

How Information is stored?

Zookeeper stores information in tree like structure

It is a triditional file system like structure. A file is called Znode

A file can store data and refernce to other Znodes( i.e. Its children ).

Each Znode can store data upto 1 MB.

Data is kept in memory and is backed up to a log for reliability. By using memory, ZooKeeper is very fast and can handle the high loads.

ZooKeeper Data Model

Hierarchal namespace (like a file system)

Each znode has data and children

Quorum?

ZNodes

Every node in a ZooKeeper tree is referred to as a Znode.

Znodes maintain a stat structure that includes version numbers for data changes. The stat

structure also has timestamps.

The version number, together with the timestamp allow ZooKeeper to validate the cache and to

coordinate updates. Each time a Znode's data changes, the version number increases.

For instance, whenever a client retrieves data, it also receives the version of the data. And when

a client performs an update or a delete, it must supply the version of the data of the znode it is

changing. If the version it supplies doesn't match the actual version of the data, the update will

fail.

How HBase uses Zookeeper

Every Region server creates its own Znode in Zookeeper This helps in tracking available region servers

It also helps in tracking the failures of region servers or network partitions

In simple words it maintains a registry for Region servers

HBase also uses Zookeeper to ensure that there is only one master running.

HBase also stores information where root table is stored.

How HBase uses Zookeeper

Before zookeeper Hbase master used to get information from region servers using Heartbeat

Zookeeper now takes responsibilty to inform regarding changes in any side.

Zookeper helps Hbase to bootstrap.

Hbase master selection

Register new Region Server

Thank You

hadoop week 7

Documents

zookeeper week

zookeeper tree

zookeeper hbase master

hbase week

zookeeper stores information

sqoop zookeeper problem

data upto

region servers hbase