hadoop week 7
DESCRIPTION
Hadoop_Week_7TRANSCRIPT
-
Week 1 Introduction to HDFS
Week 2 Setting Up Hadoop Cluster
Week 3 Map-Reduce Basics, types and formats
Week 4 PIG
Week 5 HIVE
Week 6 HBASE
Week 7 ZOOKEEPER
Week 8 SQOOP
Zookeeper
-
Problem?
-
Problem?
-
Problem?
-
More Examples
For Example:
Purposes assume the system is an ad system for
serving advertisements to web sites. Ad systems are
complex beasts that require a fair bit of
coordination. Imagine all the subsystems needing to
run on those 100 nodes: database, monitoring,
fraud detectors, beacon servers, web server event
log processors, failover servers, customer
dashboards, targeting engines, campaign planners,
campaign scenario testers, upgrades, installs, media
managers, and so on.
There's a lot going on
-
Traditional Solution and issues
Tridition Approach
1. Manual Intervention.
2. Better Alert System
3. Code for Syncronization
4. Spend more money in hiring great programmers
-
Issues with the Approach
1. Triditional solutions are not scalable.
2. They become very complex over a period of time.
3. Lots of time is wasted in fixing bugs
4. Services may not have fixed IP address.
5. Programmers cannot use locks correctly.
6. Message based coordination can be hard to use in some applications.
-
Characterstics of New solution
1. Simple Interface
2. Centralized coordination service
3. Highly reliable
-
Simplified Zookeeper
Apache ZooKeeper is a software
project of the Apache Software
Foundation, providing an open source
distributed configuration
service, synchronization service, and
naming registry for large distributed
systems
-
How Information is stored?
Zookeeper stores information in tree like structure
It is a triditional file system like structure. A file is called Znode
A file can store data and refernce to other Znodes( i.e. Its children ).
Each Znode can store data upto 1 MB.
Data is kept in memory and is backed up to a log for reliability. By using memory, ZooKeeper is very fast and can handle the high loads.
-
ZooKeeper Data Model
Hierarchal namespace (like a file system)
Each znode has data and children
Quorum?
-
ZNodes
Every node in a ZooKeeper tree is referred to as a Znode.
Znodes maintain a stat structure that includes version numbers for data changes. The stat
structure also has timestamps.
The version number, together with the timestamp allow ZooKeeper to validate the cache and to
coordinate updates. Each time a Znode's data changes, the version number increases.
For instance, whenever a client retrieves data, it also receives the version of the data. And when
a client performs an update or a delete, it must supply the version of the data of the znode it is
changing. If the version it supplies doesn't match the actual version of the data, the update will
fail.
-
How HBase uses Zookeeper
Every Region server creates its own Znode in Zookeeper This helps in tracking available region servers
It also helps in tracking the failures of region servers or network partitions
In simple words it maintains a registry for Region servers
HBase also uses Zookeeper to ensure that there is only one master running.
HBase also stores information where root table is stored.
-
How HBase uses Zookeeper
Before zookeeper Hbase master used to get information from region servers using Heartbeat
Zookeeper now takes responsibilty to inform regarding changes in any side.
Zookeper helps Hbase to bootstrap.
-
Hbase master selection
-
Register new Region Server
-
Thank You