hadoop week 7

18

Upload: deepesh-tripathi

Post on 19-Dec-2015

24 views

Category:

Documents


1 download

DESCRIPTION

Hadoop_Week_7

TRANSCRIPT

  • Week 1 Introduction to HDFS

    Week 2 Setting Up Hadoop Cluster

    Week 3 Map-Reduce Basics, types and formats

    Week 4 PIG

    Week 5 HIVE

    Week 6 HBASE

    Week 7 ZOOKEEPER

    Week 8 SQOOP

    Zookeeper

  • Problem?

  • Problem?

  • Problem?

  • More Examples

    For Example:

    Purposes assume the system is an ad system for

    serving advertisements to web sites. Ad systems are

    complex beasts that require a fair bit of

    coordination. Imagine all the subsystems needing to

    run on those 100 nodes: database, monitoring,

    fraud detectors, beacon servers, web server event

    log processors, failover servers, customer

    dashboards, targeting engines, campaign planners,

    campaign scenario testers, upgrades, installs, media

    managers, and so on.

    There's a lot going on

  • Traditional Solution and issues

    Tridition Approach

    1. Manual Intervention.

    2. Better Alert System

    3. Code for Syncronization

    4. Spend more money in hiring great programmers

  • Issues with the Approach

    1. Triditional solutions are not scalable.

    2. They become very complex over a period of time.

    3. Lots of time is wasted in fixing bugs

    4. Services may not have fixed IP address.

    5. Programmers cannot use locks correctly.

    6. Message based coordination can be hard to use in some applications.

  • Characterstics of New solution

    1. Simple Interface

    2. Centralized coordination service

    3. Highly reliable

  • Simplified Zookeeper

    Apache ZooKeeper is a software

    project of the Apache Software

    Foundation, providing an open source

    distributed configuration

    service, synchronization service, and

    naming registry for large distributed

    systems

  • How Information is stored?

    Zookeeper stores information in tree like structure

    It is a triditional file system like structure. A file is called Znode

    A file can store data and refernce to other Znodes( i.e. Its children ).

    Each Znode can store data upto 1 MB.

    Data is kept in memory and is backed up to a log for reliability. By using memory, ZooKeeper is very fast and can handle the high loads.

  • ZooKeeper Data Model

    Hierarchal namespace (like a file system)

    Each znode has data and children

    Quorum?

  • ZNodes

    Every node in a ZooKeeper tree is referred to as a Znode.

    Znodes maintain a stat structure that includes version numbers for data changes. The stat

    structure also has timestamps.

    The version number, together with the timestamp allow ZooKeeper to validate the cache and to

    coordinate updates. Each time a Znode's data changes, the version number increases.

    For instance, whenever a client retrieves data, it also receives the version of the data. And when

    a client performs an update or a delete, it must supply the version of the data of the znode it is

    changing. If the version it supplies doesn't match the actual version of the data, the update will

    fail.

  • How HBase uses Zookeeper

    Every Region server creates its own Znode in Zookeeper This helps in tracking available region servers

    It also helps in tracking the failures of region servers or network partitions

    In simple words it maintains a registry for Region servers

    HBase also uses Zookeeper to ensure that there is only one master running.

    HBase also stores information where root table is stored.

  • How HBase uses Zookeeper

    Before zookeeper Hbase master used to get information from region servers using Heartbeat

    Zookeeper now takes responsibilty to inform regarding changes in any side.

    Zookeper helps Hbase to bootstrap.

  • Hbase master selection

  • Register new Region Server

  • Thank You