introduction to hbase
DESCRIPTION
Introduction to Hbase. Agenda. What is Hbase About RDBMS Overview of Hbase Why Hbase instead of RDBMS Architecture of Hbase Hbase interface Summarize. What is Hbase. Hbase is an open source, distributed sorted map modeled after Google's BigTable. Open Source. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/1.jpg)
Introduction to Hbase
![Page 2: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/2.jpg)
Agenda What is Hbase
About RDBMS
Overview of Hbase
Why Hbase instead of RDBMS
Architecture of Hbase
Hbase interface
Summarize
![Page 3: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/3.jpg)
What is Hbase
Hbase is an open source, distributed sorted map modeled after Google's BigTable
![Page 4: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/4.jpg)
Open Source Apache 2.0 License Committers and contributors from diverse organizations
like Facebook, Trend Micro etc.
![Page 5: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/5.jpg)
About RDBMS Have a lot of Limitations Both read / write throughput not high (transactional
databases)
Specialized Hardware is quite expensive
![Page 6: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/6.jpg)
Background Google releases paper on Bigtable – 2006
First usable Hbase – 2007
Hbase becomes Apache top-level project – 2010
![Page 7: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/7.jpg)
Overview of Hbase Hbase is a part of Hadoop eco-system. Apache Hadoop is an open source system to reliably
store and process data across many commodity computers
Hadoop provides: Fault tolerance Scalability
![Page 8: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/8.jpg)
Hadoop's components MapReduce (Process)
Fault tolerant distributed processing
HDFS (store) Self-healing High-bandwidth Clustered storage
![Page 9: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/9.jpg)
Difference Between Hadoop/HDFS and Hbase HDFS is a distributed file system that is well suited for the
storage of large files.
Hbase is built on top of HDFS and provides fast record lookups (and updates) for large tables.
HDFS has based on GFS.
![Page 10: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/10.jpg)
Hbase is Distributed – uses HDFS for storage
Column – Oriented
Multi-Dimensional
Storage System
![Page 11: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/11.jpg)
Hbase is NOT A sql Database – No Joins, no query engine, no
datatypes, no sql
No Schema
![Page 12: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/12.jpg)
Storage Model Column – oriented database (column families) Table consists of Rows, each which has a primary
key(row key)
Each Row may have any number of columns Table schema only defines Column families(column family
can have any number of columns)
Each cell value has a timestamp
![Page 13: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/13.jpg)
Static Columns
int varchar int varchar int
int varchar int varchar int
int varchar int varchar int
![Page 14: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/14.jpg)
Something different Row1 → ColA = Value
ColB = Value ColC = Value
Row2 → ColX = Value ColY = Value
![Page 15: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/15.jpg)
A Big Map Row Key + Column Key + timestamp => value
Row Key Column Key Timestamp Value
1 Info:name 1273516197868 Sakis
1 Info:age 1273871824184 21
1 Info:sex 1273746281432 Male
2 Info:name 1273863723227 Themis
2 Info:name 1273973134238 Andreas
![Page 16: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/16.jpg)
RDBMS vs Hbase
RDBMS Hbase
Data layout Row-oriented Column-oriented
Query language SQL Get/put/scan/etc *
Security Authentication/Authorization
Work in Progress
Max data size TBs Hundrends of PBs
Read / write throughputlimits
1000s queries/second Millions of queries persecond
![Page 17: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/17.jpg)
Terms and Daemons
Region
A subset of table's rows
Region Server(slave)
Serves data for reads and writes
Master Responsible for coordinating the slaves Assigns regions, detects failures of Region Servers Control some admin function
![Page 18: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/18.jpg)
Distributed coordination To manage master election and server availability we use
Zookeeper Set up a cluster, provides distributed coordination
primitives An excellent tool for building cluster management
systems
![Page 19: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/19.jpg)
Hbase Architecture
![Page 20: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/20.jpg)
Hbase Interface
Java
Thrift (Ruby, Php, Python, Perl, C++,..)
Hbase Shell
![Page 21: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/21.jpg)
Use Hbase if You need random write, random read or both
You need to do many thousands of operations per sec on multiple TB of data
Your access patterns are simple
![Page 22: Introduction to Hbase](https://reader035.vdocument.in/reader035/viewer/2022062305/56816369550346895dd442b7/html5/thumbnails/22.jpg)
Thank You