row and column security for hive data with big sql
TRANSCRIPT
© 2015 IBM Corporation4
SQL on Hadoop Matters for Big Data Analytics
For BI Tools like Cognos
Visualizations from Cognos 10.2.2
© 2015 IBM Corporation5
Hive is Really 3 Things…
Storage Format, Metastore, and Execution Engine
5
SQL Execution Engine
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet RC Others…Tab Delim.
Hive Metastore
(open source)M
apR
edu
ce
Applications
© 2015 IBM Corporation6
OutputReduceMap
Hive “Execution Engine”
SQL
Hive
References Hive Meta Store to understand data
Translates SQL to Map Reduce
© 2015 IBM Corporation7
Big SQL preserves open source foundationLeverages Hive metastore and storage formats.
No Lock-in. Data part of Hadoop, not BigSQL. Fall back to Open Source Hive Engine at any time.
7
SQL Execution Engines
IBM BigSQL
(IBM)
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet RC Others…Tab Delim.
Hive Metastore
(open source)
Applications
© 2015 IBM Corporation9
Performance Test – TPC DS Workload
20 (Physical Node) Cluster
TPC-DS stands for Transaction Processing Council – Decision Support (workload) which is
an industry standard benchmark for SQL
Hive 1.2.1
IBM Open Platform V4.1
20 Nodes
Big SQL V4.1
IBM Open Platform V4.1
20 Nodes
*Not an official TPC-DS Benchmark.
© 2015 IBM Corporation14
Performance Test Summary
Big SQL V4 vs. Hive 1.2.1 @ 1TB
In 99 / 99 Queries, Big SQL was faster
On Average, Big SQL was 21X faster
Excluding the Top 5 and Bottom 5 results, Big SQL was 19X faster
© 2015 IBM Corporation15
ONLY BIG SQL COULD RUN
THE COMPLETE WORKLOAD
Actually, we originally set out to run 10TB, but …
© 2015 IBM Corporation16
Performance Test Summary
Big SQL @ 10TB vs. Hive @ 1 TB
How does Big SQL running with 10X the data?
In 89 / 99 Queries, Big SQL was still faster
On Average, Big SQL still 3.8X faster
Excluding the Top/Bottom 5 results, Big SQL was still 3.2X faster
© 2015 IBM Corporation17
AND, we’re really good with lots of users….
Clear benefit on workload throughput with WLM enabled:
© 2015 IBM Corporation19
Enhanced Security - Good to Know
Role Based Access Control
Row Level Security
Column Level Security
Separation of Duties
Security Administrator
Database Administrator
Workload Manager
Others..
© 2015 IBM Corporation20
Recap - Big SQL preserves open source foundation
SQL Execution Engines
IBM BigSQL
(IBM)
Hive
(Open Source)
Hive Storage Model
(open source)
CSV Parquet RC Others…Tab Delim.
Hive Metastore
(open source)
Applications
Big SQL Makes Hive
FASTER and more SECURE