bigsql = postgres + hadoop - openscg · pdf fileleveraging postgresql, hadoop and java. keen...

13
BigSQL = Postgres + Hadoop Denis Lussier

Upload: dinhquynh

Post on 03-Feb-2018

244 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

BigSQL = Postgres + Hadoop Denis Lussier

Page 2: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

BigSQL Best of Both Worlds

Agenda

OpenSCG Facts Community Distributions

Demo Moving Forward

BigSQL = Hadoop + Postgres

Q&A

Page 3: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

OpenSCG Facts •  Started Operations in early 2010 •  Profitable, No Outside Investment •  40+ Member Team •  Headquartered in Bridgewater, NJ •  Offices in

•  San Mateo, CA •  Hyderabad, India

•  Healthy Controlled Growth

Experts committed to helping our clients gain

strategic advantage leveraging PostgreSQL,

Hadoop and Java. Keen focus on high

availability and performance.

Page 4: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

•  Cross platform OpenJDK

OpenSCG Community Projects

•  Eclipse BIRT for x86 installers Java & report developers

•  pgOn operations network web console

•  tPostgres Postgres with SQL Server compatibility

•  PostgresHA PostgreSQL that’s highly available

•  BenchmarkSQL TPC-C Like for major RDBMS’s

•  pgHive Postgres & Hadoop connector

•  PostgreSQL RPM & DEB packages

•  And Now BigSQL!

Page 5: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

Postgres

•  World’s most advanced open source database solution

•  Enterprise class including MVCC, streaming replication & rich data type support (to name a few!)

•  Robust transaction support with strong ANSI-SQL compliance

Hadoop

•  Big data distributed framework •  Reliable, massively scalable &

proven •  Failures handled at the application

layer allowing commodity hardware

BigSQL Best of Both Worlds

Page 6: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

POSTGRESQL World’s leading Open Source RDBMS

World’s leading Big Data distributed framework

HADOOP

BigSQL Always 100% Open & Free! No Strings!

Page 7: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

BigSQL Postgres Components

PostgreSQL Advanced RDBMS postgresql.org

pgHA

PG connection controller

postgresha.org

pgBouncer

Postgres high availability

pgfoundry/pgbouncer

JDBC4 PostgreSQL driver pgJDBC jdbc.postgresql.org

PG spatial DB extender PostGIS postgis.net

Page 8: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

BigSQL Hadoop Components

Hadoop Big Data distributed framework

hadoop.apache.org

Hive

Cluster coordinator & lock manager

hive.apache.org

Zookeeper

SQL-like queries via map reduce

zookeeper.apache.org

RDBMS/HDFS data transfer

Sqoop sqoop.apcahe.org

Streaming log data into HDFS

Flume

flume.apache.org

Page 9: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

BigSQL Hadoop Components

Hbase Random, real-time IO over HDFS

hbase.apache.org

Pig

Share Hive schemas with Pig

pig.apache.org

HCatalog

Platform for parallel data analysis

apache.org/hcatalog

Data serialization system Avro avro.apache.org

Page 10: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

BigSQL Additional Components

BenchmarkSQL Java benchmark example sourceforge/benchmarksql

Ambari

Cluster-wide metrics collection

incubator.apache.org/ambari

Ganglia

Provision and manage Hadoop clusters

ganglia.info

Monitoring and alerting Nagios nagios.org

Page 11: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

BigSQL Architecture 2013

HADOOP Cluster(HDFS + Map-Reduce)

Data Node

Name Node Job Tracker

Task Tracker

SQL Parallel Query

Driver (Compiler, Optimizer, Executor) Postgres Metastore

Web UI Console UI HIVE

Page 12: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

BigSQL Moving Forward

•  Out of Box Configurations for: •  HBase & Pig •  Oozie •  Ambari, Ganglia & Nagios

•  Deeper Postgres and Hadoop Integration

•  Additional Examples

Page 13: BigSQL = Postgres + Hadoop - OpenSCG · PDF fileleveraging PostgreSQL, Hadoop and Java. Keen focus on high availability and performance. ... • Big data distributed framework

www.openscg.com [email protected] 1200 Rt 22 East – Suite 2000 Bridgewater, NJ 08807 (908) 203-4725

BigSQL Q & A