bigsql = postgres + hadoop - openscg · pdf fileleveraging postgresql, hadoop and java. keen...
TRANSCRIPT
BigSQL = Postgres + Hadoop Denis Lussier
BigSQL Best of Both Worlds
Agenda
OpenSCG Facts Community Distributions
Demo Moving Forward
BigSQL = Hadoop + Postgres
Q&A
OpenSCG Facts • Started Operations in early 2010 • Profitable, No Outside Investment • 40+ Member Team • Headquartered in Bridgewater, NJ • Offices in
• San Mateo, CA • Hyderabad, India
• Healthy Controlled Growth
Experts committed to helping our clients gain
strategic advantage leveraging PostgreSQL,
Hadoop and Java. Keen focus on high
availability and performance.
• Cross platform OpenJDK
OpenSCG Community Projects
• Eclipse BIRT for x86 installers Java & report developers
• pgOn operations network web console
• tPostgres Postgres with SQL Server compatibility
• PostgresHA PostgreSQL that’s highly available
• BenchmarkSQL TPC-C Like for major RDBMS’s
• pgHive Postgres & Hadoop connector
• PostgreSQL RPM & DEB packages
• And Now BigSQL!
Postgres
• World’s most advanced open source database solution
• Enterprise class including MVCC, streaming replication & rich data type support (to name a few!)
• Robust transaction support with strong ANSI-SQL compliance
Hadoop
• Big data distributed framework • Reliable, massively scalable &
proven • Failures handled at the application
layer allowing commodity hardware
BigSQL Best of Both Worlds
POSTGRESQL World’s leading Open Source RDBMS
World’s leading Big Data distributed framework
HADOOP
BigSQL Always 100% Open & Free! No Strings!
BigSQL Postgres Components
PostgreSQL Advanced RDBMS postgresql.org
pgHA
PG connection controller
postgresha.org
pgBouncer
Postgres high availability
pgfoundry/pgbouncer
JDBC4 PostgreSQL driver pgJDBC jdbc.postgresql.org
PG spatial DB extender PostGIS postgis.net
BigSQL Hadoop Components
Hadoop Big Data distributed framework
hadoop.apache.org
Hive
Cluster coordinator & lock manager
hive.apache.org
Zookeeper
SQL-like queries via map reduce
zookeeper.apache.org
RDBMS/HDFS data transfer
Sqoop sqoop.apcahe.org
Streaming log data into HDFS
Flume
flume.apache.org
BigSQL Hadoop Components
Hbase Random, real-time IO over HDFS
hbase.apache.org
Pig
Share Hive schemas with Pig
pig.apache.org
HCatalog
Platform for parallel data analysis
apache.org/hcatalog
Data serialization system Avro avro.apache.org
BigSQL Additional Components
BenchmarkSQL Java benchmark example sourceforge/benchmarksql
Ambari
Cluster-wide metrics collection
incubator.apache.org/ambari
Ganglia
Provision and manage Hadoop clusters
ganglia.info
Monitoring and alerting Nagios nagios.org
BigSQL Architecture 2013
HADOOP Cluster(HDFS + Map-Reduce)
Data Node
Name Node Job Tracker
Task Tracker
SQL Parallel Query
Driver (Compiler, Optimizer, Executor) Postgres Metastore
Web UI Console UI HIVE
BigSQL Moving Forward
• Out of Box Configurations for: • HBase & Pig • Oozie • Ambari, Ganglia & Nagios
• Deeper Postgres and Hadoop Integration
• Additional Examples
www.openscg.com [email protected] 1200 Rt 22 East – Suite 2000 Bridgewater, NJ 08807 (908) 203-4725
BigSQL Q & A