postgres foreign data wrappers
TRANSCRIPT
Postgres for Integrating MongoDB, Hadoop and
Others with FDWs
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
The changing data landscape
• Data is getting generated from different sources • Data getting generated by machines • Interactions from end user has changed- Social Media • Changes in interaction and access pattern from Web
users
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
• Contrary to the myth Big Data challenge are not just about data or storage or scalability with data growth
• There are different databases available to suit the requirement of different data models
• Storages are not costly anymore • Different 0ers of storages are available offering needed IOPS
• The real challenge is managing and running analytics • The challenge is in integra0ng and co-‐rela0ng the data from different
sources • New skill sets needed for managing and querying different data
sources • Unlike Rela0onal Model no standard language for accessing NoSQL
databases – since they all offer different underlying model • SQL interface tools for NoSQL DB are geSng popular – the conversion
of SQL to NoSQL is mostly unintui0ve and unop0mized
The actual ‘Big Data’ challenges are not just about Data!
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
Big Data for Most customers consists of
What do one usually do with it? • Load it into a database system that suits the data representa0on in best way
Heterogeneous data-ecosystem has become inevitable!
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
• Co-relate data from various sources
• Get centralize view of data
• Run analytics on the data
What do you want to do with this ‘Data’?
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
How EDB is stepping up to cater to your ‘Big Data’ requirements • Core PostgreSQL Features to Support ‘Big Data’
• Flexible Datatypes – JSON / JSONB and Key Value Store • Unlogged tables to improve performance
• Foreign Data Wrappers (FDW) • Use PostgreSQL as a central interface to connect to other systems to
gather data and issue queries or joins • Push-‐down for where and columns improve performance
• Postgres Plus Advanced Server features • Resource Management to more effec0vely run mixed workloads • EDB*Loader to load data from various sources
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
Postgres Data Types catering to ‘new age Data’ requirements • HSTORE
• Key-‐value pair • Simple, fast and easy • Postgres v8.2 – pre-‐dates many NoSQL-‐only solu0ons • Ideal for flat data structures that are sparsely populated
• JSON • Hierarchical document model • Introduced in Postgres 9.2, perfected in 9.3
• JSONB • Binary version of JSON • Faster, more operators and even more robust • Postgres 9.4
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
A simple JSON Example Creating a table with a JSONB field
!CREATE TABLE json_data (data JSONB);!!
Simple JSON data element: !{"name": "Apple Phone", "type": "phone", "brand": "ACME", "price": 200, "category":["SMARTPHONE","PHONE"],"available": true, "warranty_years": 1}!!
Inserting this data element into the table json_data !INSERT INTO json_data (data) VALUES !
!(’ { !"name": "Apple Phone", "type": "phone", !! !"brand": "ACME", "price": 200, !! !"category":["SMARTPHONE","PHONE"],!! !"available": true, "warranty_years": 1 ! !!!} ')
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
A simple query for JSON data – with ‘SQL’
SELECT DISTINCT !!data->>'name' as products !
FROM json_data; ! products !------------------------------! Cable TV Basic Service Package! AC3 Case Black! Phone Service Basic Plan! AC3 Phone! AC3 Case Green! Phone Service Family Plan! AC3 Case Red! AC7 Phone!
This query does not return JSON data – it returns text values associated with the key ‘name’
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
Querying JSON with ANSI SQL SELECT DISTINCT
product_type, data->>'brand' as Brand,
data->>'available' as Availability FROM json_data JOIN products ON (products.product_type=json_data.data->>'name') WHERE json_data.data->>'available'=true; product_type | brand | availability ---------------------------+-----------+-------------- AC3 Phone | ACME | true
ANSI SQL
JSON
No need for programma0c logic to combine SQL and NoSQL in the applica0on – Postgres does it all
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
Unlogged Tables – Faster (and unreliable) writes • Every write in PostgreSQL is essentially two writes
due to Write Ahead Log (WAL) • WAL guarantees durability and support replica0on
• Unlogged tables are freed from this constraint • But tables are no longer crash safe!
• Can see good performance gain (~13-17%)
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
Foreign Data Wrappers
• Make external data sources look like local tables • Use SQL
• SELECT syntax; including useful clauses like DISTINCT, ORDER BY, GROUP BY and more.
• JOIN external data with internal tables • FUNCTIONS for comparison, math, string, pahern matching, date/
0me, etc • Star0ng in 9.3 -‐ INSERT / UPDATE / DELETE too
• Predicate pushdown – Filter data on remote sources first! • Available Features -‐ SELECT and WHERE clauses pushdown • Roadmap -‐ Join, Group/Aggregate, Sort and Limit clauses pushdown
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
FDW Example: How it works
How HDFS_FDW works
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
A simple HDFS_FWD example
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
Simple access to HDFS_FDW via SQL
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
• Build a Logical Data Hub
• Single point of access for all the data
• Centralize view of data
• Build co-relation on entities from various sources
• Easy way of accessing data for your users – SQL query
How FDW help you play with your ‘Big Data’?
Customer Interac5ons
OLTP Live Data
Purchase History
The Logical Data Warehouse
Web Logs Transac5on Logs
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
Run Mixed Workload more Effectively with PPAS v9.4
Postgres Plus Advanced Server
Resource Manager
(CPU & I/O)
Reporting
Transactions
80%
20%
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
• DBA assigns CPU & I/O to job groups • Allocates and priori0zes consump0on of resources • Low priority jobs don’t hurt high priority jobs • You can run OLAP and OLTP on the same database with
different resource priori0es
Resource Manager – What does it mean?
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
• Create a Resource Group and assign CPU Rate Limit
Example
CREATE RESOURCE GROUP resgrp_a; CREATE RESOURCE GROUP resgrp_b; ALTER RESOURCE GROUP resgrp_a SET cpu_rate_limit = .25; ALTER RESOURCE GROUP resgrp_a SET dirty_rate_limit = 12288;
SET edb_resource_group TO res_grp_a;
• Use these resource group during a session
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
• Statements executed will be limited in CPU or writing to shared_buffers per resource group.
• The limits are adjusted regularly based on current usage.
• If multiple processes in the same group are being executed, aggregate usage will be limited
• Allows you to run different workloads on the same server
Benefits of Resource Manager
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
• Conventional path, direct path, and parallel direct load loading methods
• Data loading from standard input and remote loading, particularly useful for large data sources on remote hosts
• Input data with delimiter-separated or fixed-width fields • Bad file for collecting rejected records • Discard file for collecting records that do not meet the
selection criteria of any target table • Log file for recording the EDB*Loader session and any
error messages
Faster data load with EDB*Loader
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
• Combine relational and non-relational data in Postgres • Increased write performance with un-logged tables • Foreign Data Wrapper allows you to access different
data sources centrally • Build analy0cs and reports which combines data from mul0ple
sources • Use SQL queries to do aggregation and capitalize on
pushdown of predicates and clauses • Choose your favourite reporting and analytical tools
• All that it needs is ability to work with SQL – which is a given in today’s date
Putting it all together - Postgres as a Logical Data Hub
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
• Use capacity manager to manage resources • With user mapping for FDW and resource groups set resource
limits • e.g. reduce the resources for user which is used for accessing
the data from HDFS
• Use EDB*Loader to bulk load the data • The FDW framework allows foreign tables to be writable • Bulk load into foreign tables – data gets pumped into remote
data sources • Central Bulk load mechanism
Putting it all together - Postgres as a Logical Data Hub …cont’d
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
For further Reference
• Postgres Hangout : 3 Things you should know about JSON Features in PostgreSQL v9.4
• hhps://www.youtube.com/watch?v=xrBIOuEG5rU • Postgres Hangout : Building Hybrid Data Store using PostgreSQL
and MongoDB • hhps://www.youtube.com/watch?v=8h8orDvQ0Yk
• EnterpriseDB’s github • hhps://www.github.com/EnterpriseDB
• EnterpriseDB’s blogs on FDW • hhp://www.enterprisedb.com/postgres-‐plus-‐edb-‐blog/archive/foreign-‐data-‐
wrappers • PostgreSQL’s wiki page on FDW
• hhps://wiki.postgresql.org/wiki/Foreign_data_wrappers
© Enterprise
DB Corpo
ra0o
n and Ashn
ik Pte Ltd.
2014 All Rights Reserved.
About Ashnik
• Open Source Solution provider including EnterpriseDB, MongoDB, Penta, NGINX and Red Hat
• EnterpriseDB Partner for South East ASIA Region • Provides EnterpriseDB products and solutions via chain of
partners and distributors in ASEAN Countries • Team of Certified Postgres Professional • EDB Training Partner • Provides services and support for Postgres deployments in
the ASEAN region
Questions?