unlocking proprietary data with postgresql foreign data wrappers

14
Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers Pat Patterson Principal Developer Evangelist [email protected] @metadaddy

Upload: pat-patterson

Post on 10-May-2015

3.565 views

Category:

Technology


0 download

DESCRIPTION

PostgreSQL 9.1 introduced ‘Foreign Data Wrappers’ (FDW) – a partial implementation of the SQL/MED standard for handling access to remote data sources. FDW allows PostgreSQL to expose remote data as foreign tables which then behave similarly to native PostgreSQL tables, in particular, allowing remote data to be queried with SQL statements. This session provides an overview of Foreign Data Wrappers, looks at the native interface for writing FDWs in C, and contrasts this with Multicorn, an open source framework that allows FDWs to be developed in Python. We will show a real-world Python FDW that retrieves business data from salesforce.com, with a sample client application that demonstrates how foreign data can be combined with data held in native PostgreSQL tables using a simple SQL JOIN.

TRANSCRIPT

Page 1: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

Pat PattersonPrincipal Developer Evangelist

[email protected]@metadaddy

Page 2: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

Agenda

Foreign Data Wrappers

Writing FDW’s in C

Multicorn

Database.com FDW for PostgreSQL

FDW in action

Page 3: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

Why Foreign Data Wrappers?

External data sources look like local tables!– Other SQL database

• MySQL, Oracle, SQL Server, etc

– NoSQL database• CouchDB, Redis, etc

– File

– LDAP

– Web services• Twitter!

Page 4: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

Why Foreign Data Wrappers?

Make the database do the work– SELECT syntax

• DISTINCT, ORDER BY etc

– Functions• COUNT(), MIN(), MAX() etc

– JOIN external data to internal tables

– Use standard apps, libraries for data analysis,

reporting

Page 5: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

Foreign Data Wrappers

2003 - SQL Management of External Data (SQL/MED)

2011 – PostgreSQL 9.1 implementation– Read-only

– SELECT-clause optimization

– WHERE-clause push-down• Minimize data requested from external source

Future Improvements– JOIN push-down

• Where two foreign tables are in the same server

– Support cursors

Page 6: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

FDW’s in PostgreSQL

‘Compiled language’ (C) interface

Implement a set of callbackstypedef struct FdwRoutine{ NodeTag type; /* These functions are required. */ GetForeignRelSize_function GetForeignRelSize; GetForeignPaths_function GetForeignPaths; GetForeignPlan_function GetForeignPlan; ExplainForeignScan_function ExplainForeignScan; BeginForeignScan_function BeginForeignScan; IterateForeignScan_function IterateForeignScan; ReScanForeignScan_function ReScanForeignScan; EndForeignScan_function EndForeignScan; /* These functions are optional. */ AnalyzeForeignTable_function AnalyzeForeignTable;} FdwRoutine;

Page 7: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

FDW’s in PostgreSQL

Much work!• CouchDB FDW

• https://github.com/ZhengYang/couchdb_fdw/

• couchdb_fdw.c > 1700 LoC

Page 8: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

Multicorn

http://multicorn.org/

PostgreSQL 9.1+ extension

Python framework for FDW’s

Implement two methods…

Page 9: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

Multicorn

from multicorn import ForeignDataWrapper

class ConstantForeignDataWrapper(ForeignDataWrapper):

def __init__(self, options, columns): super(ConstantForeignDataWrapper, self).__init__(options, columns) self.columns = columns

def execute(self, quals, columns): for index in range(20): line = {} for column_name in self.columns: line[column_name] = '%s %s' % (column_name, index) yield line

Page 10: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

Database.com FDW for PostgreSQL

OAuth login to Database.com / Force.com– Refresh on token expiry

Force.com REST API– SOQL query

• SELECT firstname, lastname FROM Contact

Request thread puts records in Queue, execute()

method gets them from Queue

JSON parsing – skip embedded metadat

< 250 lines code

Page 11: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

Demo

Page 12: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers

Conclusion

Foreign Data Wrappers make the whole world look like

tables!

Writing FDW’s in C is hard!– Or, at least, time consuming!

Writing FDW’s in Python via Multicorn is easy!– Or, at least, quick!

Try it for yourself!

Page 14: Unlocking Proprietary Data with PostgreSQL Foreign Data Wrappers