scaling postresql with stado

Scaling PostgreSQL with Stado

Who Am I?• Jim Mlodgenski

– Founder of Cirrus Technologies– Former Chief Architect of EnterpriseDB– Co-organizer of NYCPUG

Agenda

• What is Stado?• Architecture• Query Flow• Scaling• Limitations

What is Stado?

• Continuation of GridSQL

• “Shared-Nothing”, distributed data architecture.

–Leverage the power of multiple commodity servers while appearing as a single database to the application

• Essentially... Open Source Greenplum, Netezza or Teradata

Stado Details

• Designed for Parallel Querying• Not just “Read-Only”, can execute

UPDATE, DELETE• Data Loader for parallel loading• Standard connectivity via PostgreSQL

compatible connectors: JDBC, ODBC, ADO.NET, libpq (psql)

What Stado is not?

• A replication solution like Slony or Bucardo

• A high availability solution like Synchronous Replication in PostgreSQL 9.1

• A scalable transactional solution like PostgresXC

• An elastic, eventually consistent NoSQL database

Architecture

• Loosely coupled, shared-nothing architecture

• Data repositories

–Metadata database

–Stado database

• Stado processes

–Central coordinator

–Agents

Configuration

• Can be configured for multiple logical “nodes” per physical server

–Take advantage of multi-core processors

• Tables may be either replicated or partitioned

• Replicated tables for static lookup data or dimensions

–Partitioned tables for large fact tables

Partitioning

• Tables may simultaneously use Stado Partitioning with Constraint Exclusion Partitioning

– Large queries scan a much smaller subset of data by using subtables

– Since each subtable is also partitioned across nodes, they are scanned in parallel

– Queries execute much faster

Creating Tables

• Tables can be partitioned or replicated

CREATE TABLE STATE_CODES ( STATE_CD varchar(2) PRIMARY KEY, USPS_CD varchar(2), NAME varchar(100), GNISIS varchar(8)) REPLICATED;

Creating Tables

CREATE TABLE roads (

gid integer NOT NULL,

statefp character varying(2),

countyfp character varying(3),

linearid character varying(22),

fullname character varying(100),

rttyp character varying(1),

mtfcc character varying(5),

the_geom geometry)

PARTITIONING KEY gid ON ALL;

Query Optimization

• Cost Based Optimizer–Takes into account Row Shipping

(expensive)• Looks for joins with replicated tables

–Can be done locally–Looks for joins between tables on

partitioned columns

Two Phase Aggregation

• SUM

–SUM(stat1)

–SUM2(SUM(stat1)

• AVG

–SUM(stat1) / COUNT(stat1)

–SUM2 (SUM(stat1)) / SUM2 (COUNT(stat1))

Query 1

SELECT sum(st_length_spheroid(the_geom,

'SPHEROID["GRS_1980",6378137,298.257222101]'))/1609.344

as interstate_miles

FROM roads

WHERE rttyp = 'I';

interstate_miles

------------------

84588.5425986619

(1 row)

Query 1 : Results

1 4 8 12 160

20

40

60

80

100

120

LinearActual

Nodes

Tim

e (

seco

nd

s)

Nodes Actual (sec)

1 101.2080566

4 25.6410708

8 14.3321144

12 5.4738612

16 4.8214672

Query 2SELECT s.name as state, c.name as county, a.population, b.road_length,

a.population/b.road_length as person_per_km

FROM (SELECT state_cd, county_cd, sum(population) as population

FROM census_tract

GROUP BY 1, 2) a,

(SELECT statefp, countyfp,

sum(st_length_spheroid(the_geom, 'SPHEROID["GRS_1980",6378137,298.257222101]'))/1000 as road_length

FROM roads

GROUP BY 1, 2) b,

state_codes s, county_codes c

WHERE a.state_cd = b.statefp

AND a.county_cd = b.countyfp

AND a.state_cd = c.state_cd

AND a.county_cd = c.county_cd

AND c.state_cd = s.state_cd

ORDER BY 5 DESC

LIMIT 20;

state | county | population | road_length | person_per_km

----------------------+-----------------+------------+------------------+------------------

New York | New York | 1537195 | 1465.35561969273 | 1049.02521909483

New York | Kings | 2465326 | 2785.37685011507 | 885.096032839562

New York | Bronx | 1332650 | 1638.47925579201 | 813.345665066614

New York | Queens | 2229379 | 4343.78066667893 | 513.234707521383

New Jersey | Hudson | 608975 | 1474.86512729116 | 412.902162191933

California | San Francisco | 776733 | 2125.05706617179 | 365.51159607175

Pennsylvania | Philadelphia | 1517550 | 5067.19918355051 | 299.484970894054

District of Columbia | Washington | 572059 | 2191.33029860109 | 261.055579054054

New York | Richmond | 443728 | 1758.77468237864 | 252.293829588156

Massachusetts | Suffolk | 689807 | 2805.37242915611 | 245.887851762877

New Jersey | Essex | 793633 | 3359.22581976629 | 236.254733257324

Virginia | Alexandria City | 128283 | 577.98117468444 | 221.950135434841

Puerto Rico | San Juan | 434374 | 1994.26820504899 | 217.811224638829

Virginia | Arlington | 189453 | 967.505165121908 | 195.816008874876

New Jersey | Union | 522541 | 2827.74655887522 | 184.790605919029

Maryland | Baltimore City | 651154 | 3707.01218958787 | 175.654669231717

Puerto Rico | Catano | 30071 | 174.765650431886 | 172.064704509654

Hawaii | Honolulu | 876156 | 5098.8482067881 | 171.834101441493

Puerto Rico | Toa Baja | 94085 | 558.532996996738 | 168.450208861249

Puerto Rico | Carolina | 186076 | 1122.20560229076 | 165.812752690026

(20 rows)

Query 2 : Results

1 4 8 12 160

500

1000

1500

2000

2500

3000

3500

4000

4500

LinearActual

Nodes

Tim

e (

seco

nd

s)

Nodes Actual (sec)

1 3983.1002548

4 1007.1235182

8 563.6259202

12 365.152858

16 282.7345952

Scalability

Limitations

•• SQL Support–Uses its own parser and optimizer

so:• No Window Functions• No Stored Procedures• No Full Text Search

Transaction Performance

• Single row Insert, Update, or Delete are slow compared to a single PostgreSQL instance

–The data must make an additional network trip to be committed

–All partitioned rows must be hashed to be mapped to the proper node

–All replicated rows must be committed to all nodes

• Use “gs-loader” for bulk loading for better performance

High Availability

• No heartbeat or fail-over control in the coordinator

– High Availability for each PostgreSQL node must be configured separately

– Streaming replication can be ideal for this

• Getting a consistent backup of the entire Stado database is difficult

– Must ensure there are no transaction are occurring

– Backup each node separately

Adding Nodes

• Requires Downtime

–Data must be manually reloaded to partition the data to the new node

• With planning, the process can be fast with no mapping of data

–Run multiple PostgreSQL instances on each physical server and move the PostgreSQL instances to new hardware as needed

Summary

• Stado can improve performance tremendously of queries

• Stado can scale linearly as more nodes are added

• Stado is open source so if the limitations are an issue, submit a patch

Download Stado at:http://stado.us

Jim MlodgenskiEmail: [email protected]: @jim_mlodgenski

NYC PostgreSQL User Grouphttp://nycpug.org

mailto:[email protected]

scaling postresql with stado

Technology