scaling postresql with stado
DESCRIPTION
TRANSCRIPT
Scaling PostgreSQL with Stado
Who Am I?• Jim Mlodgenski
– Founder of Cirrus Technologies– Former Chief Architect of EnterpriseDB– Co-organizer of NYCPUG
Agenda
• What is Stado?• Architecture• Query Flow• Scaling• Limitations
What is Stado?
• Continuation of GridSQL
• “Shared-Nothing”, distributed data architecture.
–Leverage the power of multiple commodity servers while appearing as a single database to the application
• Essentially... Open Source Greenplum, Netezza or Teradata
Stado Details
• Designed for Parallel Querying• Not just “Read-Only”, can execute
UPDATE, DELETE• Data Loader for parallel loading• Standard connectivity via PostgreSQL
compatible connectors: JDBC, ODBC, ADO.NET, libpq (psql)
What Stado is not?
• A replication solution like Slony or Bucardo
• A high availability solution like Synchronous Replication in PostgreSQL 9.1
• A scalable transactional solution like PostgresXC
• An elastic, eventually consistent NoSQL database
Architecture
• Loosely coupled, shared-nothing architecture
• Data repositories
–Metadata database
–Stado database
• Stado processes
–Central coordinator
–Agents
Configuration
• Can be configured for multiple logical “nodes” per physical server
–Take advantage of multi-core processors
• Tables may be either replicated or partitioned
• Replicated tables for static lookup data or dimensions
–Partitioned tables for large fact tables
Partitioning
• Tables may simultaneously use Stado Partitioning with Constraint Exclusion Partitioning
– Large queries scan a much smaller subset of data by using subtables
– Since each subtable is also partitioned across nodes, they are scanned in parallel
– Queries execute much faster
Creating Tables
• Tables can be partitioned or replicated
CREATE TABLE STATE_CODES ( STATE_CD varchar(2) PRIMARY KEY, USPS_CD varchar(2), NAME varchar(100), GNISIS varchar(8)) REPLICATED;
Creating Tables
CREATE TABLE roads (
gid integer NOT NULL,
statefp character varying(2),
countyfp character varying(3),
linearid character varying(22),
fullname character varying(100),
rttyp character varying(1),
mtfcc character varying(5),
the_geom geometry)
PARTITIONING KEY gid ON ALL;
Query Optimization
• Cost Based Optimizer–Takes into account Row Shipping
(expensive)• Looks for joins with replicated tables
–Can be done locally–Looks for joins between tables on
partitioned columns
Two Phase Aggregation
• SUM
–SUM(stat1)
–SUM2(SUM(stat1)
• AVG
–SUM(stat1) / COUNT(stat1)
–SUM2 (SUM(stat1)) / SUM2 (COUNT(stat1))
Query 1
SELECT sum(st_length_spheroid(the_geom,
'SPHEROID["GRS_1980",6378137,298.257222101]'))/1609.344
as interstate_miles
FROM roads
WHERE rttyp = 'I';
interstate_miles
------------------
84588.5425986619
(1 row)
Query 1 : Results
1 4 8 12 160
20
40
60
80
100
120
LinearActual
Nodes
Tim
e (
seco
nd
s)
Nodes Actual (sec)
1 101.2080566
4 25.6410708
8 14.3321144
12 5.4738612
16 4.8214672
Query 2SELECT s.name as state, c.name as county, a.population, b.road_length,
a.population/b.road_length as person_per_km
FROM (SELECT state_cd, county_cd, sum(population) as population
FROM census_tract
GROUP BY 1, 2) a,
(SELECT statefp, countyfp,
sum(st_length_spheroid(the_geom, 'SPHEROID["GRS_1980",6378137,298.257222101]'))/1000 as road_length
FROM roads
GROUP BY 1, 2) b,
state_codes s, county_codes c
WHERE a.state_cd = b.statefp
AND a.county_cd = b.countyfp
AND a.state_cd = c.state_cd
AND a.county_cd = c.county_cd
AND c.state_cd = s.state_cd
ORDER BY 5 DESC
LIMIT 20;
state | county | population | road_length | person_per_km
----------------------+-----------------+------------+------------------+------------------
New York | New York | 1537195 | 1465.35561969273 | 1049.02521909483
New York | Kings | 2465326 | 2785.37685011507 | 885.096032839562
New York | Bronx | 1332650 | 1638.47925579201 | 813.345665066614
New York | Queens | 2229379 | 4343.78066667893 | 513.234707521383
New Jersey | Hudson | 608975 | 1474.86512729116 | 412.902162191933
California | San Francisco | 776733 | 2125.05706617179 | 365.51159607175
Pennsylvania | Philadelphia | 1517550 | 5067.19918355051 | 299.484970894054
District of Columbia | Washington | 572059 | 2191.33029860109 | 261.055579054054
New York | Richmond | 443728 | 1758.77468237864 | 252.293829588156
Massachusetts | Suffolk | 689807 | 2805.37242915611 | 245.887851762877
New Jersey | Essex | 793633 | 3359.22581976629 | 236.254733257324
Virginia | Alexandria City | 128283 | 577.98117468444 | 221.950135434841
Puerto Rico | San Juan | 434374 | 1994.26820504899 | 217.811224638829
Virginia | Arlington | 189453 | 967.505165121908 | 195.816008874876
New Jersey | Union | 522541 | 2827.74655887522 | 184.790605919029
Maryland | Baltimore City | 651154 | 3707.01218958787 | 175.654669231717
Puerto Rico | Catano | 30071 | 174.765650431886 | 172.064704509654
Hawaii | Honolulu | 876156 | 5098.8482067881 | 171.834101441493
Puerto Rico | Toa Baja | 94085 | 558.532996996738 | 168.450208861249
Puerto Rico | Carolina | 186076 | 1122.20560229076 | 165.812752690026
(20 rows)
Query 2 : Results
1 4 8 12 160
500
1000
1500
2000
2500
3000
3500
4000
4500
LinearActual
Nodes
Tim
e (
seco
nd
s)
Nodes Actual (sec)
1 3983.1002548
4 1007.1235182
8 563.6259202
12 365.152858
16 282.7345952
Scalability
Limitations
•• SQL Support–Uses its own parser and optimizer
so:• No Window Functions• No Stored Procedures• No Full Text Search
Transaction Performance
• Single row Insert, Update, or Delete are slow compared to a single PostgreSQL instance
–The data must make an additional network trip to be committed
–All partitioned rows must be hashed to be mapped to the proper node
–All replicated rows must be committed to all nodes
• Use “gs-loader” for bulk loading for better performance
High Availability
• No heartbeat or fail-over control in the coordinator
– High Availability for each PostgreSQL node must be configured separately
– Streaming replication can be ideal for this
• Getting a consistent backup of the entire Stado database is difficult
– Must ensure there are no transaction are occurring
– Backup each node separately
Adding Nodes
• Requires Downtime
–Data must be manually reloaded to partition the data to the new node
• With planning, the process can be fast with no mapping of data
–Run multiple PostgreSQL instances on each physical server and move the PostgreSQL instances to new hardware as needed
Summary
• Stado can improve performance tremendously of queries
• Stado can scale linearly as more nodes are added
• Stado is open source so if the limitations are an issue, submit a patch
Download Stado at:http://stado.us
Jim MlodgenskiEmail: [email protected]: @jim_mlodgenski
NYC PostgreSQL User Grouphttp://nycpug.org