pg columnstore index · 2020. 8. 20. · accelerating postgresql query performance columnar vs. row...
Post on 04-Oct-2020
3 Views
Preview:
TRANSCRIPT
PG Columnstore Index(& why you should care)
August 2020
1
© 2020 Swarm64, Inc.
The PostgreSQL high performance innovators
Developers of Swarm64 Data Accelerator for PostgreSQL
Deep PostgreSQL & hardware-level engineering expertise
Berlin ■ Boston ■ Palo Alto
© 2020 Swarm64, Inc.
Agenda
Accelerating PostgreSQL query performance
Columnar vs. Row oriented storage
Columnstore indexing
Q&A0
100
200
300
400
500
600
700
800
900
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
TPC-H Query Response Times(seconds, lower is better, timeout at 900s)
Postgres 12 Postgres 12 + Swarm64 DA
© 2020 Swarm64, Inc.
Faster querying = do more with PostgreSQL
Migrating SQL Server and Oracle to PostgreSQLo Especially if using SQL Server columnstore index or Oracle in-
memory column store
Mixed workloads, large-scale reporting & chartingo High concurrency, query complexity, data volume
Open source data warehousing – cut DWH costs 90%o Alternative to Oracle, SQL Server, Netezza, Redshift
Storage in RDBMS(Well, in the wider sense)
© 2020 Swarm64, Inc.
Database storage typology: are you OLTP or OLAP?
ColumnstoreTypical for OLAP
RowstoreTypical for OLTP
© 2020 Swarm64, Inc.
A row-storage table meets a columnar index at a database.
Index(column storage)
Table(row storage)
Perfectly indexed partRead columnar
Not-yet-indexed rowsRead row-wise (if needed)
Columnstore Index(hybrid)
© 2020 Swarm64, Inc.
Why care about an index?(because it's minimal change, great performance)
Keep your data in its native format
Compression on columns, faster data reads
Adds a decoupling mechanism, keeps single source of truth
© 2020 Swarm64, Inc.
What are the downsides?(bit more overhead, here and there)
Writing/reading might be more expensive
I/O advantage reduces the more columns are selected
Index has to be maintained, can be somewhat out of sync
© 2020 Swarm64, Inc.
Wait, doesn't PostgreSQL have this already?(it depends)
ZedstoreIntroduces different table format
PG advantage of row-based storage gone?
Fujitsu VCISeems only available for Fujitsu Enterprise PostgreSQL
Hybrid implementation, but targets in-memory
The big picture,as of now
© 2020 Swarm64, Inc.
Swarm64 DA Columnstore Index
© 2020 Swarm64, Inc.
The Swarm64 DA Columnstore Index: benefits.(compared to Swarm64 DA FDW-based acceleration)
Direct I/O: no page cache, more RAM for operations
Make use of WAL replication (again)
Backup & restore just work
autovacuum all the way
© 2020 Swarm64, Inc.
Accelerated Postgres for mixed workloads
Better query planning
Columnstore indexing
Faster query execution
~20x faster responses to complex queries
4x more simultaneous databases, users per server
100% drop-in to existing Postgres databases
Swarm64 Data Accelerator (DA) 5.0
© 2020 Swarm64, Inc.
Swarm64 DA 5.0: boosting the Postgres engine
• Compressed columnstore indexes• Smart skipping of irrelevant data• 10x-100x lower I/O load
Query execution
Data access& optimization
Queryplanning
• Query rewriting for speed, resource efficiency, & parallelism
• Optimized cost functions for more efficient OLAP & HTAP• Adaptive resource management system• Automatic up-to-date statistics
• Faster JOINs• More parallelism & better workload distribution• Faster data movement• Full ACID consistency
Interface:100%
Postgres
© 2020 Swarm64, Inc.
Anatomy of a query – faster & more efficient resource utilization
Direct access into compressed column indexesSmart skipping of irrelevant data sections
3x faster JOINs2x lower RAM consumption
Adding a “shuffle node” between JOINsand faster data movement
Keep query execution parallelfor as long as possible
Scan & filter
Merge (Aggregate,Distinct …)
JOIN 2
SORT & aggregate
JOIN 1
Minutes
Seconds
More parallel threads (scales as you add vCores)
© 2020 Swarm64, Inc.
Example: TPC-H Query №6
© 2020 Swarm64, Inc.
l_discount, l_extendedprice, l_quantity, l_shipdate
Detail: the inner workings of accelerated TPC-H query 6
l_shipdate >= date '1993-01-01'AND l_shipdate < date '1993-01-01' + INTERVAL '1' YEARAND l_discount BETWEEN 0.05 – 0.01 AND 0.05 + 0.01AND l_quantity < 24
SUM(l_extendedprice * l_discount)
1 2 3 4 5 6
Show & Tell
© 2020 Swarm64, Inc.
Example: TPC-H schema + query №14
© 2020 Swarm64, Inc.
Native PostgreSQL
Query Time: 15.60s
Scan Time: 13.62s
© 2020 Swarm64, Inc.
CREATE EXTENSION swarm64da
Query Time: 7.34s
Scan Time: 6.02s
© 2020 Swarm64, Inc.
© 2020 Swarm64, Inc.
PG + Swarm64 DA extension + columnstore index applied
Query Time: 4.81s
Scan Time: 3.25s
© 2020 Swarm64, Inc.
20x faster TPC-H
PG 12.3 as basis, SF1000
Swarm64 finishes all 22 queries in 80 secs or less
Commodity 2U server 144 vCores, SSD array
0
100
200
300
400
500
600
700
800
900
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
TPC-H Query Response Times(seconds, lower is better, timeout at 900s)
Postgres 12 Postgres 12 + Swarm64 DA
A complete benchmark
© 2020 Swarm64, Inc.
Increased resource efficiency for mixed workloads
0%
20%
40%
60%
80%
100%
CPU Cycles RAM I/O Concurrent Users or Queries
Postgres 12 Postgres 12 with Swarm64 DA 5.0
Better mixed workload densityMore queries per hour/concurrent users on the same hardware
4xIncrease
Getting Started with Swarm64
© 2020 Swarm64, Inc.
Pricing & availability
Price
Swarm64 DA $33 / vCore / month
PostgreSQL compatibility PostgreSQL 11 & upEnterpriseDB EPAS
Platforms Linux – on premises or cloud (any)
© 2020 Swarm64, Inc.
Proven acceleration timeline
Proof of concept Design/Plan Build Deploy
• Project plan & timeline
• Little-to-no code or SQL changes
• 2 weeks
• Show and prove performance gains
• Validate system requirements & costs
• Provision system
• Migrate data
• Testing
• Run anywhere
• Compatible with entire Postgres tools ecosystem
• Scale elastically
• Apply new acceleration upgrades over time
© 2020 Swarm64, Inc.
Try it for free...
Works with free PostgreSQL (v. 11 +), EDB Postgres
Run it in your data center
Run it on the cloud • Start a Swarm64-accelerated PG instance on AWS in 5 minutes• Runs on all the other clouds too
Thank you!
andy@swarm64.comsebastian@swarm64.com
top related