managing terabytes
DESCRIPTION
TRANSCRIPT
Some Conference
PgConf.EU 2011 1
Managing TerabytesProblems and solutions with
operating large Postgres installations
Selena DeckelmannPrime Radiant@selenamarie
Some Conference
PgConf.EU 2011
About me.
2
Some Conference
PgConf.EU 2011
• 1.6 TB, 1 cluster, Version 8.2
• 1.1 TB, 1 cluster, Version 8.3
• 8.4/9.0 Dev systems
• Working toward 9.0 into prod (May 2011)
• pgpool, Redis, RabbitMQ, NFS
The Environment
3
Some Conference
PgConf.EU 2011
• daily peak: ~3000 commits per second
• average writes: 4 MBps
• average reads: 8 MBps
Some stats
4
Some Conference
PgConf.EU 2011
What’s good
• Most queries are fast!
• Benchmarks say we’re pushing the limits of the hardware
• Developers love working with Postgres
Some Conference
PgConf.EU 2011
And lots more. But...
Some Conference
PgConf.EU 2011
Some Conference
PgConf.EU 2011
The Problems
1. System resource exhaustion
2. Everything is slow: Huge catalogs, Backups
3. Handling VACUUM problems: Bloat, Transaction wraparound
4. Upgrades: Minor, Major
Some Conference
PgConf.EU 2011
System Resource Exhaustion
Some Conference
PgConf.EU 2011
Problem: UFS on Solaris
“The only way to add more inodes to a UFS filesystem is: 1. destroy the filesystem and create a new filesystem with a higher inode density 2. enlarge the filesystem - growfs man page”
Running out of inodes
10
Some Conference
PgConf.EU 2011
Solution 0: Delete files.
Solution 1: Sharding/bigger filesystem
Solution 2: xfs
Running out of inodes
11
Some Conference
PgConf.EU 2011
Problem: Too many open files by the database.selena@lulu:~ #508 18:43 :) sudo lsof -p 19121 | wc
40 355 4151
Solution: You need a connection pooler.
Running out of file descriptors
12
Some Conference
PgConf.EU 2011
Solution: You need a connection pooler.
Recommended: pgbouncer (threaded, online upgrade)pgpool-II (failover)
Running out of file descriptors
13
Some Conference
PgConf.EU 2011
Everything is slow.
Some Conference
PgConf.EU 2011
409,994 tables
Huge Catalogs
15
Some Conference
PgConf.EU 2011
Minor mistake in parent table definitions:
not null default nextval('important_sequence'::text)
vs
not null default nextval('important_sequence'::regclass)
16
Maintenance problem
Some Conference
PgConf.EU 2011
Problem: Slow scans of catalog data
Solution: Upgrade to Postgres 8.4 or higher
But really: Avoid making a cluster with >400k tables.
Huge Catalogs
17
Some Conference
PgConf.EU 2011 18
9,019,868 total data points for table stats
4,550,770 total data points for index stats
Problem: This is slow to write.(128 MB written every second or so)
Stats collection
Some Conference
PgConf.EU 2011 19
9,019,868 total data points for table stats
4,550,770 total data points for index stats
Soution: Move stats file to RAM.
stats_temp_directory (8.4 or higher)There’s a trivial patch for earlier versions.
Stats collection
Some Conference
PgConf.EU 2011 20
9,019,868 total data points for table stats
4,550,770 total data points for index stats
Problem: This is slow to read.
Stats collection
Some Conference
PgConf.EU 2011 21
9,019,868 total data points for table stats
4,550,770 total data points for index stats
Solution: Supposedly, this is better in 8.4 and higher.(fewer writes per minute)Still probably not fast.
Stats collection
Some Conference
PgConf.EU 2011
pg_dump takes longer and longer...
Backups
22
Some Conference
PgConf.EU 2011
backup | duration -------------------+-------------------- 20091122 | 02:44:36.821475 20091123 | 02:46:20.003507 20091124 | 02:47:06.260705 20091206 | 07:13:04.174964 20091213 | 05:00:01.082676 20091220 | 06:24:49.433043 20091227 | 05:35:20.551477 20100103 | 07:36:49.651492 20100110 | 05:55:02.396163 20100117 | 07:32:33.277559 20100124 | 06:22:46.522319 20100131 | 10:48:13.060888 20100207 | 21:21:47.77618 20100214 | 14:32:04.638267 20100221 | 11:34:42.353244 20100228 | 11:13:02.102345
23
Some Conference
PgConf.EU 2011
Problem: pg_dump is too slow.
Solutions:
• patching pg_dump for SELECT ... LIMIT
• crank down shared_buffers
• Stop using pg_dump for backups
• 64-bit might help
Backups
24
Some Conference
PgConf.EU 2011
How not to migrate to a 64-bit system
25
Some Conference
PgConf.EU 2011
Install 32-bit Postgres and libraries on a 64-bit system. Install 64-bit Postgres/libs of the same version. Copy “hot backup” from 32-bit sys over to 64-bit sys. Run pg_dump from 64-bit version on 32-bit Postgres.
Title Text
26
Some Conference
PgConf.EU 2011
But lots of people use them that way!
A single warm standby is not a backup.
27
Some Conference
PgConf.EU 2011
Ship WAL from Solaris x86 -> Linux
It did work!
28
Some Conference
PgConf.EU 2011
Handling VACUUM problems
Some Conference
PgConf.EU 2011
Problem: Lots of dead tuples in tables.
• Frequent UPDATEs to long tables of log data
• Frequent DELETEs without a VACUUM
• A terabyte of dead tuples
Bloat
30
Some Conference
PgConf.EU 2011
Solution: Write custom scripts to clean
• VACUUM for small things
• CLUSTER for everything else
• Considered TRUNCATE
Fixing bloat
31
Some Conference
PgConf.EU 2011 32
Application allowed users to initiate ALTER TABLE.
Regular VACUUM couldn’t fix it.
VACUUM FULL of the catalog takes 2+ hours.
Use of NOTIFY/LISTEN can also cause bloat.
Catalog Bloat
Some Conference
PgConf.EU 2011
Problem: autovacuum set off too frequently
Watch age(datfrozenxid)
Solution: Increase autovacuum_freeze_max_age (default is 200 million, we increase to one billion)
Transaction wraparound avoidance
33
Some Conference
PgConf.EU 2011
Upgrades
Some Conference
PgConf.EU 2011
Problem: Restarting Postgres causes bad application performance.
• Require a start/stop of database
• Unexpected CHECKPOINT
• Cold cache
Minor upgrades
35
Some Conference
PgConf.EU 2011
Solutions:
• Plan for a CHECKPOINT before shutdown
• Warm the cache (Queries that exercise indexes, maybe table scans)
Minor upgrades
36
Some Conference
PgConf.EU 2011
Problem: Major upgrades are a PITA.
• <8.2 - no pg_upgrade :(
• Time your restores.
• Document your SLAs.
Major Version upgrades
37
Some Conference
PgConf.EU 2011
Solutions: :(
• >=8.3 - pg_upgrade
• Time your restores.
• Document your SLAs.
Major Version upgrades
38
Some Conference
PgConf.EU 2011
Solutions: :(
• Write tools to migrate data
• Shard
• Trigger-based replication
Major Version upgrades
39
Some Conference
PgConf.EU 2011
The Problems
1. System resource exhaustion
2. Everything is slow: Huge catalogs, Backups
3. Handling VACUUM problems: Bloat, Transaction wraparound
4. Upgrades: Minor, Major
Some Conference
PgConf.EU 2011
The Solutions
1. System resource exhaustionChoose a better filesystem, Pooling
2. Everything is slow: Huge catalogs, BackupsDon’t do that, Monitor & Binary backups
Some Conference
PgConf.EU 2011
The Solutions
3. Handling VACUUM problems: Bloat, Transaction wraparoundDeveloper education, Monitoring, Cleanup, *_max_freeze_age
4. Upgrades: Minor, MajorPlan, Plan, Plan (CHECKPOINT, warm cache, pg_upgrade)
Some Conference
PgConf.EU 2011 43
Managing TerabytesProblems and solutions with
operating large Postgres installations
Selena DeckelmannPrime Radiant@selenamarie
Thanks!