bigger data with postgresql 9
DESCRIPTION
Companies today collect more and more data. And those companies also want to ask more and more questions to those datawarehouses. The time that a datawarehouse only needed to be tuned for read queries, and run the etl / elt once a night is over. This brought us some new challenges. With databases sizing to over 500GB, doing hundreds, sometimes even thousands of inserts/updates/deletes every second, and running select queries on the database who's motto is: the more tables we join, the more fun, it brought us some new challenged on how to configure and tune those databases (and servers). Not only dba skills were needed, but good system engineering skills were used too, to get the db running smooth under the heavy workload. We discovered that we needed more than one server. luckily PostreSQL 9 now provides us with streaming replication too. In this talk I will discuss how we took on all challenges, how we setup up our backup / replication strategy, and that all with as little effort as possible by using the right tools for the job.TRANSCRIPT
Slide 1
© by Numius nvOpen systems, Smarter people
Bigger data with PostgreSQL 9
Datawarehousing in the 21st century.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Slide 2
© by Numius nvOpen systems, Smarter people
The presenter..
• Bert Desmet
• Consultant @ Deloitte
• System Engineer / DBA for deloitteanalytics.eu
• 'devop'?
Slide 3
© by Numius nvOpen systems, Smarter people
agenda
• Introduction
• Release the elephants!
• Impacting factors
• Divide et impera
• Basic configuration
• Passing the speed limits
• Keep your database fit
Slide 4
© by Numius nvOpen systems, Smarter people
Big data?
● 44x data growth per year!
● About 35.2 zettabyte by 2020
● 80% of data is unstructured
● The volume will grow by a whopping 650% in the next 5years
● 80% of organisations will use cloud analytics
● By 2014 80% of eneterprises will want a saas based bi system
Slide 5
© by Numius nvOpen systems, Smarter people
Know your limits
● DB2
● More load
● Scaling
● Speed
● Data size
● Pricing
Slide 6
© by Numius nvOpen systems, Smarter people6 Footer
Release the elephants!
Slide 7
© by Numius nvOpen systems, Smarter people
PostgreSQL 9
● Good for big databases
● Easy maintenance
● Scales!
● Very fast
● Extendable
Impacting factors
Slide 9
© by Numius nvOpen systems, Smarter people
Higly impacting operations
• Dataload
• In bulk (ETL)
• Row by row. Up to 100k rows / minute
• Datafetch (Reporting)
• We do like joins. The more the better.
Slide 10
© by Numius nvOpen systems, Smarter people
Extra problems
• a lot of I/O
• A lot of cpu power (index creation)
• A lot of locks
Slide 11
© by Numius nvOpen systems, Smarter people
The solution?
• Use at least 2 servers
• Set up binary replication
• Put a lot of ram in your servers.
Slide 12
© by Numius nvOpen systems, Smarter people
Dataflow
Slide 13
© by Numius nvOpen systems, Smarter people13 Footer
Devide et Impera
Slide 14
© by Numius nvOpen systems, Smarter people
Replication with postgres
• 8.3 Warm Standby
• 9.0 Async. Binary Replication
• 9.1 Synchronous Replication
• 9.2 Cascading Replication
• 9.3 more improvents towards fail overs / switching masters
• 9.4 Multimaster Binary Replication?
Slide 15
© by Numius nvOpen systems, Smarter people
Configure replication
• Wal_level = ‘host standby’
• Checkpoint_segments >= 32
• Checkpoint_completetion_target >= 0.8
• Hot_standby = on
• Hot_standby_feedback = on
Slide 16
© by Numius nvOpen systems, Smarter people
Slide 17
© by Numius nvOpen systems, Smarter people
Keep it simple, stupid
• 2nd quadrant is pretty awesome
• Barman for backups
• Repmgr for replication management
Slide 18
© by Numius nvOpen systems, Smarter people
Basic configuration
Slide 19
© by Numius nvOpen systems, Smarter people
Raise those memory limits!
• shared_buffers = 1/8 to ¼ of RAM
• work_mem = 128MB to 1GB
• maintenance_work_mem = 512MB to 1GB
• temp_buffers = 128MB to 1GB
• effective_cache_size = ¾ of RAM
• wal_buffers = 32MB
Slide 20
© by Numius nvOpen systems, Smarter people
Tune the planner for correct planning
• Random_page_cost = 3
• Cpu_tuple_cost = 0.1
• Contraint_exclusion=on
• From_collapse_limit => 12
• Join_collapse_limit => 12
Slide 21
© by Numius nvOpen systems, Smarter people
Passing the speed limits
Slide 22
© by Numius nvOpen systems, Smarter people
Use partitions
• Think about the partition key!
• Trigger based for row / row inserts
• Rule based for bulk inserts
• Make sure you add constraints
Slide 23
© by Numius nvOpen systems, Smarter people
Use indexes
• Learn to read query explains
• Use http://explain.depesz.com/
• Don’t over index
Slide 24
© by Numius nvOpen systems, Smarter people
Other sane things to do
• Use unique indexes
• Auto created when defining a primary key
• Use clustered indexes
• And cluster those tables regularly
Slide 25
© by Numius nvOpen systems, Smarter people
Use partial indexes
• Can only be found in Postgres and Mysql.
• Really usefull on big tables
• Disadvantage: no ‘moving’ indexes. Eg: index for current_day.
Keep your database fit
Slide 27
© by Numius nvOpen systems, Smarter people
Vacuum
• Disable autovacuum for datawarehouses
• Vacuum once a day
• Check regulary if the vacuums to run!
• Prevents data loss
• Prevents the database to go out of control, size wise
Slide 28
© by Numius nvOpen systems, Smarter people
Analyze
• Analyze once a day
• Together with vacuum
• Vacuum analyze <schema>.<table>;
• ‘default_statistics_target’ >= 300
Slide 29
© by Numius nvOpen systems, Smarter people
Check for bloat!
• Free space on tables.
• Indexes are not optimized anymore
• use nagios check_postgres.pl
Slide 30
© by Numius nvOpen systems, Smarter people
Prevent bloat
• Vacuum full • Offline!
• Only when a pk is not available
• Repack• Online!
• Orders the tables (clustered index)
• Needs a pk on the table
• Reindex
• Reindex regulary.
Slide 31
© by Numius nvOpen systems, Smarter people
Partial indexes?
• Write a script
• Use a cronjob
• Recreate your time-aware indexes every day. Will be fast.
Slide 32
© by Numius nvOpen systems, Smarter people
Slide 33
© by Numius nvOpen systems, Smarter people
Questions?
• Postgres has an awesome community ®
• Irc: #postgresql @ freenode
• Check the mailing list
Slide 34
© by Numius nvOpen systems, Smarter people