basic query tuning primer

Basic Query Tuning Primer

Pg West 2009

2009-10-17

Basic tool kit for query tuning

For Diagnosis:EXPLAIN and EXPLAIN ANALYZE

Query pg_class

Query pg_stats

SET enable_[nestloop|hashagg|...] = [on|off]

SET [random_page|...]_cost = [n]

RESET all

Postgres server logsMay need to adjust log_* GUCS such as: log_temp_files, log_lock_waits, log_min_duration_statement, log_checkpoints, log_statement_stats, etc.

auto_explain (new contrib module in Pg 8.4)

Occasionally helpful: gdb, dtrace, sar, iostat -x, etc.

For Remedy:ANALYZE [table]

ALTER TABLE [table] ALTER COLUMN [column] SET STATISTICS [n]

SET default_statistics_target = [n], work_mem = [n]

CREATE INDEX

Reorganize the SQL (e.g. filter large tables directly, avoid joins, refactor to temp tables or WITH clause, etc.)

Rarely appropriate: SET [gucs], table-level autovac storage-params, denormalize columns, partition large table

System tuning is not query tuning

Relevance: System tuning affects all queries, but it optimizes for an aggregate workload, not an individual query.

The overall performance of a system is the product of many factors, including but not limited to:Hardware platform: cpu speed/count, memory speed/amount, disk type/count, raid layout, controller model

System settings: kernel cache page size, tcp buffer size, dirty page flush policy, io scheduler, etc.

Filesystem settings: type, readahead, direct/buffered io, extent size, journaling policy, internal/external journal, atime/mtime maint., existence of snapshots, etc.

Postgres settings: version, GUCS (default_statistics_target, shared_buffers, temp_buffers, work_mem, wal_buffers, *_cost, enable_*, effective_cache_size, fsync), WAL on separate disks, etc.

Total workload: cumulative effect of all tasks running at a particular point in time, all of which compete for resources (e.g. cpu, memory, cache, disk I/O, locks).

Accumulated state: contents of caches (Pg, kernel, controller), physical file fragmentation on disk, optimizer statistics (specific to cluster; not portable via pg_dump)

What is an execution plan?

To see a query's execution plan, use the EXPLAIN command before the query.

To also run the query and capture how much time was spent in each step of the plan, use EXPLAIN ANALYZE.

Original SQL:pg841=# select count(*) from orders where created_ts >= '2009-09-01'::timestamp ;count -------65472(1 row)

Execution plan with estimated costs and row-counts:pg841=# explain select count(*) from orders where created_ts >= '2009-09-01'::timestamp ;QUERY PLAN ------------------------------------------------------------------------------------Aggregate (cost=2100.85..2100.86 rows=1 width=0)-> Seq Scan on orders (cost=0.00..1937.00 rows=65538 width=0)Filter: (created_ts >= '2009-09-01 00:00:00'::timestamp without time zone)(3 rows)

Execution plan with actual runtimes and row-counts:pg841=# explain analyze select count(*) from orders where created_ts >= '2009-09-01'::timestamp ;QUERY PLAN -------------------------------------------------------------------------------------------------------------------Aggregate (cost=2100.85..2100.86 rows=1 width=0) (actual time=251.210..251.212 rows=1 loops=1)-> Seq Scan on orders (cost=0.00..1937.00 rows=65538 width=0) (actual time=0.018..140.495 rows=65472 loops=1)Filter: (created_ts >= '2009-09-01 00:00:00'::timestamp without time zone)Total runtime: 251.269 ms(4 rows)

Each step in the execution plan is a node in a tree hierarchy. The leftmost (top) node pulls data from its children, which do the same for their children.

Nodes with no children start working first, since they gather the data from tables or indexes to be processed by the rest of the plan nodes.

RunOrder QUERY PLAN ------- ---------------------------------------------------------------------------------------------------------------4th HashAggregate (cost=1119.28..1120.75 rows=98 width=12)3rd -> Nested Loop (cost=0.00..1118.79 rows=98 width=12)1st -> Index Scan using idx_order_details_product_id on order_details (cost=0.00..397.99 rows=98 width=8)Index Cond: (product_id = 1000)2nd -> Index Scan using orders_pkey on orders (cost=0.00..7.34 rows=1 width=12)Index Cond: (orders.order_id = order_details.order_id)Filter: (orders.created_ts >= '2009-09-01 00:00:00'::timestamp without time zone)(7 rows)

Reading an execution plan:
In what order do the nodes run?

pg841=# set enable_indexscan = off ;SET

pg841=# explainselectorders.created_ts::date as day,sum(order_details.qty) as num_products_orderedfromordersinner join order_detailsusing (order_id)whereorders.created_ts >= '2009-09-01'::timestampand order_details.product_id = 1000group byorders.created_ts::date;QUERY PLAN -----------------------------------------------------------------------------------------------------------------HashAggregate (cost=1119.28..1120.75 rows=98 width=12)-> Nested Loop (cost=0.00..1118.79 rows=98 width=12)-> Index Scan using idx_order_details_product_id on order_details (cost=0.00..397.99 rows=98 width=8)Index Cond: (product_id = 1000)-> Index Scan using orders_pkey on orders (cost=0.00..7.34 rows=1 width=12)Index Cond: (orders.order_id = order_details.order_id)Filter: (orders.created_ts >= '2009-09-01 00:00:00'::timestamp without time zone)(7 rows)

The two numbers for cost and actual time represent when the 1st and last rows will be output by that node.

Different kinds of nodes have different amounts of lag between when it receives its first input and when it produces its first output. This startup cost is implied by the difference in the first cost value of the node vs. its slowest-starting child.

Here, the Hash node has high startup cost it cannot feed data to its parent (Hash Join) until it receives and processes all of the data from its child Seq Scan node. So the Seq Scan node's final cost (1937.00) becomes the Hash node's initial cost (1937.00). In practice, the child's actual completion time (151.703) is a lower bound for the Hash node's first-output time (288.962).

QUERY PLAN -------------------------------------------------------------------------------------------------------------------------Hash Join (cost=3013.22..45974.93 rows=651270 width=12) (actual time=289.152..6297.282 rows=654720 loops=1)Hash Cond: (order_details.order_id = orders.order_id)-> Seq Scan on order_details (cost=0.00..14902.00 rows=1000000 width=12) (actual time=0.037..2026.802 rows=1000000 loops=1)-> Hash (cost=1937.00..1937.00 rows=65538 width=8) (actual time=288.962..288.962 rows=65472 loops=1)-> Seq Scan on orders (cost=0.00..1937.00 rows=65538 width=8) (actual time=0.014..151.703 rows=65472 loops=1)Filter: (created_ts >= '2009-09-01 00:00:00'::timestamp without time zone)Total runtime: 7349.662 ms(7 rows)

What do the numbers mean for "cost" and "actual time"?

pg841=# set enable_indexscan = off ;SET

pg841=# explain analyzeselectorders.cust_id,order_details.product_id,order_details.qtyfromordersinner join order_detailsusing (order_id)whereorders.created_ts >= '2009-09-01'::timestamp;QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------Hash Join (cost=3013.22..45974.93 rows=651270 width=12) (actual time=289.152..6297.282 rows=654720 loops=1)Hash Cond: (order_details.order_id = orders.order_id)-> Seq Scan on order_details (cost=0.00..14902.00 rows=1000000 width=12) (actual time=0.037..2026.802 rows=1000000 loops=1)-> Hash (cost=1937.00..1937.00 rows=65538 width=8) (actual time=288.962..288.962 rows=65472 loops=1)-> Seq Scan on orders (cost=0.00..1937.00 rows=65538 width=8) (actual time=0.014..151.703 rows=65472 loops=1)Filter: (created_ts >= '2009-09-01 00:00:00'::timestamp without time zone)Total runtime: 7349.662 ms(7 rows)

Access methods are used by leaf nodes to pull data from tables/indexes.Join methods specify which type of algorithm will be used to implement each of the query's joins.

QUERY PLAN ------------------------------------------------------------------------------------------------------Sort (cost=248.54..248.79 rows=99 width=8)Sort Key: o.order_id-> Nested Loop (cost=4.34..245.26 rows=99 width=8)-> Nested Loop (cost=4.34..234.94 rows=10 width=4)-> Seq Scan on customers c (cost=0.00..194.00 rows=1 width=4)Filter: (name = 'JOHN DOE'::text)-> Bitmap Heap Scan on orders o (cost=4.34..40.82 rows=10 width=8)Recheck Cond: (o.cust_id = c.cust_id)-> Bitmap Index Scan on idx_orders_cust_id (cost=0.00..4.34 rows=10 width=0)Index Cond: (o.cust_id = c.cust_id)-> Index Scan using order_details_pk on order_details od (cost=0.00..0.91 rows=10 width=8)Index Cond: (od.order_id = o.order_id)(12 rows)

QUERY PLAN ----------------------------------------------------------------------------------------------------------------------Aggregate (cost=18850.37..18850.38 rows=1 width=8)-> Hash Join (cost=53.18..18801.97 rows=9679 width=8)Hash Cond: (od.order_id = o.order_id)-> Seq Scan on order_details od (cost=0.00..14902.00 rows=1000000 width=8)-> Hash (cost=41.00..41.00 rows=974 width=4)-> Index Scan using pidx_orders_order_id_not_shipped on orders o (cost=0.00..41.00 rows=974 width=4)Filter: is_paid(7 rows)

What are access methods and join methods?

pg841=# explain selecto.order_id as order_id,od.qty as qty_orderedfromcustomers cinner join orders oon o.cust_id = c.cust_idinner join order_details odon od.order_id = o.order_idwherec.name = 'JOHN DOE'order byo.order_id;

pg841=# explainselectcount(distinct o.order_id) as num_orders,sum(od.qty) as num_productsfromorders oinner join order_details odon od.order_id = o.order_idwhereo.shipped_ts is nulland o.is_paid;

Create and ANALYZE a small toy table.

pg841=# create temp table my_tab1 as select * from orders limit 100 ;SELECTpg841=# analyze my_tab1 ;ANALYZE

EXPLAIN a simple query against it.

pg841=# explain select * from my_tab1 where created_ts > '2009-10-14 08:14:26'::timestamp - '1 day'::interval ;QUERY PLAN ------------------------------------------------------------------------------------Seq Scan on my_tab1 (cost=0.00..2.25 rows=17 width=25)Filter: (created_ts > '2009-10-13 08:14:26'::timestamp without time zone)(2 rows)

The planner was able to combine the filter's two literals. The table is accessed by SeqScan, since there are no indexes yet on this temp table. Add an index, force the planner not to SeqScan, and compare the same query's cost estimates for IndexScan versus SeqScan.

pg841=# create index idx_my_tab1_test on my_tab1 ( created_ts ) ;CREATE INDEXpg841=# set enable_seqscan = off ;SETpg841=# explain select * from my_tab1 where created_ts > '2009-10-14 08:14:26'::timestamp - '1 day'::interval ;QUERY PLAN ----------------------------------------------------------------------------------------Index Scan using idx_my_tab1_test on my_tab1 (cost=0.00..8.55 rows=17 width=25)Index Cond: (created_ts > '2009-10-13 08:14:26'::timestamp without time zone)(2 rows)pg841=# reset all ;RESET

The optimizer prefers (assigns a lower cost) to SeqScan this table because it is so tiny (1 block). An IndexScan would require reading a 2nd block (the index itself) and doing extra comparisons.

Play with small toy queries.

Notes

Run your query with EXPLAIN (or if practical, EXPLAIN ANALYZE), and look for nodes where the cost, row-count, or actual_time significantly increases compared to its children.

In this example, the SQL is missing its join criteria. The estimated cost and row-count skyrocket in the Nested Loop node, because it is returning the cross-product of all rows from both its input nodes.

pg841=# explainpg841-# selectpg841-# customers.cust_id as customer_id,pg841-# max(customers.name) as customer_name,pg841-# count(distinct orders.order_id) as num_orders,pg841-# max(orders.shipped_ts) as latest_shipment_datetimepg841-# from orders, customers, products /* THIS JOIN TO products IS SPURIOUS */pg841-# wherepg841-# orders.cust_id = customers.cust_idpg841-# and orders.created_ts >= now() - '30 days'::intervalpg841-# group bypg841-# customers.cust_idpg841-# order by num_orders descpg841-# limit 10pg841-# ;QUERY PLAN -----------------------------------------------------------------------------------------------------------------------Limit (cost=15307415.27..15307415.29 rows=10 width=29)-> Sort (cost=15307415.27..15307440.27 rows=10000 width=29)Sort Key: (count(DISTINCT orders.order_id))-> GroupAggregate (cost=185.03..15307199.17 rows=10000 width=29)-> Nested Loop (cost=185.03..10207124.17 rows=509990000 width=29)-> Merge Join (cost=0.03..7139.17 rows=50999 width=29)Merge Cond: (customers.cust_id = orders.cust_id)-> Index Scan using customers_pkey on customers (cost=0.00..318.48 rows=10000 width=17)-> Index Scan using idx_orders_cust_id on orders (cost=0.00..6158.23 rows=50999 width=16)Filter: (orders.created_ts >= (now() - '30 days'::interval))-> Materialize (cost=185.00..285.00 rows=10000 width=0)-> Seq Scan on products (cost=0.00..175.00 rows=10000 width=0)(12 rows)

What does a "bad plan" look like? Does it imply possible tune-ups?

Notes

Rerun the modified query with EXPLAIN to confirm that it looks better. The final cost estimate is much lower, and there are no more huge jumps in cost from one step to the next.

pg841=# explainpg841-# selectpg841-# customers.cust_id as customer_id,pg841-# max(customers.name) as customer_name,pg841-# count(distinct orders.order_id) as num_orders,pg841-# max(orders.shipped_ts) as latest_shipment_datetimepg841-# from orders, customerspg841-# wherepg841-# orders.cust_id = customers.cust_idpg841-# and orders.created_ts >= now() - '30 days'::intervalpg841-# group bypg841-# customers.cust_idpg841-# order by num_orders descpg841-# limit 10pg841-# ;QUERY PLAN ----------------------------------------------------------------------------------------------------Limit (cost=10243.47..10243.50 rows=10 width=29)-> Sort (cost=10243.47..10268.47 rows=10000 width=29)Sort Key: (count(DISTINCT orders.order_id))-> GroupAggregate (cost=9214.91..10027.37 rows=10000 width=29)-> Sort (cost=9214.91..9342.40 rows=50997 width=29)Sort Key: customers.cust_id-> Hash Join (cost=294.00..4005.92 rows=50997 width=29)Hash Cond: (orders.cust_id = customers.cust_id)-> Seq Scan on orders (cost=0.00..2437.00 rows=50997 width=16)Filter: (created_ts >= (now() - '30 days'::interval))-> Hash (cost=169.00..169.00 rows=10000 width=17)-> Seq Scan on customers (cost=0.00..169.00 rows=10000 width=17)(12 rows)

After previewing the new plan with EXPLAIN, run EXPLAIN ANALYZE to confirm runtime is better.

Retest after rewriting the query to remove the spurious join...

Notes

Here's a trivial query joining 2 tables.

pg841=# explain select count(*) from my_tab1 inner join my_tab2 using (key) ;QUERY PLAN ---------------------------------------------------------------------------------------Aggregate (cost=42.26..42.27 rows=1 width=0)-> Nested Loop (cost=0.00..42.01 rows=100 width=0)-> Seq Scan on my_tab1 (cost=0.00..2.00 rows=100 width=4)-> Index Scan using idx_my_tab2 on my_tab2 (cost=0.00..0.39 rows=1 width=4)Index Cond: (my_tab2.key = my_tab1.key)(5 rows)

Notice that the Nested Loop node does not specify any join conditions, even though the SQL does. The Nested Loop is effectively cross-joining its 2 inputs. Why would it do that, when the SQL says to join on column key?

Because the join condition has been pushed down into child #2's index filter. So for each row from child #1 (Seq Scan), the parent (Nested Loop) calls child #2 (Index Scan), passing it info from child #1's row. If this session forbids the use of indexes, the join filter won't be pushed down.

pg841=# set enable_indexscan = off ;SETpg841=# set enable_bitmapscan = off ;SET

pg841=# explain select count(*) from my_tab1 inner join my_tab2 using (key) ;QUERY PLAN ---------------------------------------------------------------------------Aggregate (cost=229.35..229.36 rows=1 width=0)-> Nested Loop (cost=2.10..229.10 rows=100 width=0)Join Filter: (my_tab1.key = my_tab2.key)-> Seq Scan on my_tab1 (cost=0.00..2.00 rows=100 width=4)-> Materialize (cost=2.10..3.10 rows=100 width=4)-> Seq Scan on my_tab2 (cost=0.00..2.00 rows=100 width=4)(6 rows)

pg841=# reset all ;RESET

Join Conditions can sometimes be implemented by a child node.

# Setup:

pg841=# create temp table my_tab1 as select * from generate_series(1, 100) s(key) ;SELECTpg841=# create temp table my_tab2 as select * from generate_series(1, 100) s(key) ;SELECTpg841=# create index idx_my_tab1 on my_tab1 ( key ) ;CREATE INDEXpg841=# create index idx_my_tab2 on my_tab2 ( key ) ;CREATE INDEXpg841=# analyze my_tab1 ;ANALYZEpg841=# analyze my_tab2 ;ANALYZEpg841=# set enable_hashjoin = off ;SETpg841=# set enable_mergejoin = off ;SET

Stale or missing statistics:
When was my table last analyzed?

Review the last time when each table was analyzed (either manually or by autovacuum). Use your knowledge of how and when your data changes to decide if the statistics are likely to be out of date.

pg841=# selectpg841-# schemaname,pg841-# relname,pg841-# last_analyze as last_manual,pg841-# last_autoanalyze as last_auto,pg841-# greatest(last_analyze, last_autoanalyze)pg841-# frompg841-# pg_stat_user_tablespg841-# wherepg841-# relname = 'my_tab'pg841-# ;

-[ RECORD 1 ]------------------------------schemaname | publicrelname | my_tablast_manual | 2009-10-03 18:45:57.627593-07last_auto | 2009-10-03 23:08:32.914092-07greatest | 2009-10-03 23:08:32.914092-07

Based on your knowledge of your application, find tables you think may have changed significantly since the optimizer statistics were last collected.

Autovacuum handles much of this, but it's percentage-based (>10% of rows have changed since last analyzed). Some data distributions need fresh statistics at regular intervals (e.g. an order_date column's upper bound increases daily).

Tip: psql's \x metacommand toggles between column-oriented output and row-oriented output. Handy for ad-hoc queries that return few rows with wide/many columns.

Stale or missing statistics:
How many rows does the optimizer think my table has?
How are my columns' values distributed?

Many bad plans are due to poor cardinality estimates for one or more nodes. Sometimes this is due to stale or missing statistics. For example, if a column was added or a significant percentage of rows were inserted, deleted, or modified, then the optimizer statistics should be refreshed.

You can view the table-level optimizer statistics in pg_class:

pg841=# select reltuples, relpages from pg_class where relname = 'my_tab' ;reltuples | relpages -----------+----------1000 | 5(1 row)

And the more detailed column-level optimizer statistics are shown in pg_stats:

pg841=# select * from pg_stats where tablename = 'my_tab' and attname = 'bar' ;-[ RECORD 1 ]-----+---------------------------schemaname | publictablename | my_tabattname | barnull_frac | 0avg_width | 4n_distinct | 13most_common_vals | {1,2}most_common_freqs | {0.707,0.207}histogram_bounds | {0,3,4,5,6,7,8,9,10,11,12}correlation | 0.876659

Compare actual to estimated rows in table.

Cardinality = total input rows * filter selectivity

Total input rows is the easiest part to check. Filter selectivity is a topic for later.

Easiest way to identify the selectivity of each filter is to run EXPLAIN WHERE one_filter.

Combine filters' selectivity using the basic probability rules for independent conditions:P(a+b) = P(a) * P(b)P(a|b) = P(a) + P(b) - P(a+b)

Optimizer has no notion of joint variation (e.g. city=Seattle and state=WA).

A new index is usually helpful if it greatly reduces the number of rows the query must visit.

This table has an index for all possible combinations of its columns.

pg841=# \d my_tab Table "public.my_tab"Column | Type | Modifiers --------+---------+-----------foo | integer | bar | integer | Indexes:"idx_my_tab_bar" btree (bar)"idx_my_tab_bar_foo" btree (bar, foo)"idx_my_tab_foo" btree (foo)"idx_my_tab_foo_bar" btree (foo, bar)

At least 1 is unnecessary, and up to 2 could be dropped without forcing any query to use a Seq Scan.

pg841=# explain select * from my_tab where foo = 10 and bar = 10 ;QUERY PLAN ---------------------------------------------------------------------------------Index Scan using idx_my_tab_bar_foo on my_tab (cost=0.00..8.27 rows=1 width=8)Index Cond: ((bar = 10) AND (foo = 10))(2 rows)

pg841=# drop index idx_my_tab_bar_foo ;DROP INDEXpg841=# drop index idx_my_tab_foo_bar ;DROP INDEX

pg841=# explain select * from my_tab where foo = 10 and bar = 10 ;QUERY PLAN -----------------------------------------------------------------------------Index Scan using idx_my_tab_foo on my_tab (cost=0.00..8.27 rows=1 width=8)Index Cond: (foo = 10)Filter: (bar = 10)(3 rows)

When will adding a new index improve query performance?

Notes

An incomplete/missing join in the SQL leads to poor performance and usually incorrect query results.When in doubt, compare your foreign key constraints to the query's joined columns.

Complex plans are difficult to troubleshoot and are more susceptible to mis-optimization.A series of 5 short queries is usually easier to correctly maintain and tune than 1 large query with 10 joins.

Filtering a large table primarily by joining it to small filtered tables is much less efficient than directly filtering columms on the large table itself.Consider either:denormalizing the filterable columns onto the large table, or

separately querying the small filterable tables to fetch the join-key values which can then be used in the main query as filters directly applied to the large table.

Column statistics quickly grow stale for columns with an ever-increasing upper bound.Regularly ANALYZE such columns, and consider increasing their SET STATISTICS limit.

The same SQL can use different plans when run in dissimilar sessions or with literal values substituted for bind variables.When diagnosing a slow query, be sure you're looking at the same execution plan as the application used. If your application code sets GUCS in its session or uses bind variables, do the same in your psql session.

Large discrepancies in estimated vs. actual row-counts implies an estimation error.Check the child nodes for similarly inaccurate row-counts, and either adjust statistics or rewriting the query.

Lock contention erratically affects performance and may lead to deadlock.Enable log_lock_waits (Pg >= 8.4), and check server logs. Monitor pg_stat_activity.waiting.

Avoid unnecessary risks.When tuning anything production, always have a rollback plan, and tell any coworkers who may encounter side-effects.

Some common pitfalls and how to avoid them

Examples of avoiding filtering by joining:Original:select * from big_tab join tiny_tab using (key) where tiny_tab.foo = 'bar';Option A: Denormalizealter table big_tab add column foo text;update big_tab from tiny_tab set foo = tiny_tab.foo;analyze big_tab (foo);select * from big_tab join tiny_tab using (key) where big_tab.foo = 'bar';Option B: Refactorselect key from tiny_tab where foo = 'bar'; -- Returns 10, 12, 14select * from big_tab join tiny_tab using (key) where big_tab.key in (10, 12, 14);

Thanks for coming!

Questions?

Notes

SUPPLEMENTARY
MATERIAL

Which other tables also have a column named "cust_id"?

Relevance: Find join candidates, missing foreign key constraints, or otherwise related tables in a large/complex schema.

pg841=# selectpg841-# attrelid::regclass as table_name,pg841-# attname as column_namepg841-# from pg_attributepg841-# where attname ~ 'cust_id'pg841-# ;table_name | column_name ------------------------+-------------customers | cust_idother_namespace.orders | cust_id(2 rows)

Using a regexp for column name matching can be helpful in finding undocumented foreign keys. Some tables may have multiple columns referencing the same FK (e.g. billing_addr_id and shipping_addr_id could both refer to addresses.addr_id).

How often is this index used?

Relevance: Unused/obsolete indexes may be dropped to save disk/cache space and speed up DML.

3 measures of how used an index is:idx_scan = How many times was this index scanned?

idx_tup_read = How many rows was this index used to read from the table?

idx_tup_fetch = How many of those table rows were valid "live" rows?

pg841=# selectpg841-# indexrelname,pg841-# idx_scan,pg841-# idx_tup_read,pg841-# idx_tup_fetchpg841-# from pg_stat_user_indexespg841-# where relname = 'my_tab'pg841-# order by indexrelnamepg841-# ;indexrelname | idx_scan | idx_tup_read | idx_tup_fetch --------------------+----------+--------------+---------------idx_my_tab_bar | 2 | 20 | 17idx_my_tab_foo | 1 | 3 | 3idx_my_tab_foo_bar | 1 | 1 | 1(3 rows)

Caveats:idx_scan increments for each iteration of outer loop for inner row-source of nested loops.idx_tup_fetch not incremented by BitmapIndexScan, only normal IndexScan.

How often is this index used? (cont.)

Points to remember:Counters increase monotonously since the last time a superuser ran pg_stat_reset().

If an index was used more often in the past than recently, its counter may still be high.

Newly created indexes have had less time to accumulate hits.

An index may exist to support a rare but otherwise slow operation (e.g. FK check during delete from referenced table).

Before dropping an index consider:Any queries using the dropped index would try to fall back to another index with columns relevant to the query's filters or sorting requirements. If the resulting filtering selectivity is similar, performance will probably be similar.

If no other index can be used, the query will instead SeqScan the table. This may be more or less efficient, depending on index selectivity, caching of table/index blocks, and relative speed of sequential vs. scattered I/O for uncached blocks.

What functions/aggregates are available?

Relevance: Know what your database can do for you.

In Pg 8.3 and earlier:

Normal functions are shown by \df. It includes built-in functions (from pg_catalog) as well as user-defined functions (in your search_path, e.g. public):pg838=# \dfList of functionsSchema | Name | Result data type | Argument data types ------------+---------+------------------+---------------------pg_catalog | abs | bigint | bigint...public | my_func | integer | integer...(1853 rows)

Aggregate functions are similarly shown by \da.pg838=# \daList of aggregate functionsSchema | Name | Result data type | Argument data types | Description ------------+------+------------------+---------------------+-------------pg_catalog | avg | double precision | real | ...(117 rows)

What functions/aggregates are available? (cont.)

In Pg 8.4:

The \df meta-command now:omits the built-in functions by default, listing only user-defined functions

includes aggregates, in addition to normal functions

pg841=# \dfList of functionsSchema | Name | Result data type | Argument data types | Type --------+---------+------------------+---------------------+--------public | my_aggr | anyarray | anyelement | aggpublic | my_func | integer | integer | normal(2 rows)

To list the built-in functions/aggregates in Pg 8.4, use either of these:pg841=# \dfSpg841=# \df pg_catalog....(2208 rows)

What functions/aggregates are available? (cont.)

Wild cards (*?) are allowed, as with most psql metacommands:pg841=# \dfS bool*List of functionsSchema | Name | Result data type | Argument data types | Type ------------+-------------------+------------------+---------------------+--------pg_catalog | bool | boolean | integer | normalpg_catalog | bool_and | boolean | boolean | agg...(16 rows)

For more options, see psql's built-in help:pg841=# \?...Informational(options: S = show system objects, + = additional detail)...\df[antw][S+] [PATRN] list [only agg/normal/trigger/window] functions...

How is function my_func implemented?

Relevance: If a function-call is a query's bottleneck, seeing its code may suggest a tune-up.

View the source code for your user-defined functions written in an interpreted language (e.g. SQL, plpgsql, plperl, etc.):pg841=# \df+ my_funcList of functions-[ RECORD 1 ]-------+----------------Schema | publicName | my_funcResult data type | integerArgument data types | integerType | normalVolatility | immutableOwner | mssLanguage | sqlSource code | select $1 + 1Description |

Or (in Pg 8.4) open the function in an editor. (Be careful what you change!)pg841=# \ef my_func(int)

Alternately, query the catalog directly:pg841=# select prosrc from pg_proc where oid = 'my_func(int)'::regprocedure ;prosrc -----------------select $1 + 1

What are the other concurrent sessions doing?

Relevance: Know your system's workload other concurrent sessions' queries compete for system resources (disk I/O, CPU time, cache, work mem, locks, network bandwidth, temp space, etc.).

The pg_stat_activity table gives 1 row per session:

pg841=# select *pg841-# from pg_stat_activitypg841-# where procpid != pg_backend_pid()pg841-# order by xact_startpg841-# ;-[ RECORD 1 ]-+------------------------------------------------------datid | 16480datname | pg841procpid | 2646usesysid | 10usename | msscurrent_query | select count(1) from orders o1 cross join orders o2 ;waiting | fxact_start | 2009-10-04 15:35:25.637107-07query_start | 2009-10-04 15:35:41.666956-07backend_start | 2009-10-04 15:34:08.653364-07client_addr | 127.0.0.1client_port | 59733

The meaning of each column is listed in the docs:http://www.postgresql.org/docs/8.4/static/monitoring-stats.html

Is lock contention affecting my session?

Relevance: While waiting for a lock, a query is doing no useful work.

First we'll create some lock contention:Session #1: create index idx_customers_foo on customers ( foo ) ;Session #2: analyze customers ;Session #3: vacuum customers ;

Diagnosis, the easy way:pg_stat_activity reports which sessions are waiting due to lock contention, but it gives few details.

pg841=# select procpid, usename, xact_start, query_start, current_querypg841-# from pg_stat_activitypg841-# where waitingpg841-# order by query_startpg841-# ;procpid | usename | xact_start | query_start | current_query ---------+---------+---------------------+---------------------+---------------------7165 | mss | 2009-10-10 12:58:08 | 2009-10-10 12:58:14 | analyze customers ;7189 | mss | 2009-10-10 12:59:01 | 2009-10-10 12:59:01 | vacuum customers ;(2 rows)

Sometimes the stalled queries are easy to read (as above), so you can guess what lock is needed. Whenever it is not obvious, you can ask the database, by looking in pg_locks.

Long-lasting contention is harmful because it:degrades apparent performance of waiters

can impair scalability of the application, if it is recurring

can potentially lead to deadlock (i.e. if the waiter already holds locks that the blocker will later require)

Run this query in a new session (with new transaction on pg_stat_activity).

Here we look for any query's lock contention. You can filter by procpid or regexp on current_query to narrow results, but for most systems there should be no result rows anyway (lasting lock contention should be rare).

Is lock contention affecting my session? (cont.)

Diagnosis, the meticulous way:Look in pg_locks for more details about which lock each of the stalled sessions needs next but cannot get.

pg841=# selectpg841-# pid,pg841-# locktype,pg841-# mode,pg841-# relation::regclass as relname,pg841-# transactionid as target_xid,pg841-# virtualxid as target_vxid,pg841-# virtualtransaction as my_vxid,pg841-# grantedpg841-# from pg_lockspg841-# where pid in (7189, 7165)pg841-# order by pid, locktype, mode, relationpg841-# ;pid | locktype | mode | relname | target_xid | target_vxid | my_vxid | granted ------+------------+--------------------------+----------------------------+------------+-------------+---------+---------7165 | relation | AccessShareLock | pg_class | | | 3/2 | t7165 | relation | AccessShareLock | pg_namespace | | | 3/2 | t7165 | relation | AccessShareLock | pg_class_oid_index | | | 3/2 | t7165 | relation | AccessShareLock | pg_class_relname_nsp_index | | | 3/2 | t7165 | relation | AccessShareLock | pg_namespace_nspname_index | | | 3/2 | t7165 | relation | AccessShareLock | pg_namespace_oid_index | | | 3/2 | t7165 | relation | ShareUpdateExclusiveLock | customers | | | 3/2 | f7165 | virtualxid | ExclusiveLock | | | 3/2 | 3/2 | t

7189 | relation | ShareUpdateExclusiveLock | customers | | | 4/4 | f7189 | virtualxid | ExclusiveLock | | | 4/4 | 4/4 | t(10 rows)

Who is blocking my query?

Relevance: To prevent recurring lock contention you must identify both the waiters and the blocker.

May be obvious from examining the waiters' stalled current_query. If not, narrow the list of suspected blockers by:(1) Find which resource is contended (e.g. which table or row).(2) Find which other transactions hold a conflicting lock on that resource.

The first step we did in the previous slides. The second step again visits pg_locks.

pg841=# selectpg841-# pid,pg841-# locktype,pg841-# mode,pg841-# relation::regclass as relname,pg841-# transactionid as target_xid,pg841-# virtualxid as target_vxid,pg841-# virtualtransaction as my_vxid,pg841-# grantedpg841-# from pg_lockspg841-# where relation = 'customers'::regclasspg841-# order by pid, locktype, mode, relationpg841-# ;pid | locktype | mode | relname | target_xid | target_vxid | my_vxid | granted ------+----------+--------------------------+-----------+------------+-------------+---------+---------7141 | relation | AccessShareLock | customers | | | 2/4 | t7141 | relation | ShareLock | customers | | | 2/4 | t

7165 | relation | ShareUpdateExclusiveLock | customers | | | 3/2 | f

7189 | relation | ShareUpdateExclusiveLock | customers | | | 4/4 | f(4 rows)

Which locking modes conflict?

Lock modesWhich lock modes conflict? Examples of commands that set these modes on each table.Locktype transactionid and virtualxid use lock modes Exclusive and Share. Exclusive is used to lock your own transactionid, and Share is used to wait for a different transaction (e.g. when waiting for a row-level lock).

Some concurrency rules:* You can SELECT anytime except during a schema change (logical or physical) other than CREATE [object] (which are invisible to other transactions, anyway).* You can INSERT, UPDATE, DELETE to the same table, but not to the same row.* You can CREATE INDEX during another CREATE INDEX on the same table.* You cannot CREATE INDEX during INSERT, UPDATE, DELETE, or VACUUM.* You cannot VACUUM during a VACUUM or any schema changes (logical or physical).* Even an AccessShare lock will block an AccessExclusive lock, which is why commands like ALTER TABLE are often blocked by a read-only query.* When running INSERT or UPDATE on non-key fields, your transaction only sets locks on your target table.* When running DELETE or UPDATE on a key field, in addition to the normal RowExclusiveLock, your transaction sets a RowShareLock (i.e. SELECT FOR UPDATE) on all tables with foreign keys referencing your table. These read-locks only conflict with INSERTS into any of the child tables.* You cannot INSERT rows into a child table in concurrent transactions if the child rows reference the same parent row (because FK-checking sets row-level locks in exclusive mode on the foreign table).* You cannot UPDATE or DELETE parent rows while a concurrent transaction INSERTS child rows referencing them (because FK-checking sets row-level locks in exclusive mode on the foreign table).

This example supposes that an important query only cares about a predefined set of rare values in a column (i.e. find which orders are not yet paid for). A normal index would suffice, but a partial index is even more efficient. A partial index is only usable if its own WHERE clause is included in the query's.

pg841=# select is_paid, count(1) from orders group by is_paid ;is_paid | count ---------+-------f | 945t | 99055(2 rows)

pg841=# explain analyze select count(*) from orders where not is_paid ;QUERY PLAN --------------------------------------------------------------------------------------------------------------Aggregate (cost=1689.46..1689.47 rows=1 width=0) (actual time=32.861..32.863 rows=1 loops=1)-> Seq Scan on orders (cost=0.00..1687.00 rows=983 width=0) (actual time=0.036..31.221 rows=945 loops=1)Filter: (NOT is_paid)Total runtime: 32.921 ms(4 rows)

pg841=# create index pidx_orders on orders ( order_id ) where not is_paid ;CREATE INDEX

pg841=# explain analyze select count(*) from orders where not is_paid ;QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=47.27..47.28 rows=1 width=0) (actual time=5.282..5.284 rows=1 loops=1)-> Index Scan using pidx_orders on orders (cost=0.00..44.81 rows=983 width=0) (actual time=0.097..3.569 rows=945 loops=1)Total runtime: 5.352 ms(3 rows)

When will adding a new index improve query performance?
Partial index example

Notes

Narration.

Query and plan.

Title

Notes

Lock mode:AccessShareRowShareRowExclusiveShareUpdateExclusiveShareShareRowExclusiveExclusiveAccessExclusiveExamples of when used:

AccessShareXSELECT, ANALYZE (Pg = 8.2)

ShareXXCREATE INDEX (Share on table, AccessExclusive on index)

ShareRowExclusiveXXXNot automatically set on a table/index by any command.

ExclusiveXXXNot automatically set on a table/index by any command.

AccessExclusiveXXALTER/DROP TABLE, REINDEX, CLUSTER, VACUUM FULL

???Page ??? (???)10/15/2009, 02:55:50Page /

basic query tuning primer

Documents