materialized views in postgresql
Post on 07-Dec-2014
1.211 Views
Preview:
DESCRIPTION
TRANSCRIPT
© 2013 EDB All rights reserved. 1
Materialized views in PostgreSQL
Ashutosh Bapat | 28th March, 2014
© 2013 EDB All rights reserved. 2
Theoretical backgroundPostgreSQL's supportUse cases
© 2013 EDB All rights reserved. 3
(SQL) View
● “Virtual relation” defined by a query
● Represents the result of the query
● Can be queried similar to a table
● Referencing view in a query, requires the defining query to be executed each time
View: emp_with_good_salary
SELECT emp_name
FROM emp
WHERE salary > 15000;
Table: emp
emp_name salaryKiran 10000
Mohan 20000
Leela 30000
© 2013 EDB All rights reserved. 4
Materialized View (MV)
● A “view” with results of associated query stored in the database
● Referencing a materialized view does not require execution of the query
● Needs to be “maintained” to keep up with changes in underlying objects (tables or views)
● Can be indexed unlike non-materialized view
Table: emp
emp_name salaryKiran 10000
Mohan 20000
Leela 30000
MV: emp_with_good_salary
emp_name salary
Mohan 20000
Leela 30000
© 2013 EDB All rights reserved. 5
Theoretical backgroundPostgreSQL's supportUse cases
© 2013 EDB All rights reserved. 6
● Creation
– CREATE MATERIALIZED VIEW● Maintainance
– REFRESH MATERIALIZED VIEW● Destruction
– DROP MATERIALIZED VIEW● Supported from 9.3
● Enhancements in 9.4
– REFRESH MATERIALIZED VIEW CONCURRENTLY
Materialized Views in PostgreSQL
© 2013 EDB All rights reserved. 7
● Lazy refresh
– Materialized view usually contains stale data
– REFRESH periodically or suitable independent of DML activity
–
● Aggressive refresh
– Materialized view contains latest data in serializable transactions and nearly fresh data at other isolation levels
– REFRESH using triggers/rules
Refreshing MV
© 2013 EDB All rights reserved. 8
● Incremental refresh
– Refreshing only those rows affected by changes to the underlying table
– Being worked on community● Using Materialized views for query optimization
– Using MVs automatically● Auto-refresh
– Refreshing materialized view automatically when the underlying tables change
What's not supported in 9.4
© 2013 EDB All rights reserved. 9
Theoretical backgroundPostgreSQL's supportUse cases
© 2013 EDB All rights reserved. 10
Reporting using stale data
● Very frequently updated tables
● Approximate reports are fine
● Create materialized view/s for reporting queries
● Refresh every night or on weekly/monthly basis
© 2013 EDB All rights reserved. 11
Reporting region-wise sales● Table schema
CREATE TABLE salesman(salesman_no integer PRIMARY KEY, name varchar(100), region varchar(100));CREATE TABLE invoice (invoice_no integer PRIMARY KEY, salesman_no integer REFERENCES salesman, invoice_amt numeric(13, 2), invoice_date date);
● Reporting QuerySELECT sum(i.invoice_amt) region_sale, s.region region FROM salesman s, invoice i WHERE i.salesman_no = s.salesman_no GROUP BY s.region ORDER BY region_sale LIMIT 10;
© 2013 EDB All rights reserved. 12
Reporting region-wise sales
EXPLAIN ANALYZE SELECT sum(i.invoice_amt) region_sale, s.region region FROM salesman s, invoice i WHERE i.salesman_no = s.salesman_no GROUP BY s.region ORDER BY region_sale LIMIT 10; QUERY PLAN--------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=44294.16..44294.18 rows=10 width=234) (actual time=2609.868..2609.870 rows=10 loops=1) -> Sort (cost=44294.16..44294.66 rows=200 width=234) (actual time=2609.860..2609.861 rows=10 loops=1) Sort Key: (sum(i.invoice_amt)) Sort Method: top-N heapsort Memory: 26kB -> HashAggregate (cost=44287.84..44289.84 rows=200 width=234) (actual time=2609.347..2609.366 rows=26 loops=1) -> Hash Join (cost=559.84..39828.84 rows=891800 width=234) (actual time=29.751..1374.305 rows=1000000 loops=1) Hash Cond: (i.salesman_no = s.salesman_no) -> Seq Scan on invoice i (cost=0.00..15288.00 rows=891800 width=20) (actual time=0.048..398.745 rows=1000000 loops=1) -> Hash (cost=345.15..345.15 rows=5015 width=222) (actual time=29.602..29.602 rows=10000 loops=1) Buckets: 1024 Batches: 2 Memory Usage: 685kB -> Seq Scan on salesman s (cost=0.00..345.15 rows=5015 width=222) (actual time=0.009..5.221 rows=10000 loops=1) Total runtime: 2610.316 ms
© 2013 EDB All rights reserved. 13
Reporting region-wise sales
CREATE MATERIALIZED VIEW sales_by_region AS SELECT sum(i.invoice_amt) region_sale, s.region region FROM salesman s, invoice i WHERE i.salesman_no = s.salesman_no GROUP BY s.region;
EXPLAIN ANALYZE SELECT * FROM sales_by_region ORDER BY region_sale LIMIT 10; QUERY PLAN--------------------------------------------------------------------------------------------------------------------------- Limit (cost=19.17..19.19 rows=10 width=250) (actual time=0.065..0.066 rows=10 loops=1) -> Sort (cost=19.17..19.89 rows=290 width=250) (actual time=0.064..0.064 rows=10 loops=1) Sort Key: region_sale Sort Method: top-N heapsort Memory: 26kB -> Seq Scan on sales_by_region (cost=0.00..12.90 rows=290 width=250) (actual time=0.007..0.013 rows=26 loops=1) Total runtime: 0.094 ms(6 rows)
© 2013 EDB All rights reserved. 14
Complex queries
● Relatively stable underlying tables
● Complex and slow running queries
● Bonus– Stale data not tolerable – use triggers to refresh
– Faster query results – use indexes on MV
© 2013 EDB All rights reserved. 15
Shortest route problem
● Table schemaCREATE TABLE roads (source char, dest char, length numeric(5, 2));
● Slow queryWITH RECURSIVE paths (source, dest, length, path) AS ( SELECT source, dest, length::float, '{}'::bpchar[] FROM roads WHERE source = 'A' UNION ALL SELECT p.source, r.dest, p.length + r.length, p.path || ARRAY[r.source] FROM paths p, roads r WHERE p.dest = r.source AND not (r.dest = ANY(p.path)))SELECT * FROM paths WHERE dest = 'L' ORDER BY length LIMIT 1;
© 2013 EDB All rights reserved. 16
SRP: without MV
EXPLAIN ANALYZE output WITH RECURSIVE paths (source, dest, length, path) AS ( ORDER BY length LIMIT 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=686.43..686.43 rows=1 width=56) (actual time=897.159..897.159 rows=1 loops=1) CTE paths -> Recursive Union (cost=0.00..581.31 rows=4667 width=76) (actual time=0.039..720.175 rows=138640 loops=1) -> Seq Scan on roads (cost=0.00..27.52 rows=7 width=28) (actual time=0.036..0.061 rows=5 loops=1) Filter: (source = 'A'::bpchar) Rows Removed by Filter: 75 -> Hash Join (cost=2.28..46.04 rows=466 width=76) (actual time=9.528..38.388 rows=8665 loops=16) Hash Cond: (r.source = p.dest) Join Filter: (r.dest <> ALL (p.path)) -> Seq Scan on roads r (cost=0.00..24.00 rows=1400 width=28) (actual time=0.010..0.025 rows=80 loops=16) -> Hash (cost=1.40..1.40 rows=70 width=56) (actual time=9.159..9.159 rows=8665 loops=16) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> WorkTable Scan on paths p (cost=0.00..1.40 rows=70 width=56) (actual time=0.008..3.959 rows=8665 loops=16) -> Sort (cost=105.12..105.18 rows=23 width=56) (actual time=897.154..897.154 rows=1 loops=1) Sort Key: paths.length Sort Method: top-N heapsort Memory: 25kB -> CTE Scan on paths (cost=0.00..105.01 rows=23 width=56) (actual time=0.696..896.652 rows=912 loops=1) Filter: (dest = 'L'::bpchar) Rows Removed by Filter: 137728 Total runtime: 900.970 ms(20 rows)
© 2013 EDB All rights reserved. 17
SRP: Materialized View
CREATE MATERIALIZED VIEW paths ASWITH RECURSIVE paths (source, dest, length, path) AS ( SELECT source, dest, length::float, '{}'::bpchar[] FROM roads UNION ALL SELECT p.source, r.dest, p.length + r.length, p.path || ARRAY[r.source] FROM paths p, roads r WHERE p.dest = r.source AND not (r.dest = ANY(p.path)))SELECT * FROM paths;
EXPLAIN ANALYZE SELECT * FROM paths WHERE source = 'A' and dest = 'L' ORDER BY length DESC LIMIT 1; QUERY PLAN--------------------------------------------------------------------------------------------------------------------- Limit (cost=10623.33..10623.33 rows=1 width=56) (actual time=125.326..125.327 rows=1 loops=1) -> Sort (cost=10623.33..10623.35 rows=10 width=56) (actual time=125.324..125.324 rows=1 loops=1) Sort Key: length Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on paths (cost=0.00..10623.28 rows=10 width=56) (actual time=0.283..124.988 rows=912 loops=1) Filter: ((source = 'A'::bpchar) AND (dest = 'L'::bpchar)) Rows Removed by Filter: 281233 Total runtime: 125.377 ms(8 rows)
© 2013 EDB All rights reserved. 18
SRP: MV with indexes
CREATE INDEX i_paths_source on paths(source, dest);
EXPLAIN ANALYZE SELECT * FROM paths WHERE source = 'A' and dest = 'L' ORDER BY length DESC LIMIT 1; QUERY PLAN------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=31.80..31.80 rows=1 width=56) (actual time=1.265..1.265 rows=1 loops=1) -> Sort (cost=31.80..31.81 rows=7 width=56) (actual time=1.264..1.264 rows=1 loops=1) Sort Key: length Sort Method: top-N heapsort Memory: 25kB -> Bitmap Heap Scan on paths (cost=4.49..31.76 rows=7 width=56) (actual time=0.327..0.982 rows=912 loops=1) Recheck Cond: ((source = 'A'::bpchar) AND (dest = 'L'::bpchar)) -> Bitmap Index Scan on i_paths_source (cost=0.00..4.49 rows=7 width=0) (actual time=0.304..0.304 rows=912 loops=1) Index Cond: ((source = 'A'::bpchar) AND (dest = 'L'::bpchar)) Total runtime: 1.317 ms(9 rows)
© 2013 EDB All rights reserved. 19
SRP: latest data using triggers
CREATE FUNCTION refresh_mvs() RETURNS trigger LANGUAGE plpgsql AS$$BEGIN REFRESH MATERIALIZED VIEW paths; RETURN NULL;END;$$;CREATE TRIGGER paths_trig AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE ON roads FOR EACH STATEMENT EXECUTE PROCEDURE refresh_mvs();
© 2013 EDB All rights reserved. 20
SRP: latest data using triggers
SELECT * FROM paths WHERE source = 'T'; source | dest | length | path--------+------+--------+------(0 rows)
EXPLAIN ANALYZE INSERT INTO roads VALUES ('T', 'Z', 100.4); QUERY PLAN--------------------------------------------------------------------------------------------- Insert on roads (cost=0.00..0.01 rows=1 width=0) (actual time=0.033..0.033 rows=0 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1) Trigger paths_trig: time=9080.960 calls=1 Total runtime: 9081.028 ms(4 rows)
SELECT * FROM paths WHERE source = 'T'; source | dest | length | path--------+------+--------+------ T | Z | 100.4 | {}(1 row)
© 2013 EDB All rights reserved. 21
Caching foreign data
● Materialized views on foreign tables– Data availability in case of foreign server failure
– Faster data access
– Possibly stale data
● Aggressive refresh– Triggers on foreign tables not supported
● Being discussed in the community
– External method for firing REFRESH when foreign data changes
● Lazy refresh– Fire REFRESH periodically
© 2013 EDB All rights reserved. 22
Caching foreign data
postgres=# \d+ remote_emp Foreign table "public.remote_emp" Column | Type | Modifiers | FDW Options | Storage | Stats target | Description --------+-----------------------+-----------+-------------+----------+--------------+------------- empno | numeric(4,0) | | | main | | ename | character varying(10) | | | extended | | job | character varying(10) | | | extended | | Server: local_ppasFDW Options: (schema_name 'public', table_name 'emp')Has OIDs: no
postgres=# create materialized view cached_remote_emp as select * from remote_emp;
postgres=# explain analyze select * from cached_remote_emp; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Seq Scan on cached_remote_emp (cost=0.00..16.90 rows=690 width=88) (actual time=0.020..0.024 rows=14 loops=1) Planning time: 0.076 ms Total runtime: 0.068 ms(3 rows)
postgres=# explain analyze select * from remote_emp; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Foreign Scan on remote_emp (cost=100.00..131.93 rows=731 width=88) (actual time=0.834..0.836 rows=14 loops=1) Planning time: 0.077 ms Total runtime: 1.451 ms(3 rows)
●
© 2013 EDB All rights reserved. 23
Thank you
top related