you can do that without perl?

60
You Can Do That Without Perl? Lists and Recursion and Trees, Oh My! YAPC10, Pittsburgh 2009 Copyright © 2009 David Fetter [email protected] All Rights Reserved

Upload: postgresql-experts-inc

Post on 18-Dec-2014

402 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: You can do THAT without Perl?

You Can Do ThatWithout Perl?

Lists and Recursion and Trees, Oh My!YAPC10, Pittsburgh 2009

Copyright © 2009David Fetter [email protected] Rights Reserved

Page 2: You can do THAT without Perl?

Subject: The YAPC::NA 2009 Survey

Page 3: You can do THAT without Perl?

Lists:Windowing Functions

Page 4: You can do THAT without Perl?

What are Windowing Functions?

PGCon2009, 21-22 May 2009 Ottawa

Page 5: You can do THAT without Perl?

What are the Windowing Functions?

• Similar to classical aggregates but does more! • Provides access to set of rows from the current row• Introduced SQL:2003 and more detail in SQL:2008

– SQL:2008 (http://wiscorp.com/sql200n.zip) 02: SQL/Foundation• 4.14.9 Window tables• 4.15.3 Window functions• 6.10 <window function>• 7.8 <window clause>

• Supported by Oracle, SQL Server, Sybase and DB2– No open source RDBMS so far except PostgreSQL (Firebird trying)

• Used in OLAP mainly but also useful in OLTP– Analysis and reporting by rankings, cumulative aggregates

PGCon2009, 21-22 May 2009 Ottawa

Page 6: You can do THAT without Perl?

How they work and what you get

PGCon2009, 21-22 May 2009 Ottawa

Page 7: You can do THAT without Perl?

How they work and what do you get

• Windowed table– Operates on a windowed table– Returns a value for each row– Returned value is calculated from the rows in the window

PGCon2009, 21-22 May 2009 Ottawa

Page 8: You can do THAT without Perl?

How they work and what do you get

• You can use…– New window functions– Existing aggregate functions– User-defined window functions– User-defined aggregate functions

PGCon2009, 21-22 May 2009 Ottawa

Page 9: You can do THAT without Perl?

How they work and what do you get

• Completely new concept!– With Windowing Functions, you can reach outside the current row

PGCon2009, 21-22 May 2009 Ottawa

Page 10: You can do THAT without Perl?

How they work and what you get

PGCon2009, 21-22 May 2009 Ottawa

[Aggregates] SELECT key, SUM(val) FROM tbl GROUP BY key;

Page 11: You can do THAT without Perl?

How they work and what you get

PGCon2009, 21-22 May 2009 Ottawa

[Windowing Functions] SELECT key, SUM(val) OVER (PARTITION BY key) FROM tbl;

Page 12: You can do THAT without Perl?

depname empno salary

develop 10 5200sales 1 5000personnel 5 3500sales 4 4800sales 6 550personnel 2 3900develop 7 4200develop 9 4500sales 3 4800develop 8 6000develop 11 5200

PGCon2009, 21-22 May 2009 Ottawa

Who is the highest paid relatively compared with the department average?

How they work and what you get

Page 13: You can do THAT without Perl?

PGCon2009, 21-22 May 2009 Ottawa

depname empno salary avg diff

develop 8 6000 5020 980sales 6 5500 5025 475personnel 2 3900 3700 200develop 11 5200 5020 180develop 10 5200 5020 180sales 1 5000 5025 -25personnel 5 3500 3700 -200sales 4 4800 5025 -225sales 3 4800 5025 -225develop 9 4500 5020 -520develop 7 4200 5020 -820

SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname)::int, salary - avg(salary) OVER (PARTITION BY depname)::int AS diffFROM empsalary ORDER BY diff DESC

How they work and what you get

Page 14: You can do THAT without Perl?

Anatomy of a Window• Represents set of rows abstractly as:• A partition

– Specified by PARTITION BY clause– Never moves– Can contain:

• A frame– Specified by ORDER BY clause and frame clause– Defined in a partition– Moves within a partition– Never goes across two partitions

• Some functions take values from a partition. Others take them from a frame.

PGCon2009, 21-22 May 2009 Ottawa

Page 15: You can do THAT without Perl?

A partition• Specified by PARTITION BY clause in OVER()• Allows to subdivide the table, much like

GROUP BY clause• Without a PARTITION BY clause, the whole

table is in a single partition

PGCon2009, 21-22 May 2009 Ottawa

func(args)

OVER ( )

partition-clause order-clause frame-

clause

PARTITION BY expr, …

Page 16: You can do THAT without Perl?

A frame• Specified by ORDER BY clause and frame

clause in OVER()• Allows to tell how far the set is applied• Also defines ordering of the set• Without order and frame clauses, the whole of

partition is a single frame

PGCon2009, 21-22 May 2009 Ottawa

func(args)

OVER ( )

partition-clause order-clause frame-

clause

ORDER BY expr [ASC|DESC] [NULLS FIRST|LAST], …

(ROWS|RANGE) BETWEEN UNBOUNDED (PRECEDING|FOLLOWING) AND CURRENT ROW

Page 17: You can do THAT without Perl?

The WINDOW clause• You can specify common window definitions

in one place and name it, for convenience• You can use the window at the same query

level

PGCon2009, 21-22 May 2009 Ottawa

WINDOW w AS

( )

partition-clause order-clause frame-

clause

SELECT target_list, … wfunc() OVER wFROM table_list, …WHERE qual_list, ….GROUP BY groupkey_list, ….HAVING groupqual_list, …

Page 18: You can do THAT without Perl?

Built-in Windowing Functions

PGCon2009, 21-22 May 2009 Ottawa

Page 19: You can do THAT without Perl?

List of built-in Windowing Functions

• row_number()• rank()• dense_rank()• percent_rank()• cume_dist()• ntile()

• lag()• lead()• first_value()• last_value()• nth_value()

PGCon2009, 21-22 May 2009 Ottawa

Page 20: You can do THAT without Perl?

row_number()• Returns number of the current row

PGCon2009, 21-22 May 2009 Ottawa

val row_number()5 15 23 31 4

SELECT val, row_number() OVER (ORDER BY val DESC) FROM tbl;

Note: row_number() always incremented values independent of frame

Page 21: You can do THAT without Perl?

rank()• Returns rank of the current row with gap

PGCon2009, 21-22 May 2009 Ottawa

val rank()5 15 13 31 4

SELECT val, rank() OVER (ORDER BY val DESC) FROM tbl;

Note: rank() OVER(*empty*) returns 1 for all rows, since all rowsare peers to each other

gap

Page 22: You can do THAT without Perl?

dense_rank()• Returns rank of the current row without gap

PGCon2009, 21-22 May 2009 Ottawa

val dense_rank()5 15 13 21 3

SELECT val, dense_rank() OVER (ORDER BY val DESC) FROM tbl;

no gap

Note: dense_rank() OVER(*empty*) returns 1 for all rows, since all rows are peers to each other

Page 23: You can do THAT without Perl?

percent_rank()

• Returns relative rank; (rank() – 1) / (total row – 1)

PGCon2009, 21-22 May 2009 Ottawa

val percent_rank()5 05 03 0.6666666666666671 1

SELECT val, percent_rank() OVER (ORDER BY val DESC) FROM tbl;

Note: percent_rank() OVER(*empty*) returns 0 for all rows, since all rows are peers to each other

values are always between 0 and 1 inclusive.

Page 24: You can do THAT without Perl?

cume_dist()

• Returns relative rank; (# of preced. or peers) / (total row)

PGCon2009, 21-22 May 2009 Ottawa

val cume_dist()5 0.55 0.53 0.751 1

SELECT val, cume_dist() OVER (ORDER BY val DESC) FROM tbl;

Note: cume_dist() OVER(*empty*) returns 1 for all rows, since all rowsare peers to each other

= 2 / 4

= 2 / 4

= 3 / 4

= 4 / 4

The result can be emulated by “count(*) OVER (ORDER BY val DESC) / count(*) OVER ()”

Page 25: You can do THAT without Perl?

ntile()• Returns dividing bucket number

PGCon2009, 21-22 May 2009 Ottawa

val ntile(3)5 15 13 21 3

SELECT val, ntile(3) OVER (ORDER BY val DESC) FROM tbl;

Note: ntile() OVER (*empty*) returns same values as above, sincentile() doesn’t care the frame but works against the partition

The results are the divided positions, but if there’s remainder addrow from the head

4 % 3 = 1

Page 26: You can do THAT without Perl?

lag()• Returns value of row above

PGCon2009, 21-22 May 2009 Ottawa

val lag(val)5 NULL5 53 51 3

SELECT val, lag(val) OVER (ORDER BY val DESC) FROM tbl;

Note: lag() only acts on a partition.

Page 27: You can do THAT without Perl?

lead()• Returns value of the row below

PGCon2009, 21-22 May 2009 Ottawa

val lead(val)5 55 33 11 NULL

SELECT val, lead(val) OVER (ORDER BY val DESC) FROM tbl;

Note: lead() acts against a partition.

Page 28: You can do THAT without Perl?

first_value()• Returns the first value of the frame

PGCon2009, 21-22 May 2009 Ottawa

val first_value(val)5 55 53 51 5

SELECT val, first_value(val) OVER (ORDER BY val DESC) FROM tbl;

Page 29: You can do THAT without Perl?

last_value()• Returns the last value of the frame

PGCon2009, 21-22 May 2009 Ottawa

val last_value(val)5 15 13 11 1

SELECT val, last_value(val) OVER (ORDER BY val DESC ROWS BETWEEN UNBOUNDED PRECEDINGAND UNBOUNDED FOLLOWING) FROM tbl;

Note: frame clause is necessary since you have a frame betweenthe first row and the current row by only the order clause

Page 30: You can do THAT without Perl?

nth_value()• Returns the n-th value of the frame

PGCon2009, 21-22 May 2009 Ottawa

val nth_value(val, val)5 NULL5 NULL3 31 5

SELECT val, nth_value(val, val) OVER (ORDER BY val DESC ROWS BETWEEN UNBOUNDED PRECEDINGAND UNBOUNDED FOLLOWING) FROM tbl;

Note: frame clause is necessary since you have a frame betweenthe first row and the current row by only the order clause

Page 31: You can do THAT without Perl?

aggregates(all peers)• Returns the same values along the frame

PGCon2009, 21-22 May 2009 Ottawa

val sum(val)5 145 143 141 14

Note: all rows are the peers to each other

SELECT val, sum(val) OVER () FROM tbl;

Page 32: You can do THAT without Perl?

cumulative aggregates• Returns different values along the frame

PGCon2009, 21-22 May 2009 Ottawa

val sum(val)5 105 103 131 14

Note: row#1 and row#2 return the same value since they are the peers.the result of row#3 is sum(val of row#1…#3)

SELECT val, sum(val) OVER (ORDER BY val DESC) FROM tbl;

Page 33: You can do THAT without Perl?

Recursion and Trees, Oh My:Common Table Expressions

Page 34: You can do THAT without Perl?

Thanks to:

• The ISO SQL Standards Committee

• Yoshiyuki Asaba

• Ishii Tatsuo

• Jeff Davis

• Gregory Stark

• Tom Lane

• etc., etc., etc.

Page 35: You can do THAT without Perl?

Recursion

Page 36: You can do THAT without Perl?

Recursion in General

•Initial Condition

•Recursion step

•Termination condition

Page 37: You can do THAT without Perl?

E1 List TableCREATE TABLE employee( id INTEGER NOT NULL, boss_id INTEGER, UNIQUE(id, boss_id)/*, etc., etc. */);

INSERT INTO employee(id, boss_id)VALUES(1,NULL), /* El capo di tutti capi */(2,1),(3,1),(4,1),(5,2),(6,2),(7,2),(8,3),(9,3),(10,4),(11,5),(12,5),(13,6),(14,7),(15,8),(1,9);

Page 38: You can do THAT without Perl?

Tree Query InitiationWITH RECURSIVE t(node, path) AS ( SELECT id, ARRAY[id] FROM employee WHERE boss_id IS NULL /* Initiation Step */UNION ALL SELECT e1.id, t.path || ARRAY[e1.id] FROM employee e1 JOIN t ON (e1.boss_id = t.node) WHERE id NOT IN (t.path))SELECT CASE WHEN array_upper(path,1)>1 THEN '+-' ELSE '' END || REPEAT('--', array_upper(path,1)-2) || node AS "Branch"FROM tORDER BY path;

Page 39: You can do THAT without Perl?

Tree Query Recursion

WITH RECURSIVE t(node, path) AS ( SELECT id, ARRAY[id] FROM employee WHERE boss_id IS NULLUNION ALL SELECT e1.id, t.path || ARRAY[e1.id] /* Recursion */ FROM employee e1 JOIN t ON (e1.boss_id = t.node) WHERE id NOT IN (t.path))SELECT CASE WHEN array_upper(path,1)>1 THEN '+-' ELSE '' END || REPEAT('--', array_upper(path,1)-2) || node AS "Branch"FROM tORDER BY path;

Page 40: You can do THAT without Perl?

Tree Query Termination

WITH RECURSIVE t(node, path) AS ( SELECT id, ARRAY[id] FROM employee WHERE boss_id IS NULLUNION ALL SELECT e1.id, t.path || ARRAY[e1.id] FROM employee e1 JOIN t ON (e1.boss_id = t.node) WHERE id NOT IN (t.path) /* Termination Condition */)SELECT CASE WHEN array_upper(path,1)>1 THEN '+-' ELSE '' END || REPEAT('--', array_upper(path,1)-2) || node AS "Branch"FROM tORDER BY path;

Page 41: You can do THAT without Perl?

Tree Query Display

WITH RECURSIVE t(node, path) AS ( SELECT id, ARRAY[id] FROM employee WHERE boss_id IS NULLUNION ALL SELECT e1.id, t.path || ARRAY[e1.id] FROM employee e1 JOIN t ON (e1.boss_id = t.node) WHERE id NOT IN (t.path))SELECT CASE WHEN array_upper(path,1)>1 THEN '+-' ELSE '' END || REPEAT('--', array_upper(path,1)-2) || node AS "Branch" /* Display */FROM tORDER BY path;

Page 42: You can do THAT without Perl?

Tree Query Initiation Branch ---------- 1 +-2 +---5 +-----11 +-----12 +---6 +-----13 +---7 +-----14 +-3 +---8 +-----15 +---9 +-4 +---10 +-9(16 rows)

Page 43: You can do THAT without Perl?

Travelling Salesman ProblemGiven a number of cities and the costs of travelling from any city to any other city, what is the least-cost round-trip route that visits each city exactly once and then returns to the starting city?

Page 44: You can do THAT without Perl?

TSP Schema

CREATE TABLE pairs ( from_city TEXT NOT NULL, to_city TEXT NOT NULL, distance INTEGER NOT NULL, PRIMARY KEY(from_city, to_city), CHECK (from_city < to_city));

Page 45: You can do THAT without Perl?

TSP DataINSERT INTO pairsVALUES ('Bari','Bologna',672), ('Bari','Bolzano',939), ('Bari','Firenze',723), ('Bari','Genova',944), ('Bari','Milan',881), ('Bari','Napoli',257), ('Bari','Palermo',708), ('Bari','Reggio Calabria',464), ....

Page 46: You can do THAT without Perl?

TSP Program:Symmetric Setup

WITH RECURSIVE both_ways( from_city, to_city, distance) /* Working Table */AS ( SELECT from_city, to_city, distance FROM pairsUNION ALL SELECT to_city AS "from_city", from_city AS "to_city", distance FROM pairs),

Page 47: You can do THAT without Perl?

TSP Program:Symmetric Setup

WITH RECURSIVE both_ways( from_city, to_city, distance)AS (/* Distances One Way */ SELECT from_city, to_city, distance FROM pairsUNION ALL SELECT to_city AS "from_city", from_city AS "to_city", distance FROM pairs),

Page 48: You can do THAT without Perl?

TSP Program:Symmetric Setup

WITH RECURSIVE both_ways( from_city, to_city, distance)AS ( SELECT from_city, to_city, distance FROM pairsUNION ALL /* Distances Other Way */ SELECT to_city AS "from_city", from_city AS "to_city", distance FROM pairs),

Page 49: You can do THAT without Perl?

TSP Program:Path Initialization Step

paths ( from_city, to_city, distance, path)AS ( SELECT from_city, to_city, distance, ARRAY[from_city] AS "path" FROM both_ways b1 WHERE b1.from_city = 'Roma'UNION ALL

Page 50: You can do THAT without Perl?

TSP Program:Path Recursion Step

SELECT b2.from_city, b2.to_city, p.distance + b2.distance, p.path || b2.from_city FROM both_ways b2 JOIN paths p ON ( p.to_city = b2.from_city AND b2.from_city <> ALL (p.path[ 2:array_upper(p.path,1) ]) /* Prevent re-tracing */ AND array_upper(p.path,1) < 6 ))

Page 51: You can do THAT without Perl?

TSP Program:Timely Termination Step

SELECT b2.from_city, b2.to_city, p.distance + b2.distance, p.path || b2.from_city FROM both_ways b2 JOIN paths p ON ( p.to_city = b2.from_city AND b2.from_city <> ALL (p.path[ 2:array_upper(p.path,1) ]) /* Prevent re-tracing */ AND array_upper(p.path,1) < 6 /* Timely Termination */ ))

Page 52: You can do THAT without Perl?

TSP Program:Filter and Display

SELECT path || to_city AS "path", distanceFROM pathsWHERE to_city = 'Roma'AND ARRAY['Milan','Firenze','Napoli'] <@ pathORDER BY distance, pathLIMIT 1;

Page 53: You can do THAT without Perl?

TSP Program:Filter and Display

davidfetter@tsp=# \i travelling_salesman.sql path | distance ----------------------------------+---------- {Roma,Firenze,Milan,Napoli,Roma} | 1553(1 row)

Time: 11679.503 ms

Page 54: You can do THAT without Perl?

TSP Program:Gritty Details

QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=44.14..44.15 rows=1 width=68) (actual time=16131.326..16131.327 rows=1 loops=1) CTE both_ways -> Append (cost=0.00..4.64 rows=132 width=19) (actual time=0.011..0.090 rows=132 loops=1) -> Seq Scan on pairs (cost=0.00..1.66 rows=66 width=19) (actual time=0.010..0.026 rows=66 loops=1) -> Seq Scan on pairs (cost=0.00..1.66 rows=66 width=19) (actual time=0.004..0.029 rows=66 loops=1) CTE paths -> Recursive Union (cost=0.00..38.72 rows=31 width=104) (actual time=0.102..10631.385 rows=1092883 loops=1) -> CTE Scan on both_ways b1 (cost=0.00..2.97 rows=1 width=68) (actual time=0.100..0.251 rows=11 loops=1) Filter: (from_city = 'Roma'::text) -> Hash Join (cost=0.29..3.51 rows=3 width=104) (actual time=180.125..958.459 rows=182145 loops=6) Hash Cond: (b2.from_city = p.to_city) Join Filter: (b2.from_city <> ALL (p.path[2:array_upper(p.path, 1)])) -> CTE Scan on both_ways b2 (cost=0.00..2.64 rows=132 width=68) (actual time=0.001..0.056 rows=132 loops=5) -> Hash (cost=0.25..0.25 rows=3 width=68) (actual time=172.292..172.292 rows=22427 loops=6) -> WorkTable Scan on paths p (cost=0.00..0.25 rows=3 width=68) (actual time=123.167..146.659 rows=22427 loops=6) Filter: (array_upper(path, 1) < 6) -> Sort (cost=0.79..0.79 rows=1 width=68) (actual time=16131.324..16131.324 rows=1 loops=1) Sort Key: paths.distance, ((paths.path || paths.to_city)) Sort Method: top-N heapsort Memory: 17kB -> CTE Scan on paths (cost=0.00..0.78 rows=1 width=68) (actual time=36.621..16127.841 rows=4146 loops=1) Filter: (('{Milan,Firenze,Napoli}'::text[] <@ path) AND (to_city = 'Roma'::text)) Total runtime: 16180.335 ms(22 rows)

Page 55: You can do THAT without Perl?

Hausdorf-Besicovich-WTF Dimension

WITH RECURSIVEZ(Ix, Iy, Cx, Cy, X, Y, I)AS ( SELECT Ix, Iy, X::float, Y::float, X::float, Y::float, 0 FROM (SELECT -2.2 + 0.031 * i, i FROM generate_series(0,101) AS i) AS xgen(x,ix) CROSS JOIN (SELECT -1.5 + 0.031 * i, i FROM generate_series(0,101) AS i) AS ygen(y,iy) UNION ALL SELECT Ix, Iy, Cx, Cy, X * X - Y * Y + Cx AS X, Y * X * 2 + Cy, I + 1 FROM Z WHERE X * X + Y * Y < 16::float AND I < 100),

Page 56: You can do THAT without Perl?

Filter

Zt (Ix, Iy, I) AS ( SELECT Ix, Iy, MAX(I) AS I FROM Z GROUP BY Iy, Ix ORDER BY Iy, Ix)

Page 57: You can do THAT without Perl?

Display

SELECT array_to_string( array_agg( SUBSTRING( ' .,,,-----++++%%%%@@@@#### ', GREATEST(I,1) ),'')FROM ZtGROUP BY IyORDER BY Iy;

Page 58: You can do THAT without Perl?
Page 59: You can do THAT without Perl?

Questions?Comments?Straitjackets?