dublin 4x3-final-slideshare

95
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Window functions in MySQL 8 Dag H. Wanvik Senior database engineer MySQL optimizer team Sep. 2017

Upload: dag-h-wanvik

Post on 28-Jan-2018

490 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Window functions in MySQL 8

Dag H. Wanvik

Senior database engineer

MySQL optimizer team

Sep. 2017

Page 2: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general productdirection. It is intended for information purposes only, andmay not be incorporated into any contract. It is not acommitment to deliver any material, code, or functionality,and should not be relied upon in making purchasingdecisions. The development, release, and timing of anyfeatures or functionality described for Oracle’s productsremains at the sole discretion of Oracle.

Page 3: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

• Introduction: what & why• What's supported• Ranking and analytical wfs• Implementation & performance

1

2

3

44

Page 4: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

PART I

Gentle intro in which we meet the SUM aggregate used as awindow function and get introduced to window partitions and

window frames

Page 5: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Why window functions?

● Part of SQL standard since 2003, with later additions● Frequently requested feature(s) for data analysis● Improves readability and often performance● Most vendors support it, but to varying degrees (YMMV)

Page 6: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Why window functions?SELECT name o_name, department_id, salary AS o_salary, (SELECT SUM(salary) AS sum FROM employee WHERE salary <= o_salary AND NOT (salary = o_salary AND o_name > name)) AS sumFROM employeeORDER BY sum, name;

SELECT name, department_id, salary, SUM(salary) OVER w AS sumFROM employee WINDOW w AS (ORDER BY salary, name ROWS UNBOUNDED PRECEDING)ORDER BY sum, name;

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Dag | 10 | NULL | NULL || Frederik | 10 | 60000 | 60000 || Jon | 10 | 60000 | 120000 || Lena | 20 | 65000 | 185000 || Paula | 20 | 65000 | 250000 || Michael | 10 | 70000 | 320000 || William | 30 | 70000 | 390000 || Nils | NULL | 75000 | 465000 || Nils | 20 | 80000 | 545000 || Erik | 10 | 100000 | 645000 || Rose | 30 | 300000 | 945000 |+----------+---------------+--------+--------+

● Readability● Performance my laptop:

50,000 rows: 16m vs 0.14s● Or use self join, but tricky

Page 7: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

What's a SQL window function?

Short answer: a function that gets its arguments from a set of rows; awindow defined by a partition and a frame.

OK, but

● what is partitioned data?● what is a frame?● what does a window function look like? Hint: OVER keyword

Page 8: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Running ex: employees

+----------+---------------+--------+| name | department_id | salary |+----------+---------------+--------+| Nils | NULL | 75000 || Dag | 10 | NULL || Erik | 10 | 100000 || Frederik | 10 | 60000 || Jon | 10 | 60000 || Michael | 10 | 70000 || Lena | 20 | 65000 || Nils | 20 | 80000 || Paula | 20 | 65000 || Rose | 30 | 300000 || William | 30 | 70000 |+----------+---------------+--------+

SELECT name, department_id, salaryFROM employeeORDER BY department_id, name;

Page 9: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Salaries per dept

+----------+---------------+--------+| name | department_id | salary |+----------+---------------+--------+| Nils | NULL | 75000 || Dag | 10 | NULL || Erik | 10 | 100000 || Frederik | 10 | 60000 || Jon | 10 | 60000 || Michael | 10 | 70000 || Lena | 20 | 65000 || Nils | 20 | 80000 || Paula | 20 | 65000 || Rose | 30 | 300000 || William | 30 | 70000 |+----------+---------------+--------+

+---------------+-------------+| department_id | SUM(salary) |+---------------+-------------+| NULL | 75000 || 10 | 290000 || 20 | 210000 || 30 | 370000 |+---------------+-------------+

SELECT department_id, SUM(salary) FROM employee GROUP BY department_id ORDER BY department_id;

Query: find sums of salaries per department

Page 10: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Grouping loss

SELECT department_id, SUM(salary) FROM employee GROUP BY department_id;

Identity of names andsalaries lost.

Lena

Nils

Paula ∑

Page 11: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Windowing

+----------+---------------+--------+| name | department_id | salary |+----------+---------------+--------+| Nils | NULL | 75000 || Dag | 10 | NULL || Erik | 10 | 100000 || Frederik | 10 | 60000 || Jon | 10 | 60000 || Michael | 10 | 70000 || Lena | 20 | 65000 || Nils | 20 | 80000 || Paula | 20 | 65000 || Rose | 30 | 300000 || William | 30 | 70000 |+----------+---------------+--------+

SELECT name, department_id, salaryFROM employeeORDER BY department_id, name;

Page 12: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Windowing, rows kept

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 945000 || Dag | 10 | NULL | 945000 || Erik | 10 | 100000 | 945000 || Frederik | 10 | 60000 | 945000 || Jon | 10 | 60000 | 945000 || Michael | 10 | 70000 | 945000 || Lena | 20 | 65000 | 945000 || Nils | 20 | 80000 | 945000 || Paula | 20 | 65000 | 945000 || Rose | 30 | 300000 | 945000 || William | 30 | 70000 | 945000 |+----------+---------------+--------+--------+

SELECT name, department_id, salary, SUM(salary) OVER () sumFROM employeeORDER BY department_id, name;

Page 13: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Windowing, «grouped»

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 75000 || Dag | 10 | NULL | 290000 || Erik | 10 | 100000 | 290000 || Frederik | 10 | 60000 | 290000 || Jon | 10 | 60000 | 290000 || Michael | 10 | 70000 | 290000 || Lena | 20 | 65000 | 210000 || Nils | 20 | 80000 | 210000 || Paula | 20 | 65000 | 210000 || Rose | 30 | 300000 | 370000 || William | 30 | 70000 | 370000 |+----------+---------------+--------+--------+

SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id) sumFROM employeeORDER BY department_id, name;

Page 14: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Partition == frame

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 75000 || Dag | 10 | NULL | 290000 || Erik | 10 | 100000 | 290000 || Frederik | 10 | 60000 | 290000 || Jon | 10 | 60000 | 290000 || Michael | 10 | 70000 | 290000 || Lena | 20 | 65000 | 210000 || Nils | 20 | 80000 | 210000 || Paula | 20 | 65000 | 210000 || Rose | 30 | 300000 | 370000 || William | 30 | 70000 | 370000 |+----------+---------------+--------+--------+

SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id) sumFROM employeeORDER BY department_id, name;

All salaries in partitionadded: the window frame isthe entire partition

Page 15: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

SELECT name, salary, department_id, SUM(salary) OVER (PARTITION BY department_id) sumFROM employeeORDER BY department_id;

∑Identity of department names andsalaries kept: no rows are lost

=> A window function is similar to a scalarfunction: adds a result column

=> BUT: can read data from other rowsthan its own: within its WINDOW partitionor frame

Lena

Nils

Paula

Windowing, rows kept

Page 16: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Default partition

SELECT name, department_id, salary, SUM(salary) OVER () sumFROM employeeORDER BY department_id, name;

No partition specified: thewindow frame is the entireresult set

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 945000 || Dag | 10 | NULL | 945000 || Erik | 10 | 100000 | 945000 || Frederik | 10 | 60000 | 945000 || Jon | 10 | 60000 | 945000 || Michael | 10 | 70000 | 945000 || Lena | 20 | 65000 | 945000 || Nils | 20 | 80000 | 945000 || Paula | 20 | 65000 | 945000 || Rose | 30 | 300000 | 945000 || William | 30 | 70000 | 945000 |+----------+---------------+--------+--------+

Page 17: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY department_id, name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

No partition specified: thewindow frame grows

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 75000 || Dag | 10 | NULL | 75000 || Erik | 10 | 100000 | 175000 || Frederik | 10 | 60000 | 235000 || Jon | 10 | 60000 | 295000 || Michael | 10 | 70000 | 365000 || Lena | 20 | 65000 | 430000 || Nils | 20 | 80000 | 510000 || Paula | 20 | 65000 | 575000 || Rose | 30 | 300000 | 875000 || William | 30 | 70000 | 945000 |+----------+---------------+--------+--------+

Cumulative sum

Page 18: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Cumulative sumSELECT name, department_id, salary, SUM(salary) OVER (ORDER BY department_id, name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

No partition specified: thewindow frame grows

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 75000 || Dag | 10 | NULL | 75000 || Erik | 10 | 100000 | 175000 || Frederik | 10 | 60000 | 235000 || Jon | 10 | 60000 | 295000 || Michael | 10 | 70000 | 365000 || Lena | 20 | 65000 | 430000 || Nils | 20 | 80000 | 510000 || Paula | 20 | 65000 | 575000 || Rose | 30 | 300000 | 875000 || William | 30 | 70000 | 945000 |+----------+---------------+--------+--------+

Page 19: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Cumulative sumSELECT name, department_id, salary, SUM(salary) OVER (ORDER BY department_id, name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

No partition specified: thewindow frame grows

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 75000 || Dag | 10 | NULL | 75000 || Erik | 10 | 100000 | 175000 || Frederik | 10 | 60000 | 235000 || Jon | 10 | 60000 | 295000 || Michael | 10 | 70000 | 365000 || Lena | 20 | 65000 | 430000 || Nils | 20 | 80000 | 510000 || Paula | 20 | 65000 | 575000 || Rose | 30 | 300000 | 875000 || William | 30 | 70000 | 945000 |+----------+---------------+--------+--------+

Page 20: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Cumulative sumSELECT name, department_id, salary, SUM(salary) OVER (ORDER BY department_id, name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

No partition specified: thewindow frame grows

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 75000 || Dag | 10 | NULL | 75000 || Erik | 10 | 100000 | 175000 || Frederik | 10 | 60000 | 235000 || Jon | 10 | 60000 | 295000 || Michael | 10 | 70000 | 365000 || Lena | 20 | 65000 | 430000 || Nils | 20 | 80000 | 510000 || Paula | 20 | 65000 | 575000 || Rose | 30 | 300000 | 875000 || William | 30 | 70000 | 945000 |+----------+---------------+--------+--------+

Page 21: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Cumulative sumSELECT name, department_id, salary, SUM(salary) OVER (ORDER BY department_id, name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

No partition specified: thewindow frame grows

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 75000 || Dag | 10 | NULL | 75000 || Erik | 10 | 100000 | 175000 || Frederik | 10 | 60000 | 235000 || Jon | 10 | 60000 | 295000 || Michael | 10 | 70000 | 365000 || Lena | 20 | 65000 | 430000 || Nils | 20 | 80000 | 510000 || Paula | 20 | 65000 | 575000 || Rose | 30 | 300000 | 875000 || William | 30 | 70000 | 945000 |+----------+---------------+--------+--------+

Page 22: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Cumulative sum,partitioned

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 75000 || Dag | 10 | NULL | NULL || Erik | 10 | 100000 | 100000 || Frederik | 10 | 60000 | 160000 || Jon | 10 | 60000 | 220000 || Michael | 10 | 70000 | 290000 || Lena | 20 | 65000 | 65000 || Nils | 20 | 80000 | 145000 || Paula | 20 | 65000 | 210000 || Rose | 30 | 300000 | 300000 || William | 30 | 70000 | 370000 |+----------+---------------+--------+--------+

SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id

ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

No partition specified: thewindow frame grows

New partition: resetsum

Page 23: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Cumulative sum,partitioned

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 75000 || Dag | 10 | NULL | NULL || Erik | 10 | 100000 | 100000 || Frederik | 10 | 60000 | 160000 || Jon | 10 | 60000 | 220000 || Michael | 10 | 70000 | 290000 || Lena | 20 | 65000 | 65000 || Nils | 20 | 80000 | 145000 || Paula | 20 | 65000 | 210000 || Rose | 30 | 300000 | 300000 || William | 30 | 70000 | 370000 |+----------+---------------+--------+--------+

SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id

ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

No partition specified: thewindow frame grows

Page 24: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Cumulative sum,partitioned

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Nils | NULL | 75000 | 75000 || Dag | 10 | NULL | NULL || Erik | 10 | 100000 | 100000 || Frederik | 10 | 60000 | 160000 || Jon | 10 | 60000 | 220000 || Michael | 10 | 70000 | 290000 || Lena | 20 | 65000 | 65000 || Nils | 20 | 80000 | 145000 || Paula | 20 | 65000 | 210000 || Rose | 30 | 300000 | 300000 || William | 30 | 70000 | 370000 |+----------+---------------+--------+--------+

SELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id

ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

No partition specified: thewindow frame grows

Page 25: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Parts of window functionSELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id

ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

Function call + OVER keyword signals a window function

Page 26: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Parts of window functionSELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id

ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

An optional partition specification:

PARTITION BY <expression> {, <expression}*

● A partition divides up the result set in disjoint sets● A window function does not see rows in partitions other than that of

the current row for which it is being evaluated

Page 27: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Parts of window functionSELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id

ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

An optional ordering specification:

ORDER BY <expression> {, <expression}*

● Orders the row within the partition● Not the same as a final query ORDER BY

and makes no guarantees of final query rowordering.

Page 28: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

ORDER BY: growing frame● Some window functions need row ordering to be useful, e.g. RANK● Peers: same value for ORDER BY expression(s)

SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY salary DESC) AS sumFROM employee;

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Rose | 30 | 300000 | 300000 || Erik | 10 | 100000 | 400000 || Nils | 20 | 80000 | 480000 || Nils | NULL | 75000 | 555000 || Michael | 10 | 70000 | 625000 || William | 30 | 70000 | 695000 || Lena | 20 | 65000 | 760000 || Paula | 20 | 65000 | 825000 || Frederik | 10 | 60000 | 885000 || Jon | 10 | 60000 | 945000 || Dag | 10 | NULL | 945000 |+----------+---------------+--------+--------+

Page 29: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

ORDER BY: growing frame● Some window functions need row ordering to be useful, e.g. RANK● Peers: same value for ORDER BY expression(s)

SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY salary DESC) AS sumFROM employee;

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Rose | 30 | 300000 | 300000 || Erik | 10 | 100000 | 400000 || Nils | 20 | 80000 | 480000 || Nils | NULL | 75000 | 555000 || Michael | 10 | 70000 | 695000 || William | 30 | 70000 | 695000 || Lena | 20 | 65000 | 825000 || Paula | 20 | 65000 | 825000 || Frederik | 10 | 60000 | 945000 || Jon | 10 | 60000 | 945000 || Dag | 10 | NULL | 945000 |+----------+---------------+--------+--------+

Page 30: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

ORDER BY: growing frame● Some window functions need row ordering to be useful, e.g. RANK● Peers: same value for ORDER BY expression(s)

SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY salary DESC) AS sumFROM employee;

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Rose | 30 | 300000 | 300000 || Erik | 10 | 100000 | 400000 || Nils | 20 | 80000 | 480000 || Nils | NULL | 75000 | 555000 || Michael | 10 | 70000 | 695000 || William | 30 | 70000 | 695000 || Lena | 20 | 65000 | 825000 || Paula | 20 | 65000 | 825000 || Frederik | 10 | 60000 | 945000 || Jon | 10 | 60000 | 945000 || Dag | 10 | NULL | 945000 |+----------+---------------+--------+--------+

Page 31: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

ORDER BY: growing frame● Some window functions need row ordering to be useful, e.g. RANK● Peers: same value for ORDER BY expression(s)

SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY salary DESC) AS sumFROM employee;

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Rose | 30 | 300000 | 300000 || Erik | 10 | 100000 | 400000 || Nils | 20 | 80000 | 480000 || Nils | NULL | 75000 | 555000 || Michael | 10 | 70000 | 695000 || William | 30 | 70000 | 695000 || Lena | 20 | 65000 | 825000 || Paula | 20 | 65000 | 825000 || Frederik | 10 | 60000 | 945000 || Jon | 10 | 60000 | 945000 || Dag | 10 | NULL | 945000 |+----------+---------------+--------+--------+

Page 32: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

ORDER BY: growing framePEERS

● Some window functions need row ordering to be useful, e.g. RANK● Peers: same value for ORDER BY expression(s)

SELECT name, department_id, salary, SUM(salary) OVER (ORDER BY salary DESC) AS sumFROM employee;

+----------+---------------+--------+--------+| name | department_id | salary | sum |+----------+---------------+--------+--------+| Rose | 30 | 300000 | 300000 || Erik | 10 | 100000 | 400000 || Nils | 20 | 80000 | 480000 || Nils | NULL | 75000 | 555000 || Michael | 10 | 70000 | 695000 || William | 30 | 70000 | 695000 || Lena | 20 | 65000 | 825000 || Paula | 20 | 65000 | 825000 || Frederik | 10 | 60000 | 945000 || Jon | 10 | 60000 | 945000 || Dag | 10 | NULL | 945000 |+----------+---------------+--------+--------+

What happened here?Answer: Two rows are peers w.r.t. salary This is an example of aRANGE frame (implicit)

Page 33: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Parts of window functionSELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id

ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

An optional frame specification

● A subset of rows within a partition● Extent can depend on the current row● Default frame: partition● Not all window functions heed frame

Page 34: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

partition

CURRENTROW

UNBOUNDEDPRECEDING

UNBOUNDEDFOLLOWING

nPRECEDING

mFOLLOWING

n: numeric or temporal

Frame anatomyExamples:

● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

● RANGE CURRENT ROW

● ROWS BETWEEN CURRENT ROW AND3 FOLLOWING

● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING

● RANGE INTERVAL 6 DAY PRECEDING

Page 35: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

partition

CURRENTROW

UNBOUNDEDPRECEDING

UNBOUNDEDFOLLOWING

nPRECEDING

mFOLLOWING

n: numeric or temporal

Frame anatomyExamples:

● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

● RANGE CURRENT ROW

● ROWS BETWEEN CURRENT ROW AND3 FOLLOWING

● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING

● RANGE INTERVAL 6 DAY PRECEDING

Page 36: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

partition

CURRENTROW

UNBOUNDEDPRECEDING

n: numeric or temporal

Frame anatomyExamples:

● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

● RANGE CURRENT ROW

● ROWS BETWEEN CURRENT ROW AND3 FOLLOWING

● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING

● RANGE INTERVAL 6 DAY PRECEDING

Page 37: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

partition

CURRENTROW andpeers

Frame anatomyExamples:

● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

● RANGE CURRENT ROW

● ROWS BETWEEN CURRENT ROW AND3 FOLLOWING

● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING

● RANGE INTERVAL 6 DAY PRECEDING

Page 38: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

partition

CURRENTROW

3FOLLOWING

Frame anatomyExamples:

● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

● RANGE CURRENT ROW

● ROWS BETWEEN CURRENT ROW AND3 FOLLOWING

● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING

● RANGE INTERVAL 6 DAY PRECEDING

Page 39: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

partition

CURRENTROW

2FOLLOWING

Frame anatomyExamples:

● ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

● RANGE CURRENT ROW

● ROWS BETWEEN CURRENT ROW AND3 FOLLOWING

● ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING

● RANGE INTERVAL 6 DAY PRECEDING

2PRECEDING

Page 40: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

ROWS vs. RANGE

● Frame boundaries: physical (ROWS) or logical (RANGE)● ROWS: bound N: # rows. Peers are ignored.● RANGE requires ORDER BY on a single numeric or temporal expression● RANGE: bound N: rows with value for ascending ORDER BY expression within

N lower (PRECEDING) and M higher (FOLLOWING) of value of the current row.Peers are always included in frame.

Ex: ORDER BY date RANGE BETWEEN INTERVAL 6 DAY PRECEDING AND CURRENT ROW

specifies all rows within last week.

● Default (std):

OVER (ORDER BY n) ==OVER (ORDER BY n RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

Page 41: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

DeterminacySELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id

ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employee;

In general, window queries are notdeterministic unless one orders on enoughexpressions to designate the row uniquely.

Minimum guarantee by SQL std: severalequivalent non-deterministic orderings in samequery give the same order (within query).

Page 42: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

DeterminacySELECT name, department_id, salary, SUM(salary) OVER (PARTITION BY department_id

ORDER BY name ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) sumFROM employeeORDER BY department_id, name;

In general, window queries are notdeterministic unless one orders on enoughexpressions to designate the row uniquely.

Minimum guarantee by SQL std: severalequivalent non-deterministic orderings in samequery give the same order (within query).

A final ORDER BY is still required if ordering isdesired: no guarantees from window.

Page 43: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Example: salary analysisQuestion: find the employees with the largest difference between their wage andthat of the department average

SELECT name, department_id, salary, AVG(salary) OVER (PARTITION BY department_id) AS avg, salary - AVG(salary) OVER (PARTITION BY department_id) AS diff FROM employee ORDER BY diff desc;

+----------+---------------+--------+-------------+--------------+| name | department_id | salary | avg | diff |+----------+---------------+--------+-------------+--------------+| Rose | 30 | 300000 | 185000.0000 | 115000.0000 || Erik | 10 | 100000 | 72500.0000 | 27500.0000 || Nils | 20 | 80000 | 70000.0000 | 10000.0000 || Nils | NULL | 75000 | 75000.0000 | 0.0000 || Michael | 10 | 70000 | 72500.0000 | -2500.0000 || Lena | 20 | 65000 | 70000.0000 | -5000.0000 || Paula | 20 | 65000 | 70000.0000 | -5000.0000 || Frederik | 10 | 60000 | 72500.0000 | -12500.0000 || Jon | 10 | 60000 | 72500.0000 | -12500.0000 || William | 30 | 70000 | 185000.0000 | -115000.0000 || Dag | 10 | NULL | 72500.0000 | NULL |+----------+---------------+--------+-------------+--------------+

Page 44: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Example: salary analysisQuestion: find the employees with the largest difference between their wage andthat of the department average

SELECT name, department_id, salary, AVG(salary) OVER (PARTITION BY department_id) AS avg, salary - AVG(salary) OVER (PARTITION BY department_id) AS diff FROM employee ORDER BY diff desc;

● Here: two distinct windows ● A query can use have any number of windows● Logically evaluated in multiple phases

Page 45: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Example: named windowQuestion: find the employees with the largest difference between their wage andthat of the department average

SELECT name, department_id, salary, AVG(salary) OVER w AS avg, salary - AVG(salary) OVER w AS diff FROM employee WINDOW w as (PARTITION BY department_id) ORDER BY diff desc;

Named window w

References to w

● Multiple window functions per window● Will be evaluated in same phase (efficiency)● Better readability

Page 46: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Ex: moving AVG, smoothing

CREATE TABLE sales(id INT AUTO_INCREMENT PRIMARY KEY, date DATE, sale INT); ...;SELECT * FROM sales;

+----+------------+------+| id | date | sale |+----+------------+------+| 1 | 2017-03-01 | 200 || 2 | 2017-04-01 | 300 || 3 | 2017-05-01 | 400 || 4 | 2017-06-01 | 200 || 5 | 2017-07-01 | 600 || 6 | 2017-08-01 | 100 || 7 | 2017-03-01 | 400 || 8 | 2017-04-01 | 300 || 9 | 2017-05-01 | 500 || 10 | 2017-06-01 | 400 || 11 | 2017-07-01 | 600 || 12 | 2017-08-01 | 150 |+----+------------+------+

Page 47: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Ex: moving AVG, smoothing

+-------------+-----------+| MONTH(date) | SUM(sale) |+-------------+-----------+| 3 | 600 || 4 | 600 || 5 | 900 || 6 | 600 || 7 | 1200 || 8 | 250 |+-------------+-----------+

● Sum up sales per month

SELECT MONTH(date), SUM(sale) FROM salesGROUP BY MONTH(date);

Page 48: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Ex: moving AVG, smoothing● Moving AVG over 3 months

SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avgFROM salesGROUP BY MONTH(date)WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);

+-------------+-----------+-------------+| MONTH(date) | SUM(sale) | sliding_avg |+-------------+-----------+-------------+| 3 | 600 | 600.0000 || 4 | 600 | || 5 | 900 | || 6 | 600 | || 7 | 1200 | || 8 | 250 | |+-------------+-----------+-------------+

Page 49: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Ex: moving AVG, smoothing

+-------------+-----------+-------------+| MONTH(date) | SUM(sale) | sliding_avg |+-------------+-----------+-------------+| 3 | 600 | 600.0000 || 4 | 600 | 700.0000 || 5 | 900 | || 6 | 600 | || 7 | 1200 | || 8 | 250 | |+-------------+-----------+-------------+

● Moving AVG over 3 months

SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avgFROM salesGROUP BY MONTH(date)WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);

Page 50: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Ex: moving AVG, smoothing● Moving AVG over 3 months

SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avgFROM salesGROUP BY MONTH(date)WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);

+-------------+-----------+-------------+| MONTH(date) | SUM(sale) | sliding_avg |+-------------+-----------+-------------+| 3 | 600 | 600.0000 || 4 | 600 | 700.0000 || 5 | 900 | 700.0000 || 6 | 600 | || 7 | 1200 | || 8 | 250 | |+-------------+-----------+-------------+

movingframe

Page 51: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Ex: moving AVG, smoothing● Moving AVG over 3 months

SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avgFROM salesGROUP BY MONTH(date)WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);

+-------------+-----------+-------------+| MONTH(date) | SUM(sale) | sliding_avg |+-------------+-----------+-------------+| 3 | 600 | 600.0000 || 4 | 600 | 700.0000 || 5 | 900 | 700.0000 || 6 | 600 | 900.0000 || 7 | 1200 | || 8 | 250 | |+-------------+-----------+-------------+

Page 52: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Ex: moving AVG, smoothing● Moving AVG over 3 months

SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avgFROM salesGROUP BY MONTH(date)WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);

+-------------+-----------+-------------+| MONTH(date) | SUM(sale) | sliding_avg |+-------------+-----------+-------------+| 3 | 600 | 600.0000 || 4 | 600 | 700.0000 || 5 | 900 | 700.0000 || 6 | 600 | 900.0000 || 7 | 1200 | 683.3333 || 8 | 250 | |+-------------+-----------+-------------+

Page 53: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Ex: moving AVG, smoothing● Moving AVG over 3 months

SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avgFROM salesGROUP BY MONTH(date)WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);

+-------------+-----------+-------------+| MONTH(date) | SUM(sale) | sliding_avg |+-------------+-----------+-------------+| 3 | 600 | 600.0000 || 4 | 600 | 700.0000 || 5 | 900 | 700.0000 || 6 | 600 | 900.0000 || 7 | 1200 | 683.3333 || 8 | 250 | 725.0000 |+-------------+-----------+-------------+

Page 54: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Ex: moving AVG, smoothing● Moving AVG over 3 months

SELECT MONTH(date), SUM(sale), AVG(SUM(sale)) OVER w AS sliding_avgFROM salesGROUP BY MONTH(date)WINDOW w AS (ORDER BY MONTH(date) RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING);

+-------------+-----------+-------------+| MONTH(date) | SUM(sale) | sliding_avg |+-------------+-----------+-------------+| 3 | 600 | 600.0000 || 4 | 600 | 700.0000 || 5 | 900 | 700.0000 || 6 | 600 | 900.0000 || 7 | 1200 | 683.3333 || 8 | 250 | 725.0000 |+-------------+-----------+-------------+

Page 55: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Windowing in an SQL query

JOINGROUP

BY, HAVING

WINDOW1

WINDOWn

ORDER BY/DISTINCT/

LIMIT

● Window functions see query result set after grouping/having- filtering on wf results requires subquery

● Ordering not semantically significant● Window functions can't use window functions in same query (without

using subqueries)● In practice, ordering matters. The optimizer can is allowed to

- reorder to minimize sorting required- merge window phases if equivalent

Page 56: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

• Introduction: what & why• What's supported• Ranking and analytical wfs• Implementation & performance

2

4

3

4

1

Page 57: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

PART II

in which we learn which window functions are supportedby MySQL

Page 58: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

MySQL 8.0● Most aggregate functions in MySQL can be used as window functions:

COUNT, SUM, AVG, MAX, MIN, STDDEV_POP (& synonyms),STDDEV_SAMP, VAR_POP (& synonym), VAR_SAMP

Limitation: No DISTINCT in aggregates yet

● All SQL standard specialized window functions

ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST,NTILE, LEAD, LAG, FIRST_VALUE, LAST_VALUE, NTH_VALUE

● Next phase (probably post-GA), more aggregates:

BIT_OR, BIT_XOR, BIT_AND, JSON_ARRAYAGG, JSON_OBJECTAGG[ GROUP_CONCAT ]

Page 59: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Std compliance: extensions

Target SQL standard semantics, but

● Expression (not only column) allowed in PARTITION BY

Benefit: more flexible

● Missing ORDER BY tolerated even if useless (all rows are peers)except for RANGE <value>:requires single ORDER BY expression

● Tolerate frame clause even for window functions that operate on entirepartition (std is stricter)

Benefit: Many wfs can use same window definition

Page 60: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Std compliance: restrictions

Target SQL standard semantics, but

● Valued frame bounds must be static in query● No GROUPS in frame clause (Feature T620)● No EXCLUDE in frame clause● No DISTINCT in aggregates with windowing● IGNORE NULLS not supported● FROM LAST not supported (NTH_VALUE)● No nested window functions (Feature T619)● No row pattern recognition in window clause (Feature R020)● Operator subqueries with windowing only if materializable (WL#10431)

Page 61: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

• Introduction: what & why• What's supported• Ranking and analytical wfs• Implementation & performance

3

4

2

4

1

Page 62: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

PART III

in which we learn about the specialized window functions

Ranking: ROW_NUMBER, RANK, DENSE_RANK, PERCENT_RANK,CUME_DIST, NTILE

Analytical: LEAD, LAG, NTH_VALUE, FIRST_VALUE,LAST_VALUE

Blue ones use frames, the others work on the entire partition.

Page 63: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

ROW_NUMBER● Assign number to row in ascending order

Example: give employees a number according to their salary

SELECT name, department_id AS dept, salary, ROW_NUMBER() OVER w AS `row#`FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC, name ASC)ORDER BY department_id, `row#`;

+----------+------+--------+------+| name | dept | salary | row# |+----------+------+--------+------+| Nils | NULL | 75000 | 1 || Erik | 10 | 100000 | 1 || Michael | 10 | 70000 | 2 || Frederik | 10 | 60000 | 3 || Jon | 10 | 60000 | 4 || Dag | 10 | NULL | 5 || Nils | 20 | 80000 | 1 || Lena | 20 | 65000 | 2 || Paula | 20 | 65000 | 3 || Rose | 30 | 300000 | 1 || William | 30 | 70000 | 2 |+----------+------+--------+------+

Page 64: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

RANK● Rows that are the same w.r.t any ordering have the same rank

Example: rank employees within each department according to theirsalary

SELECT name, department_id AS dept, salary, .. , RANK() OVER w AS `rank`FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC)ORDER BY department_id, `row#`;

+----------+------+--------+------+------+| name | dept | salary | row# | rank |+----------+------+--------+------+------+| Nils | NULL | 75000 | 1 | 1 || Erik | 10 | 100000 | 1 | 1 || Michael | 10 | 70000 | 2 | 2 || Frederik | 10 | 60000 | 3 | 3 || Jon | 10 | 60000 | 4 | 3 || Dag | 10 | NULL | 5 | 5 || Nils | 20 | 80000 | 1 | 1 || Lena | 20 | 65000 | 2 | 2 || Paula | 20 | 65000 | 3 | 2 || Rose | 30 | 300000 | 1 | 1 || William | 30 | 70000 | 2 | 2 |+----------+------+--------+------+------+

Peer rows w.r.t ordering,skip rank 4.

Page 65: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

DENSE_RANK● Rows that are the same wrt any ordering have the same rank

Example: rank employees within each department according to theirsalary

SELECT name, department_id AS dept, salary, .. , DENSE_RANK() OVER w AS denseFROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC)ORDER BY department_id, `row#`;

+----------+------+--------+------+------+-------+| name | dept | salary | row# | rank | dense |+----------+------+--------+------+------+-------+| Nils | NULL | 75000 | 1 | 1 | 1 || Erik | 10 | 100000 | 1 | 1 | 1 || Michael | 10 | 70000 | 2 | 2 | 2 || Frederik | 10 | 60000 | 3 | 3 | 3 || Jon | 10 | 60000 | 4 | 3 | 3 || Dag | 10 | NULL | 5 | 5 | 4 || Nils | 20 | 80000 | 1 | 1 | 1 || Lena | 20 | 65000 | 2 | 2 | 2 || Paula | 20 | 65000 | 3 | 2 | 2 || Rose | 30 | 300000 | 1 | 1 | 1 || William | 30 | 70000 | 2 | 2 | 2 |+----------+------+--------+------+------+-------+

Peer rows w.r.t ordering,do not skip rank 4.

Page 66: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

PERCENT_RANK● Relative rank: (rank - 1) / (total p.rows - 1), or 0 if one row in partition

Example: rank employees within each department according to theirsalary

SELECT name, department_id AS dept, salary, .. , PERCENT_RANK() OVER w AS `%rank`FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC)ORDER BY department_id, `row#`;

+----------+------+--------+------+------+-------+-------+| name | dept | salary | row# | rank | dense | %rank |+----------+------+--------+------+------+-------+-------+| Nils | NULL | 75000 | 1 | 1 | 1 | 0 || Erik | 10 | 100000 | 1 | 1 | 1 | 0 || Michael | 10 | 70000 | 2 | 2 | 2 | 0.25 || Frederik | 10 | 60000 | 3 | 3 | 3 | 0.5 || Jon | 10 | 60000 | 4 | 3 | 3 | 0.5 || Dag | 10 | NULL | 5 | 5 | 4 | 1 || Nils | 20 | 80000 | 1 | 1 | 1 | 0 || Lena | 20 | 65000 | 2 | 2 | 2 | 0.5 || Paula | 20 | 65000 | 3 | 2 | 2 | 0.5 || Rose | 30 | 300000 | 1 | 1 | 1 | 0 || William | 30 | 70000 | 2 | 2 | 2 | 1 |+----------+------+--------+------+------+-------+-------+

(3-1)/(5-1)=0.5

Page 67: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

CUME_DIST● Cumulative relative rank: preceding rows incl. peers / total p.rows ● Example: cumulative rank of employees within each department according

to their salary

SELECT name, department_id AS dept, salary, .. , CUME_DIST() OVER w AS cumeFROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC)ORDER BY department_id, `row#`;

+----------+------+--------+------+------+-------+-------+---------+| name | dept | salary | row# | rank | dense | %rank | cume |+----------+------+--------+------+------+-------+-------+---------+| Nils | NULL | 75000 | 1 | 1 | 1 | 0 | 1 || Erik | 10 | 100000 | 1 | 1 | 1 | 0 | 0.2 || Michael | 10 | 70000 | 2 | 2 | 2 | 0.25 | 0.4 || Frederik | 10 | 60000 | 3 | 3 | 3 | 0.5 | 0.8 || Jon | 10 | 60000 | 4 | 3 | 3 | 0.5 | 0.8 || Dag | 10 | NULL | 5 | 5 | 4 | 1 | 1 || Nils | 20 | 80000 | 1 | 1 | 1 | 0 | 0.33333 || Lena | 20 | 65000 | 2 | 2 | 2 | 0.5 | 1 || Paula | 20 | 65000 | 3 | 2 | 2 | 0.5 | 1 || Rose | 30 | 300000 | 1 | 1 | 1 | 0 | 0.5 || William | 30 | 70000 | 2 | 2 | 2 | 1 | 1 |+----------+------+--------+------+------+-------+-------+---------+

4/5=0.8

Page 68: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

NTILE● Divides an ordered partition into a specified number of groups aka buckets

as evenly as possible and assigns a bucket number to each row in thepartition. In spite of name, not the same as percentile!

SELECT name, department_id AS dept, salary, .. , NTILE(3) OVER w AS `3-tile`FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC)ORDER BY department_id, `row#`;

+----------+------+--------+------+------+-------+-------+---------+--------+| name | dept | salary | row# | rank | dense | %rank | cume | 3-tile |+----------+------+--------+------+------+-------+-------+---------+--------+| Nils | NULL | 75000 | 1 | 1 | 1 | 0 | 1 | 1 || Erik | 10 | 100000 | 1 | 1 | 1 | 0 | 0.2 | 1 || Michael | 10 | 70000 | 2 | 2 | 2 | 0.25 | 0.4 | 1 || Frederik | 10 | 60000 | 3 | 3 | 3 | 0.5 | 0.8 | 2 || Jon | 10 | 60000 | 4 | 3 | 3 | 0.5 | 0.8 | 2 || Dag | 10 | NULL | 5 | 5 | 4 | 1 | 1 | 3 || Nils | 20 | 80000 | 1 | 1 | 1 | 0 | 0.33333 | 1 || Lena | 20 | 65000 | 2 | 2 | 2 | 0.5 | 1 | 2 || Paula | 20 | 65000 | 3 | 2 | 2 | 0.5 | 1 | 3 || Rose | 30 | 300000 | 1 | 1 | 1 | 0 | 0.5 | 1 || William | 30 | 70000 | 2 | 2 | 2 | 1 | 1 | 2 |+----------+------+--------+------+------+-------+-------+---------+--------+

Page 69: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

LEAD, LAG● Returns value evaluated at the row that is offset rows after/before the current row

within the partition; if there is no such row, instead return an optional defaultexpression (which must be of the same type as value).

● Both offset and default expr are evaluated with respect to the current row. Ifomitted, offset defaults to 1 and default expr to null

Syntax: LEAD( <expr> [, <offset> [, <default expr> ] ] ) [ <RESPECT NULLS> ]

Example: LEAD(date) OVER (..)

Note: “IGNORE NULLS” not supported, RESPECT NULLS is default but can bespecified.

● Any window frame is ignored

Page 70: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

LEAD

+----------+------+--------+--------+| name | dept | salary | lead |+----------+------+--------+--------+| Rose | 30 | 300000 | 100000 || Erik | 10 | 100000 | || Nils | 20 | 80000 | || Nils | NULL | 75000 | || Michael | 10 | 70000 | || William | 30 | 70000 | : || Lena | 20 | 65000 | || Paula | 20 | 65000 | || Frederik | 10 | 60000 | || Jon | 10 | 60000 | || Dag | 10 | NULL | |+----------+------+--------+--------+

SELECT name, department_id AS dept, salary, LEAD(salary, 1) OVER w AS `lead` FROM employee WINDOW w AS (ORDER BY salary DESC);

Page 71: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

LEAD

+----------+------+--------+--------+| name | dept | salary | lead |+----------+------+--------+--------+| Rose | 30 | 300000 | 100000 || Erik | 10 | 100000 | 80000 || Nils | 20 | 80000 | || Nils | NULL | 75000 | || Michael | 10 | 70000 | || William | 30 | 70000 | || Lena | 20 | 65000 | || Paula | 20 | 65000 | || Frederik | 10 | 60000 | || Jon | 10 | 60000 | || Dag | 10 | NULL | |+----------+------+--------+--------+

SELECT name, department_id AS dept, salary, LEAD(salary, 1) OVER w AS `lead` FROM employee WINDOW w AS (ORDER BY salary DESC);

Page 72: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

LEAD

+----------+------+--------+--------+| name | dept | salary | lead |+----------+------+--------+--------+| Rose | 30 | 300000 | 100000 || Erik | 10 | 100000 | 80000 || Nils | 20 | 80000 | 75000 || Nils | NULL | 75000 | || Michael | 10 | 70000 | || William | 30 | 70000 | || Lena | 20 | 65000 | || Paula | 20 | 65000 | || Frederik | 10 | 60000 | || Jon | 10 | 60000 | || Dag | 10 | NULL | |+----------+------+--------+--------+

SELECT name, department_id AS dept, salary, LEAD(salary, 1) OVER w AS `lead` FROM employee WINDOW w AS (ORDER BY salary DESC);

Page 73: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

LEAD

+----------+------+--------+--------+| name | dept | salary | lead |+----------+------+--------+--------+| Rose | 30 | 300000 | 100000 || Erik | 10 | 100000 | 80000 || Nils | 20 | 80000 | 75000 || Nils | NULL | 75000 | 70000 || Michael | 10 | 70000 | 70000 || William | 30 | 70000 | 65000 || Lena | 20 | 65000 | 65000 || Paula | 20 | 65000 | 60000 || Frederik | 10 | 60000 | 60000 || Jon | 10 | 60000 | NULL || Dag | 10 | NULL | NULL |+----------+------+--------+--------+

SELECT name, department_id AS dept, salary, LEAD(salary, 1) OVER w AS `lead` FROM employee WINDOW w AS (ORDER BY salary DESC);

Page 74: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

LEAD

+----------+------+--------+--------+| name | dept | salary | lead |+----------+------+--------+--------+| Rose | 30 | 300000 | 100000 || Erik | 10 | 100000 | 80000 || Nils | 20 | 80000 | 75000 || Nils | NULL | 75000 | 70000 || Michael | 10 | 70000 | 70000 || William | 30 | 70000 | 65000 || Lena | 20 | 65000 | 65000 || Paula | 20 | 65000 | 60000 || Frederik | 10 | 60000 | 60000 || Jon | 10 | 60000 | NULL || Dag | 10 | NULL | 77000 |+----------+------+--------+--------+

SELECT name, department_id AS dept, salary, LEAD(salary, 1, 77000) OVER w AS `lead` FROM employee WINDOW w AS (ORDER BY salary DESC);

default

Page 75: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

LEAD - gap detection

● Classic example:

CREATE TABLE t(i INT);INSERT INTO t VALUES (1), (2), (4), (5), (6), (8), (9), (10);SELECT i, l FROM (SELECT i, LEAD(i) OVER (ORDER BY i) AS l FROM t) dWHERE i + 1 <> l;

+------+------+| i | l |+------+------+| 2 | 4 || 6 | 8 |+------+------+

Page 76: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

FIRST_VALUE, LAST_VALUEReturns value evaluated at the first, last in the frame of the current row within thepartition; if there is no nth row (frame is too small), the NTH_VALUE returns NULL.

Note: “IGNORE NULLS” is not supported, RESPECT NULLS is default but can bespecified.

Page 77: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

FIRST_VALUE

● Difference between employee wages and best paid in department

SELECT name, department_id AS dept, salary, FIRST_VALUE(salary) OVER w - salary AS diff FROM employee WINDOW w AS (PARTITION BY department_id ORDER BY salary DESC)

+----------+------+--------+--------+| name | dept | salary | diff |+----------+------+--------+--------+| Nils | NULL | 75000 | 0 || Erik | 10 | 100000 | 0 || Michael | 10 | 70000 | 30000 || Frederik | 10 | 60000 | 40000 || Jon | 10 | 60000 | 40000 || Dag | 10 | NULL | NULL || Nils | 20 | 80000 | 0 || Lena | 20 | 65000 | 15000 || Paula | 20 | 65000 | 15000 || Rose | 30 | 300000 | 0 || William | 30 | 70000 | 230000 |+----------+------+--------+--------+

Page 78: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

NTH_VALUEReturns value evaluated at the nth in the frame of the current row within thepartition; if there is no nth row (frame is too small), the NTH_VALUE returns NULL.

Note: “IGNORE NULLS” is not supported, RESPECT NULLS is used but can bespecified.

Note: For NTH_VALUE, “FROM LAST” is not supported, FROM FIRST is used butcan be specified

Page 79: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

• Introduction: what & why• What's supported• Ranking and analytical wfs• Implementation & performance4

2

3

1

Page 80: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

PART IV

is where we look at implementation and performanceaspects

Page 81: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Windowing in an SQL query

JOINGROUP

BY, HAVING

WINDOW1

WINDOWn

ORDER BY/DISTINCT/

LIMIT

● Window functions see query result set after grouping/having- filtering on wf results requires subquery

● Ordering not semantically significant● Window functions can't use window functions in same query (without

using subqueries)● In practice, ordering matters. The optimizer is allowed to

- reorder to minimize sorting required- merge window phases if equivalent

Page 82: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

JOIN GROUPBY

WINDOW1

WINDOWn

ORDER BY/DISTINCT/

LIMIT

Sort byconcatenation of

PARTITION BY and ORDER BY

● Tmp table between each windowing step● Optimization: re-order windows to

eliminate sorting steps: when equalPARTITION BY/ORDER BY expressions

Processing window functions

SELECT name, SUM(salary) OVER () FROM employee LIMIT 3

+----------+---------------------+| name | SUM(salary) OVER () |+----------+---------------------+| Dag | 945000 || Erik | 945000 || Frederik | 945000 |+----------+---------------------+

Need to read all rows to get SUMbefore we can output row 1:need buffering to merge original rowwith result of window function

Page 83: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

JOIN GROUPBY

WINDOW1

WINDOWn

ORDER BY/DISTINCT/

LIMIT

Row addressable buffer aka

frame buffer

in-memory tmp table; overflows automatically to disk if needed

Permits re-reading rows when framemoves

Processing window functions

Page 84: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Row addressable buffer aka

frame buffer

Streaming window functions

Frame buffer not needed for streaming window functions:

● ROW_NUMBER, RANK, DENSE_RANK● Aggregates with ROW frame and dynamically growing

upper bound, e.g.

SELECT SUM(salary) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)FROM employee;

● Non-streaming (need buffer), e.g.CUME_DIST, SUM() OVER ()

=> a function on what window function and which framespecification

If in doubt, check with EXPLAIN FORMAT=JSON

Page 85: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

EXPLAIN: streamingEXPLAIN format=json SELECT SUM(salary) OVER (ROWS UNBOUNDED PRECEDING) FROM employee;"query_block": { "select_id": 1, "cost_info": { "query_cost": "1.35" }, "windowing": { "windows": [ { "name": "<unnamed window>", "functions": [ "sum" ] } ], "table": { "table_name": "employee", "access_type": "ALL", "rows_examined_per_scan": 11, "rows_produced_per_join": 11, "filtered": "100.00", "cost_info": { "read_cost": "0.25", "eval_cost": "1.10", "prefix_cost": "1.35", "data_read_per_join": "1K" }, "used_columns": [ "salary" ] } } }}

Page 86: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

EXPLAIN: frame bufferEXPLAIN format=json SELECT COUNT(salary) OVER () FROM employee;

"query_block": { "select_id": 1, "cost_info": { "query_cost": "1.35" }, "windowing": { "windows": [ { "name": "<unnamed window>", "frame_buffer": { "using_temporary_table": true, "optimized_frame_evaluation": true }, "functions": [ "count" ] } ], "table": { "table_name": "employee", "access_type": "ALL", "rows_examined_per_scan": 11, "rows_produced_per_join": 11, "filtered": "100.00", "cost_info": { "read_cost": "0.25", "eval_cost": "1.10", "prefix_cost": "1.35", "data_read_per_join": "1K" }, "used_columns": [ "salary" ] } } }

Page 87: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

JOIN GROUPBY

WINDOW1

WINDOWn

ORDER BY/DISTINCT/

LIMIT

Row addressable buffer aka

frame buffer

in-memory; overflows automatically to disk if needed

Permits re-reading rows when framemoves

Frame buffer processing

SELECT name, SUM(salary) OVER () FROM employee LIMIT 3

Optimization 1: Compute SUM onlyonce (static frame)But what if frame changes?

+----------+---------------------+| name | SUM(salary) OVER () |+----------+---------------------+| Dag | 945000 || Erik | 945000 || Frederik | 945000 |+----------+---------------------+

Page 88: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Frame buffer processing● Case: expanding frame

SELECT name, salary, SUM(salary) OVER (ORDER BY name) AS `sum`FROM employee LIMIT 3

Optimization 2: Remember SUM andadjust: here add next row'scontribution: (NULL+100k)+60k=160k

But what if we have a moving frame?

+----------+--------+--------+| name | salary | sum |+----------+--------+--------+| Dag | NULL | NULL || Erik | 100000 | 100000 || Frederik | 60000 | 160000 |+----------+--------+--------+

Page 89: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Inversion● Case: moving frame: Sales over this month and last

SELECT MONTH(date) AS month, SUM(sale) AS monthly, SUM(SUM(sale)) OVER (ORDER BY MONTH(date) RANGE 1 PRECEDING) AS `this&last` FROM sales GROUP BY MONTH(date);

Optimization 3: Remember SUM andadjust: here remove last contributionfrom 2 PRECEDING, then add currentrow's contribution: inversion

NOTE: Ok only if: a + b - a + c = b + c

This isn't always the case foraggregates

Penalty if we can't do this for largewindow frames: O(#partition size) vs. O(#partition size * #frame size)

+-------+---------+-----------+| month | monthly | this&last |+-------+---------+-----------+| 3 | 600 | 600 || 4 | 600 | 1200 || 5 | 900 | 1500 || 6 | 600 | 1500 || 7 | 1200 | 1800 || 8 | 250 | 1450 |+-------+---------+-----------+

Page 90: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Floating aggregates, standard deviation and variance aggregates

mysql> show variables like '%high%';+------------------------------+-------+| Variable_name | Value |+------------------------------+-------+| windowing_use_high_precision | ON |+------------------------------+-------+

SET windowing_use_high_precision= off;

For variance, the differences are only in the last significant few digits to thethe incremental algorithm yielding slightly different results (usually insignificant)

For floats, this can matter of if summing very large and small numbers:

mysql> select 1.7976931348623157E+307 + 1 - 1.7976931348623157E+307;+-------------------------------------------------------+| 1.7976931348623157E+307 + 1 - 1.7976931348623157E+307 |+-------------------------------------------------------+| 0 |+-------------------------------------------------------+

Conditional inversionWhen: a + b - a + c ≠ b + c

● recomputes: slow, butguaranteed same results asgrouped aggregates

● linear performance

Page 91: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Performance hints

● Use named windows to eliminate several windowing steps whenever possiblePossibility: analyze and collapse windows where semantics allow it.We might want to add this capability to the optimizer if large demand.

● Streaming window functions faster than those that need buffering

● MAX/MIN do not support inversion, so can slow with large frames unless the expression is a prefix of the ORDER BY expressions

● JSON_OBJECTAGG is not invertible, so can slow with large frames

Page 92: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Q&A

Thank you for using MySQL!

Blog: http://mysqlserverteam.com/mysql-8-0-2-introducing-window-functions/

Page 93: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The preceding is intended to outline our general productdirection. It is intended for information purposes only, andmay not be incorporated into any contract. It is not acommitment to deliver any material, code, or functionality,and should not be relied upon in making purchasingdecisions. The development, release, and timing of anyfeatures or functionality described for Oracle’s productsremains at the sole discretion of Oracle.

Page 94: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Page 95: Dublin 4x3-final-slideshare

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |