still using windows 3.1? - modern sql · select * from t1 cross join lateral (select * from t2...

Still using Windows 3.1? So why stick with

SQL-92?

@ModernSQL - https://modern-sql.com/ @MarkusWinand

SQL:1999

LATERAL

Select-list sub-queries must be scalar[0]:

LATERAL Before SQL:1999

SELECT…,(SELECTcolumn_1FROMt1WHEREt1.x=t2.y)AScFROMt2…

(an atomic quantity that can hold only one value at a time[1])

[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar

✗,column_2

More thanone column? ⇒Syntax error

✗,column_2

More thanone column? ⇒Syntax error

}More thanone row?

⇒Runtime error!

SELECT*FROMt1CROSSJOINLATERAL(SELECT*FROMt2WHEREt2.x=t1.x)derived_tableON(true)

LATERAL Since SQL:1999Lateral derived queries can see table names defined before:

LATERAL Since SQL:1999

Valid due to

LATERAL

keyword

Lateral derived queries can see table names defined before:

Valid due to

LATERAL

keyword

Useless, but still required

Valid due to

LATERAL

keyword

Use CROSS JOIN to omit the ON clause

But WHY?

Use-CasesLATERAL

‣ Top-N per group

inside a lateral derived tableFETCHFIRST (or LIMIT, TOP)applies per row from left tables.

Use-CasesLATERAL

FROMtCROSSJOINLATERAL(SELECT…FROM…WHEREt.c=…ORDERBY…LIMIT10)derived_table

‣ Top-N per group

Use-CasesLATERAL

‣ Top-N per group

Use-CasesLATERAL

Add proper indexfor Top-N query

https://use-the-index-luke.com/sql/partial-results/top-n-queries

‣ Top-N per group

‣ Also useful to find most recent news from several subscribed topics (“multi-source top-N”).

Use-CasesLATERAL

‣ Top-N per group

‣ Also useful to find most recent news from several subscribed topics (“multi-source top-N”).

‣ Table function arguments

(TABLE often implies LATERAL)

Use-CasesLATERAL

FROMtJOINTABLE(your_func(t.c))

LATERAL Availability1999

5.1 MariaDB8.0.14 MySQL

9.3 PostgreSQLSQLite

9.1 DB2 LUW11gR1[0] 12cR1 Oracle

2005[1] SQL Server[0]Undocumented. Requires setting trace event 22829.[1]LATERAL is not supported as of SQL Server 2016 but [CROSS|OUTER]APPLY can be used for the same effect.

WITHRECURSIVE(Common Table Expressions)

WITHRECURSIVECREATETABLEt(idINTEGER,parentINTEGER)

WITHRECURSIVE Since SQL:1999

SELECTt.id,t.parentFROMtWHEREt.id=?

SELECTt.id,t.parentFROMtWHEREt.id=?UNIONALLSELECTt.id,t.parentFROMtWHEREt.parent=?

WITHRECURSIVEprev(id,parent)AS(

SELECT*FROMprev

SELECTt.id,t.parentFROMtWHEREt.id=?UNIONALLSELECTt.id,t.parentFROMtJOINprevONt.parent=prev.id

SELECT*FROMprev

WITHRECURSIVE Use Cases

“[…] for certain classes of graphs, solutions utilizing relational database technology […] can offer performance superior to that of the dedicated graph databases.” event.cwi.nl/grades2013/07-welc.pdf

‣ Processing graphs

Shortest route from person A to B in LinkedIn/Facebook/…

https://wiki.postgresql.org/wiki/Loose_indexscan

† n … # distinct values, N … # of table rows. Suitable index required

‣ Finding distinct values

in n*log(N)† time complexity

WITHRECURSIVEgen(n)AS(SELECT1UNIONALLSELECTn+1FROMgenWHEREn<?)SELECT*FROMgen

https://wiki.postgresql.org/wiki/Loose_indexscan

† n … # distinct values, N … # of table rows. Suitable index required

‣ Finding distinct values

in n*log(N)† time complexity

‣ Row generators

To fill gaps (e.g., in time series), generate test data

AvailabilityWITHRECURSIVE1999

5.1 10.2 MariaDB8.0 MySQL

8.4 PostgreSQL3.8.3[0] SQLite

7.0 DB2 LUW11gR2 Oracle

2005 SQL Server[0]Only for top-level SELECT statements

GROUPINGSETS

Only one GROUPBY operation at a time:

GROUPINGSETS Before SQL:1999

SELECTyear,month,sum(revenue)FROMtblGROUPBYyear,month

Monthly revenue

SELECTyear,month,sum(revenue)FROMtblGROUPBYyear,month

Monthly revenue Yearly revenue

SELECTyear,sum(revenue)FROMtblGROUPBYyear

GROUPINGSETS Before SQL:1999SELECTyear,month,sum(revenue)FROMtblGROUPBYyear,month

UNIONALL

GROUPINGSETS Since SQL:1999SELECTyear,month,sum(revenue)FROMtblGROUPBYyear,month

UNIONALL

SELECTyear,month,sum(revenue)FROMtblGROUPBYGROUPINGSETS((year,month),(year))

GROUPINGSETS Availability1999

5.1[0] MariaDB5.0[1] MySQL

5 DB2 LUW9iR1 Oracle

2008 SQL Server[0]Only ROLLUP (properitery syntax).[1]Only ROLLUP (properitery syntax). GROUPING function since MySQL 8.0.

SQL:2003

OVERand

PARTITIONBY

OVER (PARTITION BY) The ProblemTwo distinct concepts could not be used independently:

‣Merge rows with the same key properties

‣ GROUPBY to specify key properties

‣ DISTINCT to use full row as key

OVER (PARTITION BY) The ProblemTwo distinct concepts could not be used independently:

‣Merge rows with the same key properties

‣ GROUPBY to specify key properties

‣ DISTINCT to use full row as key

‣ Aggregate data from related rows ‣ Requires GROUPBY to segregate the rows

‣ COUNT, SUM, AVG, MIN, MAX to aggregate grouped rows

OVER (PARTITION BY) The Problem

SELECTc1,c2FROMt

Yes ⇠

ws ⇢

No SELECTc1

,c2FROMt

SELECTc1,c2FROMt

Yes ⇠

ws ⇢

No SELECTc1

,c2FROMt

SELECTDISTINCTc1,c2FROMt

SELECTc1,c2FROMt

Yes ⇠

ws ⇢

No ⇠ Aggregate ⇢ Yes

SELECTc1,c2FROMt

SELECTc1,SUM(c2)totFROMtGROUPBYc1

Yes ⇠

ws ⇢

SELECTc1,c2FROMt

Yes ⇠

ws ⇢

SELECTc1,c2FROMt

Yes ⇠

ws ⇢

SELECTc1,c2FROMt

SELECTc1,c2FROMtJOIN()taON(t.c1=ta.c1)

Yes ⇠

ws ⇢

SELECTc1,c2FROMt

Yes ⇠

ws ⇢

SELECTc1,c2FROMt

Yes ⇠

ws ⇢

SELECTc1,c2FROMt

OVER (PARTITION BY) Since SQL:2003

Yes ⇠

ws ⇢

SELECTc1,c2FROMt

SELECTc1,c2FROMtFROMt

OVER (PARTITION BY) Since SQL:2003

Yes ⇠

ws ⇢

SELECTc1,c2FROMt

,SUM(c2)OVER(PARTITIONBYc1)

SELECTdep,salary,SUM(salary)OVER()FROMemp

dep salary1 1000 600022 1000 600022 1000 6000333 1000 6000333 1000 6000333 1000 6000

OVER (PARTITION BY) How it works

dep salary1 1000 600022 1000 600022 1000 6000333 1000 6000333 1000 6000333 1000 6000

Look here

dep salary1 1000 600022 1000 600022 1000 6000333 1000 6000333 1000 6000333 1000 6000

dep salary1 1000 100022 1000 200022 1000 2000333 1000 3000333 1000 3000333 1000 3000

)PARTITIONBYdep

dep salary1 1000 100022 1000 200022 1000 2000333 1000 3000333 1000 3000333 1000 3000

)PARTITIONBYdep

dep salary1 1000 100022 1000 200022 1000 2000333 1000 3000333 1000 3000333 1000 3000

)PARTITIONBYdep

OVERand

ORDERBY(Framing & Ranking)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

OVER (ORDER BY) The Problem

SELECTid,value,FROMtransactionst

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

(SELECTSUM(value)FROMtransactionst2WHEREt2.id<=t.id)

FROMtransactionst

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

FROMtransactionst

Range segregation (<=)not possible with

GROUP BY orPARTITION BY

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

FROMtransactionst

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYid

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDING

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10

22 2 +20

22 3 -10

333 4 +50

333 5 -30

333 6 -20

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

1 1 +10 +10

22 2 +20 +20

22 3 -10 +10

333 4 +50 +50

333 5 -30 +20

333 6 -20 .0

PARTITIONBYacnt

OVER (ORDER BY) Since SQL:2003With OVER(ORDERBYn) a new type of functions make sense:

n ROW_NUMBER RANK DENSE_RANK PERCENT_RANK CUME_DISTA 1 1 1 0 0.25B 2 2 2 0.33… 0.75B 3 2 2 0.33… 0.75X 4 4 3 1 1

‣ Aggregates without GROUPBY

Use CasesOVER (SQL:2003)

‣ Running totals, moving averages

Use Cases

AVG(…)OVER(ORDERBY…ROWSBETWEEN3PRECEDINGAND3FOLLOWING)moving_avg

OVER (SQL:2003)

‣ Ranking

Use Cases

OVER (SQL:2003)

‣ Ranking‣ Top-N per Group

Use Cases

SELECT*FROM(SELECTROW_NUMBER()OVER(PARTITIONBY…ORDERBY…)rn,t.*FROMt)numbered_tWHERErn<=3

OVER (SQL:2003)

‣ Avoiding self-joins

Use Cases

OVER (SQL:2003)

‣ Avoiding self-joins

[… many more …]

Use Cases

OVER (SQL:2003)

8.4 PostgreSQL3.25.0 SQLite

7.0 DB2 LUW8i Oracle

2005[0] 2012 SQL Server[0]No framing

OVER (SQL:2003) Availability

8.4 PostgreSQL3.25.0 SQLite

7.0 DB2 LUW8i Oracle

2005[0] 2012 SQL Server[0]No framing

OVER (SQL:2003) AvailabilityImpalaSpark

BigQuery Hive

Inverse Distribution Functions (percentiles)

Inverse Distribution Functions The ProblemGrouped rows cannot be ordered prior to aggregation.

(how to get the middle value (median) of a set)

SELECTd1.valFROMdatad1JOINdatad2ON(d1.val<d2.valOR(d1.val=d2.valANDd1.id<d2.id))GROUPBYd1.valHAVINGcount(*)=(SELECTFLOOR(COUNT(*)/2)FROMdatad3)

Number rows

Pick middle one

Number rows

Pick middle one

Number rows

Pick middle one

Since SQL:2003Inverse Distribution Functions

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Median

Which value?

Two variants: ‣ for discrete values (categories) ‣ for continuous values (linear interpolation)

0 0.25 0.5 0.75 11

PERCENTILE_DISC

0 0.25 0.5 0.75 11

PERCENTILE_DISC0 0.25 0.5 0.75 1

PERCENTILE_DISC

0 0.25 0.5 0.75 11

PERCENTILE_DISC(0.5)

0 0.25 0.5 0.75 11

PERCENTILE_DISC(0.5)0 0.25 0.5 0.75 1

PERCENTILE_CONT

0 0.25 0.5 0.75 11

PERCENTILE_DISC(0.5)0 0.25 0.5 0.75 1

PERCENTILE_CONT

PERCENTILE_DISC(0.5)0 0.25 0.5 0.75 1

PERCENTILE_CONT(0.5)

Inverse Distribution Functions Availability1999

5.1 10.3[0] MariaDBMySQL

11.1 DB2 LUW9iR1 Oracle

2012[0] SQL Server[0]Only as window function (requires OVER clause)

TABLESAMPLE

TABLESAMPLE Since SQL:2003

SELECT*FROMtblTABLESAMPLEsystem(10)REPEATABLE(0)

User provided seed

Or: bernoulli(10)

better sampling,but slow.

User provided seed

Or: bernoulli(10)

better sampling,but slow.

User provided seed

TABLESAMPLE Availability1999

5.1 MariaDBMySQL

9.5[0] PostgreSQLSQLite

8.2[0] DB2 LUW8i[0] Oracle

2005[0] SQL Server[0]Not for derived tables

SQL:2008

Copying rows from another table is easy:

INSERTINTO<target>SELECT…FROM<source>WHERENOTEXISTS(SELECT*FROM<target>WHERE…)

MERGE The Problem

Copying rows from another table is easy:

INSERTINTO<target>SELECT…FROM<source>WHERENOTEXISTS(SELECT*FROM<target>WHERE…)

MERGE The Problem

Both, <target> and <source>

are in scope here.

Deleting rows that exist in another table is also possible:

DELETEFROM<target>WHEREEXISTS(SELECT*FROM<source>WHERE…)

MERGE The Problem

Deleting rows that exist in another table is also possible:

DELETEFROM<target>WHEREEXISTS(SELECT*FROM<source>WHERE…)

MERGE The Problem

are in scope here.

Updating rows from another table is awkward:

UPDATE<target>SET…WHERE…

MERGE The Problem

UPDATE<target>SET…WHERE…

MERGE The Problem

Requires a name(table or updatable view)

but not a subquery

UPDATE<target>SET…WHERE…SET(col1,col2)=(SELECTcol1,col2FROM<source>WHERE…)WHERE…

MERGE The Problem

are in scope here.

SQL:2008 introduced merge to improve this situation:

MERGEINTO<target>USING<source>ON<joincondition>WHENMATCHED[AND<cond>]THEN[UPDATE…|DELETE…]WHENNOTMATCHED[AND<cond>]THENINSERT…

‣It has always two tables in scope,the source can be a derived table (subquery). ‣It can do insert, update, or delete in one go.

MERGE Since SQL:2008

5.1 MariaDBMySQLPostgreSQLSQLite

9.1 DB2 LUW9iR1[0] Oracle

2008 SQL Server[0]No AND condition.

FETCHFIRST

FETCHFIRST The ProblemLimit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary)

SELECT*FROM(SELECT*,ROW_NUMBER()OVER(ORDERBYx)rnFROMdata)numbered_dataWHERErn<=10

SQL:2003 introduced ROW_NUMBER() to number rows.But this still requires wrapping to limit the result.

And how about databases not supporting ROW_NUMBER()?

SQL:2003 introduced ROW_NUMBER() to number rows.But this still requires wrapping to limit the result.

And how about databases not supporting ROW_NUMBER()?

Dammit! Let's takeLIMIT

SELECT*FROMdataORDERBYxFETCHFIRST10ROWSONLY

FETCHFIRST Since SQL:2008SQL:2008 introduced the FETCHFIRST…ROWSONLY clause:

FETCHFIRST Availability1999

5.1 MariaDB3.19.3[0] MySQL6.5[1] 8.4 PostgreSQL

2.1.0[1] SQLite

7.0 DB2 LUW12cR1 Oracle

7.0[2] 2012 SQL Server[0]Earliest mention of LIMIT. Probably inherited from mSQL[1]Functionality available using LIMIT[2]SELECT TOP n ... SQL Server 2000 also supports expressions and bind parameters

SQL:2011

OFFSET

SELECT*FROM(SELECT*,ROW_NUMBER()OVER(ORDERBYx)rnFROMdata)numbered_dataWHERErn>10andrn<=20

OFFSET The ProblemHow to fetch the rows after a limit?

(pagination anybody?)

SELECT*FROMdataORDERBYxOFFSET10ROWSFETCHNEXT10ROWSONLY

OFFSET Since SQL:2011SQL:2011 introduced OFFSET, unfortunately!

OFFSET

OFFSETGrab coasters & stickers!

https://use-the-index-luke.com/no-offset

OFFSET Since SQL:20111999

5.1 MariaDB3.20.3[0] 4.0.6[1] MySQL6.5 PostgreSQL

2.1.0 SQLite

9.7[2] 11.1 DB2 LUW12cR1 Oracle

2012 SQL Server[0]LIMIT[offset,]limit: "With this it's easy to do a poor man's next page/previous page WWW application."[1]The release notes say "Added PostgreSQL compatible LIMIT syntax"[2]Requires enabling the MySQL compatibility vector: db2set DB2_COMPATIBILITY_VECTOR=MYS

OVER (SQL:2011) The ProblemDirect access of other rows of the same window is ugly.

(E.g., calculate the difference to the previous rows)

SELECTbalance,balance-COALESCE(MAX(balance)OVER(ORDERBY…ROWSBETWEEN1PRECEDINGAND1PRECEDING),0)diffFROMt

balance50907030

……………

balance50907030

……………

diff+50+40-20-40

balance50907030

……………

diff+50+40-20-40

balance50907030

……………

diff+50+40-20-40

balance50907030

……………

diff+50+40-20-40

balance50907030

……………

diff+50+40-20-40

OVER (SQL:2011) Since SQL:2011SQL:2011 introduced LEAD, LAG, NTH_VALUE, … for that:

SELECT*,balance-COALESCE(LAG(balance)OVER(ORDERBYx),0)FROMt

Available functions:LEAD/LAGFIRST_VALUE/LAST_VALUENTH_VALUE(col,n)FROMFIRST/LASTRESPECT/IGNORENULLS

OVER (LEAD, LAG, …) Since SQL:20111999

5.1 10.2[0] MariaDB8.0[0] MySQL

8.4[0] PostgreSQL3.25.0[0] SQLite

9.5[1] 11.1 DB2 LUW8i[1] 11gR2 Oracle

2012[1] SQL Server[0]No IGNORENULLS and FROMLAST[1]No NTH_VALUE

System Versioning (Time Traveling)

INSERTUPDATEDELETE

are DESTRUCTIVE

System Versioning The Problem

System Versioning Since SQL:2011Table can be system versioned, application versioned or both.

CREATETABLEt(...,

CREATETABLEt(...,start_tsTIMESTAMP(9)GENERATEDALWAYSASROWSTART,

CREATETABLEt(...,start_tsTIMESTAMP(9)GENERATEDALWAYSASROWSTART,end_tsTIMESTAMP(9)GENERATEDALWAYSASROWEND,

PERIODFORSYSTEM_TIME(start_ts,end_ts)

PERIODFORSYSTEM_TIME(start_ts,end_ts))WITHSYSTEMVERSIONING

ID Data start_ts end_ts1 X 10:00:00

UPDATE...SETDATA='Y'...

ID Data start_ts end_ts1 X 10:00:00 11:00:001 Y 11:00:00

DELETE...WHEREID=1

INSERT...(ID,DATA)VALUES(1,'X')

System Versioning Since SQL:2011

DELETE...WHEREID=1

ID Data start_ts end_ts1 X 10:00:00 11:00:001 Y 11:00:00 12:00:00

Although multiple versions exist, only the “current” one is visible per default.

After 12:00:00, SELECT*FROMt doesn’t return anything anymore.

With FOR…ASOF you can query anything you like: SELECT*FROMtFORSYSTEM_TIMEASOFTIMESTAMP'2019-05-0810:30:00'

ID Data start_ts end_ts

1 X 10:00:00 11:00:00

5.1 10.3 MariaDBMySQLPostgreSQLSQLite

10.1 DB2 LUW10gR1[0] 11gR1[1] Oracle

2016 SQL Server[0]Short term using Flashback.[1]Flashback Archive. Proprietery syntax.

SQL:2016 (released: 2016-12-15)

JSON_TABLE

JSON Since SQL:2016