still using windows 3.1? - modern sql · select * from t1 cross join lateral (select * from t2...

Post on 30-May-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Still using Windows 3.1? So why stick with

SQL-92?

@ModernSQL - https://modern-sql.com/ @MarkusWinand

SQL:1999

LATERAL

Select-list sub-queries must be scalar[0]:

LATERAL Before SQL:1999

SELECT…,(SELECTcolumn_1FROMt1WHEREt1.x=t2.y)AScFROMt2…

(an atomic quantity that can hold only one value at a time[1])

[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar

Select-list sub-queries must be scalar[0]:

LATERAL Before SQL:1999

SELECT…,(SELECTcolumn_1FROMt1WHEREt1.x=t2.y)AScFROMt2…

(an atomic quantity that can hold only one value at a time[1])

[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar

✗,column_2

More thanone column? ⇒Syntax error

Select-list sub-queries must be scalar[0]:

LATERAL Before SQL:1999

SELECT…,(SELECTcolumn_1FROMt1WHEREt1.x=t2.y)AScFROMt2…

(an atomic quantity that can hold only one value at a time[1])

[0] Neglecting row values and other workarounds here; [1] https://en.wikipedia.org/wiki/Scalar

✗,column_2

More thanone column? ⇒Syntax error

}More thanone row?

⇒Runtime error!

SELECT*FROMt1CROSSJOINLATERAL(SELECT*FROMt2WHEREt2.x=t1.x)derived_tableON(true)

LATERAL Since SQL:1999Lateral derived queries can see table names defined before:

SELECT*FROMt1CROSSJOINLATERAL(SELECT*FROMt2WHEREt2.x=t1.x)derived_tableON(true)

LATERAL Since SQL:1999

Valid due to

LATERAL

keyword

Lateral derived queries can see table names defined before:

SELECT*FROMt1CROSSJOINLATERAL(SELECT*FROMt2WHEREt2.x=t1.x)derived_tableON(true)

LATERAL Since SQL:1999

Valid due to

LATERAL

keyword

Useless, but still required

Lateral derived queries can see table names defined before:

SELECT*FROMt1CROSSJOINLATERAL(SELECT*FROMt2WHEREt2.x=t1.x)derived_tableON(true)

LATERAL Since SQL:1999

Valid due to

LATERAL

keyword

Lateral derived queries can see table names defined before:

Use CROSS JOIN to omit the ON clause

But WHY?

Use-CasesLATERAL

‣ Top-N per group

inside a lateral derived tableFETCHFIRST (or LIMIT, TOP)applies per row from left tables.

Use-CasesLATERAL

FROMtCROSSJOINLATERAL(SELECT…FROM…WHEREt.c=…ORDERBY…LIMIT10)derived_table

‣ Top-N per group

inside a lateral derived tableFETCHFIRST (or LIMIT, TOP)applies per row from left tables.

Use-CasesLATERAL

FROMtCROSSJOINLATERAL(SELECT…FROM…WHEREt.c=…ORDERBY…LIMIT10)derived_table

‣ Top-N per group

inside a lateral derived tableFETCHFIRST (or LIMIT, TOP)applies per row from left tables.

Use-CasesLATERAL

Add proper indexfor Top-N query

https://use-the-index-luke.com/sql/partial-results/top-n-queries

FROMtCROSSJOINLATERAL(SELECT…FROM…WHEREt.c=…ORDERBY…LIMIT10)derived_table

‣ Top-N per group

inside a lateral derived tableFETCHFIRST (or LIMIT, TOP)applies per row from left tables.

‣ Also useful to find most recent news from several subscribed topics (“multi-source top-N”).

Use-CasesLATERAL

FROMtCROSSJOINLATERAL(SELECT…FROM…WHEREt.c=…ORDERBY…LIMIT10)derived_table

‣ Top-N per group

inside a lateral derived tableFETCHFIRST (or LIMIT, TOP)applies per row from left tables.

‣ Also useful to find most recent news from several subscribed topics (“multi-source top-N”).

‣ Table function arguments

(TABLE often implies LATERAL)

Use-CasesLATERAL

FROMtJOINTABLE(your_func(t.c))

LATERAL Availability1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1 MariaDB8.0.14 MySQL

9.3 PostgreSQLSQLite

9.1 DB2 LUW11gR1[0] 12cR1 Oracle

2005[1] SQL Server[0]Undocumented. Requires setting trace event 22829.[1]LATERAL is not supported as of SQL Server 2016 but [CROSS|OUTER]APPLY can be used for the same effect.

WITHRECURSIVE(Common Table Expressions)

WITHRECURSIVECREATETABLEt(idINTEGER,parentINTEGER)

WITHRECURSIVECREATETABLEt(idINTEGER,parentINTEGER)

WITHRECURSIVECREATETABLEt(idINTEGER,parentINTEGER)

WITHRECURSIVE Since SQL:1999

SELECTt.id,t.parentFROMtWHEREt.id=?

WITHRECURSIVE Since SQL:1999

SELECTt.id,t.parentFROMtWHEREt.id=?UNIONALLSELECTt.id,t.parentFROMtWHEREt.parent=?

WITHRECURSIVE Since SQL:1999

SELECTt.id,t.parentFROMtWHEREt.id=?UNIONALLSELECTt.id,t.parentFROMtWHEREt.parent=?

WITHRECURSIVE Since SQL:1999

SELECTt.id,t.parentFROMtWHEREt.id=?UNIONALLSELECTt.id,t.parentFROMtWHEREt.parent=?

WITHRECURSIVE Since SQL:1999

WITHRECURSIVEprev(id,parent)AS(

)

SELECTt.id,t.parentFROMtWHEREt.id=?UNIONALLSELECTt.id,t.parentFROMtWHEREt.parent=?

SELECT*FROMprev

WITHRECURSIVE Since SQL:1999

WITHRECURSIVEprev(id,parent)AS(

)

SELECTt.id,t.parentFROMtWHEREt.id=?UNIONALLSELECTt.id,t.parentFROMtJOINprevONt.parent=prev.id

SELECT*FROMprev

WITHRECURSIVE Since SQL:1999

WITHRECURSIVEprev(id,parent)AS(

)

SELECTt.id,t.parentFROMtWHEREt.id=?UNIONALLSELECTt.id,t.parentFROMtJOINprevONt.parent=prev.id

SELECT*FROMprev

WITHRECURSIVE Since SQL:1999

WITHRECURSIVEprev(id,parent)AS(

)

SELECTt.id,t.parentFROMtWHEREt.id=?UNIONALLSELECTt.id,t.parentFROMtJOINprevONt.parent=prev.id

SELECT*FROMprev

WITHRECURSIVE Since SQL:1999

WITHRECURSIVE Use Cases

“[…] for certain classes of graphs, solutions utilizing relational database technology […] can offer performance superior to that of the dedicated graph databases.” event.cwi.nl/grades2013/07-welc.pdf

‣ Processing graphs

Shortest route from person A to B in LinkedIn/Facebook/…

WITHRECURSIVE Use Cases

“[…] for certain classes of graphs, solutions utilizing relational database technology […] can offer performance superior to that of the dedicated graph databases.” event.cwi.nl/grades2013/07-welc.pdf

https://wiki.postgresql.org/wiki/Loose_indexscan

† n … # distinct values, N … # of table rows. Suitable index required

‣ Processing graphs

Shortest route from person A to B in LinkedIn/Facebook/…

‣ Finding distinct values

in n*log(N)† time complexity

WITHRECURSIVE Use Cases

WITHRECURSIVEgen(n)AS(SELECT1UNIONALLSELECTn+1FROMgenWHEREn<?)SELECT*FROMgen

“[…] for certain classes of graphs, solutions utilizing relational database technology […] can offer performance superior to that of the dedicated graph databases.” event.cwi.nl/grades2013/07-welc.pdf

https://wiki.postgresql.org/wiki/Loose_indexscan

† n … # distinct values, N … # of table rows. Suitable index required

‣ Processing graphs

Shortest route from person A to B in LinkedIn/Facebook/…

‣ Finding distinct values

in n*log(N)† time complexity

‣ Row generators

To fill gaps (e.g., in time series), generate test data

WITHRECURSIVE Use Cases

AvailabilityWITHRECURSIVE1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1 10.2 MariaDB8.0 MySQL

8.4 PostgreSQL3.8.3[0] SQLite

7.0 DB2 LUW11gR2 Oracle

2005 SQL Server[0]Only for top-level SELECT statements

GROUPINGSETS

Only one GROUPBY operation at a time:

GROUPINGSETS Before SQL:1999

Only one GROUPBY operation at a time:

GROUPINGSETS Before SQL:1999

SELECTyear,month,sum(revenue)FROMtblGROUPBYyear,month

Monthly revenue

Only one GROUPBY operation at a time:

GROUPINGSETS Before SQL:1999

SELECTyear,month,sum(revenue)FROMtblGROUPBYyear,month

Monthly revenue Yearly revenue

SELECTyear,sum(revenue)FROMtblGROUPBYyear

GROUPINGSETS Before SQL:1999SELECTyear,month,sum(revenue)FROMtblGROUPBYyear,month

SELECTyear,sum(revenue)FROMtblGROUPBYyear

GROUPINGSETS Before SQL:1999SELECTyear,month,sum(revenue)FROMtblGROUPBYyear,month

SELECTyear,sum(revenue)FROMtblGROUPBYyear

UNIONALL

,null

GROUPINGSETS Since SQL:1999SELECTyear,month,sum(revenue)FROMtblGROUPBYyear,month

SELECTyear,sum(revenue)FROMtblGROUPBYyear

UNIONALL

,null

SELECTyear,month,sum(revenue)FROMtblGROUPBYGROUPINGSETS((year,month),(year))

GROUPINGSETS Availability1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1[0] MariaDB5.0[1] MySQL

9.5 PostgreSQLSQLite

5 DB2 LUW9iR1 Oracle

2008 SQL Server[0]Only ROLLUP (properitery syntax).[1]Only ROLLUP (properitery syntax). GROUPING function since MySQL 8.0.

SQL:2003

OVERand

PARTITIONBY

OVER (PARTITION BY) The ProblemTwo distinct concepts could not be used independently:

OVER (PARTITION BY) The ProblemTwo distinct concepts could not be used independently:

‣Merge rows with the same key properties

‣ GROUPBY to specify key properties

‣ DISTINCT to use full row as key

OVER (PARTITION BY) The ProblemTwo distinct concepts could not be used independently:

‣Merge rows with the same key properties

‣ GROUPBY to specify key properties

‣ DISTINCT to use full row as key

‣ Aggregate data from related rows ‣ Requires GROUPBY to segregate the rows

‣ COUNT, SUM, AVG, MIN, MAX to aggregate grouped rows

OVER (PARTITION BY) The Problem

OVER (PARTITION BY) The Problem

SELECTc1,c2FROMt

SELECTc1,c2FROMt

OVER (PARTITION BY) The Problem

Yes ⇠

Mer

ge ro

ws ⇢

No SELECTc1

,c2FROMt

SELECTc1,c2FROMt

OVER (PARTITION BY) The Problem

Yes ⇠

Mer

ge ro

ws ⇢

No SELECTc1

,c2FROMt

SELECTDISTINCTc1,c2FROMt

SELECTc1,c2FROMt

OVER (PARTITION BY) The Problem

Yes ⇠

Mer

ge ro

ws ⇢

No

No ⇠ Aggregate ⇢ Yes

SELECTc1,c2FROMt

SELECTDISTINCTc1,c2FROMt

SELECTc1,c2FROMt

SELECTc1,SUM(c2)totFROMtGROUPBYc1

OVER (PARTITION BY) The Problem

Yes ⇠

Mer

ge ro

ws ⇢

No

No ⇠ Aggregate ⇢ Yes

SELECTc1,c2FROMt

SELECTDISTINCTc1,c2FROMt

SELECTc1,c2FROMt

SELECTc1,SUM(c2)totFROMtGROUPBYc1

SELECTc1,SUM(c2)totFROMtGROUPBYc1

OVER (PARTITION BY) The Problem

Yes ⇠

Mer

ge ro

ws ⇢

No

No ⇠ Aggregate ⇢ Yes

SELECTc1,c2FROMt

SELECTDISTINCTc1,c2FROMt

SELECTc1,c2FROMt

SELECTc1,SUM(c2)totFROMtGROUPBYc1

SELECTc1,SUM(c2)totFROMtGROUPBYc1

OVER (PARTITION BY) The Problem

Yes ⇠

Mer

ge ro

ws ⇢

No

No ⇠ Aggregate ⇢ Yes

SELECTc1,c2FROMt

SELECTDISTINCTc1,c2FROMt

SELECTc1,c2FROMtJOIN()taON(t.c1=ta.c1)

SELECTc1,SUM(c2)totFROMtGROUPBYc1

SELECTc1,SUM(c2)totFROMtGROUPBYc1

OVER (PARTITION BY) The Problem

Yes ⇠

Mer

ge ro

ws ⇢

No

No ⇠ Aggregate ⇢ Yes

SELECTc1,c2FROMt

SELECTDISTINCTc1,c2FROMt

SELECTc1,c2FROMtJOIN()taON(t.c1=ta.c1)

SELECTc1,SUM(c2)totFROMtGROUPBYc1

SELECTc1,SUM(c2)totFROMtGROUPBYc1

OVER (PARTITION BY) The Problem

Yes ⇠

Mer

ge ro

ws ⇢

No

No ⇠ Aggregate ⇢ Yes

SELECTc1,c2FROMt

SELECTDISTINCTc1,c2FROMt

SELECTc1,c2FROMtJOIN()taON(t.c1=ta.c1)

SELECTc1,SUM(c2)totFROMtGROUPBYc1

,tot

SELECTc1,SUM(c2)totFROMtGROUPBYc1

OVER (PARTITION BY) The Problem

Yes ⇠

Mer

ge ro

ws ⇢

No

No ⇠ Aggregate ⇢ Yes

SELECTc1,c2FROMt

SELECTDISTINCTc1,c2FROMt

SELECTc1,c2FROMtJOIN()taON(t.c1=ta.c1)

SELECTc1,SUM(c2)totFROMtGROUPBYc1

,tot

SELECTc1,SUM(c2)totFROMtGROUPBYc1

OVER (PARTITION BY) Since SQL:2003

Yes ⇠

Mer

ge ro

ws ⇢

No

No ⇠ Aggregate ⇢ Yes

SELECTc1,c2FROMt

SELECTDISTINCTc1,c2FROMt

SELECTc1,c2FROMtFROMt

SELECTc1,SUM(c2)totFROMtGROUPBYc1

OVER (PARTITION BY) Since SQL:2003

Yes ⇠

Mer

ge ro

ws ⇢

No

No ⇠ Aggregate ⇢ Yes

SELECTc1,c2FROMt

SELECTDISTINCTc1,c2FROMt

SELECTc1,c2FROMt

FROMt

,SUM(c2)OVER(PARTITIONBYc1)

SELECTdep,salary,SUM(salary)OVER()FROMemp

dep salary1 1000 600022 1000 600022 1000 6000333 1000 6000333 1000 6000333 1000 6000

OVER (PARTITION BY) How it works

SELECTdep,salary,SUM(salary)OVER()FROMemp

dep salary1 1000 600022 1000 600022 1000 6000333 1000 6000333 1000 6000333 1000 6000

OVER (PARTITION BY) How it works

Look here

SELECTdep,salary,SUM(salary)OVER()FROMemp

dep salary1 1000 600022 1000 600022 1000 6000333 1000 6000333 1000 6000333 1000 6000

OVER (PARTITION BY) How it works

SELECTdep,salary,SUM(salary)OVER()FROMemp

dep salary1 1000 600022 1000 600022 1000 6000333 1000 6000333 1000 6000333 1000 6000

OVER (PARTITION BY) How it works

SELECTdep,salary,SUM(salary)OVER()FROMemp

dep salary1 1000 600022 1000 600022 1000 6000333 1000 6000333 1000 6000333 1000 6000

OVER (PARTITION BY) How it works

SELECTdep,salary,SUM(salary)OVER()FROMemp

dep salary1 1000 600022 1000 600022 1000 6000333 1000 6000333 1000 6000333 1000 6000

OVER (PARTITION BY) How it works

dep salary1 1000 600022 1000 600022 1000 6000333 1000 6000333 1000 6000333 1000 6000

SELECTdep,salary,SUM(salary)OVER()FROMemp

OVER (PARTITION BY) How it works

)

SELECTdep,salary,SUM(salary)OVER()FROMemp

dep salary1 1000 100022 1000 200022 1000 2000333 1000 3000333 1000 3000333 1000 3000

OVER (PARTITION BY) How it works

)

SELECTdep,salary,SUM(salary)OVER()FROMemp

dep salary1 1000 100022 1000 200022 1000 2000333 1000 3000333 1000 3000333 1000 3000

OVER (PARTITION BY) How it works

)PARTITIONBYdep

SELECTdep,salary,SUM(salary)OVER()FROMemp

dep salary1 1000 100022 1000 200022 1000 2000333 1000 3000333 1000 3000333 1000 3000

OVER (PARTITION BY) How it works

)PARTITIONBYdep

SELECTdep,salary,SUM(salary)OVER()FROMemp

dep salary1 1000 100022 1000 200022 1000 2000333 1000 3000333 1000 3000333 1000 3000

OVER (PARTITION BY) How it works

)PARTITIONBYdep

OVERand

ORDERBY(Framing & Ranking)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

OVER (ORDER BY) The Problem

SELECTid,value,FROMtransactionst

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

OVER (ORDER BY) The Problem

SELECTid,value,FROMtransactionst

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

OVER (ORDER BY) The Problem

SELECTid,value,FROMtransactionst

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

OVER (ORDER BY) The Problem

SELECTid,value,

(SELECTSUM(value)FROMtransactionst2WHEREt2.id<=t.id)

FROMtransactionst

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

OVER (ORDER BY) The Problem

SELECTid,value,

(SELECTSUM(value)FROMtransactionst2WHEREt2.id<=t.id)

FROMtransactionst

Range segregation (<=)not possible with

GROUP BY orPARTITION BY

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

(SELECTSUM(value)FROMtransactionst2WHEREt2.id<=t.id)

FROMtransactionst

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYid

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDING

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +30

22 3 -10 +20

333 4 +50 +70

333 5 -30 +40

333 6 -20 +20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10

22 2 +20

22 3 -10

333 4 +50

333 5 -30

333 6 -20

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW

OVER (ORDER BY) Since SQL:2003

SELECTid,value,

FROMtransactionst

SUM(value)OVER(

)

acnt id value balance

1 1 +10 +10

22 2 +20 +20

22 3 -10 +10

333 4 +50 +50

333 5 -30 +20

333 6 -20 .0

ORDERBYidROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW

PARTITIONBYacnt

OVER (ORDER BY) Since SQL:2003With OVER(ORDERBYn) a new type of functions make sense:

n ROW_NUMBER RANK DENSE_RANK PERCENT_RANK CUME_DISTA 1 1 1 0 0.25B 2 2 2 0.33… 0.75B 3 2 2 0.33… 0.75X 4 4 3 1 1

‣ Aggregates without GROUPBY

Use CasesOVER (SQL:2003)

‣ Aggregates without GROUPBY

‣ Running totals, moving averages

Use Cases

AVG(…)OVER(ORDERBY…ROWSBETWEEN3PRECEDINGAND3FOLLOWING)moving_avg

OVER (SQL:2003)

‣ Aggregates without GROUPBY

‣ Running totals, moving averages

‣ Ranking

Use Cases

AVG(…)OVER(ORDERBY…ROWSBETWEEN3PRECEDINGAND3FOLLOWING)moving_avg

OVER (SQL:2003)

‣ Aggregates without GROUPBY

‣ Running totals, moving averages

‣ Ranking‣ Top-N per Group

Use Cases

SELECT*FROM(SELECTROW_NUMBER()OVER(PARTITIONBY…ORDERBY…)rn,t.*FROMt)numbered_tWHERErn<=3

AVG(…)OVER(ORDERBY…ROWSBETWEEN3PRECEDINGAND3FOLLOWING)moving_avg

OVER (SQL:2003)

‣ Aggregates without GROUPBY

‣ Running totals, moving averages

‣ Ranking‣ Top-N per Group

‣ Avoiding self-joins

Use Cases

SELECT*FROM(SELECTROW_NUMBER()OVER(PARTITIONBY…ORDERBY…)rn,t.*FROMt)numbered_tWHERErn<=3

AVG(…)OVER(ORDERBY…ROWSBETWEEN3PRECEDINGAND3FOLLOWING)moving_avg

OVER (SQL:2003)

‣ Aggregates without GROUPBY

‣ Running totals, moving averages

‣ Ranking‣ Top-N per Group

‣ Avoiding self-joins

[… many more …]

Use Cases

SELECT*FROM(SELECTROW_NUMBER()OVER(PARTITIONBY…ORDERBY…)rn,t.*FROMt)numbered_tWHERErn<=3

AVG(…)OVER(ORDERBY…ROWSBETWEEN3PRECEDINGAND3FOLLOWING)moving_avg

OVER (SQL:2003)

1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1 10.2 MariaDB8.0 MySQL

8.4 PostgreSQL3.25.0 SQLite

7.0 DB2 LUW8i Oracle

2005[0] 2012 SQL Server[0]No framing

OVER (SQL:2003) Availability

1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1 10.2 MariaDB8.0 MySQL

8.4 PostgreSQL3.25.0 SQLite

7.0 DB2 LUW8i Oracle

2005[0] 2012 SQL Server[0]No framing

OVER (SQL:2003) AvailabilityImpalaSpark

NuoDB

BigQuery Hive

Inverse Distribution Functions (percentiles)

Inverse Distribution Functions The ProblemGrouped rows cannot be ordered prior to aggregation.

(how to get the middle value (median) of a set)

SELECTd1.valFROMdatad1JOINdatad2ON(d1.val<d2.valOR(d1.val=d2.valANDd1.id<d2.id))GROUPBYd1.valHAVINGcount(*)=(SELECTFLOOR(COUNT(*)/2)FROMdatad3)

Inverse Distribution Functions The ProblemGrouped rows cannot be ordered prior to aggregation.

(how to get the middle value (median) of a set)

SELECTd1.valFROMdatad1JOINdatad2ON(d1.val<d2.valOR(d1.val=d2.valANDd1.id<d2.id))GROUPBYd1.valHAVINGcount(*)=(SELECTFLOOR(COUNT(*)/2)FROMdatad3)

Inverse Distribution Functions The ProblemGrouped rows cannot be ordered prior to aggregation.

(how to get the middle value (median) of a set)

Number rows

SELECTd1.valFROMdatad1JOINdatad2ON(d1.val<d2.valOR(d1.val=d2.valANDd1.id<d2.id))GROUPBYd1.valHAVINGcount(*)=(SELECTFLOOR(COUNT(*)/2)FROMdatad3)

Inverse Distribution Functions The ProblemGrouped rows cannot be ordered prior to aggregation.

(how to get the middle value (median) of a set)

Number rows

Pick middle one

SELECTd1.valFROMdatad1JOINdatad2ON(d1.val<d2.valOR(d1.val=d2.valANDd1.id<d2.id))GROUPBYd1.valHAVINGcount(*)=(SELECTFLOOR(COUNT(*)/2)FROMdatad3)

Inverse Distribution Functions The ProblemGrouped rows cannot be ordered prior to aggregation.

(how to get the middle value (median) of a set)

Number rows

Pick middle one

SELECTd1.valFROMdatad1JOINdatad2ON(d1.val<d2.valOR(d1.val=d2.valANDd1.id<d2.id))GROUPBYd1.valHAVINGcount(*)=(SELECTFLOOR(COUNT(*)/2)FROMdatad3)

Inverse Distribution Functions The ProblemGrouped rows cannot be ordered prior to aggregation.

(how to get the middle value (median) of a set)

Number rows

Pick middle one

Since SQL:2003Inverse Distribution Functions

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Since SQL:2003Inverse Distribution Functions

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Median

Since SQL:2003Inverse Distribution Functions

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Median

Which value?

Since SQL:2003Inverse Distribution Functions

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Since SQL:2003Inverse Distribution Functions

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Since SQL:2003Inverse Distribution Functions

Two variants: ‣ for discrete values (categories) ‣ for continuous values (linear interpolation)

1

2

3

4

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Since SQL:2003Inverse Distribution Functions

Two variants: ‣ for discrete values (categories) ‣ for continuous values (linear interpolation)

1

2

3

4

0 0.25 0.5 0.75 11

2

3

4

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Since SQL:2003Inverse Distribution Functions

Two variants: ‣ for discrete values (categories) ‣ for continuous values (linear interpolation)

1

2

3

4

0 0.25 0.5 0.75 11

2

3

4

0 0.25 0.5 0.75 11

2

3

4

PERCENTILE_DISC

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Since SQL:2003Inverse Distribution Functions

Two variants: ‣ for discrete values (categories) ‣ for continuous values (linear interpolation)

1

2

3

4

0 0.25 0.5 0.75 11

2

3

4

0 0.25 0.5 0.75 11

2

3

4

PERCENTILE_DISC0 0.25 0.5 0.75 1

1

2

3

4

PERCENTILE_DISC

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Since SQL:2003Inverse Distribution Functions

Two variants: ‣ for discrete values (categories) ‣ for continuous values (linear interpolation)

1

2

3

4

0 0.25 0.5 0.75 11

2

3

4

0 0.25 0.5 0.75 11

2

3

4

PERCENTILE_DISC0 0.25 0.5 0.75 1

1

2

3

4

PERCENTILE_DISC0 0.25 0.5 0.75 1

1

2

3

4

PERCENTILE_DISC(0.5)

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Since SQL:2003Inverse Distribution Functions

Two variants: ‣ for discrete values (categories) ‣ for continuous values (linear interpolation)

1

2

3

4

0 0.25 0.5 0.75 11

2

3

4

0 0.25 0.5 0.75 11

2

3

4

PERCENTILE_DISC0 0.25 0.5 0.75 1

1

2

3

4

PERCENTILE_DISC0 0.25 0.5 0.75 1

1

2

3

4

PERCENTILE_DISC(0.5)0 0.25 0.5 0.75 1

1

2

3

4

PERCENTILE_CONT

PERCENTILE_DISC(0.5)

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Since SQL:2003Inverse Distribution Functions

Two variants: ‣ for discrete values (categories) ‣ for continuous values (linear interpolation)

1

2

3

4

0 0.25 0.5 0.75 11

2

3

4

0 0.25 0.5 0.75 11

2

3

4

PERCENTILE_DISC0 0.25 0.5 0.75 1

1

2

3

4

PERCENTILE_DISC0 0.25 0.5 0.75 1

1

2

3

4

PERCENTILE_DISC(0.5)0 0.25 0.5 0.75 1

1

2

3

4

PERCENTILE_CONT

PERCENTILE_DISC(0.5)0 0.25 0.5 0.75 1

1

2

3

4

PERCENTILE_CONT(0.5)

PERCENTILE_DISC(0.5)

SELECTPERCENTILE_DISC(0.5)WITHINGROUP(ORDERBYval)FROMdata

Since SQL:2003Inverse Distribution Functions

Two variants: ‣ for discrete values (categories) ‣ for continuous values (linear interpolation)

Inverse Distribution Functions Availability1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1 10.3[0] MariaDBMySQL

9.4 PostgreSQLSQLite

11.1 DB2 LUW9iR1 Oracle

2012[0] SQL Server[0]Only as window function (requires OVER clause)

TABLESAMPLE

TABLESAMPLE Since SQL:2003

SELECT*FROMtblTABLESAMPLEsystem(10)REPEATABLE(0)

User provided seed

value

TABLESAMPLE Since SQL:2003

SELECT*FROMtblTABLESAMPLEsystem(10)REPEATABLE(0)

Or: bernoulli(10)

better sampling,but slow.

User provided seed

value

TABLESAMPLE Since SQL:2003

SELECT*FROMtblTABLESAMPLEsystem(10)REPEATABLE(0)

Or: bernoulli(10)

better sampling,but slow.

User provided seed

value

TABLESAMPLE Availability1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1 MariaDBMySQL

9.5[0] PostgreSQLSQLite

8.2[0] DB2 LUW8i[0] Oracle

2005[0] SQL Server[0]Not for derived tables

SQL:2008

MERGE

Copying rows from another table is easy:

INSERTINTO<target>SELECT…FROM<source>WHERENOTEXISTS(SELECT*FROM<target>WHERE…)

MERGE The Problem

Copying rows from another table is easy:

INSERTINTO<target>SELECT…FROM<source>WHERENOTEXISTS(SELECT*FROM<target>WHERE…)

MERGE The Problem

Both, <target> and <source>

are in scope here.

Deleting rows that exist in another table is also possible:

DELETEFROM<target>WHEREEXISTS(SELECT*FROM<source>WHERE…)

MERGE The Problem

Deleting rows that exist in another table is also possible:

DELETEFROM<target>WHEREEXISTS(SELECT*FROM<source>WHERE…)

MERGE The Problem

Both, <target> and <source>

are in scope here.

Updating rows from another table is awkward:

UPDATE<target>SET…WHERE…

MERGE The Problem

Updating rows from another table is awkward:

UPDATE<target>SET…WHERE…

MERGE The Problem

Requires a name(table or updatable view)

but not a subquery

Updating rows from another table is awkward:

UPDATE<target>SET…WHERE…SET(col1,col2)=(SELECTcol1,col2FROM<source>WHERE…)WHERE…

MERGE The Problem

Both, <target> and <source>

are in scope here.

SQL:2008 introduced merge to improve this situation:

MERGEINTO<target>USING<source>ON<joincondition>WHENMATCHED[AND<cond>]THEN[UPDATE…|DELETE…]WHENNOTMATCHED[AND<cond>]THENINSERT…

‣It has always two tables in scope,the source can be a derived table (subquery). ‣It can do insert, update, or delete in one go.

MERGE Since SQL:2008

SQL:2008 introduced merge to improve this situation:

MERGEINTO<target>USING<source>ON<joincondition>WHENMATCHED[AND<cond>]THEN[UPDATE…|DELETE…]WHENNOTMATCHED[AND<cond>]THENINSERT…

‣It has always two tables in scope,the source can be a derived table (subquery). ‣It can do insert, update, or delete in one go.

MERGE Since SQL:2008

SQL:2008 introduced merge to improve this situation:

MERGEINTO<target>USING<source>ON<joincondition>WHENMATCHED[AND<cond>]THEN[UPDATE…|DELETE…]WHENNOTMATCHED[AND<cond>]THENINSERT…

‣It has always two tables in scope,the source can be a derived table (subquery). ‣It can do insert, update, or delete in one go.

MERGE Since SQL:2008

MERGE Since SQL:20081999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1 MariaDBMySQLPostgreSQLSQLite

9.1 DB2 LUW9iR1[0] Oracle

2008 SQL Server[0]No AND condition.

FETCHFIRST

FETCHFIRST The ProblemLimit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary)

SELECT*FROM(SELECT*,ROW_NUMBER()OVER(ORDERBYx)rnFROMdata)numbered_dataWHERErn<=10

FETCHFIRST The ProblemLimit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary)

SELECT*FROM(SELECT*,ROW_NUMBER()OVER(ORDERBYx)rnFROMdata)numbered_dataWHERErn<=10

FETCHFIRST The ProblemLimit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary)

SQL:2003 introduced ROW_NUMBER() to number rows.But this still requires wrapping to limit the result.

And how about databases not supporting ROW_NUMBER()?

SELECT*FROM(SELECT*,ROW_NUMBER()OVER(ORDERBYx)rnFROMdata)numbered_dataWHERErn<=10

FETCHFIRST The ProblemLimit the result to a number of rows. (LIMIT, TOP and ROWNUM are all proprietary)

SQL:2003 introduced ROW_NUMBER() to number rows.But this still requires wrapping to limit the result.

And how about databases not supporting ROW_NUMBER()?

Dammit! Let's takeLIMIT

SELECT*FROMdataORDERBYxFETCHFIRST10ROWSONLY

FETCHFIRST Since SQL:2008SQL:2008 introduced the FETCHFIRST…ROWSONLY clause:

FETCHFIRST Availability1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1 MariaDB3.19.3[0] MySQL6.5[1] 8.4 PostgreSQL

2.1.0[1] SQLite

7.0 DB2 LUW12cR1 Oracle

7.0[2] 2012 SQL Server[0]Earliest mention of LIMIT. Probably inherited from mSQL[1]Functionality available using LIMIT[2]SELECT TOP n ... SQL Server 2000 also supports expressions and bind parameters

SQL:2011

OFFSET

SELECT*FROM(SELECT*,ROW_NUMBER()OVER(ORDERBYx)rnFROMdata)numbered_dataWHERErn>10andrn<=20

OFFSET The ProblemHow to fetch the rows after a limit?

(pagination anybody?)

SELECT*FROMdataORDERBYxOFFSET10ROWSFETCHNEXT10ROWSONLY

OFFSET Since SQL:2011SQL:2011 introduced OFFSET, unfortunately!

SELECT*FROMdataORDERBYxOFFSET10ROWSFETCHNEXT10ROWSONLY

OFFSET Since SQL:2011SQL:2011 introduced OFFSET, unfortunately!

OFFSET

SELECT*FROMdataORDERBYxOFFSET10ROWSFETCHNEXT10ROWSONLY

OFFSET Since SQL:2011SQL:2011 introduced OFFSET, unfortunately!

OFFSETGrab coasters & stickers!

https://use-the-index-luke.com/no-offset

OFFSET Since SQL:20111999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1 MariaDB3.20.3[0] 4.0.6[1] MySQL6.5 PostgreSQL

2.1.0 SQLite

9.7[2] 11.1 DB2 LUW12cR1 Oracle

2012 SQL Server[0]LIMIT[offset,]limit: "With this it's easy to do a poor man's next page/previous page WWW application."[1]The release notes say "Added PostgreSQL compatible LIMIT syntax"[2]Requires enabling the MySQL compatibility vector: db2set DB2_COMPATIBILITY_VECTOR=MYS

OVER

OVER (SQL:2011) The ProblemDirect access of other rows of the same window is ugly.

(E.g., calculate the difference to the previous rows)

SELECTbalance,balance-COALESCE(MAX(balance)OVER(ORDERBY…ROWSBETWEEN1PRECEDINGAND1PRECEDING),0)diffFROMt

balance50907030

……………

OVER (SQL:2011) The ProblemDirect access of other rows of the same window is ugly.

(E.g., calculate the difference to the previous rows)

SELECTbalance,balance-COALESCE(MAX(balance)OVER(ORDERBY…ROWSBETWEEN1PRECEDINGAND1PRECEDING),0)diffFROMt

balance50907030

……………

diff+50+40-20-40

OVER (SQL:2011) The ProblemDirect access of other rows of the same window is ugly.

(E.g., calculate the difference to the previous rows)

SELECTbalance,balance-COALESCE(MAX(balance)OVER(ORDERBY…ROWSBETWEEN1PRECEDINGAND1PRECEDING),0)diffFROMt

balance50907030

……………

diff+50+40-20-40

OVER (SQL:2011) The ProblemDirect access of other rows of the same window is ugly.

(E.g., calculate the difference to the previous rows)

SELECTbalance,balance-COALESCE(MAX(balance)OVER(ORDERBY…ROWSBETWEEN1PRECEDINGAND1PRECEDING),0)diffFROMt

balance50907030

……………

diff+50+40-20-40

OVER (SQL:2011) The ProblemDirect access of other rows of the same window is ugly.

(E.g., calculate the difference to the previous rows)

SELECTbalance,balance-COALESCE(MAX(balance)OVER(ORDERBY…ROWSBETWEEN1PRECEDINGAND1PRECEDING),0)diffFROMt

balance50907030

……………

diff+50+40-20-40

OVER (SQL:2011) The ProblemDirect access of other rows of the same window is ugly.

(E.g., calculate the difference to the previous rows)

SELECTbalance,balance-COALESCE(MAX(balance)OVER(ORDERBY…ROWSBETWEEN1PRECEDINGAND1PRECEDING),0)diffFROMt

balance50907030

……………

diff+50+40-20-40

OVER (SQL:2011) Since SQL:2011SQL:2011 introduced LEAD, LAG, NTH_VALUE, … for that:

SELECT*,balance-COALESCE(LAG(balance)OVER(ORDERBYx),0)FROMt

OVER (SQL:2011) Since SQL:2011SQL:2011 introduced LEAD, LAG, NTH_VALUE, … for that:

SELECT*,balance-COALESCE(LAG(balance)OVER(ORDERBYx),0)FROMt

Available functions:LEAD/LAGFIRST_VALUE/LAST_VALUENTH_VALUE(col,n)FROMFIRST/LASTRESPECT/IGNORENULLS

OVER (SQL:2011) Since SQL:2011SQL:2011 introduced LEAD, LAG, NTH_VALUE, … for that:

OVER (LEAD, LAG, …) Since SQL:20111999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1 10.2[0] MariaDB8.0[0] MySQL

8.4[0] PostgreSQL3.25.0[0] SQLite

9.5[1] 11.1 DB2 LUW8i[1] 11gR2 Oracle

2012[1] SQL Server[0]No IGNORENULLS and FROMLAST[1]No NTH_VALUE

System Versioning (Time Traveling)

INSERTUPDATEDELETE

are DESTRUCTIVE

System Versioning The Problem

System Versioning Since SQL:2011Table can be system versioned, application versioned or both.

CREATETABLEt(...,

System Versioning Since SQL:2011Table can be system versioned, application versioned or both.

CREATETABLEt(...,start_tsTIMESTAMP(9)GENERATEDALWAYSASROWSTART,

System Versioning Since SQL:2011Table can be system versioned, application versioned or both.

CREATETABLEt(...,start_tsTIMESTAMP(9)GENERATEDALWAYSASROWSTART,end_tsTIMESTAMP(9)GENERATEDALWAYSASROWEND,

System Versioning Since SQL:2011Table can be system versioned, application versioned or both.

CREATETABLEt(...,start_tsTIMESTAMP(9)GENERATEDALWAYSASROWSTART,end_tsTIMESTAMP(9)GENERATEDALWAYSASROWEND,

PERIODFORSYSTEM_TIME(start_ts,end_ts)

System Versioning Since SQL:2011Table can be system versioned, application versioned or both.

CREATETABLEt(...,start_tsTIMESTAMP(9)GENERATEDALWAYSASROWSTART,end_tsTIMESTAMP(9)GENERATEDALWAYSASROWEND,

PERIODFORSYSTEM_TIME(start_ts,end_ts))WITHSYSTEMVERSIONING

System Versioning Since SQL:2011Table can be system versioned, application versioned or both.

ID Data start_ts end_ts1 X 10:00:00

UPDATE...SETDATA='Y'...

ID Data start_ts end_ts1 X 10:00:00 11:00:001 Y 11:00:00

DELETE...WHEREID=1

INSERT...(ID,DATA)VALUES(1,'X')

System Versioning Since SQL:2011

ID Data start_ts end_ts1 X 10:00:00

UPDATE...SETDATA='Y'...

ID Data start_ts end_ts1 X 10:00:00 11:00:001 Y 11:00:00

DELETE...WHEREID=1

INSERT...(ID,DATA)VALUES(1,'X')

System Versioning Since SQL:2011

ID Data start_ts end_ts1 X 10:00:00

UPDATE...SETDATA='Y'...

ID Data start_ts end_ts1 X 10:00:00 11:00:001 Y 11:00:00

DELETE...WHEREID=1

INSERT...(ID,DATA)VALUES(1,'X')

System Versioning Since SQL:2011

ID Data start_ts end_ts1 X 10:00:00

UPDATE...SETDATA='Y'...

ID Data start_ts end_ts1 X 10:00:00 11:00:001 Y 11:00:00

DELETE...WHEREID=1

ID Data start_ts end_ts1 X 10:00:00 11:00:001 Y 11:00:00 12:00:00

INSERT...(ID,DATA)VALUES(1,'X')

System Versioning Since SQL:2011

Although multiple versions exist, only the “current” one is visible per default.

After 12:00:00, SELECT*FROMt doesn’t return anything anymore.

ID Data start_ts end_ts1 X 10:00:00 11:00:001 Y 11:00:00 12:00:00

System Versioning Since SQL:2011

ID Data start_ts end_ts1 X 10:00:00 11:00:001 Y 11:00:00 12:00:00

With FOR…ASOF you can query anything you like: SELECT*FROMtFORSYSTEM_TIMEASOFTIMESTAMP'2019-05-0810:30:00'

ID Data start_ts end_ts

1 X 10:00:00 11:00:00

System Versioning Since SQL:2011

System Versioning Since SQL:20111999

2001

2003

2005

2007

2009

2011

2013

2015

2017

5.1 10.3 MariaDBMySQLPostgreSQLSQLite

10.1 DB2 LUW10gR1[0] 11gR1[1] Oracle

2016 SQL Server[0]Short term using Flashback.[1]Flashback Archive. Proprietery syntax.

SQL:2016 (released: 2016-12-15)

JSON_TABLE

JSON Since SQL:2016

[{"id":42,"a1":"foo"},{"id":43,"a1":"bar"}]

JSON Since SQL:2016

id a1

42 foo

43 bar

[{"id":42,"a1":"foo"},{"id":43,"a1":"bar"}]

JSON Since SQL:2016

SELECT*FROMJSON_TABLE(?,'$[*]'COLUMNS(idINTPATH'$.id',a1VARCHAR(…)PATH'$.a1'))r

[{"id":42,"a1":"foo"},{"id":43,"a1":"bar"}]

id a142 foo43 bar

JSON_TABLE Since SQL:2016

SELECT*FROMJSON_TABLE(?,'$[*]'COLUMNS(idINTPATH'$.id',a1VARCHAR(…)PATH'$.a1'))r

[{"id":42,"a1":"foo"},{"id":43,"a1":"bar"}]

id a142 foo43 bar

Bind Parameter

JSON_TABLE Since SQL:2016

SELECT*FROMJSON_TABLE(?,'$[*]'COLUMNS(idINTPATH'$.id',a1VARCHAR(…)PATH'$.a1'))r

[{"id":42,"a1":"foo"},{"id":43,"a1":"bar"}]

id a142 foo43 bar

SQL/JSON Path ‣ Query language to

select elements from a JSON document ‣Defined in the

SQL standardBind

Parameter

JSON_TABLE Since SQL:2016

SELECT*FROMJSON_TABLE(?,'$[*]'COLUMNS(idINTPATH'$.id',a1VARCHAR(…)PATH'$.a1'))r

[{"id":42,"a1":"foo"},{"id":43,"a1":"bar"}]

id a142 foo43 bar

SQL/JSON Path ‣ Query language to

select elements from a JSON document ‣Defined in the

SQL standardBind

Parameter

JSON_TABLE Since SQL:2016

SELECT*FROMJSON_TABLE(?,'$[*]'COLUMNS(idINTPATH'$.id',a1VARCHAR(…)PATH'$.a1'))r

[{"id":42,"a1":"foo"},{"id":43,"a1":"bar"}]

id a142 foo43 bar

SQL/JSON Path ‣ Query language to

select elements from a JSON document ‣Defined in the

SQL standardBind

Parameter

JSON_TABLE Since SQL:2016

SELECT*FROMJSON_TABLE(?,'$[*]'COLUMNS(idINTPATH'$.id',a1VARCHAR(…)PATH'$.a1'))r

JSON_TABLE Since SQL:2016

SELECT*FROMJSON_TABLE(?,'$[*]'COLUMNS(idINTPATH'$.id',a1VARCHAR(…)PATH'$.a1'))r

INSERTINTOtarget_table

JSON_TABLE Since SQL:2016

JSON_TABLE Availability1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

MariaDB8.0 MySQL

PostgreSQLSQLite

DB2 LUW12cR1 Oracle

SQL Server

MATCH_RECOGNIZE (Row Pattern Matching)

Example: Logfile

Row Pattern Matching The Problem

Time

30 minutes

Example: Logfile

Row Pattern Matching The Problem

Time

30 minutes

Example: Logfile

Row Pattern Matching The Problem

Time

30 minutes

Example: Logfile

Row Pattern Matching The Problem

Example: Logfile

Time

30 minutes

Session 1 Session 2

Session 3

Session 4

Row Pattern Matching The Problem

Time

30 minutes

Row Pattern Matching The Problem

SELECTcount(*)sessions,avg(duration)avg_durationFROM(SELECTMAX(ts)-MIN(ts)durationFROM(SELECTts,SUM(grp_start)OVER(ORDERBYts)session_noFROM(SELECTts,CASEWHENts>=LAG(ts,1,DATE'1900-01-01')OVER(ORDERBYts)+INTERVAL'30'minuteTHEN1ELSE0ENDgrp_startFROMlog)tagged)numberedGROUPBYsession_no)grouped

Time

30 minutes

Row Pattern Matching The Problem

SELECTcount(*)sessions,avg(duration)avg_durationFROM(SELECTMAX(ts)-MIN(ts)durationFROM(SELECTts,SUM(grp_start)OVER(ORDERBYts)session_noFROM(SELECTts,CASEWHENts>=LAG(ts,1,DATE'1900-01-01')OVER(ORDERBYts)+INTERVAL'30'minuteTHEN1ELSE0ENDgrp_startFROMlog)tagged)numberedGROUPBYsession_no)grouped

Time

30 minutes

Start-of-group tags

Row Pattern Matching The Problem

SELECTcount(*)sessions,avg(duration)avg_durationFROM(SELECTMAX(ts)-MIN(ts)durationFROM(SELECTts,SUM(grp_start)OVER(ORDERBYts)session_noFROM(SELECTts,CASEWHENts>=LAG(ts,1,DATE'1900-01-01')OVER(ORDERBYts)+INTERVAL'30'minuteTHEN1ELSE0ENDgrp_startFROMlog)tagged)numberedGROUPBYsession_no)grouped

Time

30 minutes

Start-of-group tags

Row Pattern Matching The Problem

SELECTcount(*)sessions,avg(duration)avg_durationFROM(SELECTMAX(ts)-MIN(ts)durationFROM(SELECTts,SUM(grp_start)OVER(ORDERBYts)session_noFROM(SELECTts,CASEWHENts>=LAG(ts,1,DATE'1900-01-01')OVER(ORDERBYts)+INTERVAL'30'minuteTHEN1ELSE0ENDgrp_startFROMlog)tagged)numberedGROUPBYsession_no)grouped

Time

30 minutes

number sessions

Row Pattern Matching The Problem

SELECTcount(*)sessions,avg(duration)avg_durationFROM(SELECTMAX(ts)-MIN(ts)durationFROM(SELECTts,SUM(grp_start)OVER(ORDERBYts)session_noFROM(SELECTts,CASEWHENts>=LAG(ts,1,DATE'1900-01-01')OVER(ORDERBYts)+INTERVAL'30'minuteTHEN1ELSE0ENDgrp_startFROMlog)tagged)numberedGROUPBYsession_no)grouped

Time

30 minutes

number sessions

2222 2 33 3 44 42 3 41

Row Pattern Matching The Problem

SELECTcount(*)sessions,avg(duration)avg_durationFROM(SELECTMAX(ts)-MIN(ts)durationFROM(SELECTts,SUM(grp_start)OVER(ORDERBYts)session_noFROM(SELECTts,CASEWHENts>=LAG(ts,1,DATE'1900-01-01')OVER(ORDERBYts)+INTERVAL'30'minuteTHEN1ELSE0ENDgrp_startFROMlog)tagged)numberedGROUPBYsession_no)grouped

Time

30 minutes 2222 2 33 3 44 42 3 41

Row Pattern Matching The Problem

Regular ExpressionsRow Pattern Matching

.\S*

Regular ExpressionsRow Pattern Matching

.\S*Regular Expression

any character non-white space

Regular ExpressionsRow Pattern Matching

.\S*{

Rail track diagram by regexper.com

any character non-white space

Regular ExpressionsRow Pattern Matching

.\S*{ {Rail track diagram by regexper.com

any character non-white spaceany character non-white space

Regular ExpressionsRow Pattern Matching

.\S*{ { {Rail track diagram by regexper.com

MATCH_RECOGNIZE applies regular expressions on rows.

Row Pattern Matching

MATCH_RECOGNIZE applies regular expressions on rows.

Row Pattern Matching

‣Define the pattern variables (“characters”):

DEFINEdeletedASstatus=99

You can also look at other rows for that:

DEFINEcontinuationASts<PREV(ts)+INTERVAL'30'MINUTE

MATCH_RECOGNIZE applies regular expressions on rows.

Row Pattern Matching

‣Define the pattern variables (“characters”):

DEFINEdeletedASstatus=99

You can also look at other rows for that:

DEFINEcontinuationASts<PREV(ts)+INTERVAL'30'MINUTE

MATCH_RECOGNIZE applies regular expressions on rows.

Row Pattern Matching

‣Define the pattern variables (“characters”):

DEFINEdeletedASstatus=99

You can also look at other rows for that:

DEFINEcontinuationASts<PREV(ts)+INTERVAL'30'MINUTE

‣Use this alphabet in an regular expression:

PATTERN(deletedcontinuation*)

Since SQL:2016Row Pattern Matching

Time

30 minutes

SELECTCOUNT(*)sessions,AVG(duration)avg_durationFROMlogMATCH_RECOGNIZE(ORDERBYtsMEASURESLAST(ts)-FIRST(ts)ASdurationONEROWPERMATCHPATTERN(anycont*)DEFINEcontASts<PREV(ts)+INTERVAL'30'minute)t

Since SQL:2016Row Pattern Matching

Time

30 minutes

Oracle doesn’t support avg on intervals — query doesn’t work as shown

SELECTCOUNT(*)sessions,AVG(duration)avg_durationFROMlogMATCH_RECOGNIZE(ORDERBYtsMEASURESLAST(ts)-FIRST(ts)ASdurationONEROWPERMATCHPATTERN(anycont*)DEFINEcontASts<PREV(ts)+INTERVAL'30'minute)t

Since SQL:2016Row Pattern Matching

Time

30 minutes

definecontinued

Oracle doesn’t support avg on intervals — query doesn’t work as shown

SELECTCOUNT(*)sessions,AVG(duration)avg_durationFROMlogMATCH_RECOGNIZE(ORDERBYtsMEASURESLAST(ts)-FIRST(ts)ASdurationONEROWPERMATCHPATTERN(anycont*)DEFINEcontASts<PREV(ts)+INTERVAL'30'minute)t

Since SQL:2016Row Pattern Matching

Time

30 minutes

Oracle doesn’t support avg on intervals — query doesn’t work as shown

undefinedpattern variable: matches any row

SELECTCOUNT(*)sessions,AVG(duration)avg_durationFROMlogMATCH_RECOGNIZE(ORDERBYtsMEASURESLAST(ts)-FIRST(ts)ASdurationONEROWPERMATCHPATTERN(anycont*)DEFINEcontASts<PREV(ts)+INTERVAL'30'minute)t

Since SQL:2016Row Pattern Matching

Time

30 minutes

any numberof “cont”

rows

Oracle doesn’t support avg on intervals — query doesn’t work as shown

SELECTCOUNT(*)sessions,AVG(duration)avg_durationFROMlogMATCH_RECOGNIZE(ORDERBYtsMEASURESLAST(ts)-FIRST(ts)ASdurationONEROWPERMATCHPATTERN(anycont*)DEFINEcontASts<PREV(ts)+INTERVAL'30'minute)t

Since SQL:2016Row Pattern Matching

Time

30 minutes

Very muchlike GROUP BY

Oracle doesn’t support avg on intervals — query doesn’t work as shown

SELECTCOUNT(*)sessions,AVG(duration)avg_durationFROMlogMATCH_RECOGNIZE(ORDERBYtsMEASURESLAST(ts)-FIRST(ts)ASdurationONEROWPERMATCHPATTERN(anycont*)DEFINEcontASts<PREV(ts)+INTERVAL'30'minute)t

Since SQL:2016Row Pattern Matching

Time

30 minutes

Very muchlike SELECT

Oracle doesn’t support avg on intervals — query doesn’t work as shown

SELECTCOUNT(*)sessions,AVG(duration)avg_durationFROMlogMATCH_RECOGNIZE(ORDERBYtsMEASURESLAST(ts)-FIRST(ts)ASdurationONEROWPERMATCHPATTERN(anycont*)DEFINEcontASts<PREV(ts)+INTERVAL'30'minute)t

Since SQL:2016Row Pattern Matching

Time

30 minutes

Oracle doesn’t support avg on intervals — query doesn’t work as shown

Row Pattern Matching Availability1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

MariaDBMySQLPostgreSQLSQLite

DB2 LUW12cR1 Oracle

SQL Server

https://www.flickr.com/photos/mfoubister/25367243054/

https://www.flickr.com/photos/mfoubister/25367243054/

A lot has happened

since SQL-92

https://www.flickr.com/photos/mfoubister/25367243054/

SQL has evolved

beyond the relational idea

A lot has happened

since SQL-92

https://www.flickr.com/photos/mfoubister/25367243054/

SQL has evolved

beyond the relational idea

If you use SQL for

CRUD operations only, you are doing it wrong

A lot has happened

since SQL-92

https://www.flickr.com/photos/mfoubister/25367243054/

SQL has evolved

beyond the relational idea

If you use SQL for

CRUD operations only, you are doing it wrong

A lot has happened

since SQL-92

https://modern-sql.com@ModernSQL by @MarkusWinand

top related