recursive sql – unleash the power! - mwdug · 2 1. recursion basics 2. case studies -...
TRANSCRIPT
1
May 22, 2008 • 08:00 a.m. – 09:00 a.m.Platform: DB2 for z/OS
Suresh SaneDST Systems Inc.
Recursive SQL –Unleash the Power!
Session: G13
Recursive SQL is one of the most fascinating and powerful (and dangerous!) features offered in DB2 for z/OS Version 8. In this session, we will introduce the feature and show numerous examples of how it can be used to achieve things you would not have imagined being possible with SQL – all in one SQL statement! Fasten your seat belts and come join us in this exciting journey!
2
1. Recursion basics 2. Case studies - mathematical3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations
2
Session Outline
Recursion basics•Theory and introduction
Case studies - mathematical•String of numbers•Factorial•Primes•Fibonacci Series
Case studies - business•Org chart•Generating test data•Missing data•Rollup•Allocation•Weighted allocation•RI children•Cheapest fare•Account linking
Performance aspects•Comparison to procedural logic
•Org chart •Rollup •Allocate
Pitfalls and recommendations•Best practices
3
♦ Co-author-IBM Redbooks SG24-6418, May 2002SG24-7083, March 2004SG24-7111, July 2006
♦ Educational seminars and presentations at IDUG North America, Asia Pacific, Canada and Europe
♦ IDUG Solutions Journal article – Winter 2000♦ Numerous DB2 courses at various locations♦ IBM Certified Solutions Expert for both platforms for
Application Development and Database Administration
Suresh Sane
3
About the Instructor
Suresh Sane works as a Database Architect at DST Systems in Kansas City, MO and manages the Database Consulting Group, which provides strategic direction for deployment of database technology at DST and has overall responsibility for the DB2 curriculum for about 1,500 technical associates.
Contact Information:
Suresh SaneDST Systems, Inc.1055 BroadwayKansas City, MO 64105USA
(816) 435-3803
4
4
http://www.dstsystems.com
About DST Systems
♦Leading provider of computer software solutions and services, NYSE listed – “DST”♦Revenue $2.24 billion♦110 million+ shareowner accounts
♦24,000 MIPS♦150 TB DASD♦145,000 workstations♦462,000 DB2 objects♦Non-mainframe: 600 servers (DB2, Oracle, Sybase) with 2.3 million objects
If you have ever invested in a mutual fund, have had a prescription filled, or are a cable or satellite television subscriber, you may have already had dealings with our company.
DST Systems, Inc. is a publicly traded company (NYSE: DST) with headquarters in Kansas City, MO. Founded in 1969, it employs about 12,000 associates domestically and internationally.
The three operating segments - Financial Services, Output Solutions and Customer Management - are further enhanced by DST’s advanced technology and e-commerce solutions.
5
5
It’s cool but I will never use it…
“…400,000 customers in a hierarchy of about 4,000 nodesresulting in over 2.5 million queries, the application took daysto calculate total sales.”“ …a single recursive query processed …in about 5 minutes”
Dan Luksetich, IDUG Solutions Journal, May 2004.
♦ Of pure academic interest …”cool stuff” but♦ … does it have any business value? ♦ THINK AGAIN!
See ref #7 for details on this article. I have to admit my first reaction was that this feature was “cool” but of little business value. Dan’s article provoked my interest in this fascinating area of SQL.
6
6
1. Recursion basics 2. Case studies - mathematical3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations
Where Are We?Where Are We?
We will begin with a simple explanation which introduces the concept of recursion in general (not just for SQL).
7
7
Recursion Simplified
What is 4! ?
It is 4 * 3!
But what is 3! ?
It is 3 * 2!
But what is 2! ?
It is 2 * 1!
But what is 1! ? 1
2*1
2
3*2
6
4*6
24
A simple example to illustrate the concept.
8
8
Case 1 – String of Numbers
WITH NUMBERS (LEVEL, NEXTONE) AS(
SELECT 1, 1 FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1, LEVEL + 1 FROM NUMBERS WHERE LEVEL < 100
)SELECT NEXTONE FROM NUMBERS ORDER BY NEXTONE
1 2 3 4 5
…
Prime
Pumpmore
CTE
Use the result
Coloring scheme used throughout thispresentation to denote each of the 4 parts
What makesthis recursive
The best way to introduce recursive SQL? Look at a simple example. Let’s dive right in.
The definition – WITH NUMBERS – is an example of a Common Table Expression (CTE) which is similar to the Nested Table Expression (NTE) most of us already familiar with. It is required for using recursive SQL.
Also note the coloring scheme used consistently throughout this presentation to denote the 4 parts of the SQL
9
9
Recursion Basics
♦ A technique where a function calls itself to perform some part of the task
♦ Requires:An initialization select (“priming the pump”)An iterative select (“pump more”) – from the previous, how do I get the next one?A main select (“use the result”)
♦ Must use a common table expression (CTE)♦ Allows for looping “procedural” logic – all in one statement!
DO WHILE, REPEAT etc from SQL Procedures can sometimes be replaced by one SQL statement
Some of the basics. The “pump” terminology is borrowed from Tink Tysor (see ref #1). Tink provides very thorough introduction to this fascinating topic.
10
10
Rules for CTE
♦ First full select of first UNION must not reference the CTE♦ All selects within CTE cannot use DISTINCT
This is a major limitation when cycles are present –DISTINCT on the outer query is expensiveNeed an ability to specify DISTINCT without the level
♦ All selects within CTE cannot use GROUP BY or HAVING♦ Include only 1 reference to the CTE♦ Initialization select and Iterative select columns must match (data
types, lengths, CCSIDs)♦ UNION must be a UNION ALL♦ Outer joins cannot be part of any recursion cycle♦ Subquery cannot be part of any recursion cycle
Some of the restrictions on what can be coded within the common table expression (CTE).
11
Where Are We?
11
1. Recursion basics 2. Case studies - mathematical 3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations
Let us look at some mathematical applications starting with some simple examples. We will look at Factorial, Generating primes and Fibonacci series.
If mathematics scares you, do not get discouraged – these are actually easier to illustrate the concept. We will build on this foundation and cover several business cases in the next section.
12
12
Case 2 – Factorial
WITH NUMBERS (LEVEL, FACTO) AS(
SELECT 1, 1 FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1, FACTO * (LEVEL + 1) FROM NUMBERS WHERE LEVEL < 12
)SELECT LEVEL AS NUMBER, FACTO AS FACTORIAL FROM NUMBERS ORDER BY LEVEL
1 1 2 2 3 6 4 24 5 120
…
Just a little bit more complex – we simply generate the numbers and report them in the “use the results” section of the SQL.
13
WITH NUMBERS (LEVEL, PRIME) AS ( SELECT 1, 1
FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1, LEVEL + 1 FROM NUMBERS WHERE LEVEL < 5000 )
13
Case 3 – Generating Primes - SQL
SQL Continued on next slide
SQL to generate prime numbers.
14
SELECT X.PRIME FROM (SELECT NUMBERS.LEVEL,
NUMBERS.PRIME FROM NUMBERS ) AS X WHERE NOT EXISTS
(SELECT 1 FROM NUMBERS Y WHERE Y.PRIME BETWEEN 2 AND SQRT(X.PRIME) AND MOD(X.PRIME, Y.PRIME) = 0)
ORDER BY X.PRIME
14
Case 3 – Generating Primes – SQL (cont)
12357
11
4689
10
Prime
Non-prime
Has a factor
Actual SQL to generate prime numbers, continued.
Note that we can stop checking for factors once we cross SQRT of that number (a big gain in performance – for example, instead of checking 5000, we can stop at 70).
On a theoretical note – the answer to the question “Is 1 a prime?” depends on the definition. Let’s leave that discussion to Math forums. Here, I assume it is a prime.
15
15
Case 4 – Fibonacci Series
♦ The first two numbers in the series are one and one. ♦ To obtain each number of the series, you simply add the two
numbers that came before it. In other words, each number of the series is the sum of the two numbers preceding it.
1 1 2 3 5 8 13 21The Parthenon with the “Golden ratio”
The Fibonacci Spiral
Some historical background information about Fibonacci Series (made popular by the recent “Da Vinci Code”):
•A sequence of numbers first created by Leonardo Fibonacci in 1202.
•It is a deceptively simple series, but its ramifications and applications are nearly limitless.
•The number 1.618..., or Phi, is the ratio of the next number to the previous number. Called the “Golden ratio”, it has implications in art, architecture and numerous other disciplines.
16
WITH NUMBERS (LEVEL, NEXTNUM, TOTAL) AS(SELECT 1, 1, 1 FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1
, (NEXTNUM + TOTAL) , (NEXTNUM + TOTAL + TOTAL)
FROM NUMBERS WHERE LEVEL = LEVEL AND LEVEL <= 22 )
16
Case 4 – Fibonacci Series - SQL
SQL Continued on next slide
Level Nextnum Total
1 1 12 2 33 5 84 13 215 34 55…
SQL to generate Fibonacci series.
17
SELECT A.NEXTNUM FROM NUMBERS A WHERE A.LEVEL <= 22 UNION ALL SELECT B.TOTAL FROM NUMBERS B WHERE B.LEVEL <= 22 ORDER BY 1
17
Case 4 – Fibonacci Series – SQL (cont)
1 1 2 3 5 8
13213455..
Series
Level Nextnum Total
1 1 12 2 33 5 84 13 215 34 55…
Actual SQL to generate Fibonacci series, continued.
18
18
1. Recursion basics 2. Case studies - mathematical 3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations
Where Are We?Where Are We?
Now let us start to explore recursive SQL for business applications.
19
19
Case studies - Business
♦ Case 5 – Org chart♦ Case 6 – Generating test data♦ Case 7 – Missing data♦ Case 8 – Rollup♦ Case 9 – Even allocation♦ Case 10 – Weighted allocation♦ Case 11– RI children♦ Case 12 – Cheapest fare♦ Case 13 – Account linking
A list of case studies we explore in this section.
20
20
Case 5 - Org Chart
1-Wolf
2-Sontag 3-Truscott 4-Rodwell
5-Hamman 6-Jacoby 7-Wei 8-Kelsey 9-Reese
10-Schapiro
Case of how recursive SQL can be used effectively for traversing hierarchical structures.
21
21
Case 5 – RECASE05 Table
EMPID EMPNAME MGRID
1 Wolf -2 Sontag 13 Truscott 14 Rodwell 15 Hamman 26 Jacoby 27 Wei 38 Kelsey 49 Reese 410 Schapiro 9
RECASE05
Contents of the table RECASE05 that defines the hierarchy.
22
22
Case 5 - SQL
WITH OC (LEVEL, MGRID, MGRNAME, EMPID, EMPNAME) AS ( SELECT 0, 0, ' ', EMPID, EMPNAME FROM RECASE05 WHERE MGRID IS NULLUNION ALL SELECT BOSS.LEVEL + 1, SUB.MGRID, BOSS.EMPNAME , SUB.EMPID, SUB.EMPNAME FROM OC BOSS, RECASE05 SUB WHERE BOSS.EMPID = SUB.MGRID AND BOSS.LEVEL < 5 )SELECT OC.LEVEL, OC.MGRNAME, OC.EMPNAME FROM OC WHERE LEVEL > 0 ORDER BY OC.LEVEL , OC.MGRID, OC.EMPID
My direct reports
Actual SQL to generate the org chart.
The initialization select (WHERE MGRID IS NULL) could have been written instead as: (WHERE EMPID = 1) .
23
23
Case 5 - Intermediate Result
LEVEL MGRID MGRNAME EMPID EMPNAME
0 0 NULL 1 WOLF
1 1 WOLF 2 SONTAG1 1 WOLF 3 TRUSCOTT1 1 WOLF 4 RODWELL2 2 SONTAG 5 HAMMAN2 2 SONTAG 6 JACOBY 2 3 TRUSCOTT 7 WEI
2 4 RODWELL 8 KELSEY2 4 RODWELL 9 REESE 3 9 REESE 10 SCHAPIRO
Common Table ExpressionOC
Contents of the common table expression OC as the SQL is executed.
24
24
Case 5 - Result
LEVEL MGRNAME EMPNAME1 WOLF SONTAG 1 WOLF TRUSCOTT 1 WOLF RODWELL 2 SONTAG HAMMAN 2 SONTAG JACOBY 2 TRUSCOTT WEI 2 RODWELL KELSEY 2 RODWELL REESE 3 REESE SCHAPIRO
… and the result.
25
25
Case 6 – Generating Test Data
EMPID SMALLINT 1 thru 10000FNAME CHAR(20) 3-7 charsLNAME CHAR(20) 3-10 charsSALARY DEC(7,2) 1000 to 5000HIREDATE DATE 21 years thru 1 year
RECASE06
Table structure for a table containing various data types which needs to be populated with test data.
If you needed 10,000 rows of test data on this table you would use a table editor (or SPUFI) to insert some rows and repeat them. How “random” would this data really be? In real life, not really random at all. This technique allows you to do so quite easily.
26
26
Case 6 – Generating Test Data - SQL
INSERT INTO RECASE06 WITH NUMBERS (LEVEL, NEXTONE) AS
(SELECT 1, 1 FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1, LEVEL + 1 FROM NUMBERS WHERE LEVEL < 10 )
SELECT INTEGER(ROUND(RAND()*9999,0)) + 1, LEFT(SUBSTR('BCDFGHJKLMNPRSTVWZ',
INTEGER(ROUND(RAND()*17,0))+1, 1)CONCAT SUBSTR('AEIOUY', INTEGER(ROUND(RAND()*5,0))+1, 1)<<< REPEAT 5 TIMES>>> , INTEGER(ROUND(RAND()*4,0)) + 3)
1 thru 10,000
5 sets of letters +vowels
Min 3, max 7SQL Continued on next slide
SQL used for this purpose.
27
27
Case 6 – Generating Test Data – SQL (cont)
, LEFT(SUBSTR('BCDFGHJKLMNPRSTVWZ', INTEGER(ROUND(RAND()*17,0))+1, 1)CONCAT SUBSTR('AEIOUY', INTEGER(ROUND(RAND()*5,0))+1, 1)
<<< REPEAT 5 TIMES>>> , INTEGER(ROUND(RAND()*7,0)) + 3)
, DECIMAL((1000.00 + RAND()*4000),7,2) , CURRENT DATE - 1 YEAR
- INTEGER(20*365*RAND()) DAYS FROM NUMBERS
Same for Last name
Min 1000, max 5000
1 year thru21 years
Min 3, max 10
SQL continued.
28
28
Case 6 – Generating Test Data - Result
EMPID FNAME LNAME SALARY HIREDATE
1369 ZITO RYDI 2250.93 2003-02-032714 KUMU PUDIN 3112.80 1997-02-023485 FOCAC VOGYNO 1155.08 1991-09-115351 CINIFI DELOV 2651.32 2003-02-166297 CIKEZ TAGINIMUW 1252.73 1990-10-296474 COLO VOLECECU 2948.95 1993-01-286629 LIHU DACACOMILY 4721.00 1989-06-078463 VIVO VOLEZO 1634.31 2002-01-158494 FALIHED SUPO 1542.59 1988-06-068787 MUDOD SIC 3874.86 1999-06-04…
…and the result from one of my test runs (we come up with some interesting names!). It could be adjusted to reflect the regional demographics.
29
EMPID FRIDAY_DATE HOURS1 2005-03-04 41.01 2005-03-11 42.01 2005-03-18 43.01 2005-03-25 44.02 2005-03-04 39.02 2005-03-18 46.0
EMPID EMPNAME1. Alan Sontag 2. Bobby Wolf 3. Dorothy Truscott
29
Case 7 – Missing Data Non-recursive
EX7EMPL
EX7TIME
For the month, notice that empid 3 has not reported any time; 2 has entered only partially.
A simple case involving a set of two tables with time reported by week. Some employees may fail to report their time.
30
EMPNAME FRIDAY_DATE TOTHOURSAlan Sontag 2005-03-04 41.0 Alan Sontag 2005-03-11 42.0 Alan Sontag 2005-03-18 43.0 Alan Sontag 2005-03-25 44.0 Bobby Wolf 2005-03-04 39.0 Bobby Wolf 2005-03-11 0.0 Bobby Wolf 2005-03-18 46.0 Bobby Wolf 2005-03-25 0.0 Dorothy Truscott 2005-03-04 0.0 Dorothy Truscott 2005-03-11 0.0 Dorothy Truscott 2005-03-18 0.0 Dorothy Truscott 2005-03-25 0.0
30
Case 7 – Missing Data Non-recursive
Required Report
List all employees and the reported time for March 2005. If an employee failed to report time, show it as zeroes.
We need a report that includes any missing information. Since we are dealing with optional data, our first thought would be a Left Outer Join.
However, how do you create a “left” table that has all the data when rows themselves are missing? See next slide to see how we can generate such a “left” table.
31
SELECT CART.EMPNAME, CART.FRIDAY_DATE, SUM(COALESCE(TIME.HOURS,0.0)) AS TOTHOURS
FROM ( SELECT EMPL.EMPID, EMPL.EMPNAME,
FRIDAYS.FRIDAY_DATE FROM EX7EMPL EMPL Employees
INNER JOIN All Fridays( SELECT DISTINCT FRIDAY_DATE FROM EX7TIME WHERE FRIDAY_DATE BETWEEN
'2005-03-01' AND '2005-03-31‘) AS FRIDAYS ON 1=1 Cartesian Product) AS CART
31
Case 7 – Missing Data Non-recursive
SQL Continued on next slide
CART is a nested table expression (NTE) of all employee-Fridays combinations that is such a “left” table, mentioned in the previous slide.
32
32
Case 7 – Missing Data Non-recursive (cont)
LEFT OUTER JOIN EX7TIME TIME … and their time if anyON CART.EMPID = TIME.EMPID AND CART.FRIDAY_DATE = TIME.FRIDAY_DATE GROUP BY
CART.EMPNAME , CART.FRIDAY_DATE
ORDER BY CART.EMPNAME
, CART.FRIDAY_DATE
We now perform a Left Outer Join of CART with the TIME table to show the time reported, if any.
Notice that if all of them cooperate and no one enters the time, this query will fail!
33
33
Case 7 – Missing Data - Recursive
WITH DATES (LEVEL, NEXTDAY) AS(
SELECT 1, DATE('2005-03-01') FROM SYSIBM.SYSDUMMY1UNION ALL SELECT LEVEL + 1, NEXTDAY + 1 DAYFROM DATES WHERE LEVEL < 32
)
SQL Continued on next slide
An alternative solution to the same problem using recursive SQL.
Also note that unlike the previous solution that requires at least one employee to enter the time, this solution works irrespective of who has entered the time.
34
34
Case 7 – Missing Data – Recursive (cont)
SELECT EMPL.EMPNAME , DATES.NEXTDAY AS FRIDAY_DATE , COALESCE(TIME.HOURS,0.0) AS TOTHOURS
FROM DATES INNER JOIN
EX7EMPL EMPL ON 1 = 1 LEFT OUTER JOIN
EX7TIME TIME ON DATES.NEXTDAY = TIME.FRIDAY_DATE AND EMPL.EMPID = TIME.EMPID WHERE NEXTDAY BETWEEN '2005-03-01' AND '2005-03-31'AND DAYOFWEEK(NEXTDAY) = 6 ORDER BY EMPL.EMPNAME
, DATES.NEXTDAY
Cartesian Product
is a Friday
SQL to generate the missing data, continued.
Alternatively, we could generate just the Friday dates by pushing these constructs into the CTE itself. However, care must be taken to start with a Friday (2005-03-04), otherwise the initial select adds zero rows to the CTE. You must also increment by 7 since first non-Friday will terminate the loop (this is left as an exercise for the reader).
35
35
Case 8 – Rollup
1-CORP(500)
2-I.T.(400)
3-H.R.(100)
4-DB SVC(1000)
5-PROG(5000)
6-EMPL REL(50)
7-BENEFITS(30)
Case of a hierarchy where the salary amounts must be rolled up to the higher costs center (e.g. Database Services and Programming are to be rolled up to the IT salary budget).
36
36
Case 8 – Rollup SQL
WITH ROLLUP (ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, TOTAL_BUDGET) AS
(SELECT ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, BUDGET_AMTFROM RECASE08
UNION ALL SELECT A.ACCT_NUM, A.ACCT_NAME,
A.PARENT_ACCT_NUM, B.TOTAL_BUDGETFROM RECASE08 A
, ROLLUP B WHERE A.ACCT_NUM = B.PARENT_ACCT_NUM AND B.PARENT_ACCT_NUM IS NOT NULL )
SQL Continued on next slide
me
and my boss
SQL used for this purpose.
The table RECCASE09 contains:
ACCT_NUM INTEGER NOT NULL (PK)ACCT_NAME CHAR(20) NOT NULLPARENT_ACCT_NUM INTEGER BUDGET_AMT DEC(9,2) NOT NULL
37
37
Case 8 – Rollup SQL (cont)
SELECT ACCT_NUM, ACCT_NAME, SUM(TOTAL_BUDGET) AS TOTAL_BUDGET
FROM ROLLUP GROUP BY ACCT_NUM, ACCT_NAME ORDER BY ACCT_NUM
SQL continued.
38
ACCT_NUM ACCT_NAME TOTAL_BUDGET
1 CORP 5002 I.T. 4003 H.R. 1004 DB SVC 10005 PROG 50006 EMPL REL 507 BENEFITS 30
38
Case 8 – Rollup Result
18064007080
… and the result.
39
39
Case 9 – Even Allocation
1-CORP(6000)
2-I.T.(4000)
3-H.R.(1000)
4-DB SVC(5000)
5-PROG(1000)
6-EMPL REL(100)
7-BENEFITS(50)
Case of a hierarchy where bonus amounts are to be allocated evenly across all subordinates. For example, the $6,000 for DST is to be split into 2 parts ($3,000 each) for IT and HR. We must allocate from top down.
40
40
Case 9 – Even Allocation SQL
WITH ALLOC (ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, TOTAL_BONUS) AS(SELECT ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, BONUS_AMT FROM RECASE10UNION ALL SELECT A.ACCT_NUM, A.ACCT_NAME, A.PARENT_ACCT_NUM , B.TOTAL_BONUS /
(SELECT COUNT(*) FROM RECASE10 X WHERE X.PARENT_ACCT_NUM =
A.PARENT_ACCT_NUM) FROM RECASE10 A
, ALLOC B WHERE B.ACCT_NUM = A.PARENT_ACCT_NUM)
DivideEvenly
SQL Continued on next slide
me
and mychildren
SQL used for this purpose.
The table RECCASE10 contains:
ACCT_NUM INTEGER NOT NULL (PK)ACCT_NAME CHAR(20) NOT NULL PARENT_ACCT_NUM INTEGER BONUS_AMT DEC(9,2) NOT NULL
41
41
Case 9 – Even Allocation SQL (cont)
SELECT ACCT_NUM, ACCT_NAME, SUM(TOTAL_BONUS) FROM ALLOC GROUP BY ACCT_NUM, ACCT_NAME ORDER BY ACCT_NUM
SQL continued.
42
ACCT_NUM ACCT_NAME TOTAL_BONUS
1 CORP 60002 I.T. 40003 H.R. 10004 DB SVC 50005 PROG 10006 EMPL REL 1007 BENEFITS 50
42
Case 9 – Even Allocation Result
850040007000
450021002050
…and the result.
43
43
Case 10 – Weighted Allocation
1-CORP(5000,R=null)
2-I.T.(4000,R=4)
3-H.R.(1000,R=1)
5-DB SVC(5000,R=5)
6-PROG(1000,R=3)
7-EMPL REL(100,R=3)
8-BENEFITS(50,R=2)
Similar hierarchy as before but this time, we need to accomplish a weighted allocation based on rating. For example the $6,000 bonus for CORP is to be divided using the ratings of 5 and 1 for IT and HR - IT will obtain 4 / (4+1) = 4/5 of the $5,000. As before, we must process top down.
44
44
Case 10 – Weighted Allocation SQL
WITH ALLOC (ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, TOTAL_BONUS) AS(SELECT ACCT_NUM, ACCT_NAME, PARENT_ACCT_NUM, BONUS_AMT FROM RECASE11UNION ALL SELECT A.ACCT_NUM, A.ACCT_NAME, A.PARENT_ACCT_NUM , B.TOTAL_BONUS * A.RATING / (SELECT SUM(RATING) FROM RECASE11 X
WHERE X.PARENT_ACCT_NUM = A.PARENT_ACCT_NUM) FROM RECASE11 A
, ALLOC B WHERE B.ACCT_NUM = A.PARENT_ACCT_NUM )
DivideBasedOn Rating
SQL Continued on next slide
My children
SQL for this purpose.
The table RECCASE11 contains:
ACCT_NUM INTEGER NOT NULL (PK)ACCT_NAME CHAR(20) NOT NULL RATING SMALLINT PARENT_ACCT_NUM INTEGER BONUS_AMT DEC(9,2) NOT NULL
45
45
Case 10 – Weighted Allocation SQL (cont)
SELECT ACCT_NUM, ACCT_NAME, SUM(TOTAL_BONUS)FROM ALLOC GROUP BY ACCT_NUM, ACCT_NAME ORDER BY ACCT_NUM
SQL continued.
46
ACCT_NUM ACCT_NAME TOTAL_BONUS
1 CORP 50002 I.T. 40003 H.R. 10004 DB SVC 50005 PROG 10006 EMPL REL 1007 BENEFITS 50
46
Case 10 – Weighted Allocation Result
1000020008000
40001300850
41
53
32
… and the result.
47
47
Case 11 – RI Children
A
B C D
E F G H I
J
An example of a hierarchy that show the referential integrity (RI) implemented using foreign keys. These relationships are visible in the catalog table SYSIBM.SYSRELS.
Notice the recursive relationship between B and itself, as well as the circular definition of constraints from D to H to J and back to D.
48
48
Case 11 – RI Children SQL
WITH OC (LEVEL, PARENTTAB, CHILDTAB) AS (SELECT 0 , 'Z', ‘your start table'
FROM SYSIBM.SYSDUMMYUUNION ALL SELECT PARENT.LEVEL + 1, SUBSTR(CHILD.REFTBNAME,1,1) , SUBSTR(CHILD.TBNAME,1,1) FROM OC PARENT, SYSIBM.SYSRELS CHILD WHERE PARENT.CHILDTAB = CHILD.REFTBNAME AND CHILD.CREATOR = …AND CHILD.REFTBCREATOR = …AND CHILD.REFTBNAME <> CHILD.TBNAME AND PARENT.LEVEL < 10 )
Unicodein catalog
EliminateSelf-ref
SQL Continued on next slide
my children
SQL used for this purpose.
49
49
Case 11 – RI Children SQL (cont)
SELECT DISTINCT OC.PARENTTAB, OC.CHILDTABFROM OC WHERE OC.LEVEL > 0 ORDER BY OC.PARENTTAB, OC.CHILDTAB
SQL continued.
Notice that the CTE generates much more data than necessary (e.g. node D is revisited multiple times) and the DISTINCT is needed to eliminate the duplicates. A very handy SQL enhancement would be for DB2 to allow us to specify DISTINCT (without level) within the CTE itself. V(future)) perhaps?
50
PATRENTTAB CHILDTAB
A BA CA DB EB FC GC JD HD IH JJ D
50
Case 11 – RI Children Result
… and the result.
51
51
Case 12 – Cheapest Fare
MCI
ORD STL
IAH DFW
Start City
EndCity
200100
100
50
25
50
150
500
Chicago
Kansas City
Houston
Dallas
St Louis
A case study to list and evaluate multiple paths.
52
52
Case 12 – RECASE12 Table
FROM_CITY TO_CITY FAREMCI ORD 100MCI CDG 200MCI DFW 500ORD STL 50ORD IAH 100STL IAH 25STL DFW 150IAH DFW 50
RECASE12
Contents of the table RECASE12 that defines the available routes and the associated fares.
53
53
Case 12 – Cheapest Fare SQL
WITH ROUTING (LEVEL, FROM_CITY, TO_CITY, CHAIN, FARE) AS( SELECT 0
, CAST('-->' AS CHAR(3) ) , CAST('MCI' AS CHAR(3) ) , CAST('-->MCI' AS VARCHAR(60) ) , 0
FROM SYSIBM.SYSDUMMY1 UNION ALL SELECT PARENT.LEVEL + 1 , CAST(SUBSTR(CHILD.FROM_CITY,1,3) AS CHAR(3) ), CAST(SUBSTR(CHILD.TO_CITY,1,3) AS CHAR(3) ) , PARENT.CHAIN CONCAT '->' CONCAT
CAST(SUBSTR(CHILD.TO_CITY,1,3) AS CHAR(3) ) , PARENT.FARE + CHILD.FARE FROM ROUTING PARENT, RECASE12 CHILD WHERE PARENT.TO_CITY = CHILD.FROM_CITY AND PARENT.LEVEL < 10)
SQL Continued on next slide
Builditinerary
Add fareFrom here to?
SQL used for this purpose.
54
54
Case 12 – Cheapest Fare SQL (cont)
SELECT DISTINCT SUBSTR(ROUTING.CHAIN,4,56) AS ROUTE
, ROUTING.FARE FROM ROUTING WHERE ROUTING.LEVEL > 0 AND ROUTING.CHAIN LIKE '-->MCI%' AND ROUTING.CHAIN LIKE '%DFW' ORDER BY ROUTE
SQL continued.
The length of the generated path (60 bytes, of which 56 are reported) is arbitrary and truncated here to make the output readable.
55
ROUTE FARE
MCI->DFW 500MCI->ORD->IAH->DFW 250MCI->ORD->STL->DFW 300MCI->ORD->STL->IAH->DFW 225MCI->STL->DFW 350MCI->STL->IAH->DFW 275
55
Case 12 – Cheapest Fare Result
… and the result.
A further enhancement (including the flight start and end days/times) would allow us to select connecting flights (e.g. next one must leave 1 hour to 4 hours from the arrival of the previous flight etc) and prepare a true itinerary like Travelocity or other search engines (… all with no pop-up ads!!).
56
56
Case 13 – Account Linking
30,80
Start
4
80,1207
10,20,30 1
20,60,70 310,40,50 2
50,60,110 640,90
5
90,100 8 110,130 9
14010
140,15011
customeraccount
A case study to group accounts that share at least one customer. Such linked accounts may receive favorable pricing.
57
57
Case 13 – Account Linking SQL
WITH LINKING (LEVEL, ACCT_NUM, CUST_NUM) AS( SELECT 0 , ACCT_NUM , CUST_NUM FROM RECASE13 WHERE ACCT_NUM = 4UNION ALL SELECT PARENT.LEVEL + 1 , CHILD2.ACCT_NUM , CHILD2.CUST_NUM FROM LINKING PARENT
, RECASE13 CHILD1 , RECASE13 CHILD2
WHERE PARENT.CUST_NUM = CHILD1.CUST_NUM AND PARENT.ACCT_NUM <> CHILD1.ACCT_NUM AND CHILD1.ACCT_NUM = CHILD2.ACCT_NUM AND PARENT.LEVEL < 5)
SELECT DISTINCT ACCT_NUM FROM LINKING
Starting acct num
Link by customer
Diff acctAll customers for that acct
SQL used for this purpose.
58
58
1. Recursion basics 2. Case studies - mathematical 3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations
Where Are We?Where Are We?
OK, it is complex and compact but from a performance perspective, how does recursive SQL look? We will explore this issue by comparing it with procedural logic (COBOL program) that accomplishes the same tasks.
59
59
Hierarchy
Level 0 : 1Level 1 : 4Level 2 : 16Level 3 : 64Level 4 : 256Level 5 : 1,024Level 6 : 4,096Level 7 : 16,384Level 8 : 65,536
(total 87,381 nodes)
Level 0
Level 1
Level 2
…
Level 8…..
The hierarchical structure used for the case study – each node with 4 children, 8 level deep.
60
60
Table Structure
…ACCT_NUM INTEGERACCT_NAME CHAR(20)PARENT_ACCT_NUM INTEGERAMOUNT DEC(15,2)ACCT_LEVEL INTEGER
RECASE14
Index K0 (uniq)
Index K1
Index K2
Derived column
Table structure and the indexes available.
61
61
Procedural Logic Setting Level
SELECT C.PARENT_ACCT_NUM , C.ACCT_NUM FROM RECASE14 P
, RECASE14 C WHERE P.ACCT_LEVEL = :WS-LOOP-LEVEL AND P.ACCT_NUM = C.PARENT_ACCT_NUMORDER BY PARENT_ACCT_NUM
, ACCT_NUM
SET LEVEL = 0 FOR ROOTFOR EACH LEVEL (0 TO 7 BY 1)
FOR EACH ROW OF CHILD-CURSOR UPDATE CHILD WITH LEVEL = PARENT + 1
NEXT CHILD NEXT LEVEL
my children
SQL for setting the level (used in all 3 cases) later.
62
62
Procedural Logic for Org Chart
<<< using the pre-set level-number >>>
SELECT BOSS.LEVEL_NUM , BOSS.ACCT_NAME , SUB.ACCT_NAME
FROM RECASE14 BOSS , RECASE14 SUB
WHERE SUB.PARENT_ACCT_NUM = BOSS.ACCT_NUM ORDER BY BOSS.LEVEL_NUM , BOSS.ACCT_NUM, SUB.ACCT_NUM my
children
Logic for Org chart.
63
63
Procedural Logic for Rollup
<<< using the pre-set level-number >>>
FOR EACH LEVEL 8 TO 1 BY -1FOR EACH ROW OF CHILD-CURSOR
ADD AMOUNT TO PARENT AND UPDATE NEXT CHILD
NEXT LEVEL
SHOW ALL ROWS
Logic for rollup.
64
64
Procedural Logic for Allocate
<<< using the pre-set level-number >>>
FOR EACH LEVEL 0 TO 7 BY 1FOR EACH ROW OF PARENT-CURSOR
DETERMINE # OF CHILDREN FOR EACH ROW OF CHILD-CURSOR
ADD PRO-RATED AMOUNT AND UPDATE CHILD
NEXT CHILD NEXT PARENT
NEXT LEVEL
SHOW ALL ROWS
Logic for allocation.
65
65
Benchmarks
Comparison of procedural logic with recursive logic
0
5
10
15
20
25
30
35
Org chart Roll up AllocateFunction
CPU
sec
ProceduralRecursive
…and a comparison of they stack up.
Unlike the Dan Luksetich case mentioned earlier, my results are not dazzling in favor of recursive SQL – it is better, but not by an order of magnitude.
Admittedly, other variables will also impact the benchmarks. For example, if the procedural logic is needlessly complex or inefficient, recursive SQL can look much better. I have avoided a biased approach – the best SQL and the best procedural logic are being compared. Other variables that could affect are the presence of indexes, depth of the hierarchy and how balanced the hierarchy is (very evenly balanced in this “lab” exercise). Perhaps, the size and number of SORTWORK datasets for DSNDB07 may also affect the performance.
66
66
1. Recursion basics 2. Case studies - mathematical 3. Case studies - business 4. Performance aspects 5. Pitfalls and recommendations
Where Are We?Where Are We?
Let’s summarize what we have learned and point out some pitfalls.
67
67
Limiting the Depth
♦ Prone to infinite cycles unless controlled properly ♦ DB2 imposes no limit on the depth of recursion but issues a
warning (SQLSTATE = 01605, SQLCODE=+347) for an SQL statement that does not use a control variable to limit the depth
Warning is essentially useless in SPUFI (displayed too late)♦ Warning is based on the absence of:
An integer column that increments by a constantA predicate of the type LEVEL < constant or LEVEL < :hvLEVEL < (subselect query) is allowed but issues a warningThe SQL parser logic is quite primitive – for example:
• a loop from 10 to 1 is not infinite but will still generate a warning• Subtracting -1 (same as adding 1) will still generate a warning
The potential to create an infinite loop is ever present since DB2 will not limit it to a specified depth.
69
69
Causing Infinite Loops
DSNT404I SQLCODE = 347, WARNING: THE RECURSIVE COMMONTABLE EXPRESSION OC MAY
CONTAIN AN INFINITE LOOPDSNT418I SQLSTATE = 01605 SQLSTATE RETURN CODE DSNT415I SQLERRP = DSNXODML SQL PROCEDURE DETECTING ERROR DSNT416I SQLERRD = 0 0 0 1209090530 0 0 SQL DIAGNOSTIC INFORMATION DSNT416I SQLERRD = X'00000000' X'00000000' X'00000000' X'481141E2'
X'00000000' X'00000000' SQL DIAGNOSTIC INFORMATION
LEVEL PARENTTAB CHILDTAB
Removed
WITH OC (LEVEL, PARENTTAB, CHILDTAB) AS
(SELECT 0 , 'Z', ‘your start table' FROM SYSIBM.SYSDUMMYU
UNION ALL SELECT PARENT.LEVEL + 1, SUBSTR(CHILD.REFTBNAME,1,1) , SUBSTR(CHILD.TBNAME,1,1) FROM OC PARENT, SYSIBM.SYSRELS
CHILD WHERE PARENT.CHILDTAB =
CHILD.REFTBNAME AND CHILD.CREATOR = …AND CHILD.REFTBCREATOR = …AND CHILD.REFTBNAME <>
CHILD.TBNAME AND PARENT.LEVEL < 10 )
Without the extra protection offered by the level clause, the warning and possible infinite loop.
70
70
Recommendations
♦ Use controls to limit the depth of recursion♦ Consider possible future changes that could cause infinite loops
(e.g. recursive or circular references)♦ Very useful for reporting data that does not exist!♦ Allows looping logic that would otherwise require procedural
code or SQL Procedures ♦ In most cases, simplifies processing logic♦ In specific instances, can lead to a huge performance gain
Powerful but dangerous in the wrong hands.
71
71
But it seems too hard…
“Don't blame me for the fact that competent programming, as I view it as an intellectual possibility, will be too difficult for 'the average programmer', you must not fall into the trap of rejecting a surgical technique because it is beyond the capabilities of the barber in his shop around the corner.”
E. F. Dijkstra
♦ Seems too complex for “the average programmer”?♦ Perhaps, but consider what Prof Dijkstra had to say…
Just because it is not for the masses, let it not stop you.
72
1. Recursion basicsIntroducing the new feature
2. Case studies - mathematical String of numbers, factorial, primes and FibonacciIt’s not that hard!
3. Case studies - businessOrg chart, Generating test data, Missing data, Rollup, Even allocation, Weighted allocation, RI children, Cheapest fare, Account linkingA little complex but compact
4. Performance aspectsOrg chart, rollup, allocateCompact and a better performer
5. Pitfalls and recommendationsPowerful but dangerous!
72
Summary
I trust this session has empowered you with the knowledge to exploit Recursive SQL fully. Good Luck!
73
73
References
1. Recursive SQL for Dummies – B.L. “Tink” Tysor - IDUG NA 2005 - Session A1
2. Recursive SQL – Unleash the Power! – Suresh Sane - IDUG EU 2007 - Session E5
3. DB2 UDB for z/OS V8– Everything You Ever Wanted to Know, … and More – SG24-6079
4. DB2 UDB for z/OS Version 8 Performance Topics – SG24-6465
5. Having Fun with Complex SQL – Suresh Sane - IDUG AP 2005 - Session B5
6. Parlez-Vous Klingon – Alexander Kopac - IDUG NA 2007 -Session ALT
7. “Rinse, Lather, Repeat: Utilizing Recursive SQL on DB2 UDB for z/OS” – Daniel L. Luksetich – IDUG Solutions Journal May 2004
Some of the many useful references.
74
74
Recursive SQL –Unleash the Power!
Not really…in the unlikely event that we actually have time ….I doubt it!
75
75
Suresh SaneDST Systems Inc.
Session G13Recursive SQL –Unleash the Power!
Thank you and good luck with Recursive SQL!