Download - Performance Tuning for Developers and DBA

8/10/2019 Performance Tuning for Developers and DBA

1/40

Oct 5th 2009 4pm

Platform: z/OS

Kurt Struyf

Competence Partners

Session: E03

Practical SQL performance tuning,

for developers and DBA


2/40

2

Agenda One SQL, one access path

Index, stage1, stage2

Sort impact

SQL examples of sub optimal coding and

its improvements

Access path fields in the plan_table

Other CPU saving techniques


3/40

3

Static SQL One SQL = One access path

SELECT FROM WHERE NAME BETWEEN :HV1-LOW AND :HV1-HIGH

AND FIRSTNAME BETWEEN :HV2-LOW AND :HV2-HIGHAND BIRTHDATE BETWEEN :HV3-LOW AND :HV3-HIGHAND ZIPCODE BETWEEN :HV4-LOW AND :HV4-HIGH

Our table has 4 indexes :

IX1 on NAME

IX2 on FIRSTNAMEIX3 on BIRTHDATE

IX4 on ZIP CODE

AT BIND TIME DB2 CHOOSES IX3

Step1

Step2

Step3

AT RUN TIME

User only fills out a value for ZIP CODE

Step4

AT RUN TIME DB2 USES IX3

which doesnt filter anything ONE SQL = ONE access path

Step5

DB2 determines for each SQL statement the best way to resolve the query. The result

of this calculation is the access path. If it is a static SQL statement, this access path

will be chosen at bind time. As a general rule we can say that after DB2 has chosen

this, it wont change this access strategy at execution time. Even if at run time certain

other access path choices would have been better. This is somewhat simplifying the

truth, but is in most cases accurate.

In the example on the slide this is explained by a very simple static query.

Our table has 4 indexes. Our select has on all 4 columns a between. If nothing is

filled out by a user, the host variables are low value and high value. If the user

provides a value both host variables for that column hold the provided value.

At bind time, DB2 chooses IX3 as the best possible access path, with the known

parameters at that time.

IF at execution time our user doesnt fill out a value for COL3, but he does provide a

value for COL1. DB2 doesnt change his access path to IX1, but uses IX3, whichdoesnt filter anything.

Well explain more later, the purpose here is just to explain that DB2 chooses one

access path and sticks to it. This access path can be a cheap access path or a more

expensive access path. But DB2 estimates that within the parameters at bind time it is

the cheapest.


4/40

4



Sort impact


its improvements




5/40


6/40


7/40

7

Matching columns is an indication of how well an index is used,

- more matching columns better index use

- always start with first index column- on = and on one IN you can continue

Example: Index on (Name, Clientno, Salarycode)

Predicate Matching--------------------- -------------1. Name = Smith AND

Clientno = 20 ANDSalarycode = 56

2. Name = Smith ORClientno > 20 ANDSalarycode = 56

3. Name IN (Smith, Doe)AND Clientno > 20AND Salarycode = 56

Matching Columns

Predicate Matching--------------------- -------------4. Name IN (Smith, Doe)

AND Salarycode > 0

5. Name SmithAND Clientno = 56

6. Cliento = 56AND Salarycode = 0


8/40

8

Stage 1 Keep it positive and simple but no index!

= : equal to

> : larger then

= : larger then or equal to


9/40

9

Stage 2 All the rest !! All functions such as

SUBSTR

CONCAT

CHAR()

Mismatching data types

Colchar_6 = 1234567

Host variable checking

AND :HV1 = 5

Decryption

Current date between col1 and col2 Sorting

DB2

RDS

DM

Index Index

Stage1

Stage2

All functions require by definition more procession power then what the data

manager is capable of providing, and so they are resolved in stage 2.

This functions also include any mathematical function such as adding and subtracting

with a column.

Mismatching data types, this is a bit more complex. As a general rule of thumb, you

can say that, when the data type of a the host variable doesnt match the data type of

the column. The predicate is stage2. This is cutting it a bit short, you could also say

(and is more correct) if the host variable is bigger than the column data type the

predicate is stage 2. Many exceptions exist, but best is to use the correct data type.

Host variable checking is done in stage2 and this should NEVERbe done in SQL and

should always be done in COBOL.


10/40

10

In COBOL checking stage 3 NEVER (ab)use this!

All DB2 columns that

CAN be checked in SQL

SHOULD be checked in SQL

So BETTER

a Stage2 predicate then NO predicate

DB2

RDS

DM

Index Index

Stage1

Stage2

IN COBOL

Being said that stage2 is expensive doesnt mean that you should use them.

If indeed the only way to write the predicate on a COLUMN is as a stage2 predicate,

you should write it as a stage2 and not pass the row on to COBOL and check it in

COBOL, that obviously is even more expensive. If such a thing as stage3 would

exist this would be it.


11/40

11

Index, Stage1, Stage2DB2

RDS

DM

Index Index

Stage1

Stage2

This time around you should understand this slide. And know that there are more and

less expensive ways to writing a query, depending on where DB2 can resolve its

where predicates. And how many rows are filtered as early (index) on as possible and

how many are carried on to stage1 or even stage2.


12/40

12

SQL processing

DM (stage1)

1) matching index predicates (when the index is accessed)

2) other indexable stage 1 predicates (index screening)

3) non indexable stage 1 predicates on index pages

4) stage 1 predicates on the data

5) rows passed to RDS

RDS (stage2)

1) stage 2 predicates2) sort

Selected rows passed to the user

DB2 resolves its where predicate always in the same manner.

First it will resolve the matching index predicates in the sequence of the index

columns

Secondly it will resolve all the screening predicates in the index

Thirdly DB2 will resolve all non indexable where predicates, that are stage 1 and can

be resolved in the index pages

Fourth, DB2 will resolve all stage1 predicates on the data

Then all stage2 predicates are resolved and lastly all returning rows are sorted.


13/40

13

Order of evaluating predicates

Within each of the above non index steps :

1) all equal predicates

2) all range predicates and col IS NOT NULL

3) all other predicates

Within each of the above sub-step :

the order in which they appear

Within all the non index steps of the previous slide, the same logic is followed.

E.G step 4 stage1 on data pages :

First DB2 will resolve all equal predicates

Secondly all range predicates

Thirdly all the rest (e.g. not equal to)

Within each sub step, the order in the SQL statement is followed. That means that if

we for example have two equal predicates that we have to resolve in the data pages,

DB2 will take the physical sequence in the SQL statement to determine the order in

which to resolve the predicates.

Well explain with a little example on the next slide


14/40

14

ExampleSELECT *FROM MYTABLEWHERE C1 > ? 1 i ndexAND : HV 5 6 st age2

AND C5 = ? 3 st age1AND C4 = ? 4 st age1AND DATE( C2) < ? 5 st age2

AND C3 = ? 2 i ndexORDER BY C2 7 st age2

INDEX (C1, C3)


15/40

15



Sort impact


its improvements




16/40


17/40

17



Sort impact


its improvements




18/40

18

Select * SELECT * almost never to be used SELECT ONLY COLUMNS that are

needed !

Reason :

Program maintenance

CPU cost per extra column

SORT file becomes bigger Maybe not index only


19/40

19

Select * Even for :where exists (select *)Better where exists (select 1)

Select col5, where col5= ABBetter Select AB where col5= ABBest Select where col5= AB

Select col1, col2order by col2Better Select col1order by col2if just for order by


20/40

20

Other easy improvements:hv between col1 and col2 col1 >= :hv and col2 0

COL :hv COL in ( , , , , , )

COL not 5


21/40

21

Other easy improvementsSELECT DISTINCT COL1, COL2, COUNT(C1)

FROM TABLE

WHERE

Always results in extra SORT

SELECT COL1, COL2, COUNT(C1)

FROM TABLE

WHERE

GROUP BY COL1, COL2

Same results SORT can be avoided

V9

Before version 9, although logically alike, there was a clear difference between both,

queries.

Using a distinct would always result in an extra sort, whereas the second query, with

adequate indexing could avoid the sort.

For instance an index on COL1, COL2 would have avoided a sort in the second query.

Since version 9, the distinct clause can also be used to avoid an extra sort.

Another important change is that since V9 and index COL2, COL1 can also be used to

avoid an extra sort. That of course means that you could have an impact in the

sequence of your result set and an order by clause should be included if you want to

guarantee the V8 sequence.


22/40

22

More easy improvementsCol1=A orCol1= B Col1 in (A,B)

Col1>= :hv1 and COL1= :hv1 AND

Col1 = :hv1 or (col1=:hv1 or

Col1 >:hv1 and Col2 = :hv2 col1>:hv1 and col2 =:hv2)

Col1 = :hva (always 5) Col1 = 5

:hv = 5 IN COBOL !!!


23/40

23

Even More easy improvementsCol1 not between 10 and 50 col 1 < 10

union all

col1 > 50

Existence checking select 1

from table

where col1 =:hv

fetch first 1 row only

Col1 not in (A, B, C) if possible

Col1 in (the rest)

will be cheaper even

when list is bigger


24/40

24



Sort impact


its improvements




25/40

25

Determine Access Path Optimization Service Center

Newest generation of Visual explain

Plan_table See next slide

Might require some exercise

Not everything in it

DSN_statement_table Contains the Cost columns


26/40

26

DB2 Plan_tableSELECT QBLOCKNO, PROGNAME, PLANNO, METHOD,

TNAME, ACCESSTYPE, MATCHCOLS, ACCESSNAME, I NDEXONLY, PREFETCH

FROM PLAN_TABLE WHERE QUERYNO = 30303

ORDER BY QBLOCKNO, PLANNO ;

QBLOCKNO PROGNAME PLANNO METHOD TNAME ACCESSTYPE MATCHCOLS ACCESSNAME I NDEXONLY

1 DSNESM68 1 0 AATEHA1 I 2 AAX0EHA1 N

1 DSNESM68 2 1 AATEHB1 I 2 AAX0EHB1 N

1 DSNESM68 3 3 0 N

Qblockno: indicates the number blocksnecessary to resolve the query

General rule, more blocks = less performing

Progname: represents the Program/packagename


27/40

27

Access path: planno, method Planno: the number of steps AND thesequence in which a query is resolved

General rule, more steps = less performing

Method: expresses what kind of access is

done 0 : First access

1 : Nested Loop Join

3 : extra sort needed

Tname: table name to be accessed

Access type : how that data is accessed


28/40

28

DSN_Statement_Table Amongst others :

COST_CATEGORY:

A: Indi cates that DB2 had enough i nfo rmatio n to make a cost esti mate withou t using

default values.

B: Indicates that some condition exists for which DB2 was forced to use default

values.

PROCMS:The estimated processor cost, in mil liseconds, for t he SQL statement

PROCSU:The estimated processor cost, in service units , for the SQL statement


29/40

29

DSN_PREDICAT_TABLE Contains all predicates and how they are

used.

Extremely useful for index design

Replaces the old spreadsheet

technique


30/40

30

Access Path Follow UpSpecificplan_tables

Identify

every

query

using

QUERYNO

New binds

plan_tables

Generalplan_tables

Transfer

LAN

EXCELL

Changes

plan_tablesEMAIL

Insert

It is also best to set up, an automated way of following up your access path changes.

And notifying your DBA and responsible developers.


31/40

31



Sort impact


its improvements




32/40

32

Multi Row fetch Technique to save up to 60% of DB2 cpu

Easy to use

Fetches a rowset into an array

Program can control size of rowset

!! due to compiler limits !!

elementary item : max. 16Mb

complete working storage : max 128 Mb


33/40

33

Multi Row Fetch To be able to use this, the cursor should be

DECLAREd for rowset positioning, forexample:

EXEC SQLDECLARE cur sor - name CURSORWITH ROWSET POSITIONING FORSELECT col umn1

, col umn2 FROM t abl e- name;END- EXEC

instead ofEXEC SQL

DECLARE cur sor - name CURSOR FORSELECT col umn1

, col umn2 FROM t abl e- name;

END- EXEC

Then you can FETCH multiple rows at-a-timefrom the cursor


34/40

34

Multi Row FetchOn the FETCH statement

the amount of rows requested can be specifiedfor example:

EXEC SQLFETCHNEXT ROWSET FROMcurs or- nameFOR :rowset-size ROWS

I NTO END- EXEC

instead ofEXEC SQL

FETCH curs or - nameI NTO

END- EXEC

The rowset size can be defined as a constant or avariable, for example:

01 rowset-size PIC S9(09) COMP-5.


35/40

35

Multi Row fetch Do not use single and multiple row fetch

for the same cursor in one program

Be aware of compiler limits elementary item : max. 16Mb


Last FETCH on a rowset can be

incomplete

!! due to compiler limits !!

elementary item : max. 16Mb



36/40

36

Multi Row Fetch Performance results may differ: < 5 rows : poor performance (worse than before)

10 100 rows : best performance

> 100 rows : no improvement anymore

Following data is based upon treatment of

1 million rows (in seconds CPU).

16 (-35%)6076FETCH + UPDATE via rowset

10 (-15%)6676FETCH + UPDATE via row

10 (-60%)616FETCH

Gain on DB2

in CPU seconds

Via rowsetVia row

Performance results may differ, depending on the amount of columns and

their data type, but mainly:< 5 rows : poor performance (worse than before)

10 100 rows : best performance

> 100 rows : no improvement anymore (same as 10 - 100 rows)

gain 10 seconds of CPU per one million rows when using rowset pointers

Following data is based upon treatment of 1 million rows (in seconds CPU).


37/40

37

Sequences Easy, fast and cheap way to generate

unique numbers if : Holes are allowed

The order isnt important

Use : next value for yy.xxxxxxxx statement

BASIC SYNTAX : CREATE SEQUENCE yy.xxxxxxxx

START WITH 1

INCREMENT BY 1

NO MINVALUE

NO MAXVALUENO CYCLE

CACHE 200;


38/40

38

SequencesEffect of concu rrency on elapsedtime

0

2

4

6

8

1 2 3

amount jobs

duration

own table

seq object

Effect on cpu usage

0

20

40

60

80

100

120

1 2 3

amount jobs

cpu own table

seq object

Better response times

Less cpu need


39/40

39

Questions ?

[email protected]


40/40

Download - Performance Tuning for Developers and DBA

Top Related