indexing strategies for db2 udb for iseries

IBM Global Services - IBM eServer iSeries

© 2005 IBM Corporation

Indexing Strategies for DB2 UDB for iSeries

Shantan [email protected]

IBM eServer iSeries


Agenda

• Types of indexes• How are indexes used• General approach to creating indexes• The “perfect” index• Examples

The material in this presentation was taken from the white paper: Indexing and Statistics Strategies for DB2 UDB for the iSeries.• ibm.com/servers/enable/site/education/abstracts/indxng_abs.html

IBM eServer iSeries


How are Indexes used &

Why are they important

IBM eServer iSeries


Binary Radix Index

• Key values are compressed– Common patterns are stored once– Unique portion stored in “leaf” pages– Positive impact on size and depth of the index tree

• Algorithm used to find values– Binary search

• Very efficient process to find a unique value

– Modified to fit the data structure

• Maintenance– Index data is automatically spread across all available disk units– Tree is automatically rebalanced to maintain an efficient structure

IBM eServer iSeries


Binary Radix Index

Database Table

001 ARKANSAS

002 MISSIPPI

003 MISSOURI

004 IOWA

005 ARIZONA

… …

ADVANTAGES: Quick access to a single key

value (million-entry index, on average, only 20 tests)

Also efficient for small, selected range of key values (low cardinality)

DISADVANTAGES: Table rows retrieved in order of key values

(not physical order) which equates to many random I/O’s when selecting a large number of keys (high cardinality)

No way to predict which physical index pages are next when traversing the index for large number of key values

ROOTROOT

Test Node

Test Node

MISSMISS

ISSIPPI002

ISSIPPI002

OURI003

OURI003

IOWA004

IOWA004IZONA

005

IZONA005

KANSAS001

KANSAS001

ARAR

IBM eServer iSeries


Encoded Vector Index (EVI)

• New index object for delivering fast data access in decision support and query reporting environments

– Complementary alternative to existing index object (binary radix tree structure – keyed logical file or SQL index)

– Advanced technology from IBM Research, that is variation on bitmap indexing– Easy to access data statistics improve query optimizer decision making

• Can only be created through an SQL interface– CREATE ENCODED VECTOR INDEX Library/EVI_Name on Library/Table_Name (Column) WITH n

DISTINCT VALUES

IBM eServer iSeries


Encoded Vector Index (EVI)

Vector

Row Number Code

1 1

2 17

3 18

4 9

5 2

6 7

7 38

8 38

9 1

Symbol Table

Key ValueCode First Row Last Row Count

Arizona 1 1 80005 5000

Arkansas 2 5 99760 7300

…

Virginia 37 1222 30111 340

Wyoming 38 7 83000 2760

What is it?: New type of index *FILE object type, LF attribute Composed of two parts

Symbol table contains information for each distinct key value. Each key value is assigned a unique code

Code is 1, 2, or 4 bytes depending on number of distinct key values Rather then a bit array for each distinct key value, the index has one array of codes

(The Vector)

IBM eServer iSeries


Index Selection

• Selection criteria is applied to ranges of index entries to quickly get a subset of rows before the table is retrieved.

– Advantages:• Only those index entries that are within a selected range are processed• Provides quick access to rows in an OLTP environment

– Potential Disadvantages:• Can perform poorly when a large number of rows are selected

– Requires a separate Random I/O against the table to extract the values

– Rule of Thumb:• Used when only asking for or expecting a few rows returned from the index• Used when sequencing the rows is required for ordering or grouping• The selection columns match the first (n) key fields of the index

IBM eServer iSeries


General Approach to Creating Indexes

IBM eServer iSeries


Proactive Query Tuning

• Remember the goal of creating indexes is to give the optimizer the statistics and implementation choices it needs while it is choosing an access plan for the query.

– Requires an understanding of the database model and types of queries that will be run against it– Build indexes for the largest or most commonly used queries– For ad-hoc (OLAP) or less frequently used queries build single key EVIs over the local selection

columns used in the queries– Make sure that statistics exist for the most and least selective columns for the query

• This may mean creating an index that will never be used to implement the query but only to provide the correct statistics

– Customize this approach to your own environment and query needs

IBM eServer iSeries


Reactive Query Tuning

• Reactive query tuning really means, develop the application and any initial indexes and then run the application to see what gets used or created by the optimizer.

– Usually highlights the slower running queries, even on a subset of the entire database records (test database)– Useful for tuning existing applications that are not performing as expected– Use the feedback from the optimizer to discover:

• Any indexes or statistics the optimizer recommends for local selection• Any temporary indexes used for the query• The implementation method(s) that the optimizer has chosen to run the queries

– Use the index advisor to help guide you as to what local selection columns may provide the best index coverage– Create permanent indexes over the same columns that any temporary indexes were created upon. Try to eliminate

the temporary index builds• This also applies to temporary hash tables built over the entire table with no selection applied

IBM eServer iSeries


Other Indexing Tips

• Avoid null capable columns if expecting to use index only access. Index only access is not available when a key column in the index is null capable

• Avoid derived expressions in local selection. Access via an index may not be used for predicates that have derived values.– T1.ShipDate > (CURRENT DATE – 10 DAYS)– UPPER(T1.CustomerName) = “SMITH”

• Index access is not used for predicates where both operands come from the same table– T1.ShipDate > T1.OrderDate

• Consider index only access if all of the columns used in the query are represented in the index as key columns• Use the most selective columns as keys in the index

– Preference should be given to columns used in equal comparisons• For key columns that are unique, specify the UNIQUE keyword when creating the index

IBM eServer iSeries


The “Perfect” Index

IBM eServer iSeries


Perfect Index Guidelines• Order of the columns in an index is very important. Optimizer may not use an index if the columns

are in an incorrect order. Use the following guideline:– Equal predicates first. Predicates using the “=“ operator generally eliminate the largest number of non-

participating rows and should therefore be first in the index– If all of the predicates have an equal operator, then order the columns as follows:

• Selection predicates + join predicates• Join predicates + selection predicates• Selection predicates + group by columns• Selection predicates + order by columns

– Always place the most selective columns as the first key in the index– Create perfect indexes ahead of time for pre-determined queries or queries that produce a standard report– Indexes will take up system resources, find a balance between query performance and system (index)

maintenance– A binary radix index is the fastest data access method available for a query that is highly selective and

returns a small number of rows

IBM eServer iSeries


Using a Query Graph

• Queries can also be represented as a graph to help visualize what columns should be considered for index creation:

– Separate all of the tables and major functions in the query into different nodes of the graph• Create a different node for each table, grouping, ordering or join requirement

– Push all of the selection to the lowest level possible in the graph– Process the columns starting at the bottom of the graph to determine which ones should be included

into any indexes– Use the Perfect Index Guidelines to determine what should be included into the index

• Columns at the top of the graph may be better suited for a column stat rather than a permanent index.

IBM eServer iSeries


Examples

IBM eServer iSeries


One-Table Query

SELECT Customer, Customer_Number, Item_NumberFROM ItemsWHERE Year = 2000

AND Quarter = 4AND ReturnFlag = ‘R’AND ShipMode = ‘AIR’

ORDER BY Customer_Number, Item_Number

CREATE INDEX Perfect_Index ON Items (Year, Quarter, ShipMode, ReturnFlag, Customer_Number, Item_Number)

Place the most selective selection predicates first in this index based upon the database model and other queries use of the same columns.

Query has four local selection predicates and two ORDER BY columns. Follow the general guidelines of Selection predicates + order by columns:

IBM eServer iSeries


One-Table Query Graph

Always start looking at the bottom of the graph for the columns to place into the index first. Then work your way up the graph looking for additional columns.

WHERE Year = 2000AND Quarter = 4 AND ReturnFlag = ‘R’ AND ShipMode = ‘AIR’

Final Select

Table (Items)

Customer_Number, Item_NumberORDER BY

IBM eServer iSeries


Three-Table Join Query

SELECT T3.Year, T1.Customer_Name, SUM(T2.Revenue_WO_Tax)FROM CustDim T1, SalesFact T2, TimeDim T3WHERE T2.CustKey = T1.CustKey

AND T2.TimeKey = T3.TimeKeyAND T3.Year IN (2000, 2001)AND T3.Quarter = 1AND T1.Continent = ‘America’AND T1.Country = ‘United States’AND T1.Region = ‘Central’AND T1.Territory = ‘Five’

GROUP BY T3.Year, T1.Customer_NameORDER BY T1.Customer_Name, T3.Year

Query has two join predicates and six selection predicates. Focus first on the selection predicates for each table in the query.

IBM eServer iSeries


Three-Table Join Query Graph

WHERE T1.Continent = ‘America’AND T1.Country = ‘United States’ AND T1.Region = ‘Central’AND T1.Territory = ‘Five’

Final Select

Table (CustDim)

T1.Customer_Name, T3.Year

Table (SalesFact)WHERE T3.Year IN (2000, 2001)

AND T3.Quarter = 1

T2.CustKey = T1.CustKey

T2.TimeKey = T3.TimeKey

T3.Year, T1.Customer_Name

ORDER BYGROUP BY

Table (TimeDim)

The columns at the top of the graph may be better suited for a column stat rather than an index.

IBM eServer iSeries






CREATE INDEX Perfect_TimeDim_Index ON TimeDim (Year, Quarter, TimeKey)

The TimeDim table has two equal selection predicates and one join predicate. Use the general guideline for Selection predicates + join predicates:

IBM eServer iSeries






CREATE INDEX Perfect_CustDim_Index ON CustDim (Continent, Country, Region, Territory, CustKey)

The CustDim table has four equal selection predicates and one join predicate. Again use the general guideline for Selection predicates + join predicates:

IBM eServer iSeries






CREATE INDEX Perfect_SalesFact_Index1 ON SalesFact (CustKey)CREATE INDEX Perfect_SalesFact_Index2 ON SalesFact (TimeKey)

The SalesFact table only has two join predicates. Since we don’t know the order in which the tables will be joined we must provide the optimizer with the flexibility of having indexes with both combinations of the join predicates.

IBM eServer iSeries


Non-Equal Predicates


AND T2.TimeKey = T3.TimeKeyAND T3.Year >= 2000AND T3.Quarter = 1AND T1.Continent = ‘America’AND T1.Country = ‘United States’AND T1.Region = ‘Central’AND T1.Territory = ‘Five’


CREATE INDEX Perfect_TimeDim_Index ON TimeDim (Quarter, TimeKey, Year)

Inequalities tend to return more rows because they deal with a range of values rather then a specific value for an equal operator. Thus they should be placed at the end of the index.

IBM eServer iSeries


Non-Equal Predicates Graph

WHERE T1.Continent = ‘America’AND T1.Country = ‘United States’ AND T1.Region = ‘Central’AND T1.Territory = ‘Five’

Final Select

Table (CustDim)

T1.Customer_Name, T3.Year

Table (SalesFact)WHERE T3.Year >= 2000

AND T3.Quarter = 1

T2.CustKey = T1.CustKey

T2.TimeKey = T3.TimeKey

T3.Year, T1.Customer_Name

ORDER BYGROUP BY

Table (TimeDim)

The non-equal predicate can be added as the last key in the index, or a column stat could be created.

IBM eServer iSeries


EVI One-Table Query


AND Quarter = 4AND ReturnFlag = ‘R’AND ShipMode = ‘AIR’

ORDER BY Customer_Number, Item_Number

CREATE ENCODED VECTOR INDEX Perfect_EVI1 ON Items (Year)CREATE ENCODED VECTOR INDEX Perfect_EVI2 ON Items (Quarter)CREATE ENCODED VECTOR INDEX Perfect_EVI3 ON Items (ReturnFlag)CREATE ENCODED VECTOR INDEX Perfect_EVI4 ON Items (ShipMode)

Dynamic bitmaps will be created from the EVI indexes and the results will be ANDed together to satisfy the query request.

When a query is not very selective (20% - 70%) then skip sequential is usually the best method. EVI’s can be scanned more efficiently. EVI’s can be used for selection in these queries, however they cannot not be used for ordering, grouping or joins.

IBM eServer iSeries


EVI One-Table Query


AND Month IN (1, 2, 3)AND ReturnFlag = ‘R’AND ShipMode = ‘RAIL’

CREATE ENCODED VECTOR INDEX Perfect_EVI5 ON Items (Month)

Either a new index over the Month column could be created or we can leave that selection to be processed as post dynamic bitmap selection.

In this case, the EVI indexes we already created on Year, ReturnFlag and ShipMode can be reused for this new query.

IBM eServer iSeries


Indexing Strategy Summary

• Indexes don’t have to be used by the optimizer to be considered helpful• Proactively create indexes that you know will be useful based upon the database model• Use tools to help you reactively create indexes that are still required• Remember, the perfect index can be different for every single query, try to find the right

blend of indexes for your environment• Read the white paper – “Indexing and Statistics Strategies for DB2 UDB for the iSeries”

– ibm.com/servers/enable/site/education/abstracts/indxng_abs.html

IBM eServer iSeries


Additional Information

• DB2 UDB for iSeries home page – http://www.iseries.ibm.com/db2• Newsgroups

– USENET: comp.sys.ibm.as400.misc, comp.database.ibm-db2– iSeries Network (NEWS/400 Magazine) SQL & DB2 Forum – http://www.iseriesnetwork.com/Forums/main.cfm?CFApp=59

• Education Resources – Classroom & Online– http://www.iseries.ibm.com/db2/db2educ_m.htm – http://www.iseries.ibm.com/developer/education/ibo/index.html

• DB2 UDB for iSeries Publications– Online Manuals: http://www.iseries.ibm.com/db2/books.htm – Porting Help: http://www.iseries.ibm.com/developer/db2/porting.html – DB2 UDB for iSeries Redbooks (http://ibm.com/redbooks )

• Stored Procedures & Triggers on DB2 UDB for iSeries (SG24-6503)• DB2 UDB for AS/400 Object Relational Support (SG24-5409)• SQL Query Engine

– (http://publib-b.boulder.ibm.com/Redbooks.nsf/RedpieceAbstracts/sg2456598.html)

– SQL/400 Developer’s Guide by Paul Conte & Mike Cravitz• http://iseriesnetwork.com/str/books/Uniquebook2.cfm?NextBook=183

– iSeries and AS/400 SQL at Work by Howard Arner• http://www.sqlthing.com/books.htm

• Please send questions or comments to [email protected]

indexing strategies for db2 udb for iseries

Documents

entry index

index advisor

binary radix index key

index tree algorithm

new type of index

physical index pages

ranges of index entries

best index coverage