importance of database design
DESCRIPTION
Importance of Database Design v5TRANSCRIPT
-
5/26/2018 Importance of Database Design
1/16
Global Business Services
2007 IBM Corporation
Importance of database design
Task management system for an insurancecompany as an example
Anne LesellJuha Puroranta
-
5/26/2018 Importance of Database Design
2/16
Global Business Services
2003 IBM Corporation2
Agenda
Requirements & challenges
Logical datamodel and problems related to it
Resolution and the physical datamodel Lessons learned
Recommendations
-
5/26/2018 Importance of Database Design
3/16
Global Business Services
2003 IBM Corporation3
Requirements
Web based task mgmt system for an insurance company
~3000 employees using the application daily
~1,2M tasks inserted to the system per year
Avg. 8 req/sec at peak hours
Avg. 2 req/sec during normal load
Max. response time for a query: 1 second
Max. time for entry and modification of a task: 0,5 second
Usage of dynamic SQL in order to allow usage of various searchcriteria.
Application: JSF based web application on WAS v6
Database: DB/2 for z/OS
-
5/26/2018 Importance of Database Design
4/16
Global Business Services
2003 IBM Corporation4
-
5/26/2018 Importance of Database Design
5/16
Global Business Services
2003 IBM Corporation5
-
5/26/2018 Importance of Database Design
6/16
Global Business Services
2003 IBM Corporation6
Challenges
Three major functions with different search criterianot possible to use same indexes
tasks of an individual employee
tasks related to a individual customer
tasks of an individual business unit
Several different views to the data in each functioneven more indexes required
Customs of company(e.g. dynamic SQL forbidden, variable length and nullable fields not allowed)reasoning and explaining the benefits
-
5/26/2018 Importance of Database Design
7/16
Global Business Services
2003 IBM Corporation7
Logical datamodel
TEHTV
TEHTAVANO
KSITTELYTIETO
TEHTAVANO
LISAYSAIKA
SIIRTOVIESTI
TEHTAVANO
MUIDEN_HOITAMAT
OMISTAJA
TEHTAVANO
TAPAHTUMA_AIKA
SIIRTO
TEHTAVANO
HISTORIA
TEHTAVANOAIKA
-
5/26/2018 Importance of Database Design
8/16
Global Business Services
2003 IBM Corporation8
Problems with the logical data model
Too many indexes
Some of the indexes too big
Row size of teh tvtable was big only 4 rows would have fit into one pagefull table scans would have
taken a lot of time
Retrieving data in chunks (28 rows per query = 2 webpages)
Calculation of how many rows fulfill the search criteriawould have required a separate SQL query which in somecases could not have been done using only indexes
Response times would have been too slow
-
5/26/2018 Importance of Database Design
9/16
Global Business Services
2003 IBM Corporation9
Resolution
Splitting the tehtvtable in the logical datamodel into three tables inthe physical datamodel The search criteria for the unfinished tasks are different from the finished tasks
Data of the finished tasks does not change in the database
allowed optimization of the data (clustering index) The description [varchar(1000)] of the task is needed only on one page while the other data of
the table is needed more frequently.
All data used in the software not to be stored as-is to the database.Instead to be derived from other information. E.g. status of a task
avoin tehtv = tekem taulu & omistaja id = null
ksittelyss oleva tehtv = tekem taulu & omistaja id != null
tehty tehtv = tehdyt taulu
Negotiating reduction of functional requirements limiting search criterianice-to-have criteria were removed calculating sums on only certain pages
Negotiation on non-functional requirements allowing longer response times for reports provided to managers of business units
-
5/26/2018 Importance of Database Design
10/16
Global Business Services
2003 IBM Corporation10
Logical vs. physical datamodel
TEHTV
TEHTAVANO
KSITTELYTIETO
TEHTAVANOLISAYSAIKA
SIIRTOVIESTI
TEHTAVANO
MUIDEN_HOITAMATOMISTAJATEHTAVANOTAPAHTUMA_AIKA
SIIRTOTEHTAVANO
HISTORIATEHTAVANOAIKA
TEKEM
TEHTUN
KASTIETO
TEHTUNLAIKA
SVIESTI
TEHTUN
MUIDHOI
TEHTUNOMID
SIIRTO
TEHTUN
HISTORIA
HAIKAM
TEHDYT
TEHTUN
TKUVAUS
kuvausid
-
5/26/2018 Importance of Database Design
11/16
Global Business Services
2003 IBM Corporation11
How indexes are identified
Rough index design principles
Index for primary key and foreign keys
Cluster order such that processing big result sets will use sequential I/O, notrandom
Aiming to three star indexes:* optimal matching columns (indexable predicates (z/OS) / range delimiting (LUW)
* avoid sorting
* no table access, index only
Column order in index:
Start the index with columns in equal predicates and IS NULL predicates, highcardinality columns first (indexable / range delimiting +boolean)
Add the column in the most selective range predicate (indexable, stage-1 / rangedelimiting, index-sargable, boolean term) for index screening
Add the remaining columns, so that Order by / Group by will not result in sort
-
5/26/2018 Importance of Database Design
12/16
Global Business Services
2003 IBM Corporation12
How response times are calculated
VQUBE for DB2 for z/OS (very quick upper bound estimation)
Formula depends on the version of DB2, version of the computer and disk workload
We estimate the worse case, average only if it bears any meaning
We aim to avoid negative surprises in response times in true production
Formula is only for I/O bound SQL, different formula for CPU bound queries
Formula can be used for DB2 for LUW, often too pessimistic
VQUBE for DB2 for z/OS, z990, more than 400 MIPS processor
LocalResponseTime = # trandom x 10ms + # tsequential x 0,2ms + # sortr x 0,002ms
-
5/26/2018 Importance of Database Design
13/16
Global Business Services
2003 IBM Corporation13
Example how calculation is done
============================================
30 Tehtv haut - omat tehtavat (ksittelyss)
============================================
SELECT t.tmvipu, t.ktmvipu, t.onvipu, t.tehtun, t.tehtyyppi,
t.kampno, t.tehot, t.asno, t.versio, t.tpvm, t.saika,
t.lapvm, t.tprty, t.omid, t.omorgy, t.kasryh, t.aikam
FROM tekem tWHERE t.omid = ?
AND t.tehtyyppi IN (?,?,...)
AND t.tehtun > ?
AND t.apvm , apvm
yhdell ksittelyss = 200
tehtun > = 200
TR TS sort
t3 1 3
T 200
LRT = tr201 x 10 + ts 3 X 0,1 + sortr x 0,01 = 2010 ms
jos haetaan vain 50 ensimmist rivi, koska ei sorttia:
TR TS sort
t3 1 3T 50
LRT = tr51 x 10 + ts 3 X 0,1 + sortr x 0,01 = 510 ms
-
5/26/2018 Importance of Database Design
14/16
Global Business Services
2003 IBM Corporation14
Lessons learned
Involve a database specialist in the project during the analysis anddesign phase (include already in the project plan).
At testing phase it might be too late and might require a lot of refactoring of codeand database.
In Teha project no modifications to the database or software were needed after
the performance testonly accidentally missing index had to be added. Theactual response times were almost identical to the expected response times.
Anne spent around 50h for calculating the expected response times based onthe logical datamodel, refactoring and recalculating based on the new model.
About 15 mandays would have probably been needed to do only themodifications, if the problems would have been identified only during
performance testing. In addition performance tests would have had to be redonerequiring resource both from IBM and the customer.
Calculate the expected response times of the queries before design andespecially prior to implementation of the data access layer.
-
5/26/2018 Importance of Database Design
15/16
Global Business Services
2003 IBM Corporation15
Recommendations
Gather what kind of queries need to be done to the database and what arethe response time requirements for each of them.
Convert the queries into SQL clauses and calculate the expected response times foreach of them
Verify if the response times are satisfactory. Keep in mind that often more than one SQL-query needs to be done against the database for a single http request.
Gather the search and sorting criteria
The search criteria for the different queries should be matching at least for couple ofcolumns. In addition couple of other search criteria based on other columns could beused.
The order of the results should be sorted based on search criteria, not by additional
columns not used are search criteria.
Use caching of data in the presentation layer when possible instead ofdoing table joins
E.g. customer name vs. customer id, organizational unit name vs. orgunit id
-
5/26/2018 Importance of Database Design
16/16
Global Business Services
2003 IBM Corporation16
Recommendations
Gather characteristics of the database tables and their columns
Is the data stored in the table permanent or can it be altered?
How many rows will be stored in the table?
What is the expected growth rate of the rows in the table or does it remain constant or withincertain boundaries? Will the data be entered in chunks or shall it happen at a constant rate?
In which columns does the data change and their change frequency and in which it doesnt change
after it has been entered e.g. task id never changes, classification of the task changes rearly, assigned employee
changes frequently
allows to order of the columns in the table to be optimized so that writes to the rollback logs areminimal
What is the range of possible values in each column and are those values evenly used?
If the range of values is small and the table is huge, it might not be a good candidate for anindex?
If the range of values is big, but only 1 or 2 of them are used in 98% of the rows, it might not bea good candidate for an index?
Which of the columns are nullable and what proportion of the rows contain a null value in thatcolumn
Depending on the query that is done against the database, it is normally not a good column tobe used in a index