communicating with the outside

21
Communicating with the Outside

Upload: jake

Post on 12-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Communicating with the Outside. Application. Query Processor. Indexes. Storage Subsystem. Concurrency Control. Recovery. Operating System. Hardware [Processor(s), Disk(s), Memory]. Accessing the Database. 4GL Power++, Visual basic Programming language + Call Level Interface - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Communicating with the Outside

Communicating withthe Outside

Page 2: Communicating with the Outside

Hardware[Processor(s), Disk(s), Memory]

Operating System

Concurrency Control Recovery

Storage SubsystemIndexes

Query Processor

Application

Page 3: Communicating with the Outside

Accessing the Database

1. 4GL Power++, Visual basic

2. Programming language + Call Level Interface ODBC: Open DataBase Connectivity JDBC: Java based API OCI (C++/Oracle), CLI (C++/ DB2) Perl/DBI

Page 4: Communicating with the Outside

ODBC

An application using the ODBC interface relies on: A driver manager, to load a driver that is

responsible for the interaction with the target DB system

The driver implements the ODBC functions and interacts with the DB server

Page 5: Communicating with the Outside

API pitfalls

Cost of portability Layer of abstraction on top of ODBC drivers to

hide discrepancies across drivers with different conformance levels.

Beware of performance problems in this layer of abstraction: Use of meta-data description when submitting queries,

accessing the result set Iterations over the result set

Page 6: Communicating with the Outside

Characterization of ODBC drivers Conformance level: portable applications, level 1 ODBC

functions Core functions for allocating, deallocating handles for DB

environment, connection, and statements; for connecting to a DB server; for executing SQL statements; for receiving results; for controlling transactions; and for error handling

SQL dialect: Generic ODBC drivers transform SQL statements that follow ODBC SQL grammar into the dialect of a particular DB system. Specific ODBC drivers assume SQL statements follow the dialect of the system they are connected to.

Client-server communication mechanism: Generic ODBC drivers rely on a DB-independent comm layer; specific ODBC drivers exploit the particular mechanisms supported by the DB server => much faster

Page 7: Communicating with the Outside

Specific vs generic ODBC drivers SQLServer, Oracle, DB2 have specific ODBC

drivers Allow a program to run on different DB systems as

long as queries are expressed in the dialect of the target system

DataDirect Technologies and OpenLInk provide generic ODBC drivers Provide transparent portability But worse performance and loss of features

Page 8: Communicating with the Outside

ODBC vs. OCI

ODBC vs. OCI on Oracle8iEE on Windows 2000

Iteration over a result set one record at a time. Prefetching is performed.

Low OCI overhead when number of records transferred is small

ODBC performs better when number of records transferred increases. Better job at prefetching.

0

20

40

60

80

0 2000 4000 6000 8000 10000

Number of records transferred

Th

rou

gh

pu

t (r

eco

rds/

sec)

OCI

ODBC

Page 9: Communicating with the Outside

Client-Server Mechanisms DB server produces results that are consumed by the

application client The client is asynchronous wrt the server Communication buffer on the database server. One per

connection. Two performance issues:

1. If a client does not consume results fast enough, then the server stops producing results and holds resources until it can output the result. A client program executing on a very slow machine can affect the

performance of the whole DB system2. Data is sent either when the communication buffer is full or when

a batch is finished executing. Small buffer – frequent transfer overhead Large buffer – time to first record increases. No actual impact on a 100 Mb network. More sensitive in an intranet

with low bandwidth.

Page 10: Communicating with the Outside

Object-Orientation Considered Harmfulauthorized(user, type)

doc(id, type, date)

What are the document instances a user can see?

SQL:select doc.id, doc.datefrom authorized, docwhere doc.type = authorized.typeand authorized.user = <input>

If each document type is encapsulated in an object and each document instance is another object, the risk is the following:

Find types t authorized for user inputselect doc.type as tfrom authorizedwhere user = <input>

For each type t issue the queryselect id, datefrom docwhere type = <t>;

The join is executed in the application and not in the DB!

Page 11: Communicating with the Outside

Tuning the Application Interface1. Avoid user interaction within a transaction

2. Minimize the number of roundtrips between the application and the database

3. Retrieve needed columns only

4. Retrieve needed rows only

5. Minimize the number of query compilations

Page 12: Communicating with the Outside

1. Avoid User Interaction within a Transaction

User interaction within a transaction forces locks to be held for a long time.

Careful transaction design (possibly transaction chopping) to avoid this problem.

Page 13: Communicating with the Outside

2. Minimize the Number of Roundtrips to the Database (a) Avoid Loops

Application programming languages offer looping facilities (SQL statements, cursors, positioned updates) Embedding an SQL SELECT statement inside the loop means

there will be application-to-DB interaction in every iteration of the loop

Better idea to retrieve a significant amount of data outside the loop and then use the loop to process the data

Package several SQL statements within one call to the database server Embedded procedural language (Transact SQL) with

control flow facilities. Many such languages are product specific, so they can

reduce the portability of the application code

Page 14: Communicating with the Outside

Loops

Fetch 2000 records Loop: 200 queries No loop: 1 query Crossing the

application interface too often hurts performances.0

100

200

300

400

500

600

loop no loop

thro

ug

hp

ut

(rec

ord

s/se

c)

Page 15: Communicating with the Outside

2. Minimize the Number of Roundtrips to the Database (b) Use User Defined Functions (UDFs) when they

select out a high number of records. Scalar UDFs can be integrated within a query condition:

they are part of the execution plan and they are executed on the DB server

Page 16: Communicating with the Outside

User Defined Functions Function computes the number of

working days between two dates. Function executed either on the

database site (UDF) or on the application site

Applying the UDF yields good performances when it helps reduce significantly the amount of data sent back to the application.

80% of the records: the records where the nb working days between the date of shipping and the date the receipt was sent is greater than 5 working days

20% of the records: …. Is smaller than 5 working days

0

1

2

3

4

5

6

20% 80%

Th

rou

gh

pu

t (q

uer

ies/

mse

c)

UDF

applicationfunction

Page 17: Communicating with the Outside

3. Retrieve Needed Columns OnlyReasons why it is a good idea:

Avoid transferring unnecessary data

Retrieving an unneeded column might prevent the use of a covering index. Ex: if there is a dense composite

index on last name and first name, then a query asking for all the first names of people whose last name is ‘Costa’ can be answered by the index alone, provided no irrelevant atts are also selected

In the experiment the subset contains ¼ of the attributes. Reducing the amount of data that

crosses the application interface yields significant performance improvement.

0

0.25

0.5

0.75

1

1.25

1.5

1.75

no index index

Th

rou

gh

pu

t (q

uer

ies/

mse

c) all

covered subset

Page 18: Communicating with the Outside

4. Retrieve Needed Rows Only If the user is only viewing a small subset of a very

large result set, it is best to Only transfer that subset Only compute that subset Use constructs such as TOP or FETCH FIRST Avoid cursors

Applications that allow the formulation of ad-hoc queries should permit users to cancel them. Avoids holding resources when a user realizes she made a

mistake or when a query takes too long to terminate

Page 19: Communicating with the Outside

Cursors

Query fetches 200000 56 bytes records

Response time is a few seconds with a SQL query and more than an hour iterating over a cursor.

0

1000

2000

3000

4000

5000

cursor SQL

Th

rou

gh

pu

t (r

eco

rds/

sec)

Page 20: Communicating with the Outside

5. Minimize the Number of Query Compilations

Query parsing and optimization is a form of overhead to avoid The compilation simple queries requires from 10,000 to 30,000

instructions; for complicated queries, requires from 100,000 to several million instructions

Compilation requires read access to the system catalog . Can cause lock blocking if other transactions are accessing the catalog at the same time

A well-tuned online relational environment should rarely perform query compilations

The query plan resulting from a compilation should be quickly accessible by the processor – use procedure cache for this purpose and be sure the cache is big enough.

Page 21: Communicating with the Outside

Benefits of precompiled queries Prepared execution yields better performance when the query

is executed more than once: No compilation No access to catalog.

Prepared execution plans become obsolete if indexes are added or the size of the relation changes.

0

0.2

0.4

0.6

0 1 2 3 4 5 6

Th

rou

gh

pu

t (q

uer

ies/

sec)

direct

prepared

Experiment performed on Oracle8iEE on Windows 2000.