query execution section 15.1 shweta athalye cs257: database systems id: 118 section 1
TRANSCRIPT
Query ExecutionSection 15.1
Shweta AthalyeCS257: Database Systems
ID: 118Section 1
Agenda
Query Processor Query compilation Physical Query Plan Operators
Scanning Tables Table Scan Index scan
Sorting while scanning tables Model of computation for physical operators Parameters for measuring cost I/O cost for scan operators Iterators
Query processor
The query processor is the group of components of a DBMS that turns user queries and data-modification commands into a sequence of database operations and executes those operations
query processor is responsible for supplying details regarding how the query is to be executed
The major parts of the query processor
Query compilation
Query compilation itself is a multi-step process consisting of : Parsing: in which a parse tree representing
query and its structure is constructed Query rewrite: in which the parse tree is
converted to an initial query plan Physical plan generation: where the abstract
query plan is turned into a physical query plan
Outline of query compilation
Physical Query Plan Operators
Physical query plans are built from operators
Each of the operators implement one step of the plan.
Physical operators can be implementations of the operator of relational algebra.
They can also be non relational algebra operators like “scan” which scans tables.
Scanning Tables
One of the most basic things in a physical query plan.
Necessary when we want to perform join or union of a relation with another relation.
Two basic approaches to locating the tuples of a relation R Table-scan
Relation R is stored in secondary memory with its tuples arranged in blocks
it is possible to get the blocks one by one This operation is called Table Scan
Two basic approaches to locating the tuples of a relation R
Index-scan there is an
index on any attribute of Relation R
Use this index to get all the tuples of R
This operation is called Index Scan
Sorting While Scanning Tables
Why do we need sorting while scanning? the query could include an ORDER BY
clause. Requiring that a relation be sorted Various algorithms for relational-algebra
operations require one or both of their arguments to be sorted relation
physical-query-plan operator sort-scan takes a relation R and a specification of the attributes on which the sort is to be made, and produces R in that sorted order
Model of Computation for Physical Operators
Choosing physical plan operators wisely is an essential for a good query processor.
Cost for an operation is measured in number of disk i/o operations.
If an operator requires the final answer to a query to be written back to the disk, the total cost will depend on the length of the answer and will include the final write back cost to the total cost of the query.
Improvements in cost
Major improvements in cost of the physical operators can be achieved by avoiding or reducing the number of disk i/o operations
This can be achieved by passing the answer of one operator to the other in the main memory itself without writing it to the disk.
Parameters for Measuring Costs
Parameters that affect the performance of a query Buffer space availability in the main memory
at the time of execution of the query Size of input and the size of the output
generated The size of memory block on the disk and the
size in the main memory also affects the performance
I/0 Cost for Scan Operators
If Relation R is clustered, i.e. it is stored in approximately B blocks then the total number of disk operations required is B
If R is clustered but requires a two phase multi way merge sort then the total number of disk i/o required will be 3B.
However, if R is not clustered, then the number of required disk I/0's is generally much higher.
Iterators for Implementation of Physical Operators
Many physical operators can be implemented as an iterator
It is a group of three functions that allows a consumer of the result of the physical operator to get the result one tuple at a time
Iterator
The three functions forming the iterator are:
Open: This function starts the process of getting
tuples. It initializes any data structures needed to
perform the operation
Iterator
GetNext This function returns the next tuple in the
result Adjusts data structures as necessary to
allow subsequent tuples to be obtained If there are no more tuples to return,
GetNext returns a special value NotFound
Iterator
Close This function ends the iteration after all
tuples it calls Close on any arguments of the
operator