query execution hashing based two pass algorithms

10
—Gaurang Patel— (205)

Upload: quentin-lott

Post on 30-Dec-2015

49 views

Category:

Documents


1 download

DESCRIPTION

Query execution hashing based Two pass algorithms. —Gaurang Patel— (205). Agenda. Terminology Hash Basics Partitioning relation by hashing Hash-based Grouping and Aggregation Union, Intersection and Difference Hash-Join Algorithm Conclusion. Terminology. Query optimization - PowerPoint PPT Presentation

TRANSCRIPT

—Gaurang Patel—(205)

Terminology Hash Basics Partitioning relation by hashing Hash-based Grouping and Aggregation Union, Intersection and Difference Hash-Join Algorithm Conclusion

Query optimization -- Logical Query plan -- Physical plan Query execution: -- Query processor- group of DBMS

components -- Converts user queries into database

operations Operation Relation- arguments of operation

Large data Hash functions to store large relations Memory buffers Gain factor of M in the size of relations

Algorithm:

Tuples from same block foes to same bucket

Hash key depends on grouping attributes First pass: Process each bucket in turn. Second pass: Only one record per group.

Binary operation- same hash function for both arguments

Union: R U S First Pass -- 2(M-1) buckets -- Avoid duplicates Same for R ∩ S, R – S..

I/O operations needed: -- B(R) + B(S) -- 2 more for hashing -- Total: 3(B(R) + B(S)) For, two pass algorithm: -- min(B(R),B(S)) ≤ M2

R(X,Y) ►◄ S(Y,Z) Same as other binary operations Only difference in hash key, Y I/O operations: -- 3(B(R)+B(S)) -- Two pass require min(B(R),B(S)) ≤ M2

-- Further techniques to reduce number of I/O operations