an array-based algorithm for simultaneous multidimensional aggregates by yihong zhao, prasad m....

An Array-Based Algorithm for Simultaneous Multidimensional

AggregatesBy Yihong Zhao, Prasad M.

Desphande and Jeffrey F. Naughton

Presented by Kia Hall

for CIS 661

(taught by Professor Megalooikonomou)

Outline of Presentation• Purpose of Paper • ROLAP vs. MOLAP systems• Array Storage• Basic Array-Based Algorithm• Multi-Way Array Algorithm

– Single Pass

– Multi-Pass

• Performance• Conclusions

Purpose

• Computing multiple related group-bys and aggregates is one of the core operations of On-Line Analytical Processing (OLAP) applications.

• The “Cube” operator computes group-by aggregates over all possible subsets of the specified dimensions.

• The purpose of this paper is to present an efficient algorithm to compute the Cube for Multidimensional OLAP (MOLAP) systems.

• Although is designed for MOLAP systems it can also be used for Relational OLAP (ROLAP) systems when table data is converted to an array, cubed as if in a MOLAP system, and then converted back to a table.

“Cube” example• Consider an example with dimensions product, store and

date, and “measure” (data value) sales.

• To compute the “Cube” would be to compute sales grouped by all subsets of these dimensions, which include the following:

– By product, store and date

– By product and store, By product and date, By store and date

– By product, By sales, By date

– Overall Sales

ROLAP vs. MOLAP systems

• ROLAP systems by definition use relational tables as their data structure

• A “cell” is represented in the system as a tuple, with some attributes that identify the location of the tuple in the multidimensional space, and other attributes that contain the data value corresponding to that data cell

• MOLAP systems store their data as sparse arrays, the the data’s position within the sparse array encoding the relevant attribute information

• Critical to MOLAP efficiency in computing the “Cube” is to simultaneously compute spatially-delimited partial aggregates so that a cell does not have to be revisited for each sub-aggregate.

Array Storage

• There are three major issues relating to the storage of the array that must be resolved

– It is likely in a multidimensional application that the array is too large to fit in memory

– It is likely that many of the cells in the array are empty, because there is no data for that combination of coordinates

– In many cases an array will need to be loaded from data that is not in array format (e.g., from a relational table or from an external load file)

Resolving Storage Issues• A large n-dimensional array that can not fit into memory is

divided into small size n-dimensional (corresponding to disk blocking size) chunks and each chunk is stored as one object on disk

• Sparse chunks (with data density less than 40%) use a “chunk-offset compression” where for each valid array entry a pair, (offsetInChunk, data), is stored

• To load data from formats other than arrays, a partition-based loading algorithm is used that takes as input the table, each dimension size and a predefined chunk size, and returns a (possibly compressed) chunked array

Efficient Computation

• The basic algorithm (which will be improved upon) computes the cube of a chunked array in multiple passes, computing each “group-by” in a separate pass

• For a three-dimensional array (with dimensions ABC) the aggregates to be computed can be viewed as a lattice with ABC at the root, with AB, BC, and AC as children; with AC having children A and C, and so forth.

• To compute the cube efficiently a tree is embedded in this lattice and each aggregate is computed from its parent in the tree

Minimum Spanning Tree• From the dimension sizes of the array and the sizes of the

chunks used to store the array, the following can be computed– The size of the array corresponding to each node in the

lattice– How much storage will be needed to use one of these

arrays to compute a child• From the above information a minimum spanning tree can

be defined• For each node n in the lattice, its parent in the minimum

spanning tree is the node n’ which has the minimum size and from which n can be computed

Basic Array Cubing Algorithm

1. Construct the minimum size spanning tree for the group-bys of the Cube

2. Compute any group-by Di1Di2 . . . Dik of a Cube from the “parent” Di1Di2 . . . Dik+1 which has the minimum size

3. Read in each chunk of Di1Di2 . . . Dik+1 along the dimension Dik+1 and aggregate each chunk to a chunk of Di1Di2 . . . Dik

4. Once the chunk of Di1Di2 . . . Dik is complete, we output the chunk to disk and use the memory for for the next chuck of Di1Di2 . . . Dik, keeping only one chunk in memory at a time

Efficiency Improvement• To improve on the basic algorithm we want to modify it to

compute all the children of a parent in a single pass of the parent

• A data Cube for an n-dimensional array consists of 2n group-bys. Ideally, if the memory were large enough hold all group-bys, total overlap could be achieved and the Cube would be finished with one scan of the array

• The “Multi-Way Array Cubing Algorithm” attempts to minimize the memory needed for each computation, so that maximum overlap can be achieved

• The single pass version of this algorithm assumes ideal memory; the multi-pass modification, realistically supposes that multiple passes may be necessary

Dimension Order and Memory

• The logical order used in reading in an array of chunks is the “dimension order”; this order is independent of the actual physical order

• Dimension order should be exploited to reduce the memory required by each group-by

• Once the dimension order is determined, a general rule can be formulated to determine what chunks of each group-by need to stay in memory to avoid rescanning

• By theorem, optimal dimension order for an array A, with dimensions D1D2 . . . Dk is O = D1, D2, . . ., Dk.

Memory Allocation Rule• Memory Allocation Rule 1: For a group-by (Dj1 . . . Djn-1 )

of the array (D1, . . ., Dn) read in the dimension order O = D1, D2, . . ., Dk, if (Dj1 . . . Djn-1 ) contains a prefix of (D1, . . ., Dn) with length p, 0pn-1, we allocate p

i=1 | Di| x n-1

i=p+1 | Ci| units of array element to (Dj1 . . . Djn-1 ) group-by, where | Di| is the size of dimension i and | Ci| is the chunk size of dimension I

• This memory allocation rule is used to build a Minimum Memory Spanning Tree (MMST)

Minimum Memory Spanning Trees

• A MMST for a Cube (D1, . . ., Dn) in a dimension order O=(Dj1, . . ., Djn) has n+1 levels with the root (Dj1, . . ., Djn) at level n

• Using the first rule, the memory required at each level of a Minimum Memory Spanning Tree (MMST) can be calculated using the following memory rule:

• Memory Allocation Rule 2: The total memory requirement for level j of the MMST for a dimension order O=(D1, . . ., Dn) is given by:

– n-ji=1 | Di| + C(j, 1)(n-j-1

i=1 | Di|)c + C(j+1, 2)(n-j-2i=1 |

Di|)c2+ . . . + C(n-1, n-j)cn-j

Single Pass Multi-Way Cubing Algorithm

• The single pass algorithm assumes there is sufficient memory required by the MMST

• In this case, all group-bys can be computed recursively in a single scan of the input array

• By theorem, the required memory is computed as follows:

– Theorem: For a chunked multidimensional array A with the size n

i=1 |Di| where |Di| =d for all i, and each array chunk has the size n

i=1 |Ci| where |Ci| =c for all i, the total amount of memory to compute the Cube of the array in one scan of A is less than cn+(d+1+c)n-1.

Multi-Pass Multi-Way Cubing Algorithm

• Let T be the MMST for the optimal dimension ordering O and MT be the memory required for T, calculated using Memory Allocation Rule 2

• If M MT, we cannot allocate the required memory for some of the subtrees of the MMST, called “incomplete trees.”

• Extra steps are required to compute the group-bys included in incomplete trees

Multi-Pass Multi-Way Algorithm (cont)1. Create the MMST for a dimension order O2. Add T to the ToBeComputed List3. For each tree T’ in the ToBeComputed List 4. { Create the working subtree W and incomplete subtrees Is5. Allocate memory to the subtrees6. Scan the array chunk of the root of T’ in the order O7. { Aggregate each chunk to the group-bys in W8. Generate intermediate results for Is9. Write complete chunks of W to disk10. Write intermediate results to the partitions of Is

}11. For each I12. { Generate the chunks from the partitions of I13. Write the completed chunks of I 14. Add I to ToBeComputed

}}

Testing Conditions

• Testing was done using three data sets in which one of the following attributes varied, while the other two remained constant:

– Number of valid data entries

– Dimension size

– Number of dimensions

• A popular ROLAP (table) cubing algorithm was compared with the MOLAP (array) Multi-Way Algorithm

• The MOLAP Algorithm consistently had better performance time

ROLAP vs. MOLAP Performance• The data table sizes are significantly larger than the

compressed arrays (and compressed chunks) of MOLAP• A significant percentage of time (55-60%) is spent sorting

intermediate results• Tuple comparisons are expensive because there are

multiple fields to be compared• About 10-12% is spent copying data, done while

generating result tuples• Since is MOLAP is position-based cells of the array are

aggregated based on their position without multiple sorts• The MOLAP Algorithm is relatively CPU intensive (80%)

as compared with the ROLAP Algorithm (70%)

Conclusion

• Multi-Way Array Algorithm overlaps the computation of different group-bys, while using minimal memory for each group-by.

• Performance results show that the Algorithm performs much better than previously published ROLAP algorithms

• The performance benefits are so substantial that in the testing done for this paper it was faster to load an array from a table, cube the array, then dump the cubed array into tables than to cube the table directly.

• Thus, the Algorithm is valuable in both ROLAP and MOLAP systems

an array-based algorithm for simultaneous multidimensional aggregates by yihong zhao, prasad m....

Documents

chunked array slide

array format

table data

data cell molap systems

molap systems rolap

valid array entry

data structure

data density