v4 qlik view-datastorage

QlikView Data Storage

Key Objectives

• How QlikView stores data

• How to determine memory savings to be gained from implementing structure changes

• The importance of load order

QlikView Data Storage

• Storage in a QlikView document is a two level organization. The first level stores distinct lists of values. The second level stores table data consisting of pointers

Storage Level 1

There is exactly one list for each field included in the document, i.e. if you have 25 fields in your document, you will have 25 lists. As data is loaded during the reload process, the lists are populated. The order of the values, therefore, reflects the load order of the data

Storage Level 2

The second level of storage in a QlikView document consists of tables containing pointer values (or NULL) in each cell. These pointer values correspond to the “index” of the appropriate value in the distinct lists located in the first level storage.

Storage Level 1

Storage Level 2

Memory Calculation Example• Consider a key field consisting of potentially 550,000 distinct values in a

fact table containing 100 million rows of data.

• To calculate the storage required for this field, we need to consider the list storage as well as the table pointer storage.

• Remember that these are minimum amounts, since there is, of course, additional storage that QlikView will require to maintain the data.

• For the list data, let's assume that each value is a 10 byte character (text) representation.

• If we simply multiple 10 bytes by 550,000, we will get to approximately 5.5 MB. This will be the requirement to store the list, regardless of the number of times this field is referenced in logical tables.

• Next, we need to calculate the pointer requirements for the logical table including this field. Since 550,000 distinct values can be represented by 20 bits (2 to the 20th power), we will need a pointer size of 2.5 bytes for each row of non-null data.

Memory Calculation Example

• As mentioned above, we have 100 million rows of data in our fact table. Since 100 million times 2.5 is 250 million, we will need approximately 250 MB to store the pointers for this field in the logical table.

• So, total storage required for this single field in one logical table would be approximately 256 MB. Since this is a key field, by definition, it also exists in one or more additional logical tables. If we assume a dimension table with a single row of data for each key value, then we need to add an additional 1.4 MB for the dimension table pointer storage. This brings our total to about 258 MB for this single field, almost all of which is pointer storage.

• Of course, in documents with small fact tables (less than 1 million rows), this is not as much of an issue. But when dealing with large fact tables, you need to be aware of the repercussions of adding and deleting fields

Why is Memory Important?

• Memory usage is just one aspect of the performance of a QlikView document. In a 64-bit environment, memory seems to be limitless.

• But even 64-bit systems have a limited amount of RAM installed, and most systems will live with that amount for quite awhile. It is safe to assume that data, users, and - in the case of QlikView Server - documents will continue to expand.

• The more memory taken up needlessly means the less memory available for expansion - both fixed and dynamic.

• So it is important to understand what your thresholds are, and that once a threshold is breached, users will generally suffer.

Thank You

v4 qlik view-datastorage

Documents