technical brief qlikview data flows en

8/20/2019 Technical Brief QlikView Data Flows En

1/13

qlikview.com

September 2013

QLIKVIEW DATA FLOWS

TECHNICAL BRIEF

A QlikView Technical Brief


2/13

QlikView Data Flows Technical Brief | 2

Table of Contents

Introduction 3

Overview 3

Data Sourcing 5

Loading and Modeling Data 6

Provisioning Data 9

Using The Data for Analytics 10

Data Governance and Security 12

Learn More 13


3/13


Introduction

Business Discovery relies on the connection, transformation, distribution and ultimately,

analysis of data. This paper provides an introductory overview of the data flows through atypical QlikView deployment and describes the role of individual systems. We explain how

data is sourced from multiple, heterogeneous sources, how it is manipulated to make it

consistent and logical, and how it is distributed where users can interact with the QlikView

applications.

Overview

There are four main systems involved in building a QlikView enterprise system: QlikView

Desktop, QlikView Publisher, QlikView Server, and Clients. To understand the data flow, we

need to understand the role of these systems and where they are situated in the system

architecture (see Figure 1).

Figure 1 - QlikView Architectural Overview

SAN StorageSMTP service


4/13


QlikView Desktop: This is the main tool for creating QlikView applications. The

application designer uses this tool to specify where data is sourced, how it is manipulated,

and how it is displayed. The application presentation is handled by Clients, but application

data processing is managed by the Server.

Clients: This is where users use the QlikView application to view and interact with data.

The application can be part of a standalone executable or part of a web page. The client

side of applications is designed to consume few computational resources.

QlikView Server: This system serves applications and data to clients, performs application

calculations, and manages security.

QlikView Publisher: This system provides a means of controlling how the data used by

applications is updated.

The sections of this paper will follow how data is sourced, loaded and modeled, provisioned,

and then analyzed in QlikView. Additionally, discussion of how data is governed and secured

is included.

DataSourcing

Loading &Modeling Data

ProvisioningData

Data for

Analytics

Governance

and Security


5/13


Data Sourcing

QlikView extracts data from multiple, heterogeneous sources (e.g. databases, spreadsheets,

web pages and ‘Big Data’ sources such as Hadoop and Google BigQuery) and creates ahomogenous data set suitable for analysis and visualization.

Figure 2: Loading data from multiple sources into QlikView

Data is sourced from a multitude of systems, from standard ODBC, OLE DB and JDBC

data stores (such as Oracle), spreadsheets and web pages (HTML, XML, etc.) to systems

that require custom connectors (such as Salesforce.com, SAP and Google BigQuery). For

most data sources, a connection is made using a wizard that simplifies the connection

process and allows the application designer control over how data is read. For example, the

designer can choose not to read in certain fields or to rename them. The presence of a data

warehouse is not required, although if it already exists, it is easily leveraged by QlikView.

Accessing Hadoop-based Big Data systems is straightforward too. Where an ODBC or

JDBC driver does not exist, QlikView has an open standard data exchange protocol (called

QVX) that can be used to build custom connectors to data sources that do not offer

standard connectivity.

SFDC Other SAP ‘Big Data’ DataWarehouse

‘Standard’Databases

SFDC QVX SAP

In-Memory Data Loading & Modeling Direct Query(Direct Discovery)

QlikView

ODBC OLEDB

Webpages

Spreadsheets

Heterogeneous data sources, locations and formats


6/13


Loading and Modeling Data

QlikView’s primary method of conducting data analysis is to use its in-memory engine. Since

its inception in 1993, QlikView has used an in-memory approach to data analytics and forover 20 years, has built on this technology to offer the best in-memory analytics solution in

the industry. In addition, QlikView introduced a direct query capability, Direct Discovery, to

allow a measure of direct data access to the underlying data systems.

Data is loaded by QlikView from the various source systems into the in-memory engine

via the Load Script. The Load Script is contained within a QlikView application and uses a

SQL-like language to connect to source systems and perform data modeling. The data gets

loaded when the Load Script gets executed. Using QlikView Publisher, data loading can

occur on a periodic basis and/or based on triggers.

Figure 3: Example QlikView Load Script


7/13


Once data is loaded into a QlikView application, it’s held in-memory. What this means is

that QlikView applications require one-time data source access to read in a dataset and

store that historical data. For new data (‘delta’ or ‘updated’ data), QlikView can simply

load this new data and append it to the historical data without having to do a full reload.

In addition, QlikView utilizes sophisticated algorithms to compress the data (sometimes

up to 90% from the size on disk in a database) to make optimal use of the in-memory

store. For more information, please see blog post: http://community.qlikview.com/blogs/

qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers

Application developers also use the Load Script to model data from the various source

systems prior to inserting it into the in-memory engine. In reality, business intelligence tools

must cope with data that is incomplete, poorly labeled, or duplicated across multiple sources.

Linking data from different data sources requires the use of a key, but the same data can be

labeled in different way across different sources (e.g. “Sales,” “Sales Revenue,” and “Sales

Numbers” might all be the same data – see Figure 4). QlikView can easily merge these

similar data fields from different tables into a single, consolidated view (e.g. converting the 3“Sales” fields into a single field called “Sales $” – see Figure 4).

Figure 4: Renaming of fields from different sources


8/13


A more subtle problem is slightly different data formats for the same underlying data, for

example one data source might store dates in a single “YYYY-MM-DD” field while another

might have separate Year, Month, and Date fields. The application designer must be able to

consolidate all date fields into a single, representative view.

The Load Script allows fields to be renamed, separated, joined, or otherwise manipulated.

For example, the developer can do table joins, or create a ‘Name’ field by combining ‘First

Name’ and ‘Last Name’ fields. Because QlikView directly reads in data sources, it’s possible to

manipulate fields across multiple data sources, for example the user could conditionally read in

sales person data (HR database) where the sales person has made a sale (Sales database).

Figure 5: QlikView Data Model Viewer

QlikView provides a data model viewer (see Figure 5) that makes it easy to see the

associations that have been made within the engine as well as providing information about

the data such as density, field names, table names, and so on. It can also find data model

problems to fix them with the scripting environment.

The QlikView engine provides a unique associative capability to the data that has been

loaded. This means that data that is sourced from multiple systems can be treated as a

single data entity within the engine for the purpose of analytics, regardless of where the

data came from. QlikView applies associations between the data from the various systems

by automatically mapping fields that have the same name and same data type. This allows

users to interrogate and make discoveries in their data as if it were a single table of data,

rather than data coming from a variety of disparate and unconnected systems. In Figure

5, one can see the automatic associations are made between the ‘Facts’ table and the

‘Employees’ table, for example, via the ‘EmployeeID’ field.


9/13


Provisioning Data

QlikView offers a set of file-based data persistence options. In fact, every QlikView

application (a “.qvw” file) itself contains all the data needed for the application. This datawithin the .qvw file on disk, which is binary encoded, represents the data that was loaded

during the previous execution of the Load Script. The Load Script is also contained within

the .qvw file, as is the entire presentation layer.

Larger deployments typically use a data staging layer. This is to a) provide atomic data

packages that are optimized for a particular analytic need (e.g. a ‘Finance’ package that

contains data from various Finance and Ops systems), and b) provide an optimized data

loading environment for QlikView. QlikView developers can create a “.qvd” file which is an

optimized QlikView data file that can be loaded rapidly into a “.qvw” application.

Typical deployments of QlikView include a “QVD Layer” containing a number of .qvd files

(e.g. a Finance QVD, Sales QVD, Q1 QVD and so on) that application developers can use

off the shelf to build their own specific QlikView applications and promote the reuse of

consistent data across many QlikView apps. See Figure 6.

Figure 6: Example QVD Layer


10/13


Using The Data for Analytics

Once the data is loaded into the in-memory associative engine, a large variety of very powerful,

and real time, analytics capabilities are available. This is because of the rapid and highly-flexible nature of QlikView’s in-memory technology. Developers can create sophisticated

analytics applications that give business users a very rich set of analysis capabilities and allow

business users to conduct their own analysis and interrogate their data the way they wish.

Using the Expression Language that is accessed via most visualization objects in QlikView,

the in-memory data can be dynamically aggregated, manipulated, and compared on-the-fly.

New dimensions can be calculated on-the-fly that were not previously in existence in the

data model. New hierarchies can be defined, and different groupings (or sets) of data can be

isolated for the purpose of comparative analytics.


11/13


There has been a lot of discussion in the marketplace about ‘in-memory:’

The term in-memory really doesn’t even begin to paint the full picture for

someone about what analytics capabilities are available in a product. People

should investigate what exactly they are getting when they acquire an in-

memory solution. With QlikView, it is the ability to use in-memory technology

to do on-demand calculation (i.e. nothing needs to be pre-calculated or

pre-aggregated) across an entire multi-table data model, in a completely

associative manner that makes QlikView truly unique in this regard.

For a more in-depth understanding of how QlikView works under the

covers, see the blog post at: http://community.qlikview.com/blogs/

qlikviewdesignblog/2013/07/15/logical-inference-and-aggregations

The Expression environment contains hundreds of functions that developers

can utilize to build dynamic and highly relevant apps. These functions are

grouped (see Figure 7) and cover topics such as Aggregation, Financial,

Mapping, Number Interpolation and so on.

Figure 7: Categories of the hundreds of functions available in

the Expression Language

With the in-memory analytics

engine, QlikView apps can be

built to do the following:

• Calculated Dimensions

• Aggregations on-the-fly

(e.g. statistics)

• Hierarchies on-the-fly

• Set Analysis

• Comparative Analysis

• Conditional Display


12/13


Data Governance and SecurityHow can you know that the sales revenue figure used by the accounting department

is the same as that used by sales and marketing? How can you be sure that numbers

are calculated the same way across different applications? This problem is given moreurgency and importance by regulatory and reporting laws that require traceability. Ensuring

consistency and accountability is the essence of data governance.

The QlikView Governance Dashboard and QlikView Expressor provide data governance

and centralized, controlled data provisioning for QlikView applications respectively. The

Governance Dashboard provides a comprehensive view into the data flows into QlikView,

how the data is manipulated, and who is using what and when. QlikView Expressor allows

for the provisioning of consistent and traceable rules for calculating business quantities

such as sales revenue, employee costs, and profit. Data stewards use QlikView Expressor

to provide common business rule definitions across a QlikView deployment.

Security is about controlling who has access to what data. All QlikView deployments requireauthentication which is handled via Integrated Windows Authentication or a 3rd party Single

Sign-On solution. Once the user’s identity is established, there is the issue of authorization

to access different data sets. Authorization can be set at the application, application section

level, row level and individual data element levels. QlikView uses a number of industry

standard and proprietary technologies to provide detailed control over what data users

can see. In a QlikView system, all communications between the client and the server use

either HTTPS or the QlikView proprietary QVP protocol and no ports are opened between

the client and the server. For more information, please reference the QlikView Security

Overview Technology White Paper.


13/13


Learn More

QlikView Architectural Overview

QlikView Governance Overview

QlikView Security Overview

QlikView Design Blog Post: Logical Inference and Aggregations

QlikView Design Blog Post: Symbol Tables and Bit Stuffed Pointers

© 2013 QlikTech International AB. All rights reserved. QlikTech, QlikView, Qlik, Q, Simplifying Analysis for Everyone, Power of Simplicity, New Rules, The Uncontrollable Smile andother QlikTech products and services as well as their respective logos are trademarks or registered trademarks of QlikTech International AB. All other company names, productsand services used herein are trademarks or registered trademarks of their respective owners. The information published herein is subject to change without notice. This publicationis for informational purposes only, without representation or warranty of any kind, and QlikTech shall not be liable for errors or omissions with respect to this publication. The onlywarranties for QlikTech products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein shouldbe construed as constituting any additional warranty.

technical brief qlikview data flows en

Documents