technical brief qlikview data flows en

Upload: vijayalakshmi-potturu

Post on 07-Aug-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/20/2019 Technical Brief QlikView Data Flows En

    1/13

    qlikview.com

    September 2013

    QLIKVIEW DATA FLOWS

    TECHNICAL BRIEF

    A QlikView Technical Brief

  • 8/20/2019 Technical Brief QlikView Data Flows En

    2/13

    QlikView Data Flows Technical Brief | 2

    Table of Contents

    Introduction 3

    Overview 3

    Data Sourcing 5

    Loading and Modeling Data 6

    Provisioning Data 9

    Using The Data for Analytics 10

    Data Governance and Security 12

    Learn More 13

  • 8/20/2019 Technical Brief QlikView Data Flows En

    3/13

    QlikView Data Flows Technical Brief | 3

    Introduction

    Business Discovery relies on the connection, transformation, distribution and ultimately,

    analysis of data. This paper provides an introductory overview of the data flows through atypical QlikView deployment and describes the role of individual systems. We explain how

    data is sourced from multiple, heterogeneous sources, how it is manipulated to make it

    consistent and logical, and how it is distributed where users can interact with the QlikView

    applications.

    Overview

    There are four main systems involved in building a QlikView enterprise system: QlikView

    Desktop, QlikView Publisher, QlikView Server, and Clients. To understand the data flow, we

    need to understand the role of these systems and where they are situated in the system

    architecture (see Figure 1).

    Figure 1 - QlikView Architectural Overview

    SAN StorageSMTP service

  • 8/20/2019 Technical Brief QlikView Data Flows En

    4/13

    QlikView Data Flows Technical Brief | 4

    QlikView Desktop: This is the main tool for creating QlikView applications. The

    application designer uses this tool to specify where data is sourced, how it is manipulated,

    and how it is displayed. The application presentation is handled by Clients, but application

    data processing is managed by the Server.

    Clients: This is where users use the QlikView application to view and interact with data.

    The application can be part of a standalone executable or part of a web page. The client

    side of applications is designed to consume few computational resources.

    QlikView Server: This system serves applications and data to clients, performs application

    calculations, and manages security.

    QlikView Publisher: This system provides a means of controlling how the data used by

    applications is updated.

    The sections of this paper will follow how data is sourced, loaded and modeled, provisioned,

    and then analyzed in QlikView. Additionally, discussion of how data is governed and secured

    is included.

    DataSourcing

    Loading &Modeling Data

    ProvisioningData

    Data for

    Analytics

    Governance

    and Security

  • 8/20/2019 Technical Brief QlikView Data Flows En

    5/13

    QlikView Data Flows Technical Brief | 5

    Data Sourcing

    QlikView extracts data from multiple, heterogeneous sources (e.g. databases, spreadsheets,

    web pages and ‘Big Data’ sources such as Hadoop and Google BigQuery) and creates ahomogenous data set suitable for analysis and visualization.

    Figure 2: Loading data from multiple sources into QlikView

    Data is sourced from a multitude of systems, from standard ODBC, OLE DB and JDBC

    data stores (such as Oracle), spreadsheets and web pages (HTML, XML, etc.) to systems

    that require custom connectors (such as Salesforce.com, SAP and Google BigQuery). For

    most data sources, a connection is made using a wizard that simplifies the connection

    process and allows the application designer control over how data is read. For example, the

    designer can choose not to read in certain fields or to rename them. The presence of a data

    warehouse is not required, although if it already exists, it is easily leveraged by QlikView.

    Accessing Hadoop-based Big Data systems is straightforward too. Where an ODBC or

    JDBC driver does not exist, QlikView has an open standard data exchange protocol (called

    QVX) that can be used to build custom connectors to data sources that do not offer

    standard connectivity.

    SFDC Other SAP ‘Big Data’ DataWarehouse

    ‘Standard’Databases

    SFDC QVX SAP

    In-Memory Data Loading & Modeling Direct Query(Direct Discovery)

    QlikView

    ODBC OLEDB

    Webpages

    Spreadsheets

    Heterogeneous data sources, locations and formats

  • 8/20/2019 Technical Brief QlikView Data Flows En

    6/13

    QlikView Data Flows Technical Brief | 6

    Loading and Modeling Data

    QlikView’s primary method of conducting data analysis is to use its in-memory engine. Since

    its inception in 1993, QlikView has used an in-memory approach to data analytics and forover 20 years, has built on this technology to offer the best in-memory analytics solution in

    the industry. In addition, QlikView introduced a direct query capability, Direct Discovery, to

    allow a measure of direct data access to the underlying data systems.

    Data is loaded by QlikView from the various source systems into the in-memory engine

    via the Load Script. The Load Script is contained within a QlikView application and uses a

    SQL-like language to connect to source systems and perform data modeling. The data gets

    loaded when the Load Script gets executed. Using QlikView Publisher, data loading can

    occur on a periodic basis and/or based on triggers.

    Figure 3: Example QlikView Load Script

  • 8/20/2019 Technical Brief QlikView Data Flows En

    7/13

    QlikView Data Flows Technical Brief | 7

    Once data is loaded into a QlikView application, it’s held in-memory. What this means is

    that QlikView applications require one-time data source access to read in a dataset and

    store that historical data. For new  data (‘delta’ or ‘updated’ data), QlikView can simply

    load this new data and append it to the historical data without having to do a full reload.

    In addition, QlikView utilizes sophisticated algorithms to compress the data (sometimes

    up to 90% from the size on disk in a database) to make optimal use of the in-memory

    store. For more information, please see blog post: http://community.qlikview.com/blogs/

    qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers 

    Application developers also use the Load Script to model data from the various source

    systems prior to inserting it into the in-memory engine. In reality, business intelligence tools

    must cope with data that is incomplete, poorly labeled, or duplicated across multiple sources.

    Linking data from different data sources requires the use of a key, but the same data can be

    labeled in different way across different sources (e.g. “Sales,” “Sales Revenue,” and “Sales

    Numbers” might all be the same data – see Figure 4). QlikView can easily merge these

    similar data fields from different tables into a single, consolidated view (e.g. converting the 3“Sales” fields into a single field called “Sales $” – see Figure 4).

    Figure 4: Renaming of fields from different sources

  • 8/20/2019 Technical Brief QlikView Data Flows En

    8/13

    QlikView Data Flows Technical Brief | 8

    A more subtle problem is slightly different data formats for the same underlying data, for

    example one data source might store dates in a single “YYYY-MM-DD” field while another

    might have separate Year, Month, and Date fields. The application designer must be able to

    consolidate all date fields into a single, representative view.

    The Load Script allows fields to be renamed, separated, joined, or otherwise manipulated.

    For example, the developer can do table joins, or create a ‘Name’ field by combining ‘First

    Name’ and ‘Last Name’ fields. Because QlikView directly reads in data sources, it’s possible to

    manipulate fields across multiple data sources, for example the user could conditionally read in

    sales person data (HR database) where the sales person has made a sale (Sales database).

    Figure 5: QlikView Data Model Viewer

    QlikView provides a data model viewer (see Figure 5) that makes it easy to see the

    associations that have been made within the engine as well as providing information about

    the data such as density, field names, table names, and so on. It can also find data model

    problems to fix them with the scripting environment.

    The QlikView engine provides a unique associative capability to the data that has been

    loaded. This means that data that is sourced from multiple systems can be treated as a

    single data entity within the engine for the purpose of analytics, regardless of where the

    data came from. QlikView applies associations between the data from the various systems

    by automatically mapping fields that have the same name and same data type. This allows

    users to interrogate and make discoveries in their data as if it were a single table of data,

    rather than data coming from a variety of disparate and unconnected systems. In Figure

    5, one can see the automatic associations are made between the ‘Facts’ table and the

    ‘Employees’ table, for example, via the ‘EmployeeID’ field.

  • 8/20/2019 Technical Brief QlikView Data Flows En

    9/13

    QlikView Data Flows Technical Brief | 9

    Provisioning Data

    QlikView offers a set of file-based data persistence options. In fact, every QlikView

    application (a “.qvw” file) itself contains all the data needed for the application. This datawithin the .qvw file on disk, which is binary encoded, represents the data that was loaded

    during the previous execution of the Load Script. The Load Script is also contained within

    the .qvw file, as is the entire presentation layer.

    Larger deployments typically use a data staging layer. This is to a) provide atomic data

    packages that are optimized for a particular analytic need (e.g. a ‘Finance’ package that

    contains data from various Finance and Ops systems), and b) provide an optimized data

    loading environment for QlikView. QlikView developers can create a “.qvd” file which is an

    optimized QlikView data file that can be loaded rapidly into a “.qvw” application.

    Typical deployments of QlikView include a “QVD Layer” containing a number of .qvd files

    (e.g. a Finance QVD, Sales QVD, Q1 QVD and so on) that application developers can use

    off the shelf to build their own specific QlikView applications and promote the reuse of

    consistent data across many QlikView apps. See Figure 6.

    Figure 6: Example QVD Layer

  • 8/20/2019 Technical Brief QlikView Data Flows En

    10/13

    QlikView Data Flows Technical Brief | 10

    Using The Data for Analytics

    Once the data is loaded into the in-memory associative engine, a large variety of very powerful,

    and real time, analytics capabilities are available. This is because of the rapid and highly-flexible nature of QlikView’s in-memory technology. Developers can create sophisticated

    analytics applications that give business users a very rich set of analysis capabilities and allow

    business users to conduct their own analysis and interrogate their data the way they wish.

    Using the Expression Language that is accessed via most visualization objects in QlikView,

    the in-memory data can be dynamically aggregated, manipulated, and compared on-the-fly.

    New dimensions can be calculated on-the-fly that were not previously in existence in the

    data model. New hierarchies can be defined, and different groupings (or sets) of data can be

    isolated for the purpose of comparative analytics.

  • 8/20/2019 Technical Brief QlikView Data Flows En

    11/13

    QlikView Data Flows Technical Brief | 11

    There has been a lot of discussion in the marketplace about ‘in-memory:’

    The term in-memory really doesn’t even begin to paint the full picture for

    someone about what analytics capabilities are available in a product. People

    should investigate what exactly they are getting when they acquire an in-

    memory solution. With QlikView, it is the ability to use in-memory technology

    to do on-demand calculation (i.e. nothing needs to be pre-calculated or

    pre-aggregated) across an entire multi-table data model, in a completely

    associative manner that makes QlikView truly unique in this regard.

    For a more in-depth understanding of how QlikView works under the

    covers, see the blog post at: http://community.qlikview.com/blogs/

    qlikviewdesignblog/2013/07/15/logical-inference-and-aggregations

    The Expression environment contains hundreds of functions that developers

    can utilize to build dynamic and highly relevant apps. These functions are

    grouped (see Figure 7) and cover topics such as Aggregation, Financial,

    Mapping, Number Interpolation and so on.

    Figure 7: Categories of the hundreds of functions available in

    the Expression Language

    With the in-memory analytics

    engine, QlikView apps can be

    built to do the following:

    • Calculated Dimensions

    • Aggregations on-the-fly

    (e.g. statistics)

    • Hierarchies on-the-fly

    • Set Analysis

    • Comparative Analysis

    • Conditional Display

  • 8/20/2019 Technical Brief QlikView Data Flows En

    12/13

    QlikView Data Flows Technical Brief | 12

    Data Governance and SecurityHow can you know that the sales revenue figure used by the accounting department

    is the same as that used by sales and marketing? How can you be sure that numbers

    are calculated the same way across different applications? This problem is given moreurgency and importance by regulatory and reporting laws that require traceability. Ensuring

    consistency and accountability is the essence of data governance.

    The QlikView Governance Dashboard and QlikView Expressor provide data governance

    and centralized, controlled data provisioning for QlikView applications respectively. The

    Governance Dashboard provides a comprehensive view into the data flows into QlikView,

    how the data is manipulated, and who is using what and when. QlikView Expressor allows

    for the provisioning of consistent and traceable rules for calculating business quantities

    such as sales revenue, employee costs, and profit. Data stewards use QlikView Expressor

    to provide common business rule definitions across a QlikView deployment.

    Security is about controlling who has access to what data. All QlikView deployments requireauthentication which is handled via Integrated Windows Authentication or a 3rd party Single

    Sign-On solution. Once the user’s identity is established, there is the issue of authorization

    to access different data sets. Authorization can be set at the application, application section

    level, row level and individual data element levels. QlikView uses a number of industry

    standard and proprietary technologies to provide detailed control over what data users

    can see. In a QlikView system, all communications between the client and the server use

    either HTTPS or the QlikView proprietary QVP protocol and no ports are opened between

    the client and the server. For more information, please reference the QlikView Security

    Overview Technology White Paper.

  • 8/20/2019 Technical Brief QlikView Data Flows En

    13/13

    QlikView Data Flows Technical Brief | 13

    Learn More

    QlikView Architectural Overview

    QlikView Governance Overview

    QlikView Security Overview

    QlikView Design Blog Post: Logical Inference and Aggregations

    QlikView Design Blog Post: Symbol Tables and Bit Stuffed Pointers

    © 2013 QlikTech International AB. All rights reserved. QlikTech, QlikView, Qlik, Q, Simplifying Analysis for Everyone, Power of Simplicity, New Rules, The Uncontrollable Smile andother QlikTech products and services as well as their respective logos are trademarks or registered trademarks of QlikTech International AB. All other company names, productsand services used herein are trademarks or registered trademarks of their respective owners. The information published herein is subject to change without notice. This publicationis for informational purposes only, without representation or warranty of any kind, and QlikTech shall not be liable for errors or omissions with respect to this publication. The onlywarranties for QlikTech products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein shouldbe construed as constituting any additional warranty.