david freriks principal solution architect · big data ecosystem –much more than just hadoop big...

Big Data & QlikView

Democratizing Big Data Analytics

David Freriks – Principal Solution Architect

TDWI – Vancouver Agenda

– What really is Big Data?

– How do we separate hype from reality?

– How does that relate to actually finding useful

business information?

– Why is Qlik unique in leading the industry in solving

Big Data solutions?

– Demo



• Most people think of Hadoop….


– How does that relate to actually finding useful

business information?

– Why is Qlik unique in leading the industry in solving

Big Data solutions?

– Demo

A Brief History of Hadoop

2005 2008 2011 2013

Cutting joins Yahoo,

estimates a billion pg

index will cost $500k

and $30k/mos to

support

A 1400n Yahoo cluster

sorts 500GB in 59s.

Cloudera launches

Google releases a

paper on GFS, based

on a distributed

search platform called

Nutch

Hadoop promoted to top

level Apache project,

predictive search index

creation time reduced from

12days to 8hrs

Yahoo spins

remaining Hadoop

folks out into

Hortonworks

Cloudera adds real-time

search, based on

Lucene, also created by

Cutting

3rd Hadoop World conf

attracts 2300 developers,

up from 275 in 2010

•Hadoop Distributed File System HDFS

•Processing framework for writing scalable data applicationsMapReduce

•Procedural language that abstracts lower level MapReducePig

•Highly reliable distributed coordinationZookeeper

•System for querying data on top of HDFS (SQL-like query)Hive

•Database for random, real time read/write accessHBase

•Scalable machine learning librariesMahout

• In-memory large-scale data processing– 100x faster than HadoopSpark

•SQL engine on top of Spark Shark

•Scalable multi-master database with no single points of failureCassandra

And on, and on…

Hadoop

Example Apache Hadoop or Next-Gen Components

Big Data: Expanding on 3 fronts

Real

Time

Near Real

Time

Periodic

Batch

MB

GB

TB

PB

Table

Database

Web XML

Audio Social

Video

Data Velocity Data Volume

Data Variety

What is “Big Data”?

• Big Data is: Nebulous

• Big Data is: Really Big or Not

• Big Data is: Mostly Useless Noise

• Big Data is: Slow

• Big Data is: Difficult

Big Data Ecosystem – Much More Than Just Hadoop

Big Insights & Streams

Big Data Appliance

HANA

Open source Distributed Processing Frameworks

Big Data Analytic Appliances

Massively Parallel Processing Platforms

Big data Integration

Packaged Mapreduce platforms

Data Visualization, Statistical & In-memory Analytics

8

Splunk >

Who What Why

Telecom Usage and Location Analysis Call Detail

Records (CDRs)

Next Product to Buy (NPTB) Real-time

Bandwidth Allocation

Operational Excellence

Customer Retention

Profitability

Financial Services New Account Risk Screens

Fraud Detection

Trading Risk

Real-Time P&L

Portfolio Analysis

Improve Profit

Minimize Risk

Utilities Smart Metering Analysis Operational Excellence

Retail 360o Customer View

Brand Sentiment Analysis

Up Sell/Cross Sell

Clickstream Analysis

Increase Revenues

Customer Loyalty

Brand Awareness

Manufacturing Supply Chain & Logistics

Assembly Line QA

Proactive Maintenance

Operational Excellence

Profitability

Source: Gartner “50 Real World Examples of Big Data and Analytics”, 2013

Some uses of Big Data today




– How does that relate to actually finding

useful business information?

– Why is Qlik unique in leading the industry in

solving Big Data solutions?

– Demo

• You need to have Ga-zinga-bytes of data to deploy a Big Data solution

– Typical Cloudera Cluster is 15-20 nodes, < 10TB of data

– Hadoop storage is 3-400% cheaper than an EDW

• Hadoop is all you need

– Hadoop is an enabling technology that provides the foundation for Big Data solutions

– Focus today is on data management

• The RDBMS is dead

– RDBMS is still critical – but not for high volume, low quality analytics

• QlikView can’t handle Big Data

– Reality is a Human can’t handle Big Data

– It’s all about the use case

Popular “Big Data” Myths

• Big Data is rapidly shifting from how much data you can handle to how quickly you can deliver value

– Volume of Data is just one, less and less critical factor

– Context is key and difficult to pinpoint

• Big Data:

– Hadoop is designed to support petabytes and beyond

• Fast Data:

– Teradata, SAP HANA, Netezza, Hbase, MongoDB, ParStream, etc

• Big Data is slow & cheap, Fast Data is neither

• A Big Data Solution requires components that address both

– Hadoop is the data system that combines Fast and Big platform

– QlikView is the platform that supports both scenarios simultaneously

Big Data vs. Fast Data vs. Right Data

Unstructured/Semi-structured data

Data Accelerator???

Web data Docs & text

data

Audio/Video

data

Structured data

Machine data Operational systems

Where Big Data fits today: The new BI architecture

Data Warehouse???Big Data Repository

many organizations lack the skills required to exploit big data

most of these skills are in short supply and rare in the market at large

data science encompasses hard skills

Big Data comes with big challenges

The Big Data bottleneck

Reports

Data Scientists

Business Users

Source: Gartner Big Data Hype Cycle Report 2013

“ ”“ ”

“ ”

Big Data

Organizations have trouble finding qualified professionals to manage big

data and providing training to those already on board

Big Data comes with big challenges

Source: Ventana Research, The Challenge of Big Data Benchmark Research, November 2013

Obstacles to Big Data Analytics

Organizations are challenged in staffing and training

“”

Staffing

Training

Real-Time

License Cost

Integration

79%

77%

67%

64%

64%








– Demo

Operational

systems

Machine data, web

data, cloud dataHadoop

cluster

Data

warehouse

Google

BigQuery

Insight Comes from Data, in Context

Big Data Business Needs

Descriptive Analytics Predictive Analytics

DATA

Clinical,

Claims,

Monitoring,

others

How are we doing? What might happen in

the future?

Prescriptive Analytics

Best course of action

given objectives,

requirements &

constraints

How many claims did we pay

today?

Which of tomorrow’s claims

might be requesting an

Emergency Room (ER)

admission?

What would be effective

steps to reduce probability of

ER admission?








– Demo

Who are we - QlikView

• What Is QlikView?

– QlikView is a Business Discovery platform – User-driven

BI supporting the creation and consumption of dynamic

apps for analyzing information

– QlikView apps allow non-technical users to explore visual

views of information and ask streams of questions,

through simple interactions such as clicks and taps

– QlikView’s patented software engine dynamically

calculates new views of information, instantly, based on

user selections

QlikView - A New Kind of Software Company

• Leader in Business

Discovery – user-driven BI

• 28,000+ customers in

100 countries

• 1,500 global partners

• 1,500 employees across

28 offices in 23 countries

• No. 1 fastest-growing

enterprise technology

company (ZDNet)

• Gartner Magic Quadrant

Leader for 3 consecutive

years

Broad Base of 28,000 Customers

These are Tools… And this is How BI has been done…

This is a Platform

An

aly

tica

l Q

uo

tie

nt

Usefulness

Managed

Reporting

Ad-Hoc

Reporting

Dashboards /

Visualization

OLAP /

Analysis

Exploration

Associative

/ Statistical

Predictive

QlikView’s

Sweet Spot

The Evolution of Business Intelligence

1) Associative Query Language + Full Search*not another query tool….

2) Core Technology: True In-memory, columnar database with built in visualization,

analytics, and ELT in a single product.

3) Designed for Heterogeneous & Complex Data (*again not just another query tool)

4) Application / Mobile Design First (Mobile, Desktop, Tablet… Design once, consume anywhere)

What Makes QlikView Unique?

How traditional BI and

visualization tools work

QlikView Natural Analytics™

• Limited view and access to data

• Forced down linear drill paths

• Need to involve IT to modify

• What-if and on-the-fly analysis

is limited

• Freedom to explore data from any point in

analysis in a dynamic, interactive interface

• Answer any question on the fly, real-time

• Easily see connections, and

disconnects in data

QlikView’s Natural Analytics™ makes data analysis a

natural part of every business process – for everyone

The Green, The White and The Gray

The Visualization Bottleneck

Response Time

Query

Size Big Data

Tableau

Spotfire

MSTR

Analytics Desktop

Datameer

Connectivity to every Big Data Source

NoSQL

Databases

Real-time

Batch

Hadoop

MPP

Warehouse

SAP HANA

BigQuery

Advanced

Analytics

SAP HANA

Hard Disk

Drives (HDD)

Solid State

Storage (SSD)

Random

Access

Memory (RAM)

Speed (t/TB) 3300s 1000-300s 1s

Price $/TB $ 50 $ 500 $ 4500

• Keep data in memory when the value obtained from processing it is high

• Leave data on disk when it is inactive or the value from processing it is low

Value

Size

The Big Data Value Chain

Flexible Big Data deployment models

Direct Discovery

Billions of rows via Direct Discovery

100’s millions rows into Memory

Aggregates / Detail

Combine Big Data and traditional data sources

Combine data sources using pure In-Memory

Aggregates / Detail

EDW Data

Data

Warehouse

Today’s challenge:

What to do with Big Data? Who should do it?

IT

What to do with this?

Business

How to define requirements?

QlikView as a catalyst for implementing Big Data

QlikView gives business users ability to discover with Big Data, not just

data scientists

More Access > More Questions > More Use > Higher ROI of Big Data

IT & Business

QlikView as a catalyst for implementing Big Data

QlikView In-Memory approach

• Loads compressed data into memory

• Enables associative search and analysis

• Supports 100’s millions to billions of rows of data

In-Memory

QlikView Direct Discovery Approach

• Combines the associative capabilities of the QlikView in-memory

dataset with a query model where:

The aggregated query result is passed back to a QlikView object

without being loaded into the QlikView data model

The result set is still part of the associative experience

Capability to Drill to Detail records

QlikView Application

QlikView In-Memory Data Model

Direct Discovery

Batch Load

100% in-memory for:

• All the necessary (i.e. relevant and

contextual) data can fit in-memory

• Users require only aggregated or

summary data, i.e. hourly or daily

averages, or record-level detail

over a limited time period.

• Query performance of external

source is not satisfactory

Direct Discovery for:

• Data cannot fit in memory and

document chaining is not sufficient

• Users require access to record-

level of detail stored in a large fact

table that will not fit in memory.

• Network bandwidth limits ability to

copy data to QlikView server

The Design of Direct Discovery lets you alternate between these

approaches with absolutely no change to the application itself

A Hybrid Approach for Tackling Big Data

david freriks principal solution architect · big data ecosystem –much more than just hadoop big...

Documents