microstrategy primedocshare01.docshare.tips/files/25873/258739848.pdf · 2016-12-20 · 2 speaker...
TRANSCRIPT
2
Speaker Introduction
Bala Chandran – Dir. Enterprise BI, MicroStrategy
• 15 years of experience implementing
and designing Big Data and Analytics
Solutions
• Hands on experience with many MPP
and in-memory systems
• @BG_Chandran – ask questions
#MSTRPrime
3
High Performance Is No Longer A “Nice To Have” In Analytical Applications
Drivers Of High
Performance
1 Users expect “Google Like” performance
from analytic applications, especially on
mobile devices
2 Exploding data volumes & variety require
In-Memory consolidation and
aggregation
3 Modern analytical applications contain
100’s of vizs, distributed to 1000’s of users
daily
5
Torbit.com
Performance Directly Correlates to Revenue
• Google found that a 500ms slowdown equals 20% decrease in ad revenue.
• Amazon finds a 100ms slowdown can mean a 1% decrease in revenue.
• Yahoo! found that a 400ms improvement translated to a 9% increase in traffic.
• Mozilla mapped a 2.2s improvement to 60 million additional Firefox downloads -
http://blog.edgecast.com/post/42404930702/ecommerce-performance-website-speed-
impacts-your#sthash.1Hn7Y4dr.dpuf
8
INTRODUCING MicroStrategy PRIME
Flexible schema &
Partitioned data
Linear scalability
to 1,000s of
CPUs
PARALLEL RELATIONAL
9
INTRODUCING MicroStrategy PRIME
Flexible schema &
Partitioned data
Linear scalability
to 1,000s of
CPUs
PARALLEL RELATIONAL
3x to 10x faster
7x to 20x more
users
IN-MEMORY
10
INTRODUCING MicroStrategy PRIME
Flexible schema &
Partitioned data
Linear scalability
to 1,000s of
CPUs
Tightly-coupled
interactive exploration
PARALLEL RELATIONAL ENGINE
3x to 10x faster
7x to 20x more
users
IN-MEMORY
11
MicroStrategy PRIME In Action At Facebook
“We have this thing that’s running. It’s one of
the most amazing things I’ve seen. It’s
running against the entire Facebook user
base, 1.1 billion users.”
Guy Bayes
Head of Enterprise BI, Facebook
• 200 + petabytes of
Hadoop Source Data
• 30 + Terabytes
Analyzed in PRIME
• 200+ Node Cluster
• 3500+ Cores
• 175 Billion Rows
12
Traditional Technologies Cannot Deliver Performance At High Scale
Custom Approaches Are Expensive And Risky
HADOOP
Data Scale
User
Scale
13
Traditional Technologies Cannot Deliver Performance At High Scale
Custom Approaches Are Expensive And Risky
HADOOP
Data Scale
User
Scale
MPP
Databases
14
Traditional Technologies Cannot Deliver Performance At High Scale
Custom Approaches Are Expensive And Risky
HADOOP
Data Scale
User
Scale
MPP
Databases
In-
memory
DB’s
15
Traditional Technologies Cannot Deliver Performance At High Scale
Custom Approaches Are Expensive And Risky
HADOOP
Data Scale
User
Scale
MPP
Databases
In-
memory
DB’s
High Scale Information Driven
Apps
Custom Development
Java + Transactional DB clusters + Web 2.0 +
In-memory + BI Tools + …….
Expensive
Complex
Risky
Slow
16
MicroStrategy PRIME – Purpose Built For Performance @ Scale
HADOOP
Data Scale
User
Scale
MPP
Databases
In-
memory
DB’s
MicroStrategy PRIME First Out of the box solution
in the market
17
Example Applications
• CRM analysis across a large customer base
• Interactive analysis: large clickstream data
• Merchant analytics for a credit card issuer
• Store manager application for a large chain
MicroStrategy PRIME – Interactive Big Data Exploration
18
Example Applications
• CRM analysis across a large customer base
• Interactive analysis: large clickstream data
• Merchant analytics for a credit card issuer
• Store manager application for a large chain
MicroStrategy PRIME – Interactive Big Data Exploration
Application Characteristics
• Large Data Volumes
• Sub 3 second response time
• Highly Dimensional data
• Complex Dashboards with multiple
visualizations
• Highly Interactive App with users
filtering and slicing across many
dimensions
• Web & Mobile Deployments
• Large User Populations
19
MicroStrategy PRIME - 7x more users and 3x faster than the next best in-
memory technology
1
9
7x More
Users
3x
Faster
Complex analytical
dashboard
High user interactivity
200 GB data set with 50+
dimensions
Equivalent hardware
configurations – 30 nodes
20
MicroStrategy PRIME is like In-Memory on steroids
Data Size 100GB Limit No theoretical limit Tested to 4.6 TB
OLAP Services SMP architecture
PRIME MPP architecture
Data Rows 2B Limit No theoretical limit Tested to 200B
Load Rate 8 GB/Hr No theoretical limit Tested to 7TB/Hr
21
MicroStrategy PRIME – World’s First Technology to Combine 3 Key
Breakthroughs
1 In-Memory Data Store
2 Massively Parallel Processing on
Commodity HW
3 Look-Ahead Analytics – Integrated
Data & Visualization Layers
Interactive Exploration
of
Terabyte Datasets
by
100,000s of Users
23
1. In-Memory Data Store – How much Faster Is It?
• Traditional Disk speed is a banana slug with a top speed of 0.007 mph
• In-Memory is an F-18 Hornet with a max speed of 1,190 mph
25
2. Massively Parallel Processing On Commodity Hardware
• Distribute data across 1000’s of nodes
• Parallel Query execution and loading
• Inexpensive Commodity Hardware
Shared Memory
Traditional BI PRIME Parallel Execution
Memory Memory Memory
Query Engines
Bottleneck
Distributed Data
Parallel Execution
26
2. Parallel Processing: Scaling The Solution
http://blog.delloem.com/2010/12/talking-hpc-with-sagiv-tech/image001/
PRIME Parallel Efficiency
27
Parallel Processing: Breaking The Problem Down
Vertical Scaling (Scale-up): Generally refers to adding more processors and RAM, buying a more robust server.
Pros
• Less power consumption / cooling
• Less network hardware than scaling horizontally
• Cons
• More expensive
• Greater risk of hardware failure
• Limited upgradeability
Horizontal Scaling (Scale-out): Generally refers to adding more servers with less processors and RAM.
• Pros
• More cost effective than scaling vertically
• Easier to run fault-tolerance
• Easy to upgrade
Cons
• Bigger footprint in the Data Center
• Higher utility cost (Electricity and cooling)
• Possible need for more networking equipment (switches/routers)
28
Data Movement: The Performance Killer
http://www.edn.com/design/communication
s-networking/4313434/The-evolution-to-
network-flow-processing
Oracle, 2012
90+% YoY growth
50% YoY growth
29
PRIME Parallel Execution
Memory Memory Memory
Query Engines
Distributed Data
Parallel Execution
Minimizing Data Movement: Bringing Query To The Data
• Query partitioned and executed on core where
data lives
• Only summary information is sent across the
network
30
Commodity Hardware vs. Specialized Appliances
Example PRIME configuration
• 100 clusters of 2 worker
nodes; 1 cluster of 20 master
nodes
• Each Node-16 cores, 144
GB RAM each
• Total: 1920 cores, ~17TB
RAM
31
3. Look Ahead Analytics – Tightly Integrated Data & Visualization Layers
• Data layer has no knowledge of analytics layer
design
• Connections Optimized for the lowest common
denominator
Data Layer
Traditional BI PRIME – Look Ahead
Analytics
Loosely
Coupled
Visualization
Layer
Visualization Layer
Data Layer
Analytics layer
optimizes queries for
data
Data layer analyzes
dashboard and
optimizes structures
• Tightly integrated layers enable optimization
• Analytics layer globally optimizes queries sent to
data based on data structures
• Data layer “looks ahead” and plans based on
knowledge of dashboard
32
Taking Co-Location One Step Further: In-Process Analytics
Query
Processing
App Engine
Process 0
Query
Processing
App Engine
Process 1
Traditional BI
Even if you install BI and DB
on the same server
They run in separate
processes
MicroStrategy PRIME
Query Engine and
Application Engine run
In-process analytics
33
Typical PRIME Application
• 75+data sets
• Multiple views of similar data
• Share joins, filters and cohorts
3. Look Ahead Analytics – The Secret Sauce
Look Ahead Analytics
• Visualizations with identical information processed once
• Filtering and cohorts processed once and reused - processed into machine code
.
• Re-use of joined results for analytics with similar information
• ….. Many More
35
MicroStrategy PRIME - Architecture
SOURCE DATA
Parallel data
loading
Analytics Engines
… DATA
DATA
DATA
DATA
Parallel query
execution
Optimized in-memory data
structure
Data partitioning within and across
nodes
Application Engines
VISUALIZATION
API
Web and mobile output
API
Commodity hardware
Tightly
coupled for
minimal
computation
al distance
36
MicroStrategy PRIME Co-exists With Existing Enterprise Databases
SOURCE DATA
Data
Warehouse
MicroStrategy
PRIME
• Does not replace databases
• Functions as Hot data layer
for apps requiring high
performance
• Load from databases or
directly from files and Hadoop