five critical success factors for big data and traditional bi

37
Grab some coffee and enjoy the pre-show banter before the top of the hour!

Upload: inside-analysis

Post on 03-Jun-2015

2.168 views

Category:

Technology


5 download

DESCRIPTION

The Briefing Room with Dr. Robin Bloor and VelociData Live Webcast Dec. 10, 2013 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=7909837&rKey=b0bac7d09bf1a638 Most Big Data discussions focus on analytics, but business users need more than that. They need speed, because most opportunities these days are transient and must be acted on quickly. Bottlenecks in the delivery of analytic results often occur on the gathering and transformation side, where massive volumes of data must be validated, converted, masked or otherwise transformed before hitting the analytics engine. Big Data is rapidly overrunning conventional approaches, creating requirements for accelerated, hybrid systems. Register for this episode of the Briefing Room to hear veteran IT Analyst Dr. Robin Bloor, as he explains how a combination of innovations is dramatically changing how companies can solve serious data transformation challenges. Robin will be briefed by Ron Indeck of VelociData, who will tout their record-breaking data operations appliance. He'll also discuss five critical success factors for achieving optimal performance, including the necessary infrastructure for executing data transformations at wire speed. Visit InsideAnalysis.com for more information

TRANSCRIPT

Page 1: Five Critical Success Factors for Big Data and Traditional BI

Grab some coffee and enjoy the pre-show banter before the top of the hour!

Page 2: Five Critical Success Factors for Big Data and Traditional BI

The Briefing Room

Five Critical Success Factors for Big Data and Traditional BI

Page 3: Five Critical Success Factors for Big Data and Traditional BI

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected]

Page 4: Five Critical Success Factors for Big Data and Traditional BI

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Page 5: Five Critical Success Factors for Big Data and Traditional BI

Twitter Tag: #briefr

The Briefing Room

Topics

This Month: INNOVATORS

January: ANALYTICS

February: BIG DATA

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

Page 6: Five Critical Success Factors for Big Data and Traditional BI

Twitter Tag: #briefr

The Briefing Room

Data Discovery & Visualization

INNOVATORS

Page 7: Five Critical Success Factors for Big Data and Traditional BI

Twitter Tag: #briefr

The Briefing Room

Analyst: Robin Bloor

Robin Bloor is Chief Analyst at The Bloor Group

[email protected]

Page 8: Five Critical Success Factors for Big Data and Traditional BI

Twitter Tag: #briefr

The Briefing Room

VelociData

! VelociData offers purpose-built big data operations appliances

!   Its solutions combine field-programmable gate arrays (FPGAs), graphics processing units (GPUs) and central processing units (CPUs) to enable high speed parallelism

! VelociData can improve data transformation and data quality performance by several orders of magnitude

Page 9: Five Critical Success Factors for Big Data and Traditional BI

Twitter Tag: #briefr

The Briefing Room

Guests: Ron Indeck and Chris O’Malley

Ron Indeck is President, CTO and Founder of VelociData

Chris O’Malley is CEO of VelociData

Page 10: Five Critical Success Factors for Big Data and Traditional BI

VelociData Solving the Need for Speed in Big DataOps

www.velocidata.com

[email protected] @velocidata tel.: 314.785.0601 Fall 2013 10 www.velocidata.com [email protected] The Bloor Group – December 10, 2013

Page 11: Five Critical Success Factors for Big Data and Traditional BI

Dr. Ronald Indeck – Founder and President, VelociData

11

•  Founder and CTO, Exegy

•  Former Professor, Washington University

•  Das Family Distinguished Professor

•  Director, Center for Security Technologies

•  Former President, Institute of Electrical & Electronics Engineers (IEEE) Magnetics Society

•  Past Recipient Bar Association Inventor of the Year

Page 12: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

Five Critical Success Factors for Leveraging Data

12

1.  Don’t ignore data ingest and transformation

2.  Data Integration speed and cost really count

3.  Hadoop alone does not solve the problem

4.  VelociData eliminates data ingest bottlenecks

5.  Big Data project risks can be mitigated effectively

Page 13: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

Why Data is Breaking the Seams of Conventional Options

Competitive advantage is achieved in seizing the opportunity presented in transient business moments; this is creating a crisis between the growth of data sources and the relentless quest for faster insights

•  Volume: Data volume growing exponentially at 55% annually

•  Variety: Must harness numerous new data sources

•  Velocity: Reconcile data moving at differing speeds; batch, streaming, archived

These factors are compounded by Hadoop that offers data management at ~80% less cost than conventional approaches, justifying storage of everything over longer periods of time; this is spawning business ideas for monetizing the use of data creating use cases requiring massive acceleration of data operations that must handle the scale and complexity of the 3Vs

Following conventional best practices no longer satisfies critical business applications

13

CSF #1: Don’t ignore data ingest and transformation

Page 14: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

What are Conventional Options for Accelerating DataOps?

Conventional options for improving data operations performance under the following requirements:

•  high volume (e.g., 10M+ row, densely populated tables) •  high growth (e.g., >60% annually) •  multiple varieties and sources (structured and unstructured) •  high velocity (e.g., data available in less than an hour) C

ost

Per

form

ance

Sca

labi

lity

Com

plex

ity

Add cores to existing ETL processes

Add MIPS to existing IBM mainframe data integration jobs

Push down optimization (ELT)

Hadoop (ELT)

Entirely new engineered system platform

CSF #2: Data integration speed and cost really count CSF #3: Hadoop alone doesn’t solve the problem

14

Page 15: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

VelociData Solution Palette VelociData Suites VelociData Solutions Examples Conventional

(records/second) VelociData (records/second)

Data Transform

Lookup and Replace Data enrichment by populating fields from a master file <3000 600,000

Cardio Pulmonologist à CP 500 700,000

Type Conversions XML à Fixed; Binary à Char 1000-2000 800,000

2013-01-02 à 01/02/2013 1000-3000 800,000

Format Conversions Rearrange, add, drop, or resize fields to change layouts 1000 650,000

Surrogate Key Generation

Hash multiple field values into a unique pseudo-key 3000 > 1,000,000

Generate MD5 or SHA hash keys 3000 > 1,000,000

Data Masking Obfuscate data for non-production uses: Persistent or Dynamic; Format preserving encryption; AES-256 500-1000 > 1,000,000

Data Quality

USPS Address Processing Standardization, verification, and cleansing (CASS certification in process) 600 400,000

Domain Data Validation Validate a value based on a list of acceptable values (e.g., all states in the US; all countries in the world) 1000-3000 750,000

Field Validation

Validates based on patterns such as emails, dates, phone numbers, … 1000-3000 > 1,000,000

Data type validation and bounds checking 3000 > 1,000,000

Data Platform Offload Mainframe Data Offload Copybook parsing & data layout discovery; EBCDIC,

COMP, COMP-3, … à ASCII, Integer, Float,… 200 > 200,000

15

Results are system dependent but data intended to provide magnitude comparison

CSF #4: VelociData eliminates data ingest bottlenecks

Page 16: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

The New World Data Challenges Being Solved

•  Credit card company reduces MIPS and improves performance to integrate historical and fresh data into Hadoop analytics process by processing 10 million records per minute

•  Financial processing network masks 5 million fields per second of production data to sell opportunity information to retailers

•  To enable customer support for a health benefits provider by shortening a data integration process from 16 hours to 45 seconds

•  Property casualty company shortens a daily task of processing 450 million records from 5 hours to less than 1 hour

•  Retailer now processes xml data to integrate 360 degree customer data from in-store, on-line, and mobile sources in real time

16

CSF #5: Big Data project risks can be mitigated effectively

Page 17: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

VelociData: Continuous Innovation

17

• 3Q13

• Format Preserving Encryption and Data Masking

• Extensive Mainframe Data Conversion

• Extensive XML Processing

• 4Q13

• Expanded Hashing and Key Generation Options

• Additional Mainframe Record Types

• Scalable Deployment Management

Page 18: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

Let’s Start the Conversation Now

Helpful Resources:

Alternatives for Data Integration: http://velocidata.com/our-solution

Industry Analyst Research Reports: http://velocidata.com/resources

Data Ops – Meeting Big Data Organizational Challenges: http://velocidata.com/blog

Join us on social media:

Twitter: @VelociData

LinkedIn: http://www.linkedin.com/company/velocidata?trk=company_name

Google+: https://plus.google.com/112063174918659483670/posts

Phone: +1-314-785-0601

E-Mail: [email protected] / [email protected]

We will send a follow-up email containing this presentation and links to contact us

18

For more information visit: http://velocidata.com

Page 19: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected] 19

Questions?

Page 20: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

How We Achieve Orders of Magnitude in Acceleration

VelociData Big Data Operations Appliance

•  Purpose built solutions that combine a mix of software, firmware, and massively parallel hardware to provide acceleration often approaching wire-speeds

•  Heterogeneous compute environment that includes FPGAs, GPUs, and CPUs to

offer a level of internal parallelism that can dramatically outperform software on general purpose computers

•  “Business Micro Supercomputer” in a 4U rack form factor

20

Page 21: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

•  Hadoop •  ETL Server

•  Data Warehouse •  Database Appliances

•  BI Tools •  Downstream zOS Process

•  Cloud

Business Value for Most Architectures

Wire Rate Transformations •  Normalize •  Encrypt/Mask •  Cleanse •  Enrich

Big Data Operations Appliance to Maximize Data

Transformation Acceleration to Wire Speed

CSV

XML

zOS Data

RDBMS

Social Media

Sensor

Hadoop

21

Page 22: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

Platform Processes Offloaded to VelociData

22

Wire-rate transformations – purpose-built for better price performance

MPP Platforms (Teradata, Netezza)

Is using the MPP Platform for ELT and Push Down Optimization not an optimal

use of resources?

ETL Server

ETL server having trouble keeping up with exploding data growth?

Mainframe

Too expensive to keep

adding mainframe MIPS? Hadoop

Are self-service business analytics users frustrated with the time required to transform unstructured

and legacy data into something useful for

decision making?

VelociData feeds Hadoop pre-processed, quality data for

real-time BI efforts

Seamlessly offload to

VelociData the heavy lifting

ETL/ELT processes from Ab Initio, IBM,

and Informatica

Page 23: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

Common ETL Bottlenecks

Lookup & replace

Field validation: datatype

validation

Field validation: bounds checking

Aggregation

USPS address standardization

Business rules

Entity resolution

Exception / error handling

Staging DB

ETL Server

Primary RDBMS

Candidates for Acceleration

Extract Transform Load

CSV

Mainframe

XML

RDBMS

Social Media

Sensor

Hadoop

Page 24: Five Critical Success Factors for Big Data and Traditional BI

www.velocidata.com [email protected]

Aggregation

Business rules

Entity resolution

Exception / error

handling

Lookup & replace

Field validation: datatype

validation

Field validation: bounds checking

USPS address standardization

ETL Processes Offloaded to VelociData

Primary RDBMS

Staging DB

ETL Server

Extract Transform Load

Keep Existing Input Interfaces Accelerate Bottlenecks

at Wire Speed

Reduce ETL Server Workload

Faster Total Processing Time

CSV

Mainframe

XML

RDBMS

Social Media

Sensor

Hadoop 24

Page 25: Five Critical Success Factors for Big Data and Traditional BI

Twitter Tag: #briefr

The Briefing Room

Perceptions & Questions

Analyst: Robin Bloor

Page 26: Five Critical Success Factors for Big Data and Traditional BI
Page 27: Five Critical Success Factors for Big Data and Traditional BI

Technology Evolution (Bloor Curve)

Page 28: Five Critical Success Factors for Big Data and Traditional BI

Disruption on Disruption

u  We are no longer certain that the pattern still holds

u  We used to encounter new technologies that were 10x because of Moore’s Law

u  Now we encounter new technologies that are 100x or even 1000x

u  This is not because of Moore’s Law but because of parallelism

Page 29: Five Critical Success Factors for Big Data and Traditional BI

Parallelism Will Become the Norm

u  This is not just about software

u  It is also about hardware architectures

u  But it affects all software u  Eventually everything will

execute in parallel u  Everything will go much

faster

Page 30: Five Critical Success Factors for Big Data and Traditional BI

CPUs, GPUs and FPGAs

u  CPUs, GPUs and FPGAs are commodities

u  They can be harnessed to deliver extreme parallelism on a single server

u  The use of such chips can deliver acceleration above 100x for some applications

Page 31: Five Critical Success Factors for Big Data and Traditional BI

The Memory Cascade

u  On chip speed v RAM •  L1(32K) = 100x •  L2(246K) = 30x •  L3(8-20Mb) = 8.6x

u  RAM v SSD •  RAM = 300x

u  SSD v Disk •  SSD = 10x

Page 32: Five Critical Success Factors for Big Data and Traditional BI

Going Forward

The old limitations are no longer

SO LIMITING

Page 33: Five Critical Success Factors for Big Data and Traditional BI

u Can one VelociData Appliance serve many applications?

u What of data cleansing functionality (e.g., cleansing rules, deduplication, etc.)?

u Please explain wire-speed in a little more detail.

Page 34: Five Critical Success Factors for Big Data and Traditional BI

u How long does it take to implement and what is the process? Please describe.

u With Hadoop, what are the possibilities?

u What does the roadmap look like?

Page 35: Five Critical Success Factors for Big Data and Traditional BI

Twitter Tag: #briefr

The Briefing Room

Page 36: Five Critical Success Factors for Big Data and Traditional BI

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: INNOVATORS

January: ANALYTICS

February: BIG DATA

Page 37: Five Critical Success Factors for Big Data and Traditional BI

Twitter Tag: #briefr

The Briefing Room

Thank You for Your

Attention