the changing landscape of data management

16
The Future of Enterprise Data Management Building a Scalable and Dynamic Big Data Infrastructure Graham Pymm | Cloudera Systems Engineer

Upload: sogeti-nederland-bv

Post on 04-Jul-2015

781 views

Category:

Data & Analytics


1 download

DESCRIPTION

Enterprises are living in a connected world built on the foundation of information. This information flows fluidly through organizations, individuals, and assets producing artifacts in the form of data. Used properly, this data can act as the lifeblood of an organization giving them complete visibility into a world that until today was hidden. It comes as no surprise that we are in the middle of a shift in the way organizations are leveraging and managing data today in order to drive competitive advantages of tomorrow. At the center of this paradigm shift sits IT professionals who have to balance not only the traditional needs of an organization, but also the future needs. In this session, we will discuss the changing landscape of data management focusing on the future needs of the business and how IT is in a leading position to guide organizations through this shift.

TRANSCRIPT

Page 1: The changing landscape of data management

The Future of Enterprise Data Management Building a Scalable and Dynamic Big Data Infrastructure

Graham Pymm | Cloudera Systems Engineer

Page 2: The changing landscape of data management

2 © 2014 Cloudera, Inc. All rights reserved.

The Challenges of Data Management

•  Connection •  Human •  Machine

•  Consumerisation

•  Experimentation

•  The Economics of Data Management

Page 3: The changing landscape of data management

3 © 2014 Cloudera, Inc. All rights reserved.

Skybox delivers high res. imagery of any spot on earth multiple times per day. The Challenge: •  High-res images of the globe require significant binary data processing •  RDBMS + Spatial software is costly & not scalable enough to meet demands

for unlimited historical storage, long-term temporal analysis The Solution: •  Ingests 1TB raw satellite data every day •  Cloudera indexes data and enables real-time search

How many cars are there in each parking lot? Can we use that information to refine our prediction?

How will seasonality affect earnings next quarter?

Page 4: The changing landscape of data management

4 © 2014 Cloudera, Inc. All rights reserved.

Global payments processor finds the largest incidence of fraud in their history. The Challenge: •  Statisticians want unconstrained analysis; limited EDW compute resources •  Expensive ETL processes; Annual IT spend >$1B; DR mandatory The Solution: •  EDW augmentation with Cloudera improves fraud detection + grows

merchant report revenues from $200M to $1B business •  Using Cloudera for DR saves $30M

How can we correlate organized activity on millions of global accounts to detect fraud?

You live in Malaysia. Did you really buy a boat in Portugal yesterday?

Page 5: The changing landscape of data management

5 © 2014 Cloudera, Inc. All rights reserved. 5

Automate data-driven R&D decisions, reduce time to market from years to months. The Challenge: •  1,000+ research scientists developing products in silos •  Time to market for new product is 5-10 years The Solution: •  Cloudera PB-scale platform for single view of all R&D data •  Scientists directly access Cloudera; auditing & access control for security

How do seed selection, planting density, ground temperature, & soil composition impact yields?

How much corn did my farm produce last year?

Page 6: The changing landscape of data management

6 © 2014 Cloudera, Inc. All rights reserved. 6

Determine affinity strengths between services to bundle and promote effectively, optimizing average subscriber value and minimizing acquisition cost and churn. The Challenge: •  Disparate data is hard to correlate and analyze for sufficiently personalized product

bundling, cross-sell, and up-sell opportunities served in real time The Solution: •  Cloudera offsets the latency and constraints of EDWs to expand data available for

multivariate regressions, also driving down the cost of ad hoc modeling

How do we combine all data to identify the most valuable customers to target for optimal profit margins?

When and where does the network experience the highest usage demands?

Page 7: The changing landscape of data management

7 © 2014 Cloudera, Inc. All rights reserved. 7

Government organization can find patterns within PB-scale data to identify suspicious behavior in real time. The Challenge: •  Logs generated as users access various web services via VPN •  Need to identify suspicious behavior requires real-time, large scale data The Solution: •  Large-scale log analysis: find patterns & identify suspicious behavior in near

real time

Which one of these people is likely to be carrying a bomb?

Do you have any liquids in your luggage?

Page 8: The changing landscape of data management

8 © 2014 Cloudera, Inc. All rights reserved.

Improving pain management in premature babies & reducing asthma-related visits to the emergency room. The Challenge: •  Legacy systems could only retain 3 days’ worth of data streams from bedside

monitors The Solution: •  Streaming bedside monitor data, growing 50GB per week •  Combining 20 years’ EPA data with asthma research; interactive SQL

How do ambient light and noise impact outcomes for infants in our neonatal ICU?

How do we analyze more than 3 days’ worth of patient data?

Page 9: The changing landscape of data management

9 © 2014 Cloudera, Inc. All rights reserved.

Large Global Bank delivers online analytics for merchants The Challenge: •  No visibility for small merchants for customer spend at merchant premise •  Need agile delivery without reliance on existing overburdened infrastructure The Solution: •  Use Cloudera Search to provide real time transactional analytics for customer

spend •  Provide view of aggregated customer spend in the locale

What did my customers spend at my competitors?

What did my customers spend with me?

Page 10: The changing landscape of data management

10 © 2014 Cloudera, Inc. All rights reserved.

Expanding Data Requires A New Approach

What we usually do Copy Data to Applications

What we should do Bring Applications to Data

Data

Information-centric businesses use all data:

Multi-structured, internal & external data

of all types

App

App

App

Process-centric businesses use:

• Structured data mainly •  Internal data only •  “Important” data only • Multiple copies of data

App

App

App

Data

Data

Data

Data

Page 11: The changing landscape of data management

11 © 2014 Cloudera, Inc. All rights reserved.

Page 12: The changing landscape of data management

12 © 2014 Cloudera, Inc. All rights reserved.

The Old Way: Moving Data to Compute Huge Investment in Specialized Systems that Treat Data as a Commodity

SERVERS MARTS EDWS DOCUMENTS STORAGE SEARCH ARCHIVE

ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES

Major Challenges

Missing Data •  Leaving data behind •  Risk and compliance •  High cost of storage

Complex Architecture •  Many special-purpose systems •  Moving data around •  No complete views

Cost of Analytics •  Existing systems strained •  No agility •  “BI backlog”

Time to Data •  Up-front modeling •  Transforms slow •  Transforms lose data

Page 13: The changing landscape of data management

13 © 2014 Cloudera, Inc. All rights reserved.

The New Way: Bringing Compute to Data Maximize Benefit from All Your Data for Mission-Critical Jobs and Innovation

SERVERS MARTS EDWS DOCUMENTS

STORAGE SEARCH

ARCHIVE

ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES

Major Benefits

Active Compliance Archive •  Full fidelity original data •  Indefinite time, any source •  Lowest cost storage

Diverse Analytic Platform •  Bring applications to data •  Combine different workloads on

common data (i.e., SQL + Search) •  True analytic agility

Self-Service Exploratory BI •  Simple search and BI tools •  “Schema on read” agility •  Reduce BI user backlog requests

Persistent Storage •  One source of data for all analytics •  Persist state of transformed data •  Significantly faster and cheaper

Page 14: The changing landscape of data management

14 © 2014 Cloudera, Inc. All rights reserved.

Hadoop and the Enterprise Data Hub An Open-Source Data Engine at the Core and Built for the Modern Enterprise

MAPREDUCE IMPALA SOLR SPARK SPARK STREAMING

YARN

HDFS HBASE

CLO

UD

ERA N

AV

IGA

TOR

CLO

UD

ERA M

AN

AG

ER

SENTRY

Key Attributes Ø  Secure, Governed,

and Compliant

Ø  Unified and Managed

Ø  Open Architecture and Scalable

Ø  Open-Source and Cost-Effective

Page 15: The changing landscape of data management

15 © 2014 Cloudera, Inc. All rights reserved.

The Journey to the Full Potential of an EDH Operational Efficiency Buys Option on Exploration, Data Science, and Convergence

Operational Efficiency Information Advantage

Data Science Exploration ETL

Acceleration Cheap

Storage EDW

Optimization Converged

Applications

Business IT

Page 16: The changing landscape of data management

Thank You. Check out our real time Search demo at our stand! Or try it out yourself at Cloudera Live!