smarter management for your data growth
DESCRIPTION
Matt Aslett (The451Group) and Deirdre Mahon (RainStor) examine the evolving data management landscape and how RainStor's Online Data Retention (OLDR) repository fits into the equation.TRANSCRIPT
Smarter Management for Your Data Growth
Retain Critical Data Online At A Fraction of The Cost
April 2011
Agenda
Introductions Changing Data Management
Landscape & Trends– From Operational to Analytical
Cloud and Hadoop– Where do They Fit?
RainStor and How it Works Analytics Data Retention Use-case Economics Q&A
Matt Aslett, The 451 Group
Deirdre Mahon, VP Marketing – RainStor
Ramon Chen, VP Product Management - RainStor
© 2011 by The 451 Group. All rights reserved
Matthew Aslett, The 451 [email protected]
© 2011 by The 451 Group. All rights reserved
Total DataThe changing data management landscape
© 2011 by The 451 Group. All rights reserved
451 Research is focused on the business of enterprise IT innovation. The company’s analysts provide critical and timely insight into the competitive dynamics of innovation in emerging technology segments.
The 451 Group
Tier1 Research is a single-source research and advisory firm covering the multi-tenant datacenter, hosting, IT and cloud-computing sectors, blending the best of industry and financial research.
The Uptime Institute is ‘The Global Data Center Authority’ and a pioneer in the creation and facilitation of end-user knowledge communities to improve reliability and uninterruptible availability in datacenter facilities.
TheInfoPro is a leading IT advisory and research firm that provides real-world perspectives on the customer and market dynamics of the enterprise information technology landscape, harnessing the collective knowledge and insight of leading IT organizations worldwide.
ChangeWave Research is a research firm that identifies and quantifies ‘change’ in consumer spending behavior, corporate purchasing, and industry, company and technology trends.
© 2011 by The 451 Group. All rights reserved
5
Overview
The changing data management landscape
One overarching trend: Total Data
Impacting four technology areas: Operational database Analytic database Data archiving Machine-generated data
The trends driving data management
© 2011 by The 451 Group. All rights reserved
Trends driving data management
The volume, variety and velocity of data has never been greater and is growing
The value of data has never been better understood
The capabilities for processing data have never been better Higher processor performance and density are enabling advanced
processing on commodity hardware Software enhancements designed to make best use of processing
performance and scalable architecture Advanced and in-database analytics bring processing to the data,
reducing latency and improving efficiency
The data deluge problem is also a big data opportunity
6
© 2011 by The 451 Group. All rights reserved
7
Introducing Total Data
A concept define by The 451 Group to describe new approaches to data management – beyond restrictive silos
Reflects the changing data management landscape as pragmatic choices are being made about data storage and analysis techniques
Processing any data that might be applicable to analytics in the operational database, data warehouse, or Hadoop, or archive Structured, semi-structured or unstructured Relational or non-relational, on-premise or in the cloud
Inspired by ‘Total Football’
© 2011 by The 451 Group. All rights reserved
8
Total Football meets Total Data
“You make space, you come into space. And if the ball doesn’t come, you leave this space and another player will come into it.”
Bernadus Hulshoff, Ajax 1966-77
Abandonment of restrictive (self-imposed) rules about individual roles and responsibility
Enabled and relied on fluidity and flexibility to respond to changing requirements
Reliant on, and exploited, improved performance levels
© 2011 by The 451 Group. All rights reserved
9
Reporting/BI
Data management – in theory
Enterprise app
Data cleansing/sampling/
MDM
EDW
Operationaldatabase
Infrastructure
The application is the primary source of data
The relational database is sacrosanct
The enterprise data warehouse is the single source of the truth (or is supposed to be)
Offline data archiving Infrastructure primarily exists
to support the data/application layer
Data archive
© 2011 by The 451 Group. All rights reserved
10
Data management – in practice
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Infrastructure
The relational database is sacrosanct
Distributed data layer to meet the scalability and performance demands
New opportunities for real-time BI
Polyglot persistence – use the most appropriate data storage for the application
Reporting/BI
Data archive
© 2011 by The 451 Group. All rights reserved
11
Data management – in practice
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Infrastructure
Reporting/BI
Data archive
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
The enterprise data warehouse is the single source of the truth
Data is copied into departmental or regional data marts
Data warehouse administrators are fighting a losing battle for control
© 2011 by The 451 Group. All rights reserved
12
Data management – in practice
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Infrastructure
Data archive
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
Reporting/BI
Higher processor performance and density are enabling advanced processing on commodity hardware
Advanced in-database analytics bring processing to the data, reducing latency and improving efficiency
© 2011 by The 451 Group. All rights reserved
13
Data management – in practice
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Infrastructure
Data archive
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
Reporting/BI
Hadoop
Reporting/BI
Hadoop and associated analysis tools (Hive, Pig) for large-scale batch processing of large, complex data sets
Taking further advantage of hardware economics
© 2011 by The 451 Group. All rights reserved
14
Data management – in practice
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Infrastructure
Data archive
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
Reporting/BI
Integrating Hadoop with the data warehouse for ETL and also two-step data analysis
Greater acceptance that the EDW is part of a broader data analytics architecture
Hadoop
Reporting/BI
© 2011 by The 451 Group. All rights reserved
15
Data location, data location, data location
Not the end of the EDW, but the EDW is one of many sources of BI, rather than the only source of BI
The issue of data location becomes paramount
Choose the right storage technology – software and hardware EDW, Hadoop or archive On-premise or on the cloud Memory, disk or SSD
Understand the requirements: Value and temperature of the data Ensure data can be queried using existing tools/skills Cost
© 2011 by The 451 Group. All rights reserved
16
EDW requirements/characteristics
High performance query/analysis response Ability to support multiple users concurrently Capacity for multi-terabyte storage and scale Fast data load and staging for data transformation Ability to operate with BI/analytics tools Security and governance
Cost - $20k-$50k per TB Alternatives
Do nothing and suffer the consequences Deploy appliances and/or Hadoop for specific use-cases Offload to an online repository
© 2011 by The 451 Group. All rights reserved
17
Data management – in practice
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Infrastructure
Data archive
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
Reporting/BI
Offline data archiving
Traditionally, data archived for legal requirements
Previously little need for querying/analytics
Hadoop
Reporting/BI
© 2011 by The 451 Group. All rights reserved
18
Data management – in practice
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Infrastructure
Data repository
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
Reporting/BI
Regulations have increased the need to query archived data
Focus shifts on to how to enable querying easily and cost effectively
Becomes an online repository for historical dataReporting
Hadoop
Reporting/BI
© 2011 by The 451 Group. All rights reserved
19
Data management – in practice
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Infrastructure
Data repository
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
Reporting/BI
Infrastructure primarily exists to support the data/application layer
“Machine generated data” an untapped source of data
Reporting
Hadoop
Reporting/BI
© 2011 by The 451 Group. All rights reserved
20
Data management – in practice
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Data repository
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
Reporting/BI
Reporting/BI
Datastructure
Infrastructure as a source of data for analysis and integration with application data: ‘datastructure’
Likely to transform into data-generating and data-processing infrastructure as analytics capabilities are applied directly to the data source
Reporting
Hadoop
Reporting/BI
© 2011 by The 451 Group. All rights reserved
21
Data management – in practice
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Data repository
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
Reporting/BI
Reporting/BI
Datastructure
Cloud as both a source of data and data storage and processing layer
Reporting
Reporting/BI
Hadoop/DW
Analyticdatabase
Analyticdatabase
Analyticdatabase
ReportingReportingReporting
Cloud Infrastructure
Data archive
Analytic DB
Hadoop
Reporting/BI
© 2011 by The 451 Group. All rights reserved
22
Total Data
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Data repository
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
Reporting/BI
Reporting/BI
Datastructure
Reporting
More flexible approach to data management
Greater opportunities for business intelligence
Reporting/BI
Hadoop/DW
Analyticdatabase
Analyticdatabase
Analyticdatabase
ReportingReportingReporting
Cloud Infrastructure
Data archive
Analytic DB
Hadoop
Reporting/BI
© 2011 by The 451 Group. All rights reserved
23
Data location, data location, data location
Avoid data movement and duplication – retain governance
Virtual data marts and data clouds
Data virtualization to provide access to multiple data sources
© 2011 by The 451 Group. All rights reserved
24
Data virtualization
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Data repository
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
Reporting/BI
Reporting/BI
Datastructure
Reporting
Reporting/BI
Hadoop/DW
Analyticdatabase
Analyticdatabase
Analyticdatabase
ReportingReportingReporting
Cloud Infrastructure
Data archive
Analytic DB
Hadoop
Reporting/BI
© 2011 by The 451 Group. All rights reserved
25
Data virtualization
Enterprise app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Data repository
Reporting/BI
Datastructure
Analytic DB
ReportingReportingReportingReportingReportingReporting ReportingReporting
Virtualdata mart
Virtualdata mart
Virtualdata mart
Virtualdata mart
Virtualdata mart
Virtualdata mart
Hadoop/DW Cloud Infrastructure
Data archive
Datavirtualization
Hadoop
Who is RainStor?Specialized database for cost effective
reduction, retention & on-demand retrievalof historical structured data
At 10x Less Cost
OEM Partner ModelCloud or On-premise
Partner Case Studies
Sector : Telco Solution : Message (SMS/MMS)
and traffic log management Retaining 1000s of messages a
second while keeping accessible for regulatory purposes
Sector : Horizontal Solution : Teradata Data Retention
Machine Retain BI & Analytical data long term
in RainStor powered Data Retention Machine for low cost per TB stored. Eliminating tape.
Sector : Various/Horizontal Solution : Information Lifecycle
Management Retaining historical data from highly
complex packaged applications while keeping accessible for business and regulatory purposes
HP Sector : Telco Solution : CDR/IPDR retention and
lawful intercept (HP Dragon) Retaining billions of CDRs per day in
immutable form and enabling cost effective query for regulatory authorities
Data Retention Solution Requirements
TransactionalOLTP
AnalyticalOLAP
Static Machine-Generated Data (MGD)
Online Data Retention (OLDR)
Database ArchivingApplication Retirement
Data Warehouse ArchivingData Warehouse Appliance
ComplianceQuery
Where RainStor Fits Enterprise
app
Data cleansing/sampling/
MDM
EDW
Distributed data
Reporting/BI
Operationaldatabase
Operationaldatabase
Operationaldatabase
Operationaldatabase
Data repository
ReportingReportingReporting
Analyticdatabase
Analyticdatabase
Analyticdatabase
Reporting/BI
Reporting/BI
Datastructure
Reporting
Reporting/BI
Hadoop/DW
Analyticdatabase
Analyticdatabase
Analyticdatabase
ReportingReportingReporting
Cloud Infrastructure
Data archive
Analytic DB
Hadoop
Reporting/BI
Application Archive / Retired
RainStor’s Focus
Communications- OSS- BSS- ISS
Multi- billions of recordsStrict ComplianceRDBMS’s BreakAnalytics Required
10’s of Petabytes Retained
Volumes are rising- Regulated -
Infrastructure needs -
Reaching Telco-scale
Security
Network ForensicsCyber-security
Utilities
- SmartGrid- e Meter
Big Data Volumes- Needs to be online &
Query-able
Found the needle – where’s the
haystack?
Data security will account for
over 60% of new enterprise
security spending in next 3 years
Global mobile data traffic will
grow 26-fold between 2010
and 2015! (6.3 Exabyte's
p/mth)
SmartGrid to Generated 1 Exabyte of
DataIn US AloneNext 2 years
How Does RainStor Do It?
SIZE: Massive de-dupe ~97% savings in storageHARDWARE: On commodity server/disk
infrastructureRESOURCES: Without specialist DBA support
ReducePRESERVED: Massive record volumes in original
formIMMUTABLE: Tamper proofed with audit trailCONFIGURABLE: With retention & expiry policies
RetainSTANDARDS: SQL & BI tools via ODBC/JDBCPERFORMANT: Fast queries for large complex data
setsFLEXIBLE: With schema evolution & point-in-time
access
Retrieve
RainStor’s Disruptive Technology Patented – 4 layers
of compression
Data Reduction through value and pattern de-duplication
Further Algorithmic-level and byte-level compression
Fast Queries in stored format without re-inflation.
Peter Smith Pharma $40,000
Peter Smith Pharma $40,000
Paul Finance $35,000
Peter Smith Pharma $40,000
Paul Finance $35,000
John
Brown
Analytics/DW
Offload Warehouse Data to Online ArchiveHigh Performance & Lower Cost
Augment existing warehouse & analytics systems by providing access to years of history
Run query on RainStor and import results to data warehouse
Re-instate data from data retention repository back to warehouse for deep analytics
Benefits: Lower TCO (Admin, Storage, CPU) Compliant data retention Unlimited scalability Add more data sources for broader
analysis
Source DBe.g. Oracle
5 Quarters
50 Quarters
RainStor Cloud
EC2
S3
VM Software Appliance
ODBC/JDBC
Amazon
1. Compressed de-duplicated data sent to the cloud resulting in quicker and cheaper uploads.
2. Encrypted data stored in private containers ensuring security and easy management.
3. Data accessed on demand using standard SQL tools leveraging elasticity of the cloud
Send
Search
Store
How Do the Economics Stack Up?
Key Criteria Standard RDBMS /
Warehouse
RainStor
Storage 2PB 100 TB (20x compression)
Servers for
Data Load & Query
100 10
Admin Multiple DBA’s No Design, Tuning or Maintenance
© 2011 by The 451 Group. All rights reserved
36
Quick summary
The growing volume, variety and velocity of data is a problem, but it is also an opportunity
Requires a broader approach to data management
Deploy appliances and Hadoop for specific use-cases, and online repository for historical data
‘Datastructure’ will become increasingly valuable, not only as a source of data but also as a source of intelligence
Data location, and the role of data virtualization will come into greater focus
Q&A
© 2011 by The 451 Group. All rights reserved
FULL TIME
Thank you