data domain protection for emerging big data platforms · data domain protection for emerging big...
TRANSCRIPT
Data Domain Protection for Emerging Big Data Platforms
Yatin Patil – Product Management Jeff St. Cyr – Manager, Global Technology Office [email protected] [email protected]
The Industry Challenge
© Copyright 2017 Dell Inc. 3
A new breed of applications spreads into enterprises Racing towards production use
How will you protect your critical data in these apps?
© Copyright 2017 Dell Inc. 4
Often with transformative business outcomes
Major credit card issuer
$2B potential fraud incidents identified before any money was lost
Electric car manufacturer
Growth that outperforms the market Can proactively identify & fix issues
Accurate forecasts saved $$$ Fewer insurance claims. Optimal energy use
Weather Forecaster
45 straight quarters of growth Increased repeat business
Leading grocery chain
© Copyright 2017 Dell Inc. 5
But customers struggle with data protection and they are speaking with one voice
… expecting 10x data growth, we need a proper backup & DR strategy for Hadoop & Greenplum systems, just like how we backup our other systems
- a Wall street bank
(for Hadoop) we want a backup strategy involving daily incrementals & rollback to the last known good point. - a consumer electronics company
Hadoop needs a backup & DR story. We need a way to verifiably delete customer data after the mandated retention time
- a large multi-national accounting firm
Current homegrown utilities are too slow for weekly data ingests of 15-40TB - a data analytics & software-as-a-service company
Backup and DR are critical for our Hadoop & Cassandra systems - an electric utility
© Copyright 2017 Dell Inc. 6
Data lakes aggregate new & existing data sources Transactional /In-memory
THE ENTERPRISE
DATA LAKE
Analytical Systems
Content / File Shares
NAS
NoSQL & other databases
Business Insights
Data warehouses Event Streams
Unstructured & Semi-structured
© Copyright 2017 Dell Inc. 7
Unstructured & Semi-structured
Big data applications lack a robust backup solution Transactional /In-memory
THE ENTERPRISE
DATA LAKE
Analytical Systems
Content / File Shares
NAS
NoSQL Databases
Business Insights
Data warehouses Event Streams
Crude or no backup story
Snapshots & replication aren’t really a backup strategy
Needs enterprise grade data protection
Protected by
existing backup
solutions
© Copyright 2017 Dell Inc. 8
Dell EMC database protection strategy
Covering broad range of mission critical applications
ANY APPLICATION
Stay within your backup window and recover your data from any point in time
ANY SPEED, ANY SLO
Leverage your entire IT team to complete the backups needed for your organization
ANY ADMINISTRATOR
No matter where your data lives
ANY LOCATION
The Industry Solution
© Copyright 2017 Dell Inc. 10
Efficient, flexible, cloud-enabled protection
• Back directly from enterprise apps or primary storage • Deploy protection storage however you want it
FLEXIBLE
• Natively tier deduped data to the cloud for modern long-term retention • Deliver data protection as a service with logical data isolation
CLOUD-ENABLED
• Reduce storage requirements by 10 – 30x with variable-length deduplication • Gain industry leading speed, scalability, and reliability
EFFICIENT
… Powered by Data Domain software
© Copyright 2017 Dell Inc. 11
DD Boost – industry leading protocol • APPLICATION INTEGRATION WITH Data Domain Boost
When you want • Faster Backup Performance
• Network Bandwidth Reduction
• Increased Database Availability
• Reduced impact on database server
• Simplified configuration management
DD6800 DD9800
Speed (DD Boost) 32 TB/hr 68 TB/hr
Speed (other) 14 TB/hr 31 TB/hr
Logical capacity 2.8–14.4 PB1
8.4–43.2 PB2 10–50 PB1 30–150 PB2
Usable capacity Up to 288 TB1 Up to 864 TB2
Up to 1 PB1 Up to 3 PB2
1 Total capacity on Active Tier only
2 Total capacity with DD Cloud Tier software for long-term retention
© Copyright 2017 Dell Inc. 12
DD Boost File System Plugin
DD Boost/BoostFS are ideal for large, clustered databases & Hadoop
Application agnostic agent
Integrate any application without an SDK
Supported for - MongoDB and Mongo Ops Manager - MySQL
Simple File systems interface
Efficient DD Boost efficiency +
Access via a file system mount-point cp file /mount_point/ddboost
Sends data to Data Domain via DD Boost - With the network & storage efficiencies of
DD Boost and Data Domain
Platform Integration
© Copyright 2017 Dell Inc. 14
Introducing the Hadoop application agent
True point-in-time backup and recovery of Hadoop data to Data Domain
Supports leading commercial distros: Cloudera & Hortonworks
Backup & recovery operations controlled by Hadoop admins
Linux CLI interface facilitates scripting & extensibility
© Copyright 2017 Dell Inc. 15
Protecting Hadoop with Data Domain
HDFS Hadoop File System
Data Node
Data Node
Data Node
Data Node
OR Shared Storage DAS
B B B B
Name Node
Hadoop Cluster
Hadoop App Agent
DDBoost Filesystems Plugin
B
• An HDFS-integrated backup app for Hadoop – Point-in time backup & recovery – Backup HDFS directories & HBASE tables
• Empowers Hadoop admins to backup their data – Using Hadoop native tooling (MapReduce, distcp)
• Storage agnostic: DAS and NAS configurations
• Add multiple Data Domains to grow capacity
• Most efficient data path to backup storage
Hadoop Application Agent A backup application for Hadoop
Backup data path
Hadoop Admin
• Supported on: – Cloudera Enterprise 5.4 – 5.9 – Hortonworks Data Platform 2.2 – 2.5
Data Domain
Backup Solution
© Copyright 2017 Dell Inc. 16
Protecting MongoDB with Data Domain File system ease of use with the power of DD Boost
DD Boost
MongoDB Server
/backup
Linux Mount point
• Supported On: – MongoDB Ops Manager 2.6.3.0, 3.3, 3.4
• Dump MongoDB to Data Domain (via BoostFS) – Get storage efficiency due to deduplication – Network bandwidth efficiency due to Boost – Use parallel connections (more detail??) – mongodump –db testdb –numParallelCollections 5 –out /backup/
• Simple to deploy – Install BoostFS on the MongoDB server/Ops Manager server – Create a mountpoint /backup – Mount Data Domain Storage Unit using BoostFS
• Supports WiredTiger & MMAPv1 storage engines
• Best practices – Use Ops Mgr. v3.4 (larger file size writes) – Up to 63 streams per BoostFS plug-in – mongodump writes backup files uncompressed from WT or
MMAPv1 storage engines
© Copyright 2017 Dell Inc. 17
Pivotal Greenplum data protection
DD Boost
Greenplum cluster
• Supported versions: – 4.2.1 through 4.3.10
• gpcrondump utility – Wrapper utility around gp_dump – dumps to Data Domain storage
unit via DD Boost – Compressed by default; with object consistency
• Use gpcrondump to backup – Databases, schemas, & tables – gpcrondump -x mydatabase -z -v –ddboost
• Use gpdbrestore for recovery – GP_RESTORE and GPDBRESTORE – Greenplum database system is online and running – Have the same primary segment instances as the system backed up – Database being restored existing but is empty, or –e to drop & create – gpdbrestore -t backup_timestamp -v -ddboost
• Incremental Backup and Restore (GPDB 4.2.5) – AO tables and partitions – ALTER TABLE, INSERT, TRUNCATE, DROP & RECREATE
DD Boost integrates into Greenplum’s database backup commands
© Copyright 2017 Dell Inc. 18
Protecting MySQL Databases with Data Domain
DD Boost
• Supported versions: – 5.6 & 5.7
Integration and qualification of 3 different applications
• Dump MySQL to Data Domain (via BoostFS) – Get storage efficiency due to deduplication – Network bandwidth efficiency due to Boost – Mysqlbackup command
• MyDumper
• Perfcona Xtrabackup
Backup Application
MySQL Enterprise
Backup
Mydumper XtraBackup
Single 30% 35% 25%
Multiple N/A 50% 40%
© Copyright 2017 Dell Inc. 19
Protecting EnterpriseDB with Data Domain
DD Boost
Integration and qualification of 3 different applications
• Dump MySQL to Data Domain (via BoostFS) – Get storage efficiency due to deduplication – Network bandwidth efficiency due to Boost – Mysqlbackup command
• Backup Tools Qualified – BART - Enterprise – PG Dump – Standard (Community) – PG RMAN - Open Source
• Supported Distros – Standard v9.5 & 9.6 – Enterprise v9.5 & 9.6
© Copyright 2017 Dell Inc. 20
DD Boost for everyone
• Expanding the benefits of DD Boost to even more applications with DD Boost File System Plug-in
• Can be deployed in minutes to reduce backup windows and storage capacity
• Same advanced DD Boost features in a file system format
© Copyright 2017 Dell Inc. 21
What does AVT do? Why do I need it?
• AVT saves you time and money by ensuring your application will benefit from using BoostFS, BEFORE going into production.
• AVT is a POC in a box, that will measure the benefits of BoostFS with your workload.
• Shorter backup windows compared to NFS
• Greater storage efficiencies with data deduplication
• Recommended for any application/workloads that use NFS for data protection that are NOT listed in the Integration guide.
https://community.emc.com/docs/DOC-55465 What guide? This one…..
© Copyright 2017 Dell Inc. 22
Where to find AVT?
https://community.emc.com/community/products/data-domain#install
Step 1. Go to Data Domain Community Page
https://community.emc.com/community/products/data-domain#install
Step 2. Scroll down to Application Validation Tool
Step 3. Select Download
© Copyright 2017 Dell Inc. 23
Want to win a levitating death star speaker?
• Follow @DellEMCProtect while at Dell EMC World
• 2 Winners will be chosen daily from
Monday May 8 to Thursday May 11 • All winners will be notified through
Twitter Direct Message
NO PURCHASE NECESSARY. Ends 05/11/2017. To enter and for Official Rules, visit http://thecoreblog.emc.com/dell-emc-world-follow-win-sweepstakes-2017/
© Copyright 2017 Dell Inc. 24
Learn more: join the conversation
@DellEMCProtect
Dell EMC Storage and Data Protection
Dell EMC Data Protection Community
Data Protection on EMC.com
Mozy.com
Spanning.com
© Copyright 2017 Dell Inc. 25
You may also be interested in these sessions …
Session Breakout Session Title First Session
Second Session
dps.01 Enterprise Copy Data Management: Primary & Protection Copy Management Best Practices Mon 01:30
dps.06 Dell EMC Data Domain: What's New For 2017 Mon 08:30 Wed 01:30
dps.07 Dell EMC Data Protection Suite: What's New For 2017 Tue 03:00 Wed 12:00
dps.13 Data Domain Protection For Microsoft Applications: SQL, SharePoint & Exchange Wed 03:00 Thu 11:30
dps.14 Data Domain Protection For Large Enterprise Databases: Oracle, SAP & IBM Mon 12:00 Thu 08:30
bof.14 Bird’s Of A Feather: Data Domain Ask The Experts Tue 01:30