1© Copyright 2010 EMC Corporation. All rights reserved.
Next-Generation Data protection
Deduplikacjakluczowy element backupu nowej generacji
Piotr Nogaś
BRS EMEA EE TC Manager
2© Copyright 2010 EMC Corporation. All rights reserved.
(8/11/10): F1000 Sample. Q4 ‟07, n=151; Q3 ‟08, n=140; Q2 ‟09, n=155; Q4 ‟09, n=185; Q2 ‟10, n=166. *Note that due to
multiple responses per interview, total exceeds 100%.
F1000 Storage Professionals‟ Pain Points
What are your top storage-related pain points?
0% 20% 40% 60% 80%
Other
Managing Storage Equipment
Power Management
Vendor Management
Application Recoveries and/or Backup Retention
Regulatory Compliance
Data Mobility
Storage Provisioning
Archiving and Archive Management
Dealing With Performance Problems
Lack of Integrated Tools
Managing Complexity
Backup Administration and Management
Managing Costs
Proper Capacity Forecasting and Storage Reporting
Managing Storage Growth
Q4 '07
Q3 '08
Q2 '09
Q4 '09
Q2 '10
3© Copyright 2010 EMC Corporation. All rights reserved.
DataCenter wavespast and future
• storage consolidation
• servers consolidation
• server virtualization
virtual environments
protection:– VMWARE (API)
– HyperV,
– XEN (host based)
– virtual partitions via FC/VTL
need for hundred+ of tape
drives emulation and high
concurency
• storage virtualization – dedupliacation appliances for
backup
– dedupliacation on tier 1
storage (primary)
• virtualization everywhere– VDI
– cloud
4© Copyright 2010 EMC Corporation. All rights reserved.
Storage OPTIMIZATION Deduplication
OPTIMIZATION
Server &
Primary
Storage
Server/Storage
Consolidation
/Virtualization
Network OPTIMIZATIONWAN
Optimization
Optimization Technologies Center Stage
5© Copyright 2010 EMC Corporation. All rights reserved.
Backup Environments Transformationroot causes
Unabated data growth
• Backup = 4 to 30 times production capacity
• Full backups kept for months or years
• New requirements to keep mo re data for longer periods
0
8
12
Zett
ab
yte
s
4
10
16
Source: IDC Digital Universe Study, sponsored by EMC, May 2010;
chart does not include data that does not need protection
2010 2012 2014 2016 2018 2020
0
1,000
2,000
Exab
yte
s
500
1,500
2,500
Digital Information Created and Replicated WorldwideFive times growth in four years
2008 2009 2010 2011 2012
Source: IDC Digital Universe white paper, sponsored by EMC, May 2009
Needing Protection
Protected
Unprotected in
2010 = Size of
entire digital
universe in 2018
6© Copyright 2010 EMC Corporation. All rights reserved.
Major Trends Driving the Transformation of Backup EnvironmentsServer virtualization
• Increased complexity
• Virtual machine sprawl
• High utilization, little bandwidth for backup
Old Paradigm20% resource utilization
CP
U U
tiliz
atio
n
100%
80%
40%
0%
60%
20%
New Paradigm80% resource utilization
Shared Physical Resources
CP
U U
tiliz
atio
n
100%
80%
40%
0%
60%
20%
VMware ESX Server
Hardware
7© Copyright 2010 EMC Corporation. All rights reserved.
“The process of detecting and identifying
the unique data segments within a given
set of information, enabling the elimination
of redundancy when stored or moved.”
Before:
total segments = 39
After:
Unique segments = 6Data Set 3
Data Set 2
Data Set 1
Deduplication
What is Data Deduplication?
8© Copyright 2010 EMC Corporation. All rights reserved.
Replicate smarter.
Move only deduplicated data over existing networks
with up to 99% bandwidth efficiency for cost-effective
disaster recovery.
By designed plenty of duplicated data Standard backup schedule with 91 days retention
(full +6 diff/incr) can contain same data 15+ times.
Keep logical copies vs physical. Deduplicate for
capacity and SLA
Recover reliably.Continuous fault-detection and self-healing ensure
data recoverability to meet SLAs.
Why Deduplicated Backup?
WAN
9© Copyright 2010 EMC Corporation. All rights reserved.
Prawie Robi Różnicędeduplication - principles
• Fixed vs Dynamic block – capacity requirements
• Number of streams per appliance
• Robustness/security MD5 vs. SHA-1http://en.wikipedia.org/wiki/MD5#Collision_vulnerabilities
http://en.wikipedia.org/wiki/SHA-1#Comparison_of_SHA_functions
• Time for SHA-1:• HW support
• Multicore CPUs
• Smart not hard – rainbow tables in deduplication algorithms
http://pl.wikipedia.org/wiki/T%C4%99czowe_tablicehttp://kestas.kuliukas.com/RainbowTables/
– Reducing CPU and disk cycles
10© Copyright 2010 EMC Corporation. All rights reserved.
Architecture AdvantageVariable vs. Fixed
Variable Segment deduplication significantly reduces:
Power, Cooling, Management, Complexity
100TB lives on
50TB
100TB lives on 33TB
100TB lives on 25TB
(4:1 is 25TB)
100TB lives on 5TB
100TB lives on
100TB
File Level
Fixed Block
Variable Block
Whitespace
Reduction
11© Copyright 2010 EMC Corporation. All rights reserved.
Type of dataMore user created, unstructured, content*
= higher deduplication ratio
*Encrypted and compressed data not ideal
deduplication candidates
Factors Impacting Deduplication RatiosSmall variations can have big impact
Data change rateLess change = higher deduplication ratio
Retention policyLonger retention policy
= higher deduplication ratio
Full to incremental
backup ratioMore full backups = higher deduplication ratio
12© Copyright 2010 EMC Corporation. All rights reserved.
Real World ResultsAvamar daily full backups vs. traditional daily full backups
Data TypeAmount of Primary
Data Backed Up
Amount of Data
Moved Daily
Windows file systems 3,573 GB 6.1 GB
Mix of Windows, Linux, and UNIX file systems 5,097 GB 11.7 GB
Engineering files on NAS (NDMP backups) 3,265 GB 24.2 GB
Mix of 20% databases, 80% file systems (Windows and UNIX) 9,583 GB 80.0 GB
Mix of Linux file systems and databases 7,831 GB 104.2 GB
Source: EMC
While results will vary by data type and mix, Avamar can
dramatically improve backup performance and efficiency
13© Copyright 2010 EMC Corporation. All rights reserved.
VMware Guest BackupSmart not Hard - Avamar backup versus traditional backup
Traditional Avamar
CPU Usage
1:20 p.m. 1:30 p.m. 1:40 p.m.
Network Usage
1:20 p.m. 1:30 p.m. 1:40 p.m.
Disk Usage
1:20 p.m. 1:30 p.m. 1:40 p.m.
14© Copyright 2010 EMC Corporation. All rights reserved.
VMware vStorage APISmart not Hard - Avamar
Key Features:
• Integrated with vStorage
API
• Single-step file & image-
level backups & restores
• Option to leverage
change block feature –
greatly reduces backup
processing
• Restore to the original,
new, or configure a new
virtual machine
capability
• Round-robin VM backup
capability across
multiple proxies
vStorage API virtual proxy server with Avamar agent
Avamar client software runs on the proxy server
ResourcePool
VMware Virtualization Layer
x86 Architecture
Physical server
Virtual Machines
SANstorage
Avamarserver
Mount
= Avamar Software Agent
VMware Image Backup
15© Copyright 2010 EMC Corporation. All rights reserved.
Data Domain Boost Integration
• Deduplication distributed to
backup servers and
Microsoft application clients– Increases backup speed
– Reduces network traffic
• Clone-controlled replication
– Schedules replication
– Catalog awareness of
replicated copies
• Ease of use– Automated configuration
– Monitoring and reporting
NetWorker Data Domain
DD Boost DD Boost
N E T W O R K E R A N D D A T A D O M A I N
16© Copyright 2010 EMC Corporation. All rights reserved.
Capacity management Single SIS make difference
• restriction on multiple SIS – No storage node/media svr load balancing between SIS
– Management overhead ( multiple instance of appliance and configs
eg. Replication)
– Efficiency (more reserved storage required)
• performance consideration– wise use of SAN/LAN infrastructure with:
client side deduplication
Dedup Replication
– leveraqe 10GbE with OST/DD Boost
• SLA improvement – Backup windows and Recovery Time objective
– Reduce backup jobs load on production systems
17© Copyright 2010 EMC Corporation. All rights reserved.
Industry‟s Most Scalable Inline Deduplication Systems
DD140 DD610 DD630 DD670 DD860 DD890Global
Deduplication ArrayDD Archiver
Speed (DD
Boost)490 GB/hr 1.3 TB/hr 2.1 TB/hr 5.4 TB/hr 9.8 TB/hr 14.7 TB/hr 26.3 TB/hr 9.8 TB/hr
Speed (other) 450 GB/hr 675 GB/hr 1.1 TB/hr 3.6 TB/hr 5.1 TB/hr 8.1 TB/hr 10.7 TB/hr 4.3 TB/hr
Logical capacity 9–43 TB 40–195 TB 84–420 TB 0.6–2.7 PB 1.4–7.1 PB 2.9–14.2 PB 5.7–28.5 PB 5.7–28.5 PB
Raw capacity 1.5 TB Up to 6 TBUp to 12
TBUp to 76 TB
Up to 192
TB
Up to 384
TBUp to 768 TB
Up to 768
TB
Usable capacity 0.86 TB Up to 3.98 TBUp to 8.4
TBUp to 55.9 TB
Up to 142
TB
Up to 285
TBUp to 570 TB
Up to 570
TB
Sof tware options:
DD Boost, DD Virtual Tape Library, DD Replicator,
DD Retention Lock, and DD Encryption
DD140 Remote
Off ice Appliance
DD600
Appliance Series
DD Archiver
Global Deduplication
ArrayDD800
Appliance Series
18© Copyright 2010 EMC Corporation. All rights reserved.
Replicate Smarter with Existing Networks99% bandwidth efficiency
• Move data offsite over existing networks for fastest time-to-DR readiness
• Map method to application recovery requirements and DR policies
Deduplicated backup
WAN
Flexible replication
On Premise
Home
DB
Off Premise
Home
DB
19© Copyright 2010 EMC Corporation. All rights reserved.
Recover Reliably from DiskHighest levels of data integrity
• Backups are the data store of last resort
• Improve your recovery SLAs with the advantages of disk-based data
protection
Verification
All data is read and
verified after it is written
Home
DB
VM
Home
DB
VM
Self-healing
Continuous on-the-fly
error detection and
correction
20© Copyright 2010 EMC Corporation. All rights reserved.
NetWorker with Data Domain
• Use with existing disk-based
or virtual tape library
capabilities
• Use with DD Boost
– Improved performance
– Clone-controlled replication
– Automated configuration
– Monitoring and reporting
Primary data center Remote site
Replication
WAN
Data Domain
File systems and applications
NetWorkerNetWorkerNetWorkerNetWorker
N E T W O R K E R A N D D A T A D O M A I N
24© Copyright 2010 EMC Corporation. All rights reserved.
Backup/archiving as a service:
• Application/databases native tools via:
– NFS
– CIFS
– VTL
• Rman „as copy” clause usecase for cloning
production to Dev & QA
• archiving and backing up
– SourceOne
– etc.
Backup and archives share same data.
25© Copyright 2010 EMC Corporation. All rights reserved.
“EMC Data Domain is just disk to me. Changing RMAN scripts to go
straight to Data Domain disk was simple.”DBA Manager
Oracle RMAN to DiskNational supermarket chain testimonial
“We used to have to go through our backup team for recovery requests and 90%
of our actual restore time was spent waiting on tape and administration. With Data
Domain, I don't have to wait for someone else to satisfy a restore request or a tape
recall.” DBA Manager
26© Copyright 2010 EMC Corporation. All rights reserved.
RMAN> ALLOCATE CHANNEL CH1 DEVICE TYPE DISK FORMAT „/dd/backup/ora.weekly/%U‟;RMAN> ALLOCATE CHANNEL CH2 DEVICE TYPE DISK FORMAT „/dd/backup/ora.weekly/%U‟;RMAN> BACKUP AS COPY TAG „MAY9‟ DATABASE INCLUDE CURRENT CONTROLFILE;RMAN> BACKUP ARCHIVELOGS TAG „MAY9‟ ALL NOT BACKED UP DELETE ALL INPUT;
Target DB
1 TB
Deduplication applied to
fulls requiring much less
disk
Weekly: Full image backups
Full
500 GB500 GB
After: With deduplication
Weekly Full Backup – With Deduplication
27© Copyright 2010 EMC Corporation. All rights reserved.
Data Domain Archiver Cost-optimized long-term retention
• Data Domain system for backup and archive
– Active tier: short-term data protection; less than 90 days
– Archive tier: scalable long-term retention; multiple years
• High-throughput deduplication storage
– Up to 9.8 TB/hr
• Cost optimized for long-term retention
– Up to 570 TB usable, 28.5 PB logical capacity
– Low cost per gigabyte while maintaining high throughput
– Fault isolation of archive units for long-term recoverability
• Easily integrates with all leading backup and archive
applications
• Leverage existing Data Domain system advantages
– Supports DD Replicator and DD Retention Lock software options
– Data Domain Data Invulnerability Architecture to ensure data integrity
28© Copyright 2010 EMC Corporation. All rights reserved.
“Deduplication has become a must-have feature for vendors in the
backup/recovery market. The value of data reduction technologies,
such as deduplication, cannot be understated.
In May 2007, Gartner called deduplication a transformational
technology with the potential for significant cost savings and
expanded QoS capabilities (see "Data Deduplication Is Poised to
Transform Backup and Recovery"). We reiterate this assessment,
and we frequently advise clients to investigate deduplication
technologies for use in addressing current and anticipated storage
challenges.”New Storage Solutions Can Modernize Data Life Cycle Management
Sheila Childs and Dave Russell, Gartner
February 24, 2010
Data Deduplication a Must-have Feature
29© Copyright 2010 EMC Corporation. All rights reserved.
40%
27%
24%
15%
9%
22%
4%
8%
12%
15%
9%
7%
14%
15%
16%
14%
12%
15%
22%
25%
28%
31%
25%
16%
20%
26%
21%
25%
45%
41%
Wave 13
Wave 12
***Wave 11
**Wave 10
**Wave 9
*Wave 8
In use now
In pilot/evaluation
In near-term plan
In long-term plan
Not in plan
Source: TheInfoPro Wave 13 Storage Study (Q4 2009), January 2010. F1000 Sample: Wave 8, n=148; Wave 9, n=150; Wave 10, n=151; Wave 11,
n=127; Wave 12, n=147; Wave 13, n=183
*Technology was previously categorized as deduplication
**Technology was previously categorized as deduplication/capacity optimized storage/single backup instance store
***Technology was previously categorized as single backup instance store sof tware
“Heat Index” Rank: 1
Storage Networking Wave 13 Study
“Deduplication is
now in use by
40% of F1000,
with use having
accelerated
rapidly over the
last year.”
The Move to Deduplication Is On!
30© Copyright 2010 EMC Corporation. All rights reserved.
Storage Networking Technology In Use Expansion Index
Lead in Use Vendors – F1000
Methodology
The TIP In Use Expansion Index is designed to illustrate levels of spending change for technologies with a minimum of 10% in use. It takes into account the size of an organization‟s total storage budget and provides a weight for current spending patterns. The weights range from -1.0 for the > 50% Less response to 1.0 for > 50% More. Technologies with 0% (No Change) receive no weight. The final score is normalized on a scale from 0 to 100, with the top score going to those technologies that have the greatest current spending within the TIP research network of users. A “!” vendor has at least twice the number of responses as the closest competitor.
(Gauges Changes in Spending on Already-adopted Technology)
Q4 '09
Rank
Q2 '10
RankTechnology
Wave 13 Lead
in Use Vendor
Wave 14 Lead
in Use Vendor
Wave 13 2nd in
Use Vendor
Wave 14 2nd in
Use Vendor
2 1 Solid-state Disk Drives (SSD) EMC! EMC! HDS/IBM Oracle
1 2 8Gbps Fibre Channel Brocade Brocade QLogic Emulex
N/A 3 Multiprotocol Storage Systems (FC/NAS/IP/FCoE) N/A NetApp N/A EMC
9 4 Backup Data Reduction/Deduplication EMC! EMC! NetApp NetApp
3 5 Virtual Server Image Storage VMware VMware EMC EMC
9 6 Serial-attached SCSI Drives (SAS) HP HP EMC EMC
N/A 7 TCP/IP Offload Engine (TOE) N/A Intel N/A HP
6 8 10Gbps Ethernet for Storage Cisco! Cisco NetApp EMC
13 9 File Replication (Sync) NetApp NetApp EMC EMC
7 10 NPIV – Virtualized I/O Brocade/IBM Cisco HP IBM
18 11 Block Replication (Sync) EMC! EMC! IBM IBM
11 12Remote Block Mirroring and/or Wide-area Replication
(Async)EMC! EMC! NetApp NetApp
12 13Fixed Content and/or Content-addressed Storage
(CAS) ArraysEMC! EMC! IBM IBM/HP
15 14 Virtual Tape Libraries (VTL) for Open Systems EMC! EMC! IBM EMC!
13 15Remote File Mirroring and/or Wide-area Replication
(Async)NetApp NetApp/EMC! EMC IBM
N/A 16 IP SAN/iSCSI Storage Arrays N/A EMC N/A NetApp
4 17 Online Data Reduction/Deduplication NetApp NetApp! EMC EMC
4 N/A Fabric-based Intelligence Cisco N/A EMC N/A
8 N/A NAS Gateways EMC N/A NetApp N/A
16 N/A Wide-area File Services (WAFS) Cisco N/A Riverbed N/A
17 N/A IP SAN Storage Arrays EMC N/A NetApp N/A
19 N/A 4Gbps Fibre Channel QLogic N/A Brocade N/A
31© Copyright 2010 EMC Corporation. All rights reserved.
Top 8 EMC BRS Deduplication Use Cases
Use
Case
EMC BRS
Solution
Challenge Areas Impacted
Avamar Resource
ContentionVM Sprawl
Reduce backup windows by 10X. Image level backup/restore. 98% less data
moved across the network. Free client agents.
Avamar Performance
File Recovery
Avamar NDMP accelerator node deduplicates native NDMP backup stream. No
client agents needed.
Data Domain
Dump & Chase
Frequent Log Backups
Native database backup tools create database and trans log dumps direct to
Data Domain deduplication file system. Efficient replication. No client agents required. One step backup and recovery by DBAs.
Avamar Bandwidth
Limitations
98% less data moved across the network. Perfect for low speed links. Free
client agents.
Avamar Field Teams Data
Loss
98% less data moved across the network. Perfect for low speed links. IT or
user directed backups/restores designed for end user systems.
Data Domain Tape Vaulting
Backward Compatibility
Auto tier active backups to fault isolated archive tier up to 550TB of
deduplicated data. Eliminate tape for long term retention.
Data Domain High Tape
UtilizationDisaster Recovery
Native iSeries BRMS backup facility writes to emulated IBM TS3500 tape
library over fiber channel. Efficient replication. Fast backup and recovery.
Data Domain Batch Operations &
Backups
Native FICON connection to zSeries mainframe with BusTech writing to Data
Domain deduplication file system.
32© Copyright 2010 EMC Corporation. All rights reserved.
Deduplication Benefits Summary
• Shrink storage requirements
• Increase retention periods
• Shorten backup/recovery windows
• Improve bandwidth efficiency
• Simplify data management
• Lower costs
– Less storage, power, bandwidth
– Reduce/eliminate use of tape
33© Copyright 2010 EMC Corporation. All rights reserved.
Deduplication Significantly Improves Business Efficiencies
• Control data protection costs– Storage/data center efficiency
– Reduced effort required on backup
• Simplify data management– Improved data recovery SLAs
– Automated data replication for ensured
disaster recovery readiness
• Improve risk management– Pass disaster recovery audits
– Reduce data loss
– Future-proofing
34© Copyright 2010 EMC Corporation. All rights reserved.
Before Data Domain…
18 Cabinets of IBM Tape
35© Copyright 2010 EMC Corporation. All rights reserved.
After Data Domain…
1 DD690 and 2 Expansion Shelves