netapp deduplication concepts

38
NetApp Deduplication Deduplication refers to the elimination of redundant data in the storage. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. However, indexing of all data is still retained should that data ever be required. De-duplication is able to reduce the required storage capacity since only the unique data is stored.

Upload: saroj-sahu

Post on 10-Jan-2017

104 views

Category:

Internet


1 download

TRANSCRIPT

Page 1: Netapp Deduplication concepts

NetApp Deduplication

Deduplication refers to the elimination of redundant data in the storage. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. However, indexing of all data is still retained should that data ever be required. De-duplication is able to reduce the required storage capacity since only the unique data is stored. 

Page 2: Netapp Deduplication concepts
Page 3: Netapp Deduplication concepts
Page 4: Netapp Deduplication concepts

NetApp deduplication provides block-level deduplication within the entire flexible volume. Essentially, deduplication removes duplicate blocks, storing only unique blocks in the flexible volume, and it creates a small amount of additional metadata in the process

Notable features of deduplication include 1. It works with a high degree of granularity: that is, at the 4KB block level2. It operates on the active file system of the flexible volume3. It is a background process that can be configured to run automatically, be scheduled, or run manually through the command line interface (CLI), NetApp Systems Manager4. It is enabled and managed by using a simple CLI or GUI such as Systems Manager

Page 5: Netapp Deduplication concepts

HOW DEDU WORKS

The core enabling technology of deduplication is fingerprints. These are unique digital signatures for every 4KB data block in the flexible volume.

When deduplication runs for the first time on a flexible volume with existing data, it scans the blocks in the flexible volume and creates a fingerprint database, which contains a sorted list of all fingerprints for used blocks in the flexible volume. After the fingerprint file is created, fingerprints are checked for duplicates and if found, first a byte-by-byte comparison of the blocks is done to make sure that the blocks are indeed identical. If they are found to be identical, the block’s pointer is updated to the already existing data block and the duplicate data block is released and inode is updated.

Page 6: Netapp Deduplication concepts
Page 7: Netapp Deduplication concepts

HOW DEDU WORKS

when you 'sis' a volume, the behavior of that volume changes and the changes takes place in two phases-:

PHASE-1-: SIS enabled: Pre-Process: Before the block is written to the array

collecting Fingerprint

Note-This is for the new blocks, for the existing data blocks that were written before enabling SIS, we need to run the scan on the existing data and pull those fingerprints into the catalogue.

Page 8: Netapp Deduplication concepts
Page 9: Netapp Deduplication concepts

Phase-2 SIS start :Post process -After the block is written to the array

sorting, comparing and deduping.

Phase-1The moment the SIS is enabled every time SIS notices a block write request coming in the SIS process makes a call to Dataontap to get a copy of the fingerprint for that block so that it can store this fingerprint in its catalogue file.Note- This request interruptus the write string and results in a 7% performance penalty for all writes into any volume with SIS enabled.

Page 10: Netapp Deduplication concepts

Phase-2Now at some point you want to dedupe the volume using ‘sis start’ command manually or automatic.SIS goes through the process of comparing fingerprints from the fingerprint database catalogue file, validating data and deduping blocks that pass the validation phase.

Page 11: Netapp Deduplication concepts
Page 12: Netapp Deduplication concepts

Important Note

Nothing about the basic data structure of the WAFL file system has changed except we are traversing a different path in the file structure to get to your desired data block. That so why NetApp dedupe usually has no perceivable impact on read performance. All we have done is redirect some block pointers. Accessing your data might go a little faster, a little slower or more likely not change at all. It all depends on the pattern of the file system data structure and the pattern of request coming from the application.

Page 13: Netapp Deduplication concepts

What is a Fingerprint?

Fingerprint is a small digital representation of a larger data object.basically it is a checksum character generated by WAFL for each BLOCK for the purpose of consistency checking.

Is fingerprint generated by SIS?No, Each time a WAFL block is created a checksum character is generated for the purpose of consistency checking. NetApp deduplication (SIS) simply borrows a copy of this checksum and stores it in a catalogue as fingerprint.

Page 14: Netapp Deduplication concepts

What happens during post process deduplication? The fingerprint catalog is sorted and searched for

identical fingerprints. When a fingerprint match is made the associated data

blocks are retrieved and scanned byte by byte. Assuming successful validation the inode pointer

metadata of the duplicate block is redirected to the original block.

The duplicate block is marked as “Free” and returned to the system eligible for re-use.

Page 15: Netapp Deduplication concepts

Volume or data constituent & Aggregate deduplication overhead

Each volume with deduplication enabled, up to 4% of the physical amount of data written to that volume is required in order to store volume deduplication metadata

&

Each aggregate that contains any volumes with deduplication enabled, up to 3% of the physical amount of data contained in all of those volumes with deduplication enabled within the aggregate is required in order to store the aggregate deduplication metadata.

Page 16: Netapp Deduplication concepts

Thin and Thick Provisioning

Page 17: Netapp Deduplication concepts

Thin Provisioning

Definition-:A thin-provisioned volume is a volume for which storage is not set aside up-front. Instead, the storage for the volume is allocated as it is needed.The storage architecture uses aggregates to virtualize the physical storage into pools for logical allocation. The volumes and LUNs see the logical space, and the aggregate controls the physical space. This architecture provides the flexibility to create multiple volumes and LUNs that can exceed the physical space available in the aggregate. All volumes and LUNs in the aggregate will use the available storage within the aggregate as a shared storage pool. This will allow them to efficiently allocate the space available in the aggregate as data is written to it, rather than preallocating (reserving) the space called Thin Provisioning.

Page 18: Netapp Deduplication concepts
Page 19: Netapp Deduplication concepts

Thick Provisioning

Definition-: In virtual storage, thick provisioning is a type of storage allocation in which the amount of storage capacity on a volume is pre-allocated on physical storage (aggregate) at the time the volume is created.

Page 20: Netapp Deduplication concepts

Multi-Tenancy; What is it?

Page 21: Netapp Deduplication concepts

Secure Multi-Tenancy – Definition

Supporting multiple “tenants” (users, customers, etc.) from single shared infrastructure while keeping all data isolated and secure

Customers concerned with security and privacy require secure multi-tenancy

– Government agencies – Financial companies – Service Providers – Etc.

Page 22: Netapp Deduplication concepts

Multi-Tenancy and Cloud Infrastructure

Page 23: Netapp Deduplication concepts

Secure Multi-tenancy for virtualized environments

Page 24: Netapp Deduplication concepts

Secure Multi-tenancy for virtualized environments

SolutionThe only validated solution to support

end to end multitenancy across application and data

Data is securely isolated from virtual server, network, to virtual storage

Page 25: Netapp Deduplication concepts

Introducing MultiStore

Page 26: Netapp Deduplication concepts

Multistore and Vfiler

A logical partition of N/W and storage resource in Data ONTAP called multistore and it provides a secure storage consolidation solution.

When enabled, the Multistore license creates a logical unit called vFiler0 which contains all of the storage and network resources of the physical FAS unit.  Additional vFilers can then be created with storage and network resources assigned specifically to them.

Page 27: Netapp Deduplication concepts

What is Vfiler ?

A lightweight Instance of Data ONTAP Multi protocol server and all the system resource are shared b/w Vfiler units.

Storage units in the vfilers are Flexvols and Qtrees

Network Units are IP Address ,VLAN,VIFs,aliases and Ipspaces

Vfiler units are not hypervisors –vfiler resource cannot be accessed and discovered by any other vfiler units

Page 28: Netapp Deduplication concepts

Multi store configuration:

Up to 65 secure partitions (vFiler units) on a single storage system (64+vfiler0)

IP Storage based (NFS,CIFS & iSCSI servers) Additional storage and n/w resource can be moved, added

or deleted NFS, CIFS, iSCSI, HTTP, NDMP, FTP, SSH and SFTP

protocols are supported -Protocols can be enabled / disabled per vFiler -Destroying a vFiler does not destroy data

Page 29: Netapp Deduplication concepts

Multistore-One Physical System, Multiple Virtual Storage Partitions

Page 30: Netapp Deduplication concepts

What Makes MultiStore Secure?

MultiStore provides multiple layers of security – IPspaces – Administrative separation – Protocol separation – Storage separation An IPspace has a dedicated routing table

Each physical interface (Ethernet port) or logical interface (VLAN) is bound to a single Ipspace

Page 31: Netapp Deduplication concepts

What Makes MultiStore Secure?

A single IPspace may have multiple physical & logical interfaces bound to it

Each customer has a unique Ipspace

Use of VLANs or VIFs is a best practice with Ipspaces

Page 32: Netapp Deduplication concepts
Page 33: Netapp Deduplication concepts

File Services Consolidation

Page 34: Netapp Deduplication concepts

Application Hosting

Page 35: Netapp Deduplication concepts

Always-On Data Mobility

Page 36: Netapp Deduplication concepts

Always-On Data Mobility

No planned downtime for-:– Storage capacity expansion – Scheduled maintenance outages –– Software Upgrades

Page 37: Netapp Deduplication concepts

Adding Mobility to Multi-Tenancy

Page 38: Netapp Deduplication concepts

Automated Disaster Recovery DR Site