virtual tape libraries: the best of tape and disk backup · virtual tape libraries: ... ad hoc and...

8
Virtual Tape Libraries: The Best of Tape and Disk Backup By Jim Lee, Director of Marketing, Yosemite Technologies www.yosemitetech.com Tape backup has traditionally been the mainstay of enterprise data protection when long-term data protection is required. As disk technologies have improved and economies of scale have driven down prices, disk adoption rates have surged and it would appear disk is poised to eclipse tape as the dominant backup platform and relegate tape to a minor archival and disaster recovery role. Numerous surveys of the enterprise have indicated adoption of disk-based backup reached the ninetieth percentile by the end of 2005 and that disks will soon be the dominant backup medium for the enterprise. But the flexibility, plunging costs, and commoditisation of disk has quickly led to a proliferation of ad hoc and confusing disk backup implementations which fail to provide many of the core data protection benefits of tape and even introduce new problems in an organisation’s backup regime. This article quickly reviews the disk-versus-tape debate, examines simple disk backup’s pros and cons, and proposes the virtual tape library (VTL) using disk as the backup medium as the best replacement for traditional tape backup. Finally, it discusses why all VTLs are not created equal and what to look for in a VTL when considering one for enterprise tape backup replacement. Tape is Problematic as a Backup Medium Tape-based backup has been the dominant media and method for data protection since the main frame computing era and has had little challenge from competing technologies until recently. The problems with tape as a backup medium are well known. The most critical flaws exposed when relying on tape as a backup medium are: Medium or cartridge failure during backup or restore Unreported failures or incomplete backups

Upload: vuhanh

Post on 12-Apr-2018

226 views

Category:

Documents


1 download

TRANSCRIPT

Virtual Tape Libraries:

The Best of Tape and Disk Backup

By Jim Lee, Director of Marketing, Yosemite Technologies www.yosemitetech.com

Tape backup has traditionally been the mainstay of enterprise data protection when long-term data protection is required. As disk technologies have improved and economies of scale have driven down prices, disk adoption rates have surged and it would appear disk is poised to eclipse tape as the dominant backup platform and relegate tape to a minor archival and disaster recovery role. Numerous surveys of the enterprise have indicated adoption of disk-based backup reached the ninetieth percentile by the end of 2005 and that disks will soon be the dominant backup medium for the enterprise. But the flexibility, plunging costs, and commoditisation of disk has quickly led to a proliferation of ad hoc and confusing disk backup implementations which fail to provide many of the core data protection benefits of tape and even introduce new problems in an organisation’s backup regime. This article quickly reviews the disk-versus-tape debate, examines simple disk backup’s pros and cons, and proposes the virtual tape library (VTL) using disk as the backup medium as the best replacement for traditional tape backup. Finally, it discusses why all VTLs are not created equal and what to look for in a VTL when considering one for enterprise tape backup replacement.

Tape is Problematic as a Backup Medium Tape-based backup has been the dominant media and method for data protection since the main frame computing era and has had little challenge from competing technologies until recently. The problems with tape as a backup medium are well known. The most critical flaws exposed when relying on tape as a backup medium are:

• Medium or cartridge failure during backup or restore

• Unreported failures or incomplete backups

• Misplaced or mislabelled media

• Long backup windows and verifies

• Slow and serial access times In addition to being driven by the need for a better backup medium, the rapid adoption of disk backup as a complement to traditional tape has also been driven by the expansion of data protection and service level requirements for the modern enterprise. Tape is simply too slow and unreliable to serve as the primary backup medium for the ‘always-on’, 24x7 operational mode of most enterprises. Second, disk-based backup can take on many forms due to its superior flexibility, performance, and manageability characteristics thus substantially increasing the solution space for the data protection of more complex IT infrastructures. Lastly, the rapid evolution in disk technologies and economies of scale have allowed disk backup to sufficiently drop in cost to be regarded as superior to tape for most equivalent backup tasks. The relative strengths and weaknesses of disk- and tape-based backup schemes are shown in Table 1 below. Table 1 - Comparison of Disk versus Tape for Primary Data Protection

Feature Disk Tape

Reliability x

Performance x

Efficiency x

Flexibility x

Expense x

Portability x

Multi-level x

Scalability x

Durability x

Redundancy x

Inexpensive expansion x

Simple Disk Backup Falls Short on Promise The first application of disk for backup was the simplest possible--replace tape as a medium and bring the strengths of disk as a medium to bear on the problems of tape which weaken tape’s effectiveness in a data protection regime. The

‘Virtual Tape Libraries: The Best of Tape and Disk Backup’ By Yosemite Technologies, page 2 of 8

implementation is straight-forward; instead of writing data serially to a tape driver and out to tape device, write the data in tape format to disk folders and later write it to tape for longer term storage. As a straight tape replacement, backup-to-disk (B2D) at first appears to offer several immediate and apparent advantages over traditional tape:

• increased write and read performance, dramatically shortening backup windows as well as verify times, mean that backups can be done more often and can be verified within backup windows; moreover, restores can be accomplished much more rapidly

• RAID levels offering superior data redundancy and fault-tolerance with much higher mean-time between failure (MTBF) rates which reduces the risk of data loss

• simplified configuration and management utilising familiar disk administrative tools

Since most B2D solutions simply write the backup instances and metadata to disk-based folders, this makes disk-based backups subject to file system limitations. The reliance on the file system adds another layer to backup media management (even though in this case it is the more familiar disk management layer), which makes disk backups more complex than they really need to be. Rarely can capacity be added as inexpensively and ‘on the fly’ as it can with tape which simply requires more cartridges to be added. With IT storage budgets under continual pressure, the flexibility, plunging cost, and relative commoditisation of disk has led to a proliferation of ad hoc and confusing disk backup configurations as well as a rush to replace tape with disk without regard to the impact on data protection policies, backup integrity, or efficiency. In simple backup to disk configurations, traditional backup techniques such as generational backup sets and tape rotations have proved difficult to implement and a desire for continued access to their benefits has forced a difficult choice: either continue to use tape or lose its benefits in the data protection scheme. Thus in many cases, it has turned out that disk as a straight tape replacement has proved to be challenging if not problematic as integration, file size limits, format issues, and distribution of data across media increase the complexity and limitations of reliable and efficient B2D solutions. As a result, reliance on tape has been prolonged. There are some key attributes of tape backups that simple B2D does not provide as noted in the table above and, as a consequence, despite the many advantages of B2D, tape has remained an important component in the enterprise data protection regime. A summary of the pros and cons is given in Table 2 below. While it is evident B2D brings much of the promise of increased reliability, performance, and simplicity over tape, it does not sufficiently replace core tape

‘Virtual Tape Libraries: The Best of Tape and Disk Backup’ By Yosemite Technologies, page 3 of 8

features and there is ample room for improvement of this simple implementation of disk-based data protection. Table 2 - Pros and Cons of Disk-based (B2D) Backup

Pros Cons

Shorter backup window Separate backup catalogue

Faster restore Expensive media increments

Increased media reliability No generational media sets

Data redundancy (RAID) File system limitations on size

Flexible targets Limited portability of backup media

Random-access to data Entire volumes containing multiple backups are vulnerable to virus infection

Simple data copying

In sum, while disk as a medium has great promise, effective disk backup solutions must be built on leveraging disk’s advantages through new backup strategies and techniques to ensure new problems and weaknesses are not introduced and that most of the benefits of the technology disk is replacing are not lost.

‘Virtualising’ Tape As we have noted, simply replacing tape as the backup medium is far from optimal for three main reasons: it does not leverage all of disk’s qualities, it does not minimise the risks from disk’s weaknesses, nor does it replace several core benefits of tape as a medium such as generational backup sets and media rotations. As result, although disk is rapidly eclipsing tape in the majority of data protection applications, more advanced uses of disk hold much more promise than simple B2D. In addition to merely replacing tape as a medium, the increased flexibility, simpler management, superior reliability, lower latency, and higher performance characteristics of disk allow the integration of new data protection techniques such as near-line, on-line, snapshot, and continuous data protection as well as emulation of other media into the enterprise backup regime without compromising the level of data protection. These techniques take much greater advantage of the relative strengths of disk over tape and combine more easily with tape for those few characteristics where tape is still superior. One such promising technology which is quickly gaining acceptance is the virtual tape library (VTL). A VTL is essentially a disk-based file storage system which

‘Virtual Tape Libraries: The Best of Tape and Disk Backup’ By Yosemite Technologies, page 4 of 8

provides an interface which emulates a tape drive and makes it appear to a back application as a physical tape library. Typically composed of a platform, essentially an operating system and server, and an application which presents a tape device interface to an operating system’s tape drivers so that backup applications can write to virtual tapes just as they do to real physical tapes, VTLs combine many of the characteristics and advantages of both tape and disk listed in Table 1 above. Among them are speed, manageability, reliability, scalability, quick backup, verification, and restore, and generational or save-set and media rotation support. The typical VTL offers significantly reduced media management, easy setup, data/image mobility, and substantial performance gains. However VTLs, like any complex technology, are vulnerable to trade-offs made in their design. Most VTLs are designed by vendors other than backup application vendors and are loosely coupled with backup applications. It is this lack of integration that becomes problematic.

The VTL: Great Innovation, Poor Execution Conventional VTLs are third-party appliances which are loosely coupled to a backup application since they are accessed through tape drivers in the same fashion as physical tape devices. This loose coupling has a number of significant disadvantages: key among them are separate catalogues, high cost of ownership, and often a serious I/O bottleneck. The first critical drawback of an external VTL is the lack of integration with the backup applications: VTLs are often restricted to operating as a particular device type such as LTO and they must emulate physical tape operations in order to maintain their masquerade. In order to use disk which is written and read differently from the tape they are emulating, VTLs create an additional and transparent data path in the disk subsystem that is not visible from the backup application. Because of this additional data path, most external VTLs must maintain a separate catalogue recording the location of the data which is invisible to the backup application. This separate catalogue will then need to be searched during restore and verify operations before data can be located, staged, and found in the correct place by the backup catalogue. This compound catalogue is not integrated and cross indexed with the catalogue of the backup application in a typical VTL and therefore can slow down the restore process significantly. A second disadvantage of VTL appliances is that they are built on separate platforms and restrict choices in hardware and software options, thus increasing the cost of expansion and eliminating existing capacity. The table below shows the substantial increase in cost of ownership of an external VTL.

‘Virtual Tape Libraries: The Best of Tape and Disk Backup’ By Yosemite Technologies, page 5 of 8

The third significant disadvantage is a negative impact on backup and restore performance. Modern backup architectures for large data sets are typically three-tier utilising multiple output devices and parallel data streams. Typical VTL appliances are built on small server platforms and isolate the VTL function and I/O on one device even though the backup application and source data might be distributed over many servers. This reliance on a single device creates a sizeable bottleneck for these data streams during backup and restore operations. Figure 1 shows both the IO bottleneck created by isolating the entire VTL on one device as well as the reliance on proprietary disk which increase costs. For small and medium-sized enterprise with substantial data sets, this can significantly throttle down I/O traffic. Figure 1 - External VTL Appliance has IO bottlenecks and expensive proprietary disk

In summary, there are key drawbacks of external VTLs that cause them to fall short of delivering on the full promise of back up to disk as summarised below. The lack of backup application integration reduces performance on restores, hardware restrictions increase the total cost of ownership (TCO), and concentrating the implementation in one device on the edge of the backup architecture creates an I/O bottleneck. These drawbacks materially decrease the potential advantage this liaison of tape policies with disk-based implementation could provide. But there is an answer and it lies in eliminating the loose coupling of the VTL with the backup application. It is important to note that the limitations above, even the ones based on disk characteristics, can be designed out of a VTL implementation, but this requires more significant integration with the backup application and architecture than most third-party VTLs provide.

Summary of Drawbacks of External VTLs • Separate catalogues lengthen restore times since two catalogues must be

managed

• File system limitations limit backup choices and flexibility

‘Virtual Tape Libraries: The Best of Tape and Disk Backup’ By Yosemite Technologies, page 6 of 8

• Restrictions on storage to increase TCO and narrow choices

• Inability to use cost-effective JBOD instead of expensive bundled storage

• Existing storage capacity cannot be utilised

• All IO must go through the VTL hardware creating a significant potential bottleneck and reducing effectiveness of three-tier backup architectures

VTLs Done Right A VTL serves only one application—the backup application. So it is counter-productive to encapsulate the function in one device and place it at one point in the network ignoring principles of modern three-tier backup architecture. A more elegant design would involve embedding the VTL in the backup application. The advantages of such an architecture are numerous and they resolve the critical issues with external VTL implementations discussed previously. First, an embedded VTL shares the backup catalogue with the backup application, eliminating clumsy and time-consuming two-step restore operations. Second, by embedding the VTL in the backup application, the VTL has access to all resources and devices of the backup application and this allows the VTL to utilise multiple data streams and multiple devices. This provides the scalability promised by a large-scale three-tier backup architecture and does not force the backup data flow into a bottleneck. Third, embedding the VTL with the backup application means that heterogeneous disk from multiple platforms can be used to build the VTL, allowing its composition from existing unused capacity as well as cost-competitive disk. Figure 2 provides an overall picture of an embedded VTLs advantages. Figure 2 - Embedded VTL with balanced IO utilising heterogeneous disk and existing capacity

The cost savings with an embedded VTL are substantial as evidenced by the TCO comparison in Table 3 below.

‘Virtual Tape Libraries: The Best of Tape and Disk Backup’ By Yosemite Technologies, page 7 of 8

Table 3 - High-cost Disk Backup with VTL Appliances

VTL Appliances are expensive to purchase and expand VTL Appliance Embedded VTL

Initial Purchase (5TB VTL + Backup Application)

$53,000 $17,273

Relative cost comparison 3x

Expansion Cost per TB $4,800-13,000 $400-600

Relative cost comparison 15-20x -

What to look for in a VTL - single catalogue integrated with backup application for one-step restores

- hardware and platform independence allowing economical expansion

- heterogeneous storage support to allow integration of existing capacity

- flexible targets allowing other media to be easily integrated

- policy-based copy to other media or VTL to automate data protection options

- high-level integration at the backup application level for three-tier architecture support and performance

Summary In summary, disk-based backup has tremendous potential that goes far beyond simple replacement of tape as a medium. Virtual tape libraries (VTL) are potentially one of the most comprehensive and flexible applications of disk-based backup available because they can combine the advantages of disk technology with traditional tape policies, the flexibility of multiple media targets, and integration and utilisation of existing backup infrastructures. However, many third-party implementations are put on proprietary server platforms which isolate the VTL from the backup application, effectively limiting the potential benefits and gains from VTL technology. The penultimate solution is a VTL embedded in the backup application allowing the following benefits: 1) the multi-processing/multi-streaming of a modern three-tier backup infrastructure, 2) an integrated and uniform backup catalogue, 3) utilisation of existing disk capacity across heterogeneous systems, and 4) the ability to use general purpose disk instead of proprietary VTL disk. The embedded VTL provides the best of the benefits of both disk and tape media ensuring scalable, comprehensive, and reliable data protection.

‘Virtual Tape Libraries: The Best of Tape and Disk Backup’ By Yosemite Technologies, page 8 of 8