tivoli software © 2010 ibm corporation 1 using machine learning techniques to enhance the...

Tivoli Software

© 2010 IBM Corporation1

Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and

Recovery System Amir Ronen, Dan Pelleg, Machine Learning Group, HRL

Eran Raichstein (IBM Software Group)

Amir Ronen

Tivoli Software


Motivation

IBM’s Fastback Automatic backup and recovery system Incremental back up of disk volumes to repository Instant restore (IR): allows applications to start working

immediately after recovery Xpress mount: allows access to back up data without

recovering it (e.g. for taking tape dumps)

Goal Accelerate IR and mount via machine learning and algorithmic

techniques Minimum intervention in Fastback’s internals

Benefits: minimize bugs, easy upgrading, generality, …

Tivoli Software


Outline

The Fastback system Algorithm for automatic determination of read-ahead

– Basic observations– The algorithm– Experiments in the Fastback system

Prefetching – Theoretical model and observation– Basic prefetching algorithms– Frequent pattern based algorithms– Controlling and combining prefetch algorithms

Summary

Tivoli Software


1. Activate Instant Restore2. Read IOs from un-recovered areas trigger block fetch from the repository3. All other reads are performed as usual

Production server

New ProductionDisk

New Production server

Typical ProductionDisk

FastBack’s Instant Restore and Mount

Instant Restore allows users to start using applications on the same disk to which the volume is being restored, while the restore operation is still in process.

Xpress Restore Server

repository

From an architectural perspective, mount is somewhat similar

Tivoli Software


CNF: An Algorithm for Readahead

Amount Determination

Tivoli Software




repository

The problem A block is needed from repository

Suppose that we are allowed to bring additional subsequent blocks

How many to bring?

- too many may slow down the system (in particular if they will not be used)

- too few will cause high total latency

Tivoli Software


Simple cost model: T ~ T1 + nT2 +

T1 “fixed” latency

T2 time to bring one block

n number of blocks noise (assumed zero)

Key idea Suppose that we choose n such that T1 = nT2

The cost never more than doubles In many settings n can be large

The algorithm is 2 competitive

Tivoli Software


Problem 1 The latency T1 and the block cost T2 are not known May vary over time

Solution Hold a window of last k requests (e.g. 200) Use linear regression to estimate T1 and T2 Update can be done in O(1)

Latency ~ 6.5Block cost ~ 3

Tivoli Software


Problem 2 What if the n-values are similar so we will not be able to estimate?

Sampling ideas We only need a few samples If mean(n) is large we sample small values If mean(n) is small, we sample 2*mean(n) Low amortized cost

Tivoli Software


The Algorithm Hold a window of the last k requests At each step update the linear regression(Refresh from time to time)

If regression is possible:– Estimate T1, T2– Compute desired n value– If the system asked for less, recommend readahead

Otherwise– Sample as described

Additional Heuristics unreasonable values, smoothing,

mis-estimation…

Tivoli Software


Impact on Fastback

Added latency per each request

Outperformed the predetermined values

Speedup up to X4

mounting continuous and fragmented data

Tivoli Software


Comments & open issues

The algorithm may be applicable elsewhere

Extensions to more complicated cost models

Analyzing executions of parallel copies of the algorithm

Tivoli Software


Block Prediction and Prefetching for

Enhancing Instant Restore

Tivoli Software




repository

Motivation

IR needs to fetch blocks from the repository according to its workload

Ideally, blocks will predicted and brought before they are needed

Comments The network is not preemptive so prefetching can also be harmful Typical workloads are parallel processes, each with some locality of reference

Tivoli Software


A model for the prefetch problem

Workload is an unknown sequence of events L1, … Ln. Each Lj is either:

An access to a block Bj A process event

System is composed of a CPU and network that can be ran in

parallel. At each step j the system can do one of the following

1. Process (Lj is a process event, cost = 1 unit)

2. Access its local memory (If Lj is an access event and Bj is already in the local memory, cost = 1 unit)

3. Fetch a block from the repository (this occupies the network for C time units, can be done in parallel to 1 or 2)

Tivoli Software


A model for the prefetch problem (cont.)

Slowdown Let L1, … Ln be a workload. The slowdown of the

system on L is the ratio between the total system time and the

time to perform the workload locally, i.e. Tsys / n.

Fetch 17 Fetch 18

ProcessCPU

Network

Access

Delta

B17 Process B18Workload Process …

Process

…

Slowdown is ~1, Without prefetching, slowdown is around 2

C = 2Access

Tivoli Software


Simple prefetch algorithms

Delta rule

Whenever Bj is accessed put Bj+1 in queue

Whenever network is idle, prefetch in LIFO order Very effective rule, simple to implement

No prefetch Can be shown as 2-competitive!

Order by frequency In train time, order blocks by their frequency

OPT Hypothetical optimal offline algorithm

Tivoli Software


Frequent pattern mining based algorithms

CMiner (Li et el. FAST 2004) Identifies reoccurring block sub-sequences in train time Problematic runtime and space complexity in our settings

B-tree

Hot item

A,E,L Z

Tivoli Software


Novel variants of CMiner

CMiner() Identifies generic frequent delta rules

Efficient runtime and space complexity

CMiner-OBF A two level variant of cminer

BBBB j

fetch

jjjj

321

Tivoli Software


Simulations

Setup Used traces from OLTP financial transactions and of an SQL

stress tool. Simulated the system under various parameters and

measured slowdown in various time points

Tivoli Software


Simulations (cont)

Simple delta rules were hard to bit

Cminer() often improves upon them but not always

Some schemas are harmful

Tivoli Software


Summary and open issues

Automatic read-ahead determination Highly effective Can be applicable elsewhere Calls for more generalized cost models

Block prediction and prefetch Simple delta rules seem hard to beat Potential for improvement Novel frequent pattern mining based algorithms. Might

be interesting in other context (e.g. caching)

tivoli software © 2010 ibm corporation 1 using machine learning techniques to enhance the...

Documents