tivoli software © 2010 ibm corporation 1 using machine learning techniques to enhance the...
TRANSCRIPT
Tivoli Software
© 2010 IBM Corporation1
Using Machine Learning Techniques to Enhance The Performance of an Automatic Backup and
Recovery System Amir Ronen, Dan Pelleg, Machine Learning Group, HRL
Eran Raichstein (IBM Software Group)
Amir Ronen
Tivoli Software
© 2010 IBM Corporation2
Motivation
IBM’s Fastback Automatic backup and recovery system Incremental back up of disk volumes to repository Instant restore (IR): allows applications to start working
immediately after recovery Xpress mount: allows access to back up data without
recovering it (e.g. for taking tape dumps)
Goal Accelerate IR and mount via machine learning and algorithmic
techniques Minimum intervention in Fastback’s internals
Benefits: minimize bugs, easy upgrading, generality, …
Tivoli Software
© 2010 IBM Corporation3
Outline
The Fastback system Algorithm for automatic determination of read-ahead
– Basic observations– The algorithm– Experiments in the Fastback system
Prefetching – Theoretical model and observation– Basic prefetching algorithms– Frequent pattern based algorithms– Controlling and combining prefetch algorithms
Summary
Tivoli Software
© 2010 IBM Corporation4
1. Activate Instant Restore2. Read IOs from un-recovered areas trigger block fetch from the repository3. All other reads are performed as usual
Production server
New ProductionDisk
New Production server
Typical ProductionDisk
FastBack’s Instant Restore and Mount
Instant Restore allows users to start using applications on the same disk to which the volume is being restored, while the restore operation is still in process.
Xpress Restore Server
repository
From an architectural perspective, mount is somewhat similar
Tivoli Software
© 2010 IBM Corporation6
New Production server
Xpress Restore Server
repository
The problem A block is needed from repository
Suppose that we are allowed to bring additional subsequent blocks
How many to bring?
- too many may slow down the system (in particular if they will not be used)
- too few will cause high total latency
Tivoli Software
© 2010 IBM Corporation7
Simple cost model: T ~ T1 + nT2 +
T1 “fixed” latency
T2 time to bring one block
n number of blocks noise (assumed zero)
Key idea Suppose that we choose n such that T1 = nT2
The cost never more than doubles In many settings n can be large
The algorithm is 2 competitive
Tivoli Software
© 2010 IBM Corporation8
Problem 1 The latency T1 and the block cost T2 are not known May vary over time
Solution Hold a window of last k requests (e.g. 200) Use linear regression to estimate T1 and T2 Update can be done in O(1)
Latency ~ 6.5Block cost ~ 3
Tivoli Software
© 2010 IBM Corporation9
Problem 2 What if the n-values are similar so we will not be able to estimate?
Sampling ideas We only need a few samples If mean(n) is large we sample small values If mean(n) is small, we sample 2*mean(n) Low amortized cost
Tivoli Software
© 2010 IBM Corporation10
The Algorithm Hold a window of the last k requests At each step update the linear regression(Refresh from time to time)
If regression is possible:– Estimate T1, T2– Compute desired n value– If the system asked for less, recommend readahead
Otherwise– Sample as described
Additional Heuristics unreasonable values, smoothing,
mis-estimation…
Tivoli Software
© 2010 IBM Corporation11
Impact on Fastback
Added latency per each request
Outperformed the predetermined values
Speedup up to X4
mounting continuous and fragmented data
Tivoli Software
© 2010 IBM Corporation12
Comments & open issues
The algorithm may be applicable elsewhere
Extensions to more complicated cost models
Analyzing executions of parallel copies of the algorithm
Tivoli Software
© 2010 IBM Corporation13
Block Prediction and Prefetching for
Enhancing Instant Restore
Tivoli Software
© 2010 IBM Corporation14
New Production server
Xpress Restore Server
repository
Motivation
IR needs to fetch blocks from the repository according to its workload
Ideally, blocks will predicted and brought before they are needed
Comments The network is not preemptive so prefetching can also be harmful Typical workloads are parallel processes, each with some locality of reference
Tivoli Software
© 2010 IBM Corporation15
A model for the prefetch problem
Workload is an unknown sequence of events L1, … Ln. Each Lj is either:
An access to a block Bj A process event
System is composed of a CPU and network that can be ran in
parallel. At each step j the system can do one of the following
1. Process (Lj is a process event, cost = 1 unit)
2. Access its local memory (If Lj is an access event and Bj is already in the local memory, cost = 1 unit)
3. Fetch a block from the repository (this occupies the network for C time units, can be done in parallel to 1 or 2)
Tivoli Software
© 2010 IBM Corporation16
A model for the prefetch problem (cont.)
Slowdown Let L1, … Ln be a workload. The slowdown of the
system on L is the ratio between the total system time and the
time to perform the workload locally, i.e. Tsys / n.
Fetch 17 Fetch 18
ProcessCPU
Network
Access
Delta
B17 Process B18Workload Process …
Process
…
Slowdown is ~1, Without prefetching, slowdown is around 2
C = 2Access
Tivoli Software
© 2010 IBM Corporation17
Simple prefetch algorithms
Delta rule
Whenever Bj is accessed put Bj+1 in queue
Whenever network is idle, prefetch in LIFO order Very effective rule, simple to implement
No prefetch Can be shown as 2-competitive!
Order by frequency In train time, order blocks by their frequency
OPT Hypothetical optimal offline algorithm
Tivoli Software
© 2010 IBM Corporation18
Frequent pattern mining based algorithms
CMiner (Li et el. FAST 2004) Identifies reoccurring block sub-sequences in train time Problematic runtime and space complexity in our settings
B-tree
Hot item
A,E,L Z
Tivoli Software
© 2010 IBM Corporation19
Novel variants of CMiner
CMiner() Identifies generic frequent delta rules
Efficient runtime and space complexity
CMiner-OBF A two level variant of cminer
BBBB j
fetch
jjjj
321
Tivoli Software
© 2010 IBM Corporation20
Simulations
Setup Used traces from OLTP financial transactions and of an SQL
stress tool. Simulated the system under various parameters and
measured slowdown in various time points
Tivoli Software
© 2010 IBM Corporation21
Simulations (cont)
Simple delta rules were hard to bit
Cminer() often improves upon them but not always
Some schemas are harmful
Tivoli Software
© 2010 IBM Corporation22
Summary and open issues
Automatic read-ahead determination Highly effective Can be applicable elsewhere Calls for more generalized cost models
Block prediction and prefetch Simple delta rules seem hard to beat Potential for improvement Novel frequent pattern mining based algorithms. Might
be interesting in other context (e.g. caching)