automatic detection and repair of errors in data structures brian demsky martin rinard laboratory...
Post on 20-Dec-2015
220 views
TRANSCRIPT
Automatic Detection and Repair of Errors in Data Structures
Brian DemskyMartin Rinard
Laboratory for Computer ScienceMassachusetts Institute of Technology
Motivation
F = 20G = 5
F = 20G = 10
I = 5
J = 2
Broken Data Structure
Errors• Missing elements• Inappropriate
sharing• Dangling
references• Out of bounds
array indices• Inconsistent values
Goal
F = 10G = 5
F = 20G = 10
I = 3
J = 2
F = 2G = 1
F = 20G = 5
F = 20G = 10
I = 5
J = 2
Broken Data Structure Consistent Data Structure
RepairAlgorithm
Goal
F = 10G = 5
F = 20G = 10
I = 3
J = 2
F = 2G = 1
F = 20G = 5
F = 20G = 10
I = 5
J = 2
Broken Data Structure Consistent Data Structure
RepairAlgorithm
ConsistencyProperties
FromDeveloper
What Does Repair Algorithm Produce?
• Data structure that • Satisfies consistency properties, and• Heuristically close to broken data
structure• Not necessarily the same data structure
as (hypothetical) correct program would produce
• But enough to keep program going
Precursors
• Data structure repair has historically appeared in systems with extreme reliability goals• 5ESS switch – hand coded audit
routines• IBM MVS operating system – hand
coded failure recovery routines• Key component of these systems
Where Is This Likely To Be Useful?
• Not for transient errors in systems with slack – you can just reboot• Must be willing to lose volatile state• Must be willing to wait for system to
come back up• Permanent data structures
• File systems• Application files (Word, PowerPoint, …)
• Autonomous systems• Critical systems
Architecture
101110011000111101110101010111100111011010111000111101110
Broken Bits
BrokenAbstract Model
RepairedAbstract Model
101001111000111101110101101011100110101010111011001100010
Repaired Bits
Model Definition &Translation
Internal ConsistencyProperties
External ConsistencyProperties
Architecture RationaleWhy go through abstract model?
• Simple, uniform structure • Sets of objects• Relations between objects
• Simplifies both• Expression of consistency properties• Repair algorithm
• Enables system to support full range of efficient, heavily encoded data structures
File System Example
abst intro 0 2 1
Directory Entries Disk Blocks
struct Entry {byte name[Length];int firstBlock;
}struct Block {
int nextBlock;data byte[BlockSize];
}
struct Disk {Entry dir[NumEntries];Block block[NumBlocks];
}
Disk D;
-5 1 -1
Model Definition
• Sets of objectsset blocks of integer : partition used |
free;• Relations between objects – values of
object fields, referencing relationships between objectsrelation next : used, used;blocks
used freenext
Model TranslationBits translated to sets and relations in abstract
model using statements of the form:
Quantifiers, Condition Inclusion Constraint
for i in 0..NumEntries, 0 D.dir[i].firstBlock and D.dir[i].firstBlock < NumBlocks D.dir[i].firstBlock in used
for b in used, 0 D.block[b].nextBlock and D.block[b].nextBlock < NumBlocks b,D.block[b].nextBlock in next
for b,n in next, true n in usedfor b in 0..NumBlocks, not (b in used) b in free
Model in Example
1
0
2
next
next
used
free
3
blocks
abst intro 0 2 1
Directory Entries Disk Blocks
-5 1 -1
Internal Consistency PropertiesQuantifiers, Body
• Body is first-order property of basic propositions• Inequality constraints on values of numeric
fields • V.R = E, V.R < E, V.R E, V.R E, V.R > E
• Presence of required number of objects• size(S) = C, size(S) C, size(S) C
• Topology of region surrounding each object• size(V.R) = C, size(V.R) C, size(V.R) C • size(R.V) = C, size(R.V) C, size(R.V) C
• Inclusion constraints: V in S, V1 in V2.R, V1,V2 in R• Example: for b in used, size(next.b) 1
Internal Consistency ViolationsEvaluate consistency properties, find
violationsfor b in used, size(next.b) 1 is false for b
= 1
1
0
2
next
next
used
free
3
blocks
Repairing Violations of Internal Consistency Properties
• Violation provides binding for quantified variables
• Convert Body to disjunctive normal form(p1 … pn ) … (q1 … qm )
p1 … pn , q1 … qm are basic propositions
• Choose a conjunction to satisfy• Repair violated basic propositions in
conjunction
Repairing Violations of Basic Propositions
• Inequality constraints on values of numeric fields • V.R = E, V.R < E, V.R E, V.R E, V.R > E• Compute value of expression, assign field
• Presence of required number of objects• size(S) = C, size(S) C, size(S) C• Remove or insert objects from/to set
• Topology of region surrounding each object• size(V.R) = C, size(V.R) C, size(V.R) C • size(R.V) = C, size(R.V) C, size(R.V) C• Remove or insert pairs from/to relation
• Inclusion constraints: V in S, V1 in V2.R, V1,V2 in R• Add object or pair to set or relation
Repair in Examplefor b in used, size(next.b) 1 is false for b
= 1Must repair size(next.1) 1
Can remove either 0,1 or 2,1 from next
1
0
2
next
next
used
free
3
blocks
Repair in Examplefor b in used, size(next.b) 1 is false for b
= 1Must repair size(next.1) 1
Can remove either 0,1 or 2,1 from next
1
0
2
next
used
free
3
blocks
Acyclic Repair Dependences
• Questions• Isn’t it possible for the repair of one
constraint to invalidate another constraint?
• What about infinite repair loops?• What about unsatisfiable specifications?
• Answer• We require specifications to have no
cyclic repair dependences between constraints
• So all repair sequences terminate• Repair can fail only because of resource
limitations
External Consistency Constraints
Quantifiers, Condition Body• Body of form V = E, V.F = E, V.F[I] = E• Example
for b in free, true D.block[b].nextBlock = -2
for i,j in next, true D.block[i].nextBlock = j
for b in used, size(b.next) = 0 D.block[b].nextBlock = -1
• Repair simply performs assignments• Translates model repairs to bit repairs
abst intro 0 2 1
Directory Entries Disk Blocks
-5 1 -1
abst intro 0 2 1
Directory Entries Disk Blocks
-1 -1 -2
Repaired File System
Repair in Example
Inconsistent File System
What About Corrupted Pointers?• Sets may contain pointers to structs• System only allows valid structs in
model• struct must be completely in valid
memory• one struct may be nested inside
another struct (but must agree on memory format)
Valid Memory
Invalid Memory
Valid StructValid Structs
Invalid Struct
When to Test for Consistency and Repair
• Persistent data structures• Repair can be independent activity, or• Repair when data written out or read in
• Volatile data structures in running program• Under programmer control• Transaction-based approach
• Identify transaction start and end• Repair at start, end, or both
• Failure-based approach• Wait until program fails• Repair and restart from latest safe point
Experience• We acquired three benchmarks
• Simplified Linux file system• Freeciv interactive game• Microsoft Word files
• We developed specifications for all three • Less than a week of development time• Most of time spent figuring out Freeciv
• Each benchmark has• Workload• Fault insertion methodology
• Ran benchmarks with and without repair
intro 110 0 1011
directoryblock
inodebitmapblock
blockbitmapblock
inode inode…
inode block
disk blocks
Simplified Linux File System
Some Consistency Properties• inode bitmap consistent with inode
usage• block bitmap consistent with block
usage• directory entries refer to valid inodes • files contain valid blocks only• files do not share blocks
superblock
groupblock
Results
• Workload – write and verify several files • Fault insertion – crash file system
• Inode and block bitmap errors• Partially initialized directory and inode
entries• Without repair
• Incorrect file contents because of inode and disk block sharing
• With repair• Bitmaps repaired preventing illegal
sharing, correct file contents
PO MM
OO MP
PO MM
PP MP
loc: 3,0
loc: 2,3
Terrain Grid
City Structures
Freeciv
Consistency Properties• Tiles have valid terrain
values• Cities are not in the ocean• Each city has exactly one
reference from city location grid
• City locations are consistent in• City structures and• tile grid
O = OceanP = PlainM = Mountain
Results
• Workload – Freeciv software plays against itself
• Fault insertion – randomly corrupt terrain values
• Without repair – program fails (seg fault)• With repair
• Game runs just fine• But game plays out differently because
of the different terrain values
Microsoft Word Files• Files consist of a sequence of streams• Streams stored using FAT-based data
structure
-1 -1 -21
HeaderFAT
blockDisk Blocks
Consistency Properties
• The FAT blocks exist• FAT contains valid values only
• -1 – terminates FAT streams• -2 – indicates free blocks• Valid disk block index – next block in
stream• FAT streams properly terminated• Free blocks properly marked• Streams contain valid blocks only • Streams do not share blocks
Results
• Workload – several Microsoft Word files• Fault insertion – scramble FAT• Without repair
• If blocks containing the FAT were incorrectly marked as free, Word successfully loads file
• Otherwise, “The document name or path is not
valid”
• With repair• Word loads all files
Related Work
• Hand-coded repair• Lucent 5ESS switch• IBM MVS operating system
• Transactions• Identify actions that leave system
consistent• If action fails, roll back to consistent state
• Checkpoint and recovery• Reboot system from scratch• Logging for roll-forward
• Self-stabilizing algorithms