data structure repair brian demsky computer science and artificial intelligence laboratory...
Post on 21-Dec-2015
226 views
TRANSCRIPT
Data Structure Repair
Brian DemskyComputer Science and
Artificial Intelligence LaboratoryMassachusetts Institute of Technology
Motivation
F = 20G = 5
F = 20G = 10
I = 5
J = 2
Broken Data Structure
Errors• Missing elements• Inappropriate
sharing• Dangling
references• Out of bounds
array indices• Inconsistent values
F = 10G = 5
F = 2G = 1
I = 3
J = 2
F = 20G = 10
F = 20G = 5
F = 20G = 10
I = 5
J = 2
Broken Data Structure Consistent Data Structure
RepairAlgorithm
Goal
F = 10G = 5
F = 2G = 1
I = 3
J = 2
F = 20G = 10
I = 5
J = 2
Broken Data Structure Consistent Data Structure
RepairAlgorithm
ConsistencyProperties
FromDeveloper
F = 20G = 5
F = 20G = 10
Goal
What Does Repair Algorithm Produce?
• Data structure that • Satisfies consistency properties, and• Heuristically close to broken data
structure• Not necessarily the same data structure
as (hypothetical) correct program would produce
• But enough to keep program operating successfully
Where Is This Likely To Be Useful?
• Less useful when acceptable to reboot• Must be OK to lose volatile state• Must be OK to wait for reboot• Cause of error must go away after reboot
• Persistent data structures (file systems, application files)• Autonomous and/or safety critical systems
• Monitor/control unstable physical phenomena
• Largely independent subcomputations• Moving time window
Basic Approach
101110010111010101110110101110110
000110010111010101110110101110110
Broken Bits
Repaired Bits
Broken Abstract Model
RepairedAbstract ModelAbstract
Repair
AutomaticallyGeneratedConcrete
Repair
Model Translation
Developer and System Responsibilities
101110010111010101110110101110110
000110010111010101110110101110110
Broken Bits
Repaired Bits
Broken Abstract Model
RepairedAbstract ModelAbstract
Repair
AutomaticallyGeneratedConcrete
Repair
Model Translation1
2ConsistencyConstraints
4
3Model Definition
Rules
5
ConsistencyCheck
6
Architecture RationaleWhy use the abstract model?
• Model construction separates objects into sets• Reachability properties• Field values
• Different constraints for objects in different sets• Appropriate division of complexity
• Data structure representation complexity encapsulated in model definition rules
• Consistency property complexity encapsulated in (clean, uniform) model constraints
Talk Outline
• File System Example• Model Definition Rules• Consistency Constraints• Abstract Repairs• Concrete Repairs
• Benchmarks• Specification Inference• Related Work• Future Directions• Conclusion
File System Example
struct disk { int blockbitmap; entry dir[numentries]; block
block[numblocks];} struct entry {
byte name[Length];int firstblock;
}
struct block {int nextblock;byte data[blocksize];
}struct blockbitmap subtype block { int nextblock; bit bitmap[numblocks];}
intro -5 2 -1
Directory Entries Disk Blocks
-1 3 -1
File System Model
• Sets of objectsset Block of block : Used | Freeset Used of block : Bitmap
• Relations between objects relation Next : Used, Used relation BlockStatus : Block, boolean
Block
Used FreeNext
Bitmap
boolean
BlockStatus
Model TranslationBits translated to sets and relations in abstract
model using statements of the form:Quantifiers, Condition => Inclusion Constraint
i [0..numentries-1], 0 d.dir[i].firstblock d.block[d.dir[i].firstblock] Used
b Used, 0 b.nextblock b,d.block[b.nextblock] Next
b Used, 0 b.nextblock d.block[b.nextblock] Used
b in [0..numblocks-1], d.block[b] Used d.block[b] Free
true d.block[d.blockbitmap] Bitmapj [0..numblocks-1], b Bitmap, true =>
<d.block[j],b.bitmap[j]> BlockStatus
Model for File System Example
intro -5 2 -1
Directory Entries Disk Blocks
-1 3 -1
1
2Used
Free0
Blocks
Bitmap
3Nex
t
Developer and System Responsibilities
101110010111010101110110101110110
000110010111010101110110101110110
Broken Bits
Repaired Bits
Broken Abstract Model
RepairedAbstract ModelAbstract
Repair
AutomaticallyGeneratedConcrete
Repair
Model Translation1
2ConsistencyConstraints
4
3Model Definition
Rules
5
ConsistencyCheck
6
Consistency Constraints in Example
|Bitmap|=1u Used, u.BlockStatus=truef Free, f.BlockStatus=falseb Used, |Next.b| 1
1
2Used
Free0
Blocks
Bitmap
3Nex
t
Detecting InconsistenciesEvaluate consistency properties, find
violations|Bitmap|=1 is violated - Bitmap set is empty
1
2Used
Free0
Blocks
Bitmap
3Nex
t
Developer and System Responsibilities
101110010111010101110110101110110
000110010111010101110110101110110
Broken Bits
Repaired Bits
Broken Abstract Model
RepairedAbstract ModelAbstract
Repair
AutomaticallyGeneratedConcrete
Repair
Model Translation1
2ConsistencyConstraints
4
3Model Definition
Rules
5
ConsistencyCheck
6
Repairing Violations of Model Consistency Properties
• Violation provides binding for quantified variables
• Convert Body to disjunctive normal form(p1 … pn ) … (q1 … qm )
p1 … pn , q1 … qm are basic propositions
• Choose a conjunction to satisfy• Repair violated basic propositions in
conjunction
Repairing Violations of Basic Propositions
• Inequality constraints on values of numeric fields • V.R = E, V.R < E, V.R E, V.R E, V.R > E• Compute value of expression, assign relation
• Presence of required number of objects• |S| = C, |S| C, |S| C• Remove or insert objects from/to set
• Topology of region surrounding each object• |V.R| = C, |V.R| C, |V.R| C • |R.V| = C, |R.V| C, |R.V| C• Remove or insert tuples from/to relation
• Inclusion constraints: V in S, V1 in V2.R, V1,V2 in R• Remove or add the object or tuple from/to set
or relation
Repairing InconsistenciesRepair the violation of |Bitmap|=1 by adding a
block to the Bitmap set
1
2Used
Free0
Blocks
Bitmap
3Nex
t
Developer and System Responsibilities
101110010111010101110110101110110
000110010111010101110110101110110
Broken Bits
Repaired Bits
Broken Abstract Model
RepairedAbstract ModelAbstract
Repair
AutomaticallyGeneratedConcrete
Repair
Model Translation1
2ConsistencyConstraints
4
3Model Definition
Rules
5
ConsistencyCheck
6
Goal-Directed Reasoning Translates Abstract Repairs Into Concrete
Repairs• Abstract repairs add or remove objects (or
tuples) to sets (or relations)• Goal: find concrete data structure updates
with same effect1) Find model definition rules that construct
the relevant set or relation2) Basic strategy:
For removals, appropriately falsify the guards of all these model definition rules.For additions, appropriately satisfy the guard of one of these model definition rules.
Goal-Directed Reasoning in Example
• Abstract Repair: add block 0 to the Bitmap set
Goal-Directed Reasoning in Example
• Abstract Repair: add block 0 to the Bitmap set
• Model Definition Rules:i [0..numentries-1], 0 d.dir[i].firstblock
d.block[d.dir[i].firstblock] Usedb Used, 0 b.nextblock
b,d.block[b.nextblock] Nextb Used, 0 b.nextblock
d.block[b.nextblock] Used b in [0..numblocks-1], d.block[b] Used
d.block[b] Freetrue d.block[d.blockbitmap] Bitmapj [0..numblocks-1], b Bitmap, true =>
<d.block[j],b.bitmap[j]> BlockStatus
Goal-Directed Reasoning in Example
• Abstract Repair: add block 0 to the Bitmap set
• Model Definition Rules:i [0..numentries-1], 0 d.dir[i].firstblock
d.block[d.dir[i].firstblock] Usedb Used, 0 b.nextblock
b,d.block[b.nextblock] Nextb Used, 0 b.nextblock
d.block[b.nextblock] Used b in [0..numblocks-1], d.block[b] Used
d.block[b] Freetrue d.block[d.blockbitmap] Bitmapj [0..numblocks-1], b Bitmap, true =>
<d.block[j],b.bitmap[j]> BlockStatus
Goal-Directed Reasoning in Example
• Abstract Repair: add block 0 to the Bitmap set
• Relevant Model Definition Rule:true d.block[d.blockbitmap] Bitmap
• d.block[d.blockbitmap]=block 0
Goal-Directed Reasoning in Example
• Abstract Repair: add block 0 to the Bitmap set
• Relevant Model Definition Rule:true d.block[d.blockbitmap] Bitmap
• d.block[d.blockbitmap]=block 0• Data Structure Update:
d.blockbitmap = index of block 0 in d.block array
Repair in Example
Original File System
Updated File System
intro -5 2 -1
Directory Entries Disk Blocks
-1 3 -1
intro 0 2 -1
Directory Entries Disk Blocks
-1 3 -1
blockbitma
p
Reasoning at Compile Time• Compile specifications into repair algorithms• Goal-directed reasoning takes place at compile
time• Consider possibility that |Bitmap| = 0• Abstract repair
• Choose a block in Free set • Add block to Bitmap set
• Concrete repair• Find relevant model definition rule:
true d.block[d.blockbitmap] Bitmap• Goal-directed reasoning finds following update:
d.blockbitmap = index of block in d.block array • Check that block is an element of d.block array:
b in [0..numblocks-1], d.block[b] Used d.block[b] Free
Multiple Repairs
• Some broken data structures may require multiple repairs
• Reconstruct model• Reevaluate consistency constraints• Perform any required additional repairs
Architecture
101110010111010101110110101110110
010110010111010101110110101110110
000110010111010101110110101110110
Broken Bits
Repaired Bits
Broken Abstract Model
RepairedAbstract Model
AbstractRepair
AutomaticallyGeneratedConcrete
Repair
. . . .
. . . .
Model Translation
Model Recomputation
BlockStatus
1
Used
Free
Blocks
Bitmap
Next
0
true
2 3false
Model Recomputation
Re-evaluate constraints, find violations of u Used, u.BlockStatus=true and f Free, f.BlockStatus=false
BlockStatus
1
Used
Free
Blocks
Bitmap
Next
0
true
2 3false
Model Recomputation
Repair violations of u Used, u.BlockStatus=true and f Free, f.BlockStatus=falseby modifying the BlockStatus relation
BlockStatus
1
Used
Free
Blocks
Bitmap
Next
0
true
2 3false
Repaired File System
blockbitma
p
Repaired File System
intro 1011 0 2 -1
Directory Entries Disk Blocks
-1 3 -1
Acyclic Repair Dependences
• Questions• Isn’t it possible for the repair of one
constraint to invalidate another constraint?
• What about infinite repair loops?• What about unsatisfiable specifications?
• Answer• We require specifications to have no
cyclic repair dependences between constraints
• So all repair sequences terminate• Repair can fail only because of resource
limitations
Repair Dependence Graph
2. Add block to Bitmap
4.Satisfy Rule 6 (BlockStatus)
6. Replace <f,true> with<f,false> in BlockStatus
1. |Bitmap|=1
5. f.BlockStatus=false
3. d.blockbitmap=indexof(bfree)
7. b.bitmap[j]=falsefor j=indexof(f)
8. Remove <f,true> from BlockStatus by
removing Bitmap
Repair Dependence Graph
2. Add block to Bitmap
4.Satisfy Rule 6 (BlockStatus)
6. Replace <f,true> with<f,false> in BlockStatus
1. |Bitmap|=1
5. f.BlockStatus=false
3. d.blockbitmap=indexof(bfree)
7. b.bitmap[j]=falsefor j=indexof(f)
8. Remove <f,true> from BlockStatus by
removing Bitmap
Repair Dependence Graph
2. Add block to Bitmap
4.Satisfy Rule 6 (BlockStatus)
6. Replace <f,true> with<f,false> in BlockStatus
1. |Bitmap|=1
5. f.BlockStatus=false
3. d.blockbitmap=indexof(bfree)
7. b.bitmap[j]=falsefor j=indexof(f)
When to Test for Consistency and Repair
• Persistent data structures• Repair can be independent activity, or• Repair when data written out or read in
• Volatile data structures in running program• Under programmer control• Transaction-based approach
• Identify transaction start and end• Repair at start, end, or both
• Failure-based approach• Wait until program fails• Repair and restart from latest safe point
Experience• We acquired five benchmarks (written in C/C++)
• AbiWord• x86 emulator• CTAS (air-traffic control tool)• Simplified Linux file system• Freeciv interactive game
• We developed specifications for all five • Little development time (days, not weeks)• Most of time spent figuring out Freeciv and
CTAS • Each benchmark has
• Workload• Bug or fault insertion methodology
• Ran benchmarks with and without repair
AbiWord
• Open-source word processing program• Approximately 360,000 lines of C++
code• Abiword represents documents using a
Piece table• Consistency properties:
• Piece table has a section fragment• Piece table has a paragraph fragment• Doubly-linked list of fragments is well
formed
AbiWord Screen Shot
Results
• Workload – import (valid) Microsoft Word document that crashes AbiWord
• Bug that creates inconsistent documents with text before the section fragment
• Without repair• AbiWord crashes when loading the
document• With repair
• AbiWord is able to open and successfully process the document
Parallel x86 emulator
• Parallel x86 emulator for the RAW machine• Multi-tile architecture• Emulator runs x86 binaries on RAW
• Contains L2 cache of translated x86 assembly instructions
• Maintains a constant L2 cache size• Consistency property:
• Computed size of the L2 cache is consistent with its actual size
Results
• Workload – gzip benchmark on x86 emulator
• Bug that (sometimes) adds the size of a cache item twice when it is inserted
• Without repair• Actual cache size goes to zero• x86 emulator crashes
• With repair• Actual cache size is the same as
computed size• Program runs correctly
CTAS
• Set of air-traffic control tools• Traffic management• Arrival planning• Flow visualization• Shortcut planning
• Deployed in centers around country (Dallas/Ft. Worth, Los Angeles, Denver, Miami, Minneapolis/St. Paul, Atlanta, Oakland)
• Approximately 1 million lines of C/C++ code
CTAS Screen Shot
Results
• Workload – recorded radar feed from DFW• Fault insertion
• Simulate error in flight plan processing• Bad airport index in flight plan data
structure • Without repair
• System crashes – segmentation fault• With repair
• Aircraft has different origin or destination• System continues to execute• Anomaly eventually flushed from system
Aspects of CTAS
• Lots of independent subcomputations• System processes hundreds of aircraft –
problem with one should not affect others• Multipurpose system
(visualization, arrival planning, shortcuts, …) – problem in one purpose should not affect others
• Sliding time window: anomalies eventually flushed
• Rebooting ineffective – system will crash again as soon as it sees the problematic flight plan
intro 110 0 1011
directoryblock
inodebitmapblock
blockbitmapblock
inode inode…
inode block
disk blocks
Simplified Linux File System
Some Consistency Properties• inode bitmap consistent with inode
usage• block bitmap consistent with block
usage• directory entries refer to valid inodes • files contain valid blocks only• files do not share blocks
superblock
groupblock
Results
• Workload – write and verify several files • Simulated power failure
• Inode and block bitmap errors• Partially initialized directory and inode
entries• Without repair
• Incorrect file contents because of inode and disk block sharing
• With repair• Bitmaps repaired preventing illegal
sharing, correct file contents
PO MM
OO MP
PO MM
PP MP
Terrain Grid
City Structures
Freeciv
Consistency Properties• Tiles have valid terrain
values• Cities are not in the ocean• Each city has exactly one
reference from the grid
O = OceanP = PlainM = Mountain
Freeciv Screen Shot
Results
• Workload – Freeciv software plays against itself
• Fault insertion – randomly corrupt terrain values
• Without repair – program crashes (seg fault)
• With repair• Game runs just fine• But game plays out differently because
of the different terrain values
Experience Developing Specifications
• Specifications small compared to system size
• Specifications straightforward to develop once you understand consistency properties
• Potential to omit properties• Overhead of understanding data
structures
Specification Inference
• Automatically infer specifications using the dynamic invariant detection tool, Daikon
• Developer simply reviews generated specification
• Successfully inferred specifications for two of our benchmarks• CTAS• Freeciv
CTAS Specification
• Inferred specification contained• All constraints in the hand-coded
specification• Additional constraints on the arrival and
departure runways• Different abstractions in the manually
developed and inferred specifications
Freeciv Specification
• Inferred specification is missing properties about city placement (Daikon limitation)
• Inferred specification contains previously overlooked properties about • The continents field of a tile • The initial position of the players
• Similar abstractions in manually-developed and inferred specifications
Related Work
• Hand-coded repair• Lucent 5ESS switch• IBM MVS operating system
• Integrity Maintenance in Databases • Deriving Production Rules for Constraint
Maintenance (Ceri, Widom)• Automatic Generation of Production Rules
for Integrity Maintenance (Ceri et al)• Constraint analysis: A design process for
specifying operations on objects (Urban et al)
• Consistency management with repair actions (Nentwich et al)
Related Work• Constraint mechanisms in programing
languages• Kaleidoscope (Lopez)• Alphonse (Hoover)
• Self-stabilizing algorithms (Dijkstra)• Log-based recovery for database systems• Recovery-oriented computing
• Microrecovery & Microreboot (Candea,Fox)
• Undo framework (Brown,Patterson)• Specification Languages
• Alloy (Jackson)• UML
Future Directions
• Explore other mechanisms to decouple software systems• Data dependences• Control dependences
• More frequent consistency checking • Use page protection mechanisms in
hardware to incrementally check specifications
• Static analysis to eliminate unnecessary checks
Conclusion
• Data structure repair exciting way to (potentially) improve reliability
• Specification-based approach promises to make technique more widely applicable
• Automatic inference of specifications promises to make developing data structure consistency specifications even easier
• Moving towards more robust, probabilistic, continuous concept of system behavior
Implementation
• Size of system: 26,200 lines• Compiler
•20,400 lines of Java code •2,500 lines of parser definitions
• Runtime - 3,200 lines of C code
Time to Check Consistency & Perform Repairs
Application Time to CheckConsistency(ms)
Time to Check and Repair (ms)
AbiWord 0.06 0.55
CTAS 0.07 0.15
FreeCiv 3.62 15.66
File system 4.22 263.14
Lines of Code
Application Lines of Code
AbiWord 360,000
x86 emulator 65,000
CTAS >1 million
FreeCiv 73,000
File system 700