using declarative invariants for protecting file-system integrity
DESCRIPTION
Using Declarative Invariants for Protecting File-System Integrity. By Kuei (Jack) Sun, Daniel Fryer, Ashvin Goel and Angela Demke Brown University of Toronto. Motivation. File systems have bugs Cause corruption and/or data loss - PowerPoint PPT PresentationTRANSCRIPT
1
Using Declarative Invariants for Protecting File-System
IntegrityBy Kuei (Jack) Sun, Daniel Fryer, Ashvin Goel and
Angela Demke BrownUniversity of Toronto
2
File systems have bugs• Cause corruption and/or data loss• Existing reliability techniques, e.g., journaling,
RAID, don’t help
Existing recovery solutions• Restore from backup, but slow, risks loss of data• Offline checker (i.e. fsck), but too slow
Can we do better?
Motivation
3
Eliminate Bugs• Static Analysis
Does not scale Bugs may be input dependent
Tolerate Bugs• N-version programming (e.g., Envyfs)
High overheads (performance and storage) Can only check features common to all versions
• Micro reboot of file system (e.g. Membrane) Requires detectable failures Many corruption bugs are fail-silent
Possible Alternatives?
4
Verify correctness at runtime• Benefit: Make silent failures detectable
What are we checking?• Same thing fsck checks
When are we checking for consistency?• When file system claims to be consistent
Leverage transactions provided for crash recovery I.e. check at commit time
How are we doing the checks?• Convert global fsck checks into local checks on
transactions
Our Approach
5
File systems have consistency properties• E.g. all in-use data blocks are marked in the block
allocation bitmap• This is a global property that fsck checks
For each property, we derive an invariant• Invariant must hold in any transaction to preserve
the corresponding property• E.g. when Block N is allocated, bit N must be set
in the allocation bitmap, in the same transaction• Invariants operate on changes in transactions,
requiring local checks
No Really… How?
6
The focus of work…
Change records encode the updates in a transaction
Recon Data Flow
Write Cache
Read Cache
Modified Block
LogicalDifference
Engine
Original Block
Change Record
InvariantChecker
Invariants
Violation?
7
Example: write 8 bytes of data into an empty file with inode #1
Change Record
Change Record[type, id, field, old, new]
Description
[inode, 1, 3, 0, 7] For inode #1, set direct block pointer (field 3 in inode) from 0 to 7, i.e., allocate block 7 to inode #1
[b_freemap, 7, _, 0, 1] Set bit 7 on (from 0 to 1)
[inode, 1, 2, 0, 8] For inode #1, set size (field 2 of inode) from 0 to 8
Block 7 allocated
Direct block
Bit set
8
Clean and concise
Works well with
recursive structures
Easy to reason about
Recon: C(Imperative)
SQCK: SQL(Query) ?
Recon: Datalog(Logic)
How to Express Invariants?
9
R1_violation(IN ,BN) :- block_allocated(IN, BN),not(change(b_freemap, _, _, BN, _, 1)).
R2_violation(BN) :- change(b_freemap, _, _, BN, _, 1),not(block_allocated(_, BN)).
Datalog Invariant CheckingDatalog facts: change(type, id, field, old, new)
change(inode, 1, 3, 0, 7).
change(b_freemap, 7, _, 0, 1).
change(inode, 1, 2, 0, 3).
Change records are trivially converted to Datalog facts
10
On each transaction• Add facts from change records into Datalog
knowledge base• Check all invariants on Datalog facts
Problem• Set of facts grows over time• Facts need to persist across reboots
Slows invariant checking Introduces more consistency problems
Insight• After commit, all facts in the transaction are
incorporated in file system
Invariant Checking
11
FS state is available in Recon caches We provide Datalog primitives to access
caches Can discard all facts after transaction commit
Querying File System State
Disk
Change Record
Invariants
Violation?Primitives
Datalog Interpreter
Read Cache
Write Cache
…Fact
12
Example: Directory Cycle Detection
path(X , P) :- dir_get_parent(X, P).path(X , A) :- dir_get_parent(X, P),
path(P, A).cycle(X) :- path(X, X).
No cycle for this tree!
Using Primitives
c b a /
Primitive
13
Implemented for a simple test file system• TestFS implemented at user level• Designed to be a simplified version of Ext3
All TestFS invariants are applicable to Ext3
TestFS has 12 Datalog invariants Ext3 has 33 invariants in C Invariants are independent
Total number of lines of invariant code is 38
Current Status
14
Datalog invariants for ext3, btrfs file systems• Currently, Ext3/Btrfs Recon is implemented in OS• We plan to implement it in a hypervisor to provide
strong fault model• Don’t need to port Datalog to kernel!
Customize Datalog interpreter• Optimize for file-system specific operations
Future Work
15
The Recon framework allows detecting arbitrary metadata corruption through runtime consistency checking
When a transaction commits, Recon checks invariants to ensure file system consistency
Invariants can be expressed in Datalog clearly and concisely
Conclusion
16
Using Declarative Invariants for Protecting File-System
IntegrityBy Kuei (Jack) Sun, Daniel Fryer, Ashvin Goel
and Angela Demke Brown
Questions?
17
Workload• ~203K commands
e.g. mkdir, rmdir, rm, touch, cd, write to file
Evaluation
Original With Checking Overhead
User 17.8±0.2s 36.4±0.1s 2.04x
System 22.5±0.1s 23.10.1s 1.03x
Sleep 545.4±9.1s 604.9±9.0s 1.11x
Total 585.8±9.2s 664.4±9.1s 1.13x
18
Example: move /a into /a/b/c
Directory Cycle Detection
c
b
a
/
: child entry
: parent entry
cb
a
/
19
Change records for move /a into /a/b/c
Directory Cycle Detection
Datalog Fact[type, id, field, old, new]
Description
change(dir_block, 0, 1 ‘a’, φ).
Remove entry ‘a’ from root node (inode #0)
change(dir_block, 1, 0, ‘..’, φ).
Remove entry ‘..’ from inode #1
change(dir_block, 3, 1, φ, ‘a’).
Add entry ‘a’ to inode #3 (inode for directory ‘c’)
change(dir_block, 1, 3, φ, ‘..’)
Add entry ‘..’ to inode #1 and set it to inode #3 (directory ‘c’ is its parent)
20
cycle(3).• path(3, 3).
parent(3, 3). parent(3, ?), path(?, 3). parent(3, 2), path(2, 3).
parent(3, 2), parent(2, 3). parent(3, 2), parent(2, ?), path(?, 3). parent(3, 2), parent(2, 1), path(1, 3).
parent(3, 2), parent(2, 1), parent(1, 3).
We have a match, a.k.a: violation!
Invariant Checking
path(IN , PIN) :- dir_get_parent(IN , PIN).path(IN , AIN) :- dir_get_parent(IN , PIN), path(PIN , AIN).cycle(IN) :- path(IN , IN).
cb
a1
32
21
Problem:• The set of change records that we have is
insufficient.• From the transaction alone, we cannot deduce the
parent of ‘c’ and ‘b’. We know the parent of ‘a’ is ‘c’.
Solution:• Primitives are predicates written in
the C language that is able to querythe read and write cache in Recon
Primitiveschange(dir_block, 3, 1, φ, ‘a’).change(dir_block, 1, 3, φ, ‘..’).
cb
a1
32