Building Undo for Operators:An Undoable E-mail Store
Aaron Brown
ROC Research GroupUniversity of California, Berkeley
Winter 2003 ROC Research Retreat14 January 2003
Slide 2
Outline
• Recap of “Three-R’s” Undo model
• Implementing Undo for e-mail
• Analysis of e-mail undo prototype
• Conclusions and future directions
Slide 3
Recap: Undo for Operators
• Dependability is achieved at the interface of the system and its operator
• Undo goal: support the operator– create a forgiving operations environment
» tolerate human error, allow trial-and-error experiments
– provide last-resort recovery from unknown damage» retroactive repair of software bugs, broken upgrades,
unknown operator changes, external attacks
– be easily comprehensible by operator and developer– preserve user data while refreshing system state
Slide 4
Recap: The Three-R’s Undo Model• Provide Undo as virtual time travel
– Rewind: roll back all system state, physically– Repair: make changes to past timeline to avert
original disaster– Replay: roll state forward logically, merging
original timeline with effects of repairs
• Key properties of 3R’s Undo– recovery from problems at any system layer– recovery from unanticipated problems– no assumptions about correct application
behavior
Slide 5
Outline
• Recap of “Three-R’s” Undo model
• Implementing Undo for e-mail
• Analysis of e-mail undo prototype
• Conclusions and future directions
Slide 6
Building an E-mail Undo Prototype• Target: an undoable e-mail store service
– a leaf node in the Internet e-mail network– accessed via IMAP and SMTP
• Built around existing e-mail store service– e.g., sendmail+imapd, iPlanet, Exchange
• Extensible to other apps– architected around a reusable undo core with e-
mail-specific extensions
• Written in Java
Slide 7
Undo System Architecture
E-mail Service
Time-travelStorage
UndoManager
Verbs
control
TimelineLog
ControlUIUser
E-mail Proxy
IMAP, SMTP
IMAP, SMTP
Includes: - user state - application - OS
Slide 8
Architecture Design Points
• Proxy-based design targeted at services
• Application-neutral core– undo manager, timeline log, time-travel storage, UI– contains all undo-cycle logic
• E-mail-specific semantics encapsulated– proxy: encodes e-mail interactions into verbs– verbs: define e-mail semantics
» model of preserved state» model of acceptable external (observed) consistency
Slide 9
Verbs
• Key construct in undo architecture– represent end-user state changes, state
exposure– building block of timeline and history– used to detect and manage inconsistencies
• Essential for reasoning about undo– verbs define exactly what state is preserved
during an undo cycle– provide framework for defining application’s
model of acceptable external inconsistency
Slide 10
Verbs and the Undo Cycle
• Verb flow
E-mail Service
Includes: - user state - application - OS
Time-travelStorage
UndoManager
Verbs
control Timeline
Log
ControlUIUser
E-mail Proxy
Replay
Normal operation
Slide 11
Comparison to Optimistic Replication• Insight: undo can be recast as
replication—not in space, but in time– two virtual replicas diverge at the Rewind point– verb history defines an update log to one replica– repairs define updates to the other replica– Replay involves reconciling the two update
streams, detecting and handling inconsistencies
• Verb-based design is heavily influenced by replica systems– verbs map to replica system “actions” or
“updates”
Slide 12
Comparison to Optimistic Replication (2)• But Undo has key differences:
– reconciliation between a logical log (verb history) and a physical state (repaired system)» so log-merging schemes (e.g., IceCube) don’t apply
– not all inconsistencies are bad» no need to check for inconsistency until externalization» no complex, error-prone guards on verb execution
– inconsistencies can be handled after-the-fact» post-conditions on externalized data are sufficient» fewer assumptions of application correctness» if necessary, can rewind undesired inconsistency-
producing verb
Slide 13
E-mail example: original timeline
Systemboundary
Systemstate
Verbs
Historylog
Time
Inbox
Folder1olleH
Copy
Copy
Externalizes:—
olleH
Fetch
Fet
ch
m
Externalizes:m
+ Signature(m)=“olleH”
Hello
olleH
Deliver
Deliver
m
Externalizes:—
+ input “Hello”
Slide 14
olleH
Copy
Copy
Externalizes:—
olleH
Fetch
Fet
ch
m
Externalizes:m
+ Signature(m)=“olleH”
Hello
olleH
Deliver
Deliver
m
Externalizes:—
+ input “Hello”
XHello
Deliver
Externalizes:—
+ input “Hello”
Hello
Deliver
m
E-mail example: replay timeline
Systemboundary
Systemstate
Verbs
Historylog
Time
Inbox
Folder1 Hello
Copy
Copy
Externalizes:—
Hello
Fetch
Fet
ch
m
Externalizes:m
+ Signature(m)=“olleH”
mismatch! => inconsistency
Slide 15
Verbs and External Inconsistency• Detecting inconsistency
– verbs that externalize state record a signature of externalized data and define a comparison function» comparison fn. defines the external consistency
model
• Handling inconsistency– verbs define compensation routines, invoked
when inconsistencies are found– later non-commuting verbs are squashed to
prevent data loss
Slide 16
Verbs for E-mail
• SMTP & IMAP protocols mapped into 13 verbs– SMTP: Deliver– IMAP: Append, Fetch, Store, Copy, List, Status,
Select, Expunge, Close, Create, Rename, Delete– set could easily be extended with user and
subscription management functionality
• Each verb is a Java class implementing a generic, app-neutral Verb interface
Slide 17
Example Verb: STORE (set msg flags)
• Tag– input: target folder, message IDs, flag value– externalized output: resultant flags of messages
• Sequencing tests– commutes with & independent of SMTP verbs, all
IMAP verbs except Fetch/Store/Close on same folder
• Consistency check– new flags must match original flags for all messages
in common; no original messages should be missing
• Inconsistency handler– leave user a message explaining and listing changes
Slide 18
An External Consistency Model for E-mail• Retrieval (IMAP)
– tracked state includes message bodies, key message headers (to/from/subject/cc), flags, folder lists, and execution status of state-altering commands
– inconsistency if objects are missing or altered on replay or commands fail; order & new objects ignored
– compensation via explanatory messages and creation of “lost-and-found” containers
• Delivery (SMTP)– only possible inconsistency is in execution status– one tricky case: originally-failed delivery succeeds
» delay bounce to provide window for undo-repair
Slide 19
Undo System Architecture
• Next up:– proxy– undo manager– time-travel
storage layer
E-mail Service
Time-travelStorage
UndoManager
Verbs
control
TimelineLog
ControlUIUser
E-mail Proxy
IMAP, SMTP
IMAP, SMTP
Slide 20
Proxying E-mail
• IMAP proxy loop is straightforward– accept client connection and open server connection– parse commands and instantiate corresponding verbs– invoke undo manager to schedule, execute, log verb
» passing a handle to the client and server connections
• SMTP is more complicated– failed SMTP deliveries must be logged for later retry– proxy poses as a server that always accepts deliveries
» deliver verb is created after client has been ACK’d» some finesse to avoid being an open-relay
– if verb fails, proxy sends a bounce after a delay
Slide 21
Undo Manager Implementation• Timeline log
– BerkeleyDB recno database storing serialized verbs– also stores control records to make undo manager
state persistent and recoverable
• Verb scheduling– all verb execution passes through undo manager– scoreboard-like structure sequences verbs according
to independence, commutativity, ordering properties
• External inconsistency management– undo manager coordinates verb APIs to ensure that
external inconsistencies are detected and handled
Slide 22
A Time-Travel Storage Layer
• Base: Network Appliance Filer with snapshots
• Java wrapper for Filer’s management CLI– provides direct control of snapshot create/restore– periodically takes snapshots, aging out old ones
» hierarchical scheduling allows the 31 snapshots to span one month, with density inversely proportional to age
• Rewind restores the closest prior snapshot, then replays up to the exact rewind point
• Challenge: making snapshot restore undoable– solution: implement rollback by copying old snapshot
forward to present (expensive, but necessary)
Slide 23
Outline
• Recap of “Three-R’s” Undo model
• Implementing Undo for e-mail
• Analysis of e-mail undo prototype
• Conclusions and future directions
Slide 24
Code complexity
• Entire system is about 23K lines of Java– split evenly between app-neutral and
e-mail-specific code– bulk of e-mail code is in verbs
• Implementing and debugging the e-mail verbs and proxy took roughly two man-months
Slide 25
Measuring Overhead for Undo• Workload
– modified SPECmail2000 e-mail benchmark» simulates traffic seen by an ISP mail server» modified to use IMAP instead of POP, all mail local
– configured with 5000 users» about 56 SMTP cxns/minute, 149 IMAP cxns/minute
• Setup– mail server: Linux, sendmail + UW imapd, 5000
users– undo proxy: Win2k, Sun 1.4.1 JDK– workload generator: Win2k, Sun 1.4.1 JDK– storage: all mailboxes and logs on NetApp Filer
» mailboxes accessed via NFS, logs via CIFS
Slide 26
Session Length (seconds)0.01 0.1 1 10 100
Per
cen
tag
e o
f S
essi
on
s
0
20
40
60
80
100
IMAP, no undoIMAP, undoSMTP, no undoSMTP, undo
Overhead Results: Space and Time• Space overhead
– 5GB/day for timeline log= 1GB/day per 1000 users
– 325KB per 1000 mail folder name translations
– per 120GB disk:» 7 weeks of timeline for
1000 ISP users» or 350 million folder
name translations
• Time overhead– cumulative distribution
plot of IMAP and SMTP session lengths
Slide 27
Outline
• Recap of “Three-R’s” Undo model
• Implementing Undo for e-mail
• Analysis of e-mail undo prototype
• Conclusions and future directions
Slide 28
Next Steps
• Perform more analysis of prototype– speed of rewind, replay, entire undo cycle
• Measure efficacy of undo with human expts.– compare recovery time from pre-defined
scenarios with and without undo» based on survey of e-mail administrators
• Examine other applications to understand generality of undo architecture– auctions, e-commerce, others?
Slide 29
Conclusions
• We can build a real Undo implementation– for a real application– with tolerable overhead– with a reusable, application-independent core– based on a verb model that enables reasoning
about state and undo behavior
• Look forward to final analysis at next retreat!
Building Undo for Operators:An Undoable E-mail Store
Aaron BrownROC Research Group
University of California, Berkeley
[email protected]://roc.cs.berkeley.edu/
Slide 31
Backup Slides
Slide 32
Summary: What’s in a Verb?
• Encapsulation of a user-service interaction– action specification– procedure to execute verb via proxy– tag: input parameters, signature of externalized data,
verb execution status
• Commutativity/independence test functions
• Consistency management functions– routine to check two tags for external inconsistency– inconsistency handler– squash routine
Slide 33
Verbs and the Undo Timeline
• Recorded history of verbs must be consistent with actual verb execution– but logging at proxy may not match execution order
• Solution: partial serialization of verbs– undo manager tracks arriving & executing verbs in
scoreboard– arriving verbs that commute are executed in parallel– non-commuting verbs stall until dependencies clear
• Guarantees consistent recorded timeline without requiring hooks into application
Slide 34
Verb Issue: Naming State
• Verbs need to specify names of target state– folders, messages, users
• Names must be time-invariant and unaffected by changes to execution timeline– IMAP protocol names can’t guarantee this
• Solution: allocate “UndoIDs” to objects– UndoID never changes and is restored on replay– undo system maintains translations to IMAP names
» messages: embed UndoID in header field» folders: track UndoIDs in parallel database