Iron Pyrite
• a.k.a. Fool’s Gold
• Looks pretty
• Initial Perceived Value
• Not much you can do with it in the real world
A Good User Experience:
• Delivers value to the user
• Reduces effort to get the job done
• Is invisible
• Just works
• But what if it doesn’t?
Mistaeks We’re Made• Designing for the happy path
• Static Assets
• PowerPoint/Keynote
• Animations (if lucky)
• “Nobody thought about that”
• “That would never happen”
Kind of Failures• Net-Split
• Bandwidth
• Latency
• Site is “Unresponsive”
• Dependent Service Down?
• Others*
* http://rgoarchitects.com/Files/fallacies.pdf
Your User Doesn’t Care
• It is on you, not a service provider
• How can you still deliver value when things go wrong?
• This is where the user’s experience really begins
Provide Some Levity• Be Helpful when the user messed up
• Github 404 pages
• erlang.org’s not found page
Provide Some Levity• Acknowledge you messed up (and apologize)
• Twitter Fail Whale
• Reddit down time messages*
* https://github.com/reddit/error-pages/blob/876f3e689206551722fbe77374e7739f54b52847/504.reallydown.html#L152
Make It A Game???• Track how many times they have failed
• Give them “rewards” for failures
• Real, or emotional
• DOOM faces?
• Give them something to be grateful about
• People can’t be grateful and upset at the same time
Can one failure take out your entire system?
• How can we isolate that part of the system?
• Can we safely restart it from scratch?
• Is it critical to the entire system?
• No, really, is it critical to the entire system?
• Can we work without it until it gets better?
• Do we have different, and unrelated, way of getting that information?
• Can we provide “The Next Best Thing”?
Partial / Reduced Functionality
• Netflix
• Streaming
• vs Personal Recommendations
• vs Top Rated
• vs Queue/Watchlist Management
Partial / Reduced Functionality
• Amazon
• Orders
• vs “Customers Also Bought”
• vs Inventory
• vs Ratings
• vs Reviews
Automated Assistance?• Analyze Event Streams and Analytics
• Common user behavior
• Uncommon behavior for a specific user
• Can we helpfully take control on behalf of the user
• Assisted driving
• Don’t let the user mess up
Network is “down”• Airplane mode
• International Travel
• Local ISP is having issues
• Bandwidth throttling
• Local network is down
Network is “down”• Natural disasters
• Backhoes
• Backhoes 💖 datacenters
• Cleaning Crews
• That one server under the desk
Internet of Things
• Televisions / Home Electronics
• Lightbulbs / Door locks / Toilets
• RFID Inventory
• Oil Pipeline Sensors
Multiple Sensor Devices• What happens when one fails?
• Scary Time…
• Planes
• Cars
• Medical Devices
–Leslie Lamport
“There has been considerable debate over the years about what constitutes a distributed system. It would appear that the following
definition has been adopted at SRC:
A distributed system is one in which the failure of a computer you didn't even know existed can
render your own computer unusable.”
http://research.microsoft.com/en-us/um/people/lamport/pubs/distributed-system.txt
Self Healing / Regenerative
• How can I know that something is wrong?
• What is supervising/monitoring this?
• Can I safely “turn it off and on again”?
• Can this all be invisible to the user?
–Peter Senge, The Fifth Discipline
“Systems thinking is a discipline for seeing wholes. It is a framework for seeing
interrelationships rather than things, for seeing patterns of change rather than static
“snapshots.””
–Nassim Nicholas Taleb, Antifragile: Things That Gain From Disorder
“Some things benefit from shocks; they thrive and grow when exposed to volatility,
randomness, disorder, and stressors and love adventure, risk, and uncertainty.”
Call to ActionStart thinking about these things.
Ask how things can break.
You might not be doing anything “critical” at this point of your career.
But you never know what the future holds for you
(or one of your coworkers).