your code is wrong
DESCRIPTION
My keynote at NoSQL Now! on August 21st, 2013TRANSCRIPT
Your Code is Wrong
Nathan Marz@nathanmarz 1
Let’s start with an example
Storm’s “reportError” method
(Storm is a realtime computation system, like Hadoop but for realtime)
Storm architecture
Storm architecture
Master node (similar to Hadoop JobTracker)
Storm architecture
Used for cluster coordination
Storm architecture
Run worker processes
Storm’s “reportError” method
Used to show errors in the Storm UI
Error info is stored in Zookeeper
What happens when a user deploys code like this?
Denial-of-service on Zookeeper and cluster goes down
Robust!
Designed input space Actual input space
Your code is wrong
Your code is literally wrong
Your code is wrong
Why do you believe your code is correct?
Your code
Dependency 1
Dependency 2
Dependency 3
Dependency 1
Dependency 4
Dependency 5
Dependency 4
Dependency 6
Dependency 9
Dependency 7
Dependency 8
Dependency 3,000,000
Hardware
Electronics
Chemistry
Atomic physics
Quantum mechanics
I think I can safely say that nobody understands
quantum mechanics.
Richard Feynman
Your code is wrong
Your code
...
All the software you’ve used has had bugs in it
Including the software you’ve written
Your code issometimes correct
That’s good enough!
Treat code as nondeterministic
Embrace “your code is wrong”to design better software
Robust!
Designed input space Actual input space
Robust!
Designed input space Actual input space
An example
Learning from Hadoop
Jobtracker
Job
Job
Job
Learning from Hadoop
Jobtracker
Job
Job
Job
Learning from Hadoop
Jobtracker
Job
Job
Job
Your code is wrong
So your processes will crash
Storm’s daemons are process fault-tolerant
Storm
Nimbus
Topology
Topology
Topology
Storm
Nimbus
Topology
Topology
Topology
Storm
Nimbus
Topology
Topology
Topology
Storm
Nimbus
Topology
Topology
Topology
Storm
Nimbus
Topology
Topology
Topology
Robust!
Designed input space Actual input space
Robust!
Designed input space Actual input space
The impact of code being wrong
Robust!
Designed input space Actual input space
Failures!Bad performance!Security holes!
Irrelevant!
Design principle #1
Measuring and monitoring are the foundation of solid engineering
Measuring: Under what range of inputs does my software function well?
Monitoring: What’s the actual input space of my software?
Measure & MonitorLatencyThroughputStack tracesBuffer sizesMemory usageCPU usage#threads spawned...
How you monitor your software is as important as its functionality
Design principle #2
Embrace immutability
Read/write databaseApplication
MySQLApplication
MongoDBApplication
RiakApplication
CassandraApplication
HBaseApplication
Your code is wrong
So data will be corrupted
And you may not know why
ViewsImmutable,
ever-growing data
Application
Architecture based on immutability
ViewsImmutable,
ever-growing data
Application
Lambda architecture
Design principle #3
Minimize dependencies
The less that can go wrong, the less that will go wrong
Example:Storm’s usage of Zookeeper
Worker locations stored in Zookeeper
All workers must know locations of other workers to send messages
Two ways to get location updates
1. Poll Zookeeper
Worker Zookeeper
2. Use Zookeeper “watch” feature to get push notifications
Worker Zookeeper
Method 2 is faster but relies on another feature
Storm uses both methods
Worker Zookeeper
If watch feature fails, locations still propagate via polling
Eliminating dependence justified by small amount of code required
Design principle #4
Explicitly respect functional input ranges
Storm’s “reportError” method
Implement self-throttling to avoid overloading other systems
Design principle #5
Embrace recomputation
“Your code is wrong” meanings1. Design input space differs from actual input space2. The logic of your code is wrong3. Requirements are constantly changing
You must be able to change your code to match shifting requirements
Example: blogging software
New requirement: search
Have to build a search index
Recomputation gives you so much more
ViewsImmutable,
ever-growing data
Application
Building software no different than any other engineering
The underlying challenges are the same
What will break it?
What are limits of my dependencies?
How can I add redundancy to increase robustness?
Can I isolate failures?
Our raw materials are ideas instead of matter
Thank you