moneytree - data aggregation with swf
DESCRIPTION
An outline of how Moneytree uses Amazon SWF to coordinate our backend aggregation workflow. Focuses on how to run a large scale distributed system with a few developers while still sleeping at night.TRANSCRIPT
Who Am I?
Ross Sharrott
Founder & CTO of Moneytree
American
10 Years in Japan (Feb 24!)
Previously Senior IT Manager
Love distributed architectures in the cloud
What is Moneytree?
Internet banking is fragmented; not simple
Email is Simple
For mail we use just ONE app!
Gmail Yahoo! Work, etc.
Radically simplify your relationship with money
and make it beautiful.
Data Aggregator
Our Goals:
Download accounts for 1M people every day
Deliver new data in < 1 minute
2-3 developers
Sleep at night
First Idea
I know…I’ll use a queue!
Original Queue Based Process
Download Data
Process Statement
sStore Data
1 Account / Many Statements
Download Data
Process Statements
Post Process Statements
Store Data + Additional
Information
But we had a problem…
To determine a CC balance, we need information from multiple statements
We needed a post statement process
What We Needed
Download Data
Process Statement
s
• Statement 1
• Statement 2
• Many More
Post Process
Queue Falls Down
I know…I’ll use a queue!
Queues are linear
Where are we in the process?Logged in yet? Processing data?
What do you do when a job fails?
How do you relate jobs to one workflow?
Enter SWF
AWS Managed Service
Coordinates Workflows / Maintains history
Provides multiple queues called Task Lists
Handle decision points with Deciders
Perform tasks with Activity Workers
Real World – A Restaurant
SWF World – A Restaurant
Decider – does nothing, makes decisions
Workflow Starter – takes orders
Activity Worker – makes food
Activity Worker – distributes food
SWF – maintains history, distributes tasks
Activity Worker
Very similar to any queue worker
Handles a specific task
Polls a Task List to get new info
Reports activity success or failure
Puts results in a DB or on S3, etc.
Workflow Decider
Uses workflow history to make decisions
Schedules tasks
Handles rescheduling failures & timeouts
Reacts to external events (Signals)
Reacts to completion events
Moneytree’s Workflow
Download Data
Statement
Post Process
Statement
Moneytree’s SWF Architecture
1 Day of Work
Yesterday:
70,000 Workflows
Average Completion Time: 1 Minute
575,000 Decision Tasks
146,000 Statements Processed
70,000 Aggregation Tasks
70,000 Post Process Tasks
Data Aggregator
Our Goals: 1M people every day Deliver new data in < 1 minute 2-3 developers Sleep at night
How To Sleep At Night
Make Workers Scalable
Avoid SWF API Throttling
Expect Failures
Measure Everything
Make Workers Scalable
Separate concerns into individual workers
Scale each worker process individually
Automate scaling your workers
Make workers idempotentYou can always try again
Avoid API Throttling
Don’t call GetWorkflowHistory
Stress test your implementation
Limits are by Region, not domain!
Get your limits raisedWe hit limits on day 1
Use exponential retry
Have a circuit breaker
Expect Failures
Cloud = FailuresDyno / EC2 instance restarts
Network & Service outages
Don’t wait for failed processesUse aggressive timeouts
Use heartbeats for long processes
Monitor Everything
Use Performance Monitoring10x increase in performance = 10x workers
New Relic & Cloudwatch
Centralize LoggingCloud resources disappear w/their logs
Papertrail / Logentries
Log Everything & Setup AlertsIf you don’t log it, you can’t fix it
Sleep At Night
Make Workers Scalable
Avoid SWF API Throttling
Expect Failures
Measure Everything
Thank You!
Moneytree is hiring!iOS Developers
API Developers / AWS Dev Ops
Technology Ninjas
Ross Sharrott Founder / [email protected]
@moneytreejp