architecture for scale [appfirst]

Architecture for Scale A Case Study

AppFirst, INC. www.AppFirst.com

•  Automation, Optimization, and Architecture Design o  Autopilot software

o  Automated stock trading platform

o  Medical device software

o  Adaptive control

o  Distributed queue technologies

Shaun Krueger

Lead Software Engineer

•  NYC based software start-up

•  Application

o  Operational Intelligence, Miss Nothing Data

o  Aggregate data from remote servers

o  Provide information for web apps and APIs

•  A Few Metrics Today o  100ks summaries per minute from 10ks of servers

o  Around a GB per remote server per day, TBs daily

o  Query & retrieve information in < 100 MS

o  Data store for up to 1 year

AppFirst Collects, Aggregates, and Correlates Information from Production Applications

Simplified Architecture

Design for Scale

• Micro scale o  Application Components

• Macro scale o  The Entire Service

Micro Scale: Data Processing

Requirements: • Process a constant stream of data o  3 snapshots per minute, per remote server

• Create summaries in real-time o  Up to 1 minute behind wall clock time

• Provide query results in < 100 MS

Micro Scale: Efficiency

We found that: • Summaries of the data were needed in order to keep queries < 100 MS

o Server o Process o Process sets o Topology

• Time series needed for each summary type o Minute o Hour o Day

We tried: • Flat files • Network file systems • Distributed file systems • Relational databases • NoSQL key-value store • Memory based SQL databases • Distributed shared memory

Tape is Dead Disk is Tape Flash is Disk

RAM Locality is King

Jim Gray Microsoft

December 2006

Micro Scale: We learned the hard way

Micro Scale: Solution

Aggregation: • HPC pipeline processing model • RAM based data model • Queues as message bus • Stateless processing • Adaptive control • Queries are fully abstracted

Horizontal scale may require that you revisit your design

Micro Scale

We all know we need to scale horizontally

Stateless • Any data processing with any time constraint • Processes can be run on any server • Processes can be migrated • Multiple processes can be added as load varies • All data stored in distributed shared memory • Message passing between components • Send keys and not data

Cluster • Use components that cluster • Don’t do backups, use replication • Redis, memcached, and Hbase can be clustered • Postgresql, MySQL, and RabbitMQ don’t really cluster

Macro Scale: Application Capacity

Load: • Most significant load impact from remote servers • User interaction, APIs, and queries do not load the system as much as remote servers • Support 100, 1,000, 10,000, 100,000 remote servers

Will a design that supports 10,000 remote servers scale to support 100,000 remote servers?

Infinite Scale

• Paralyzes the design team • Fosters bad behavior • Unrealistic expectations • Developers forced to take unrealistic action

• But... you don’t want to say no to the business • The whole purpose is to add users • When the business brings a customer with 10,000 servers you want to say; bring it on

Macro Scale: Capacity

We started with a snapshot: • Supported 1000 remote servers • Micro scale results made it possible to scale out • fairly flexible application component design • Scale out to 10,000 remote servers • This is a financial calculation • Scaled out in linear fashion • Data processing • Storage • Started in linear fashion then determined actual requirements

Macro Scale Solution: The Pod

Pod Architecture: • Segmented infrastructure along the lines of load sources • Create infrastructure to support specific load • Instantiate additional infrastructure with additional load • When a pod gets to 85-90% capacity spin out a new pod • Capacity of a pod is a financial calculation • Scale within a pod in 1000 server increments • Need to automate the deployment of a pod

Pod 0 Pod 1

Write Your Own • Adaptive software • RabbitMQ replacement • Network bridges

Metrics are king • Business metrics • Application metrics

Time Series Data • Issues relate to a specific time • Complete state information for any given minute • Don’t know what info is needed before a problem occurs; all data every minute

Don’t trust the data • Clocks are skewed • Encodings fail • Save all bad data & replay • Think defensive

The Pod Rocks • Isolated • Distributed • Located where needed • Behind the firewall

Conclusions

• Stateless Data o Key to horizontal scale

• Disk is tape o RAM based design is critical, not optional

• Cluster o Use components that cluster, not just master/salve

• Design for infinite scale does not work

• Pod approach is an answer for infinite scale

Thank You!

Shaun Krueger [email protected] www.appfirst.com

architecture for scale [appfirst]

Technology