architecture for scale [appfirst]
DESCRIPTION
It’s one thing to support many data sources with megabytes of data. It’s a completely different problem supporting thousands of data sources with terabytes of data every day. How do you create systems that scale infinitely? The answer is; you don’t . You can not design for infinite scalability. Rather, consider a pod approach where each pod supports a defined capacity. Scalability results from deployment of multiple cooperating pods. Systems handling extremely large data sources with significant processing requirements are difficult at best to validate. Attempting to deploy such a system without well understood capacity limits is destined for failure. This was first presented at Cloud Expo NYC.TRANSCRIPT
![Page 1: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/1.jpg)
Architecture for Scale A Case Study
AppFirst, INC. www.AppFirst.com
![Page 2: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/2.jpg)
• Automation, Optimization, and Architecture Design o Autopilot software
o Automated stock trading platform
o Medical device software
o Adaptive control
o Distributed queue technologies
Shaun Krueger
Lead Software Engineer
![Page 3: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/3.jpg)
• NYC based software start-up
• Application
o Operational Intelligence, Miss Nothing Data
o Aggregate data from remote servers
o Provide information for web apps and APIs
• A Few Metrics Today o 100ks summaries per minute from 10ks of servers
o Around a GB per remote server per day, TBs daily
o Query & retrieve information in < 100 MS
o Data store for up to 1 year
AppFirst Collects, Aggregates, and Correlates Information from Production Applications
![Page 4: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/4.jpg)
Simplified Architecture
![Page 5: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/5.jpg)
Design for Scale
• Micro scale o Application Components
• Macro scale o The Entire Service
![Page 6: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/6.jpg)
Micro Scale: Data Processing
Requirements: • Process a constant stream of data o 3 snapshots per minute, per remote server
• Create summaries in real-time o Up to 1 minute behind wall clock time
• Provide query results in < 100 MS
![Page 7: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/7.jpg)
Micro Scale: Efficiency
We found that: • Summaries of the data were needed in order to keep queries < 100 MS
o Server o Process o Process sets o Topology
• Time series needed for each summary type o Minute o Hour o Day
We tried: • Flat files • Network file systems • Distributed file systems • Relational databases • NoSQL key-value store • Memory based SQL databases • Distributed shared memory
![Page 8: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/8.jpg)
Tape is Dead Disk is Tape Flash is Disk
RAM Locality is King
Jim Gray Microsoft
December 2006
Micro Scale: We learned the hard way
![Page 9: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/9.jpg)
Micro Scale: Solution
Aggregation: • HPC pipeline processing model • RAM based data model • Queues as message bus • Stateless processing • Adaptive control • Queries are fully abstracted
Horizontal scale may require that you revisit your design
![Page 10: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/10.jpg)
Micro Scale
We all know we need to scale horizontally
Stateless • Any data processing with any time constraint • Processes can be run on any server • Processes can be migrated • Multiple processes can be added as load varies • All data stored in distributed shared memory • Message passing between components • Send keys and not data
Cluster • Use components that cluster • Don’t do backups, use replication • Redis, memcached, and Hbase can be clustered • Postgresql, MySQL, and RabbitMQ don’t really cluster
![Page 11: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/11.jpg)
Macro Scale: Application Capacity
Load: • Most significant load impact from remote servers • User interaction, APIs, and queries do not load the system as much as remote servers • Support 100, 1,000, 10,000, 100,000 remote servers
Will a design that supports 10,000 remote servers scale to support 100,000 remote servers?
![Page 12: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/12.jpg)
Infinite Scale
• Paralyzes the design team • Fosters bad behavior • Unrealistic expectations • Developers forced to take unrealistic action
• But... you don’t want to say no to the business • The whole purpose is to add users • When the business brings a customer with 10,000 servers you want to say; bring it on
![Page 13: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/13.jpg)
Macro Scale: Capacity
We started with a snapshot: • Supported 1000 remote servers • Micro scale results made it possible to scale out • fairly flexible application component design • Scale out to 10,000 remote servers • This is a financial calculation • Scaled out in linear fashion • Data processing • Storage • Started in linear fashion then determined actual requirements
![Page 14: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/14.jpg)
Macro Scale Solution: The Pod
Pod Architecture: • Segmented infrastructure along the lines of load sources • Create infrastructure to support specific load • Instantiate additional infrastructure with additional load • When a pod gets to 85-90% capacity spin out a new pod • Capacity of a pod is a financial calculation • Scale within a pod in 1000 server increments • Need to automate the deployment of a pod
Pod 0 Pod 1
![Page 15: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/15.jpg)
Write Your Own • Adaptive software • RabbitMQ replacement • Network bridges
Metrics are king • Business metrics • Application metrics
Time Series Data • Issues relate to a specific time • Complete state information for any given minute • Don’t know what info is needed before a problem occurs; all data every minute
Don’t trust the data • Clocks are skewed • Encodings fail • Save all bad data & replay • Think defensive
The Pod Rocks • Isolated • Distributed • Located where needed • Behind the firewall
![Page 16: Architecture for Scale [AppFirst]](https://reader038.vdocument.in/reader038/viewer/2022100523/547e1cf75806b5cc5e8b4619/html5/thumbnails/16.jpg)
Conclusions
• Stateless Data o Key to horizontal scale
• Disk is tape o RAM based design is critical, not optional
• Cluster o Use components that cluster, not just master/salve
• Design for infinite scale does not work
• Pod approach is an answer for infinite scale