Download - Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
DAT207 - Accelerating Application
Performance with Amazon ElastiCache
Omer Zaki (AWS) / Nick Dor (GREE) / James Kenigsberg (2U)
November 14, 2013
Speakers
• Omer Zaki – Senior Product Manager, AWS
• Nick Dor – Senior Director of Engineering, GREE International, Inc.
• James Kenigsberg – Chief Technology Officer, 2U, Inc.
What is a Cache?
• Specialized data store that keeps frequently
accessed data in memory
• Memory is order of magnitudes faster than disk
Why Use a Cache?
• “Latency is the mother of interactivity”*
• Handle hot data, handle spikes
• Reduce load on backend
• For a majority of web applications, workloads are read heavy – Often as high as 80-90% reads vs. writes
* http://highscalability.com/blog/2009/7/25/latency-is-everywhere-and-it-costs-you-sales-how-to-crush-it.html
Caches, caches, caches
• Types – browser cache, proxy cache, server cache, database cache, file system cache
• Characteristics – persistence, scalability, data model, warming
• Architecture – side cache, read through, write back
• Options – Memcached, Redis, etc.
Memcached
• Free, open-source, high-performance, in-memory
key-value store
• Developed for LiveJournal in 2003
• Used by many of the worlds top websites – YouTube, Facebook, Twitter, Pinterest, Tumblr, …
Memcached: Architecture
APP A
PI
Clie
nt Lib
no communication
between servers persistent TCP
session
can handle
large number
of TCP
sessions
which memcached server?
server = server_list [key mod n]
Source: http://architects.dzone.com/news/notes-memcached
value = get(key)
set(key,value,expiry)
add(key,value,expiry)
replace(key,value,expiry)
app reads /
cache updates
database
reads / writes
Redis
• High speed, in-memory, key-value data store
• Data structure support – strings, lists, sets, sorted sets
• Asynchronous replication
• Optional durability (persistence via snapshot or append-only file)
• Pub/sub functionality
App
MySQL DB
Instance
Redis Master App
Reads
Clients Cache
Updates
Redis: Architecture
Redis Read Replica
Amazon ElastiCache
• Web service that lets you easily create and use cache clusters in the cloud
• Memcached, Redis compatible
• Managed, scalable, secure
• Pay-as-you-go and flexible, so you can add capacity when you need it
Amazon ElastiCache Architecture
Where is Amazon ElastiCache used?
• Gaming
• Social
• Media & Entertainment
• Mobile
• E-Commerce
• Ad Tech
• Many more…
• Auto Scaling front end
• Amazon ElastiCache
• Amazon RDS
• Amazon S3
• Amazon CloudFront
Sample Deployment: Gaming
ZADD leaderboard 556 “Andy” ZADD leaderboard 819 “Barry” ZADD leaderboard 105 “Carl” ZADD leaderboard 1312 “Derek”
ZREVRANGE leaderboard 0 -1 1) “Derek” 2) “Barry” 3) “Andy” 4) “Carl”
Design Patterns
• Low latency / high
throughput store
• Database offloading
• Session management
• In-memory storage for
difficult or time-
consuming tasks
• Leaderboards
• High-speed sorting
• Atomic counters
• Queuing systems
• Activity streams
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon ElastiCache at GREE
Nick Dor – Sr. Director, Engineering
GREE International, Inc.
November 14, 2013
GREE International • 2004 – GREE is founded in Japan
• 2011 – establishes office in US – Hosting games in traditional datacenters
– 2 weeks to procure and provision new servers + 1 week to setup application
– ITIL practices (Dev / Ops separation)
• 2012 – acquires Funzio – AWS hosted
– Quick provisioning of servers (minutes) / but still manual setup (days)
– Hybrid hosting environment
• 2013 – consolidates in AWS – Migrated games from traditional datacenter to AWS
– Automated application setup
– DevOps practices
(c) GREE
GREE Games
• All Mobile, all Free-to-Play – iOS & Android smart phones
– Big focus on tablets
• Role Playing Games (RPG+) – Multi-million dollar franchise, top-grossing titles
– Some of the oldest games on the App Store
• Hardcore – Deeper more intense gameplay mechanics
• Real-Time Strategy (RTS) – Fast action, small unit management
• Casino & Casual Games – Familiar games, wider audience, casual play
(c) GREE
Some Scale
• Over 60 ELB endpoints hosted in AWS – Games, shared services, analytics infrastructure
• 1200 Amazon EC2 instances
• 400 Amazon ElastiCache nodes
• 260 Amazon RDS database servers
• 1TB daily logs from app servers
• Millions of monthly active users
(c) GREE
Example Game Architecture – RPG+
• Application Servers – PHP
– Game events Analytics
• Cache Layer – Memcached ElastiCache
• Batch Processing Servers – Node.js (moving to GO)
– Batches database writes
• Database – MySQL RDS
RDS RDS RDS Failover
DB
Elastic Load Balancing
App App App App
Cache Cache Cache Cache
Batch Batch
(c) GREE
Caching Strategy
• Game architecture predates stable NoSQL – We wanted similar performance at scale
– Keep combined average internal response times below 500ms
• Memcache Authoritative – Still use an RDBMS; potential data loss is limited
• Allows for cheaper/simpler DB layer – Always do full row replacements (ie: no current_row_value +1)
(c) GREE
Data Flow
• Reads – ELB App Cache
• Writes (Synchronous) – ELB App Cache DB
– ELB App Cache Batch DB
– Standard write-through
– No blind writes; always fetch current ver.
• Writes (Asynchronous) – Batch DB
– Batch writes to DB every 30 seconds RDS RDS RDS
App App App App
Cache Cache Cache Cache
Batch Batch
(c) GREE
Elastic Load Balancing
Batch Processor
• 80% of game write traffic is asynchronous
• Ex: Player items (loot) after multiple quests – 10 items in 30 sec; app server sends 10 writes downstream
– Batch processor sends last record with final item count to DB
• Greatly reduced writes on DB – Shard at table and DB server level for larger games
(c) GREE
Memcache Writes - Key Facts
• App handles memcache key hashing & sharding – DB rows are usually just a key, version, timestamp & JSON blob
– Look familiar?
• NEVER do blind writes – Always fetch current value in MC, perform operation, then write
• If version collision, then simply fail – Extremely rare; application will retry for some calls
(c) GREE
Memcache Writes – High Concurrency
• Player vs. Player Events (World Domination) – These have much higher concurrency
– Match-making, battles/results, leaderboards
• Here we do relative updates at MC layer – Yes, we contradict ourselves here a little
• If we get a version collision/failure – App server reloads MC value and tries again, up to 5 times
– Usually on 2nd or 3rd try we succeed
– This happens VERY fast in the code
(c) GREE
Failure Scenarios
• Memcache node fails – Go straight to the database; versioning is key here
• Hashing compartmentalizes impact – During failure, only players assigned to that node are affected
– Usually only a small performance drop
• Node comes back online… – Cache is refilled organically
– DB load for that subset of operations decreases over time
(c) GREE
Why Amazon ElastiCache?
• Fairly stable – Fails less regularly than Amazon EC2
• Automatic node replacement – Same node name/DNS
• Good performance – Highest performance with larger instances (network layer)
• Configuration endpoint – Application can dynamically add/remove nodes
– Automatically rebalance hashes to accommodate new nodes
– No more manual memcache migrations – YAY!
(c) GREE
Newer Games - Architecture
• MUCH more modern in terms of arch/tech
• Shift towards real-time games – Longer play sessions; higher player engagement
– Will impact our caching model – less pools, but larger
• Streaming, queuing – GO, nsqd
• Moving (finally) to memcached – Had used old memcache libraries for long time
(c) GREE
Future Trends in Caching at GREE
• Check and Set tokens (CAS) – A sort of internal versioning in memcached
– Ensures data is latest before updating
– Atomic transactions
• Investigate real NoSQL implementation
• Redis - Promising – Need to see how I/O performance goes when hitting disk
(c) GREE
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
How 2U is Building the World’s Premier
Online Learning Programs
James Kenigsberg, CTO, 2U, Inc.
November 14, 2014
2U partners with top universities to
deliver the world’s best online programs
- Real degrees
- Real live classes
- Real faculty
- Real outcomes
3
1
Graduate Undergraduate
Our best-in-class, proprietary
technology platform can be
integrated across numerous
university clients, program verticals,
and individual classes
University Partner
Prospect Mgmt
App Process
Online Campus
Content Mgmt
CRM
Security
Learning and
Management Stack
2U Online Campus
Far more than what you think of as a
Learning Management System...
...the 2U Online Campus represents the
single hub for students’ asynchronous
study, live class sessions, and dynamic
social tools to create a rich, online
student community.
“No man ever steps in the
same river twice, for it’s not
the same river and he’s not
the same man.”
- Heraclitus
2008 Servers Engineers Developer
2009 Servers Engineers
2009 • Surly French Canadians
• Configuring our own load
balancers
• No MySQL clustering
• Save us! SOS…
Set Amazon’s
servers on fire,
not ours
2010 • Amazon to the rescue!
• Release of Amazon RDS for
databases
• Release of Elastic Load
Balancing for load balancing
• Caching helps students
communicate!
• Memcache
• No file redundancy
2011 Instances Engineers
100 2
2011 • Redundancy!
• GlusterFS
• More Availability Zones
• Using new AWS services as fast as
they release them
• Amazon S3 – Backups
• Amazon SES – Outbound email
• Amazon Route 53 – A lifesaver!
(Zerigo outage)
2012 Instances Engineers
200 3
2012 • Stack growth
• API layer
• Amazon ElastiCache
• DevOps!
• Puppet
• Jenkins
• AWS CloudFormation
2013 Instances Engineers
400 4
2013 • Amazon S3 to the rescue
• More AWS!
• Amazon Redshift data
warehouse
Amazon is committed to
customers
We are committed to changing your life
Impact: Education
1,704 Graduates
3,287,000 K–12 Students
1,097 Current Students
2,116,000 K–12 Students
Through 2019
12,496,000 K–12 Students
Please give us your feedback on this
presentation
As a thank you, we will select prize
winners daily for completed surveys!
DAT207
Want more caching: Attend Amazon ElastiCache Architecture and Design Patterns
Friday @ 11:30am – 12:30pm
Lido 3006