couchbase_john_bryce_israel_training_use_cases
TRANSCRIPT
Why companies use Couchbase
Perry Krug
Sr. Solutions Architect
Common Use CasesSocial Gaming• Couchbase stores
player and game data
• Examples customers include: Zynga
• Tapjoy, Ubisoft, Tencent
Mobile Apps• Couchbase stores user
info and app content
• Examples customers include: Kobo, Playtika
Ad Targeting• Couchbase stores
user information for fast access
• Examples customers include: AOL, Mediamind, Convertro
Session store• Couchbase Server as a key-
value store
• Examples customers include: Concur, Sabre
User Profile Store• Couchbase Server as a
key-value store
• Examples customers include: Tunewiki
High availability cache• Couchbase Server used as a cache tier replacement
• Examples customers include: Orbitz
Content & Metadata Store
• Couchbase document store with Elastic Search
• Examples customers include: McGraw Hill
3rd party data aggregation • Couchbase stores social media and
data feeds• Examples customers include:
Sambacloud
Use Cases & CustomersWeb app or Use-case Couchbase Solution Example Customer
Content Store & Metadata System
Couchbase document store + Elastic Search
Social Game &Mobile App
Couchbase store game and player data
Ad Targeting Couchbase stores user information for fast access
User Profile Store Couchbase Server as a key-value store
Session Store Couchbase Server as a key-value store
High Availability Caching Tier
Couchbase Server as a memcached tier replacement
Chat/Messaging Platform
Couchbase Server
•Content metadata•Content: Articles, text •Landing pages for website•Digital content: eBooks,
magazine, research material
Content and Metadata Store
Use Case: Content and Metadata Store
• Flexibility to store any kind of content• Fast access to content metadata
(most accessed objects) and content • Full-text Search across data set• Scales horizontally as more
content gets added to the system
• Fast access to metadata and content via object-managed cache• JSON provides schema flexibility to store all types of content and
metadata• Indexing and querying provides real-time analytics capabilities
across dataset • Integration with ElasticSearch for full-text search• Ease of scalability ensures that the data cluster can be grown
seamlessly as the amount of user and ad data grows
Types of Data Application Requirements
Why NoSQL and Couchbase
McGraw Hill Education Labs Learning portal
Use Case: Content and metadata store
Building a self-adapting, interactive learning portal with Couchbase
As learning move online in great numbers
Growing need to build interactive learning environments that
Scale!
Scale to millions of learners
Serve MHE as well as third-party content
Including open content
Support learning apps
010100100111010101010101001010101010
Self-adapt via usage data
The Problem
• Allow for elastic scaling under spike periods
• Ability to catalog & deliver content from many sources
• Consistent low-latency for metadata and stats access
• Require full-text search support for content discovery
• Offer tunable content ranking & recommendation functions
Backend is an Interactive Content Delivery Cloud that must:
XML Databases
SQL/MR Engines
In-memory Data Grids
Enterprise Search Servers
Experimented with a combination of:
The Challenge
The Learning Portal
• Designed and built as a collaboration between MHE Labs and Couchbase
• Serves as proof-of-concept and testing harness for Couchbase + ElasticSearch integration
• Available for download and further development as open source code
• Document Modeling
• Metadata & Content Storage
• View Querying to support Content Browsing
• Elastic Search Integration (Full Text Search)
-Content Updated in near Real-Time
-Search Content Summaries
-Relevancy boosted based on User Preferences
• Real-Time Content Updates
• Event Logging for offline analysis
Techniques Used
Couchbase 2.0 + Elasticsearch
Store full-text articles as well as document metadata for image, video and text content in Couchbase
Combine user preferences statistics with custom relevancy scoring to provide personalized search results
Logs user behavior to calculate user preference statistics (e.g. video > text)
1
2 4
Continuously accept updates from Couchbase with new content & stats
3
Data Model
Content Metadata Bucket
User ProfilesBucket
Content StatsBucket
• Stores content metadata for media objects and content for articles
• Includes tags, contributors, type information
• Includes pointer to the media
• Stores user view details per type
• Updated every time a user views a doc with running count
• To be used for customizing ES search results per user preference• Stores content view details
• Updated for every time a document is viewed
• To be used for boosting ES search results based on popularity
Architecture
•User account information•User game profile info•User’s social graph•State of the game•Player badges and stats
Social and Mobile Gaming
Use Case: Social Gaming
•Ability to support rapid growth•Fast response times for
awesome user experience•Game uptime –24x7x365•Easy to update apps with new
features
•Scalability ensures that games are ready to handle the millions of users that come with viral growth.
•High performance guarantees players are never left waiting to make their next move.
•Always-on operations means zero interruption to game play (and revenue)
•Flexible data model means games can be developed rapidly and updated easily with new features
Types of Data Application Requirements
Why NoSQL and Couchbase
Social gaming at Tencent Stomp Games
Use Case: Social gaming
Building a social game with an awesome user experience that can scale to millions of players
Social gaming is all about the experience
Applications needs
- User centric data (read key-value access)- Scalability - Easy and simple backend
The Problem
• Must be scalable
• Highly available
• Extreme performance (latency and throughput)
• Cost effective
• Operationally easy to maintain
Backend must be a platform for multiple games
CouchbaseMongoDB
DBShardsMySQL Cluster
Experimented with several databases
The Challenge
Evaluations considerations Couchbase MongoDB dbShards MySQL Cluster (NDB)Sharding strategy Replication Failover support Scalability Customized data support System compatibilityCoding effort Performance Protocol Upgrade difficulty Data persisting method Map Reduce / Join SQL compatible Licensing Price Bulk price Management / monitor tool Hardware requirement Supported OS Operation knowledge Operation training Operation difficulty Developer company size Market penetration Support Successful use cases
The architecture
22
Draw Something by OMGPOP
23
As Usage Grew, Game Data Went Non-LinearDraw Something by OMGPOP
Daily Active Users (millions)
24
In Contrast…The Simpson’s: Tapped OutDaily Active Users (millions)
•Social media feeds: Twitter, Facebook, LinkedIn
•Blogs, news, press articles•Data service feeds:
Hoovers, Reuters
3rd Party Data Aggregation
Use Case: 3rd party data aggregation
•Flexibility to store any kind of content
•Flexibility to handle schema changes
•Full-text Search across data set•High speed data ingestion•Scales horizontally as more content
gets added to the system
•JSON provides schema flexibility to store all types of content and metadata
•Fast access to individual documents via built-in cache, high write throughput
• Indexing and querying provides real-time analytics capabilities across dataset
• Integration with ElasticSearch for full-text search•Ease of scalability ensures that the data cluster can be grown seamlessly
as the amount of user and ad data grows
Types of Data Application Requirements
Why NoSQL and Couchbase
3rd party data aggregation at Sambacloud
Use Case: 3rd party data aggregation
Building a data and content aggregation and management platform
More and more data and content coming in from external sources: social media, data services, press and news, blogs
Require a single content store for all this information to handle different types of formats and schemas
The Problem
• Flexible data model to handle any schema and constant changes to schemas
• Allow for elastic scaling particularly for cloud environments
• Consistent low-latency access and ability to handle incoming streams
• Require full-text search support for content
• Light weight analytics for sorting / ranking
The platform must support
The Challenge
The Technologies
WorkAgile Projects
ShareAny Content
OrganizeChannels
RecommendAnalytics
SambaCloud Content Services – REST API, HTML5
•Application objects•Popular search query
results•Session information•Heavily accessed web
landing pages
High availability caching
Use Case: High availability caching
•Consistently low response times for document / key lookups
•High-availability - 24x7x365•Operationally easy to migrate /
upgrade / maintain with app online
•Replacement for entire caching tier
•Low latency in sub-milliseconds with consistently high read / write throughput
•Always-on operations even for database upgrades and maintenance with zero down time
•memcached compatibility for easy migration to Couchbase without any application changes
•High availability and disaster replication with intra-cluster and cross-cluster replication (XDCR)
Types of Data Application Requirements
Why NoSQL and Couchbase
Challenges with a Memcached TierProblem Symptoms Couchbase SolutionCold Cache Slowdown or collapse of the data
service layer due to heavily overloaded RDBMS when
memcached nodes go down (on failure or for maintenance)
Data is automatically replicated across the Couchbase cluster, providing high
availability of data even on failures
Heavy RDBMS Contention
Multiple requests for data items that do not exist in the cache results in
sudden shifting of load to the relational database causing heavy
contention
By replicating data across the cluster, Couchbase Server provides consistent performance without shifting load to
the RDBMS layer
Lack of Scalability Adding or removing memcached nodes is complicated and causes
unpredictable application performance degradation
Auto-sharding and online rebalancing in Couchbase Server provides easy non-
disruptive expansion of the cluster
Complex Monitoring
Management of individual memcached nodes increases the
complexity of operations and lacks a single consistent view of the caching
layer
Couchbase Server provides an in-built admin console for cluster wide
management and monitoring as well as RESTful APIs for easy automation and
third-party integration
Before and After: Replacing Caching Tier with Couchbase
Server
Memcached Tier Replacement: How it Works
• Fully memcached protocol compatible
• Easy to replace a tier of individual memcached servers with a Couchbase Server cluster
• The cluster receives reads and writes, keeps frequently accessed items in memory, persists and shards and replicates the data amongst the cluster
• Reads and writes are still as low latency and high throughput as memcached
• User gets all the scalability and high-availability advantages of a Couchbase Server cluster
•User profile: preferences and psychographic data
•Ad serving history by user•Ad buying history by
advertiser •Ad serving history by
advertiser
Ad Targeting
Use Case: Ad Targeting
•High performance to meet limited ad serving budget; time allowance is typically <40 msec
•Scalability to handle hundreds of millions of user profiles and rapidly growing amount of data
•24x7x365 availability to avoid ad revenue loss
•Sub-millisecond reads/writes means less time is needed for data access, more time is available for ad logic processing, and more highly optimized ads will be served
•Ease of scalability ensures that the data cluster can be grown seamlessly as the amount of user and ad data grows
•Always-on operations = always-on revenue. You will never miss the opportunity to serve an ad because downtime.
Types of Data Application Requirements
Why NoSQL and Couchbase
Easy Scalabili
ty
Consistent High
Performance
Always On
24x365
Grow cluster without application changes, without downtime with a single click
Consistent sub-millisecond read and write response times
with consistent high throughput
No downtime for software upgrades, hardware maintenance, etc.
Couchbase Server
JSONJSONJSON
JSONJSON
PERFORMANCE
Flexible Data Model
JSON document model with no fixed schema.
Couchbase is the Complete Solution
Proven Easy, Online Scalability
Scaling
• Fully online throughout
• Single REST/Click to add or remove arbitrary number of nodes
• Parallelize data movement on rebalance, throttles to prevent overload
Couchbase: High throughput that scales linearly
Linear throughput scalability
High throughput with 1.4 GB/sec data transfer rate
using 4 servers
http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf
Proven Rapid Growth ScalabilityDraw Something by OMGPOPDaily Active Users (millions)
Feb 2012 March 2012
Consistent High Performance
Consistent High Performance
• Consistent, predictable sub millisecond latency Apps need fast, predictable access to data, it’s not good enough
to be fast some of the time
• Consistent, predictable throughput Throughput capacity of your data layer should be independent
of the mix of reads and writes
Consistent low latency with varying doc sizes
Consistently low latencies in microseconds for
varying documents sizes with a mixed workload
http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf
High throughput that scales linearly
Linear throughput scalability
High throughput with 1.4 GB/sec data transfer rate
using 4 servers
Linked-In 4 node cluster
Always On 24x7x365
Always on 24x7x365
• Online upgrades Balance in nodes
with new versions
• Online backup
• Online compaction
• Built-in monitoring plus REST interface Cluster wide to per node drill down
• Full admin REST interface for easy integration
Availability
CACHE 1
CACHE 2
CACHE 3
0 10 20 30 40 50 60 70 80 90 100
82
57
72
Couchbase
Flexible Data Model
Relational vs Document Data Model
Relational data model Document data modelCollection of complex documents with
arbitrary, nested data formats andvarying “record” format.
Highly-structured table organization with rigidly-defined data formats and
record structure.
C1 C2 C3 C4
JSONJSON
JSON
{
}
Comparisons
Couchbase Server vs. MongoDB
Easy Scalability
Consistent, High Performance
Flexible Data Model
Always On 24x7x365
Consistent sub millisecond reads/writes;Consistent high throughput
No downtime for software upgrades, hardware maintenance, etc.
Schemaless data model for rapid development
With 1-click, horizontally grow cluster, even scale across datacenters
High & Inconsistent latency;Lower throughput
Schemaless data model for rapid development
Difficult online upgrade;Not all maintenance is online
Complex multi-step scaling, no write scaling across data centers
✔ ✖
✔
✔
✔
✔
✖
✖
✔
Couchbase Server Leadership vs. Cassandra
Easy Scalability
Consistent, High Performance
Flexible Data Model
Always On 24x7x365
Consistent sub-millisecond reads/writes and high throughput
No downtime for software upgrades, hardware maintenance, etc.
Schemaless data model for rapid development
With 1-click, horizontally grow cluster, even scale across datacenters
High and inconsistent latency; medium throughput
Very complex columnar data model
Online upgrades and online maintenance
Complex multi-step scaling, coarse grain growth recommended
✔
✔
✔
✔ ✖
✖
✖
✔
Read performance comparison - NoSQL databases
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 220000
2000
4000
6000
8000
10000
12000
14000
16000
18000
Read latencies against throughput
Operations per Second
95t
h Pe
rcen
tile
Late
ncy
(ms)
MongoDB cannot handle throughput above ~ 8000 ops / sec
Couchbase handles ~3X throughput with significantly lower latency
MongoDB
Cassandra
Couchbase
Third Party Data - Altoros
Write performance comparison - NoSQL databases
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 220000
5000
10000
15000
20000
25000
30000
Insert/update latencies against throughput
Operations per Second
95t
h Pe
rcen
tile
Late
ncy
(ms)
MongoDB latency shoots up beyond 6000 ops / sec
Couchbase latency stays consistently low even at 20000 ops / sec
MongoDB
Cassandra
Couchbase
Third Party Data - Altoros
Thank you!
Get Couchbase http://www.couchbase.com/download