using couchbase and xdcr for real-time advertising applications: couchbase connect 2014
DESCRIPTION
Mirror Image provides a fully managed, globally distributed and load balanced real-time platform specifically designed for online and mobile advertising applications. Our customers’ advertising applications must execute with very fast response times, typically under 200 ms., regardless of where the request originates geographically. Some of these applications require very large data sets, with hundreds of millions of rows and hundreds or thousands of values per row. Until recently, since geographically distributed database consistency was so challenging to achieve, our applications managed customers’ data sets using flat files which were limited in size, impacting the use cases that we were able to implement. Learn how Couchbase Server with XDCR is enabling Mirror Image to work with much larger data sets to successfully implement and execute a wider range of advertising technology use cases for our customers.TRANSCRIPT
Joe Lichtenberg
VP Advertising and Analytics, Mirror Image
Couchbase Connect
October 21, 2014
Using Couchbase and XDCR for Real-Time Advertising Applications
Media Delivery
Content Logic
Mirror Image 1.0 (c. 1997): Content Delivery Network
Media DeliveryEdge Computing
Content LogicContent Logic
Mirror Image 2.0: Edge Computing + Media Delivery
Media DeliveryEdge Computing
Content LogicContent Logic
Dynamic Delivery: Edge Computing + Media Delivery
Synchronized Operations Worldwide
Obligatory Marketing Slide
Extensive functional capabilities
Personalized expert customization and support
High capacity, worldwide, real time infrastructure
Edge Computing
Geo-Distributed Database*
Live and On-Demand Video
Object Delivery
SSL Delivery
Token-Based Access Control
Reporting and Analytics
Knowledgeable, dedicated,in-house support team
Fully staffed, 24 x 7 Network Operations Center (NOC)
Expert professional servicesresources
High capacity, centralizedserver model
Exceptional performance, scalability, and availabilitywith SLA guarantees
Elastic scaling within and across geographies
Worldwide coverage
15+ years of experience
Mirror Image’s Dynamic Delivery Network
powers hundreds of billions of real-time requests for mission critical applications
throughout the Advertising and Advertising Technology ecosystem
Why Focus on Advertising and Ad-Tech?
-Lots of prospects-Requires fast response times to each visitor’s browser / device
Ad Tech Ecosystem
Source: Luma Partners
Source: www.iab.net/data
Ad TechEcosystem – Another View
Request/Response/Feedback Cycle
• Request Logic
– Customized behavior based on explicit and derived request attributes such as IP
address, user-agent, query parameters, cookie values, geo-location, …
• Response Logic
– Personalized javascript, XML, HTML; Transparent pixel GIFs; Cookie modifications;
Transformations, token substitutions…
• Edge Data Sets
– Shared data sets such as IAB Spiders & Bots, WURFL (Device DB), IP-GeoLocation
– Customer’s Key-Value Data
• Customized Data Collection
– Delivery of log files to customers as part of feedback cycle
Edge Computing Flow
• IAB Spiders & Bots Data• Mobile Device Data Sets
• IP – Geolocation Data Sets• Custom Key-Value Data In Memory*
What Problems Does The CS Implementation Solve?
• Customers and prospects have requirements that we workwith their large and growing data sets in real-time at theedge of the internet
• Our “flat file” implementation required moving the entirecontents of the files into memory on each server at startup,and…
• … customers were bumping up against data size limitations
Edge Computing Flow
• IAB Spiders & Bots Data• Mobile Device Data Sets
• IP – Geolocation Data Sets• Custom Key-Value Data In Memory
• Couchbase Key-Value Distributed Database
• Real-time GUID database
• Real-time cookie matching
• Contextual targeting
• Ad fraud / brand safety
• Cross-device user matching / audience de-duplication
• In session execution of batch-developed analytics
• Any web or mobile database-backed application with geographically dispersed users requiring fast end-to-end response times
Geo-Distributed Database: Use Cases
Geo-Distributed Database: Requirements
• High performance lookup and write capabilities at the edge
• Ability to manage large custom data sets for each customer application
• Low latency replication (aka XDCR)
• Ability to replicate to different regions per application / different data per region
• Key-value lookups only– no need for complex queries or SQL
• Reliability
Implementation Details
• Master hub sites have a 3 node Couchbase cluster for
reliability
• Edge sites start with a single instance for real-time RW
access, and scale out via clustering based on demand
• Edges can failover to nearby edges for availability
• Limited number of buckets
– Multi-tenant
– Geographic regions
– Customer-specific bucket(s) are optional
Implementation Details
• Phase 1 (in production): Read-only edges; master
hubs; data import / ETL servers
• Phase 2: Extends read-only behavior to virtual
locations
• Phase 3: Adds write support on edges, bi-
directional XDCR back to each master hub
GDD “Marchitecture” - Read Only
GDD “Marchitecture” - Read/Write
Master Hub Configuration
CustomizableETL Servers
FTP UploadStorage
CouchbaseEdge Instance
ETL MasterCouchbase 3-Node Cluster
ECF Server Farm
Distributed Star Network Architecture
• Globally distributed, consistent data -> High performance globally
• Schema-less data model -> Flexibility
• Scalable -> Handles traffic and data growth and spikes
• Fully managed service -> Including setup, maintenance, replication, synchronization, backup, security, and 24x7 monitoring
• Subscription-based billing model -> No CAPEX, pay as you go
• Uses Couchbase Server -> Proven; active developer community
• Data privacy -> No conflicts of interest with our customers
GDD Key Benefits (or “Marketing Slide #2”)