data modeling for microservices with cassandra and spark
TRANSCRIPT
Strata + Hadoop World NYC Sept 26-29, 2016Strata + Hadoop World NYC Sept 26-29, 2016Page 1Page 1
Jeff Carpenter, Choice Hotels International
Data modeling for microservices with
Cassandra and Spark
Strata + Hadoop World NYC Sept 26-29, 2016
1 IT Transformation – Distribution and Analytics
2 Creating a Data Architecture
3 Data Modeling for Microservices
4 Using Metadata for Diagnostics and Analytics
5 Challenges
Agenda
Page 2
Strata + Hadoop World NYC Sept 26-29, 2016
IT Capabilities
Corporate IT
Guest
Franchise
Relations
Hotel
Manage-
ment
Business
Intelligence
Distribution
Page 3
This
talk
Strata + Hadoop World NYC Sept 26-29, 2016
CRSWeb and
Mobile
External
Channels
Customer
& LoyaltyBilling
Property
Systems
Reporting
& Analytics
Distribution - Central Reservation System
Page 4
Distribution
Domain
Guest
Domain
Franchisee
Domain
Hotel
Management
Domain
Business
Intelligence
Domain
Strata + Hadoop World NYC Sept 26-29, 2016
Current Reservation System – By The Numbers
Page 5
25 years
6,000 hotels
50
transactions / second4,000
distribution channels
1 instance
Strata + Hadoop World NYC Sept 26-29, 2016
New Systems: Distribution and Data Platforms
Page 6
Distribution Platform
Data Platform
History
Realtime
data
See: Choice Hotels's journey to
better understand its customers
through self-service analytics
This Talk: how we model data
and use the self-service
platform
Strata + Hadoop World NYC Sept 26-29, 2016
Distribution Platform - Architecture Tenets
Cloud-native
Microservices
Open Source Infrastructure
Extensibility
Stable, Scalable, Secure
Page 7
Strata + Hadoop World NYC Sept 26-29, 2016
Data Ownership
What is a Microservice? (one definition)
Page 8
Message
Driven Service
Entity
ServiceClient
REST
API
AMQ
Events
DB
Composing
Service
Persistence
Strata + Hadoop World NYC Sept 26-29, 2016Strata + Hadoop World NYC Sept 26-29, 2016Page 9
How can we design our data
architecture & models to be…
• Scalable?
• Extensible?
• Maintainable?
• Analytics-ready?
Strata + Hadoop World NYC Sept 26-29, 2016
Non-
relational
storage
Long Term
Storage
LoggingReporting
& Analytics
Metrics
Our Data Stack
Page 10
Strata + Hadoop World NYC Sept 26-29, 2016
Data Modeling – Then and Now
Isolated
Systems
Data
Dictionary
SOA and
Canonical
Data
Model
Services
own data
Page 11
• Identifying domains and relationships
Conceptual Data Model
• Identifying data types and relationships
Logical Data Model
• Java APIs
• RESTful APIs (JSON)
• Events (JSON)
• Cassandra Schemas
Physical Models
Strata + Hadoop World NYC Sept 26-29, 2016
Conceptual Data Model - Domains
Page 12
rates inventoryhotels reservationsoffers
Strata + Hadoop World NYC Sept 26-29, 2016
Hotel Management
Domain
Guest DomainDistribution Domain
Conceptual Data Model – Domain Relationships
Page 13
hotelsguest
stay
loyalty
rates
inventory
offers
reservations
Strata + Hadoop World NYC Sept 26-29, 2016
Rates Domain
Composite Rate Service
Rate Plan Service
Rate
Service
Logical Data Model – Identifying Types
Page 14
Rate Plan
• id
• code
• hotelId
• effectiveDates
• Conditions
Rate
• id
• ratePlanId
• productId
• hotelId
• dateSpan
Price
• condition
• amount
Product
• id
• code
• hotelId
• features
• …
Strata + Hadoop World NYC Sept 26-29, 2016
Standardizing Common Data Types
• Instead of a Canonical Data Model, we standardize basic building blocks
– Feature, Category, Brand
– Geospatial
– Financial
– Time
– Contact information
Page 15
Address
• lines[]
• city
• subdivision
• country
• postalCode
Strata + Hadoop World NYC Sept 26-29, 2016
Data Types →Microservice Identification
Page 16
Hotel
Service
Rates
Service
Data Maintenance
Apps
Inventory
Service
Offer
Service
Inventory
Domain
Rates
Domain
Hotel
Domain
Offer
Domain
Internal / External
Client Apps
Reservation
Service
Reservation
Domain
Strata + Hadoop World NYC Sept 26-29, 2016
Physical Data Models
Page 17
Physical Models
Java APIsRESTful APIs
(JSON)
Events
(JSON)
Cassandra
Schemas
JSON = primary definition of
the data type owned by each
service
Strata + Hadoop World NYC Sept 26-29, 2016
Key Data Types → RESTful Resource Paths
Page 18
Offer
Service/offers
/reservations
Hotel
Service
Reservation
Service
Rates
Service
Inventory
Service
/hotels
/rates
/inventory
Strata + Hadoop World NYC Sept 26-29, 2016
REST Java API
GET /types/<id> Type getTypeById()
GET /types?<query parameters> Type[] searchType(TypeSearchCriteria)
POST /types/ (JSON body) createType(Type)
PUT /types/ (JSON body) updateType(Type)
DELETE /types/<id> deleteType(TypeId)
Java and RESTful APIs – common pattern
Page 19
Strata + Hadoop World NYC Sept 26-29, 2016Page 20
Cassandra Data Modeling
(an idealized view)
Strata + Hadoop World NYC Sept 26-29, 2016
View hotels near POI
View hotel Info
Show POIs near hotel
Shop for rooms at hotel
View room details
Book a room
Q1 Q2
Q3
Q4
Q5
View reservation by confirmation
number
View hotel reservations for
a date
Find reservation by guest name
Q6
Q8
Q7
View guest details
Q9
Q9
Q9
Cassandra Data Modeling – Access Patterns
Page 21
Strata + Hadoop World NYC Sept 26-29, 2016
pois_by_hotel
hotel_id
poi_name
description
Q3
Q1 Q2 Q4
Q5
amenities_by_room
hotel_id
room_id
amenity_name
description
K
K
C↑
K
C↑
hotels_by_poi
poi_name
hotel_id
name
phone
address
K
C↑
hotels
hotel_id
name
phone
address
K
available_rooms_by_hotel_date
hotel_id
date
room_number
is_available
K
C↑
C↑
Cassandra Data Modeling – Chebotko Diagrams
Page 22
Strata + Hadoop World NYC Sept 26-29, 2016
hotel keyspace
hotels_by_poi
poi_name
hotel_id
name
phone
address
K
C↑
pois_by_hotel
hotel_id
poi_name
description
amenities_by_room
hotel_id
room_number
amenity_name
description
K
K
C↑
K
C↑
available_rooms_by_hotel_date
hotel_id
date
room_number
is_available
K
C↑
C↑
date
smallint
boolean
text
text
text
text
address
text
text
smallint
text
text
text
text
*address*
street
city
state_or_province
postal_code
country
hotels
hotel_id
name
phone
*address*
text
text
text
text
text
text
text
text
address
K
text
Cassandra Data Modeling - Physical
Page 23
Strata + Hadoop World NYC Sept 26-29, 2016
Cassandra Data Modeling - Schemas
CREATE KEYSPACE hotel
WITH replication = {'class':
'SimpleStrategy',
'replication_factor' : 3};
CREATE TYPE hotel.address (
street text,
city text,
state_or_province text,
postal_code text,
country text
);
CREATE TABLE hotel.hotels_by_poi (
poi_name text,
hotel_id text,
name text,
phone text,
address frozen<address>,
PRIMARY KEY ((poi_name),
hotel_id)
)
WITH CLUSTERING ORDER BY (
hotel_id ASC) ;
Page 24
Strata + Hadoop World NYC Sept 26-29, 2016Page 25
And now…
Back to reality
Strata + Hadoop World NYC Sept 26-29, 2016
Keyspace hotel
Access Patterns and Denormalization
Page 26
Locate hotel
by identifier
Find hotels
within X miles
of point Y
Find hotels by
city, state,
country
Find hotels
by postal
code
Hotels by
amenity
Find hotels
by brand
hotels_by_id
hotels_by_brand
hotels_by_postal_code
…
Hotels by
this
Hotels by
that
Hotels by
something
else
Strata + Hadoop World NYC Sept 26-29, 2016
Metadata
Page 27
Request Context
• Requestor
• Tracking ID
• Token
• Locale
Service AMQ
Logs
ELK Stack
EventsIncoming
Request
Strata + Hadoop World NYC Sept 26-29, 2016
Asynchronous events
Page 28
Event
• Type
• Create
• Update
• Delete
• Request Context
• Old entity
• New entity
Request Context
• Requestor
• Tracking ID
• Token
• Locale
{
"type" : "UPDATE",
"trackingId" : "0da7b794-f2c3-…",
"requestor": "Legacy CRS",
"newEntity" : {
"hotelId": "AZ123",
"productId": "NSK",
"date": "2016-05-20",
"consumedCount": "22",
"totalCount": "25“
},
"oldEntity" : {
"hotelId": "AZ123",
"productId": "NSK",
"date": "2016-05-20",
"consumedCount": "20",
"totalCount": "25“
}
}
Entity (old/new)
• Id
• …
Sample Inventory Event
Strata + Hadoop World NYC Sept 26-29, 2016
Putting It Together – Diagnostics
Page 29
Service
C*
node
node
node
node
Incoming
Request
Data History Logs
Metrics StoreELK StackData Platform
Metrics
Strata + Hadoop World NYC Sept 26-29, 2016
Metrics StoreELK Stack
Putting It Together – Long Term Storage
Page 30
Data Platform
C*
node
node
node
node
Long
Term
Storage
Strata + Hadoop World NYC Sept 26-29, 2016
Separating Active and History Data
Page 31
Now
Time
Yesterday’s data is
ancient history
Rate + Inventory Data
Strata + Hadoop World NYC Sept 26-29, 2016
Data Platform - Cloudera
History architecture
Page 32
Service AMQ Kafka
S3
Other
subscribers
History retrieval
History capture
Customer
Service Apps
History
Service
Spark
node
node
node
node
Impala*
Strata + Hadoop World NYC Sept 26-29, 2016
Microservice Data Challenges
No Joins?
Data Maintenance
Data Integrity
Cascading Deletes
Transactions
Page 33
Strata + Hadoop World NYC Sept 26-29, 2016
Distributed Transactions, Anyone?
Page 34
Commit the
contract
Reserve
the inventory
Booking
Client
Data Maintenance
Apps
Inventory
Service
Reservation
Service
inventory
reservations
Data
synchronization
Strata + Hadoop World NYC Sept 26-29, 2016
Alternatives to Distributed Transactions
Approach Example Scope
C* Lightweight
TransactionUpdating inventory counts Data Tier
C* Logged BatchWriting to multiple denormalized
hotel tablesData Tier
Retrying failed callsData synchronization, reservation
processingService
Compensating
transactionsVerifying reservation processing System
Page 35
Eventual
consistency
Strong
consistency
Strata + Hadoop World NYC Sept 26-29, 2016
Final Thoughts
Data Models > Microservices
Events = Streams
Use Metadata Everywhere
Page 36
Strata + Hadoop World NYC Sept 26-29, 2016
Now Available!
Page 37
Cassandra: The Definitive Guide, 2nd Edition
Completely reworked for Cassandra 3.X:
• Data modeling in CQL
• SASI indexes
• Materialized views
• Lightweight transactions
• DataStax drivers
• New chapters on security, deployment, and integration
Strata + Hadoop World NYC Sept 26-29, 2016
Contact Info
@choicehotels
careers.choicehotels.com
@jscarp
jeffreyscarpenter
Page 38