jax london slides
TRANSCRIPT
Microservices in a Streaming
World
What are microservices really about?
GUI
UI Service
Orders Service
Returns Service
Fulfilment Service
Payment Service
Stock Service
Collaboration?
Orders Service
Autonomy?
Internal World
External world
External world
External World
External world
Service Boundary
User-Orders Statement getOpenOrders(),
Less Surface Area = Less Coupling
Encapsulate State
Encapsulate Functionality
Encapsulation => Loose Coupling
Independence gives services their value
Orders Service
Stock Service
Email Service
Fulfillment Service
Independently Deployable
Independently Deployable
Independently Deployable
Independently Deployable
It’s hard to scale individual applications
What happens when we grow?
Companies are inevitably a collection of applications They must work together to some degree
Inverse Conway Maneuver
The 'Inverse Conway Maneuver' recommends evolving your team and organizational structure to promote your desired
architecture. !
Org Structure
Software Architecture
But independence comes at a cost
$$$
Change 1 Change 2
Redeploy
Singly-deployable apps are easy
User-Orders Statement getOpenOrders(),
Independently Deployable Independently Deployable
Synchronized changes are painful
Single Sign On Business Service authorise(),
It’s unlikely that a Business Service would need the internal SSO state / function to change
Single Sign On
SSO has a tightly bounded context
Business services are different
Catalog Authorisation
Most business services share the same core stream of facts.
Most services live in here
We need encapsulation to hide internal state
But we need the freedom to slice & dice shared data like any other
dataset
But data systems have little to do
with encapsulation
Service Database
Data on inside
Data on outside
Data on inside
Data on outside
Interface hides data
Interface amplifies
data
Databases amplify the data they hold
The data dichotomy Data systems are about exposing data.
Services are about hiding it.
This affects services in one of
two ways
getOpenOrders( fulfilled=false,
deliveryLocation=CA, orderValue=100,
operator=GreatherThan)
Either (1) we constantly add to the interface, as datasets grow
getOrder(Id)
getOrder(UserId)
getAllOpenOrders()
getAllOrdersUnfulfilled(Id)
getAllOrders()
Services can end up looking like kookie home-grown databases
...and data amplifies this service
boundary problem
(2) Give up and move whole datasets en masse
getAllOrders()
Data Copied Internally
This leads to a different problem
Data diverges over time
Orders Service
Stock Service
Email Service
Fulfill- ment Service
The more mutable copies,
the more data will diverge over time
Nice neat services
Service contract too
limiting
Can we change both services
together, easily? NO
Broaden Contract
Eek $$$
NO
YES
Is it a shared
Database?
Frack it! Just give me ALL the data
Data diverges. (many different
versions of the same facts) Lets encapsulate
Lets centralise to one copy
YES
Start here
There is competition between these forces
Divergence Accessibility Encapsulation
Shared database
Service Interfaces
Better Accessibility
Better Independence
No perfect solution
Is there a better way?
Data on the Inside
Data on the Outside
Data on the Outside
Data on the Outside
Data on the Outside
Service Boundary
Make data-on-the-outside a 1st class citizen
Use a Stream Processing tool-kit
Kafka is a Streaming Platform
Kafka: a Streaming Platform
The Log Connectors Connectors
Producer Consumer
Streaming Engine
The Log?
Shard on the way in
Producing Services
Kafka
Consuming Services
Each shard is a queue
Producing Services
Kafka
Consuming Services
Consumers share load
Producing Services
Kafka
Consuming Services
Load Balanced Services
Fault Tolerant Services
Build ‘Always On’ Services
Rely on Fault Tolerant Broker
Services can “Rewind & Replay” the log
Rewind & Replay
Compacted Log (retains only latest version)
Version 3
Version 2
Version 1
Version 2
Version 1
Version 5
Version 4
Version 3
Version 2
Version 1
Two primitives: (a) Log (append only) (b) Compact-Log (update)
Service Backbone Scalable, Fault Tolerant, Concurrent, Strongly Ordered, Stateful
The Log Connectors Connectors
Producer Consumer
Streaming Engine
A place to keep data-on-the-
outside
Now add Stream Processing
KStreams: A database engine for data-in-flight
The Log Connectors Connectors
Producer Consumer
Streaming Engine
Max(price) From orders where ccy=‘GBP’ over 1 day window emitting every second
Continuously Running Queries
Features: similar to database query engine
Join Filter Aggr- egate
View
Window
What is Stream Processing?
Table Index
Query Engine
Query Engine vs
Database Finite source
Stream Processor Infinite source
What is Stateful Stream Processing?
Table Index
Query Engine Query
Engine
vs
Database Finite source
Stateful Stream Processor Infinite & Finite source
Table Compacted Stream
KStreams
Kafka
Stateful Stream Processing
stream
Compacted stream
Join
Stream Data
Stream-Tabular Data
Windowed Stream
Locally Cached Table
(disk resident)
Kafka Kafka Streams
Email Confirmation Example
Orders
Customer (Compacted)
Join
Customer Stream
Create Confirmation Emails by Joining
Order “Events” with Customer information
Kafka Email Service
Orders Stream
Scales Out
Embeddable
Orders Service Kafka
Orders Service
Orders Service
Analytic function, Event Driven, all in one.
Analytics Event Driven
Business Services
Join shared datasets without putting pressure on source services
Email Service
Legacy App Orders Payments Stock
KSTREAMS
Shared State is only cached in the service, so there is no way to diverge
Email Service
Orders Payments Stock
Lives here !
Cached here!
KSTREAMS
Services work “Event Driven”, but with the power to join, filter and aggregate embedded in
each service
The Log Connectors Connectors
Producer Consumer
Streaming Engine
Services work “Event Driven”, but with the power to join, filter and
aggregate embedded in each service
customer orders
catalogue Pay-
ments
But sometimes you have to move data
Replicate it, so both copies are identical
Keep Immutable
Iterate via view regeneration
Regenerate from the
Log
Fix at source
If you have to move data en masse fix at source and regenerate from the log
Oracle
Connectors make this easier
The Log
Connector Connector
Producer Consumer
Streaming Engine
Cassandra
So…
When building services consider more than just
REST
The data dichotomy Data systems are about exposing data.
Services are about hiding it.
Remember:
Shared data is a reality for most organisations
Catalog
Embrace data that lives in between services
Avoid data services that have complex / amplifying
interfaces
Share Data via Immutable Streams
Embed Function into each service
Distributed Log Embedded
KStreams
Always on, Event-Driven Services
The Log (streams & tables)
Ingestion Services
Services with Polyglotic
persistence
Simple Services
Streaming Services
Keep it simple, Keep it moving
Twitter: @benstopford
Slides at
benstopford.com
Blog series arriving end of the month at
http://confluent.io/blog