jax london slides

Microservices in a Streaming

World

What are microservices really about?

GUI

UI Service

Orders Service

Returns Service

Fulfilment Service

Payment Service

Stock Service

Collaboration?

Orders Service

Autonomy?

Internal World

External world

External world

External World

External world

Service Boundary

User-Orders Statement getOpenOrders(),

Less Surface Area = Less Coupling

Encapsulate State

Encapsulate Functionality

Encapsulation => Loose Coupling

Independence gives services their value

Orders Service

Stock Service

Email Service

Fulfillment Service

Independently Deployable




It’s hard to scale individual applications

What happens when we grow?

Companies are inevitably a collection of applications They must work together to some degree

Inverse Conway Maneuver

The 'Inverse Conway Maneuver' recommends evolving your team and organizational structure to promote your desired

architecture. !

Org Structure

Software Architecture

But independence comes at a cost

$$$

Change 1 Change 2

Redeploy

Singly-deployable apps are easy

User-Orders Statement getOpenOrders(),

Independently Deployable Independently Deployable

Synchronized changes are painful

Single Sign On Business Service authorise(),

It’s unlikely that a Business Service would need the internal SSO state / function to change

Single Sign On

SSO has a tightly bounded context

Business services are different

Catalog Authorisation

Most business services share the same core stream of facts.

Most services live in here

We need encapsulation to hide internal state

But we need the freedom to slice & dice shared data like any other

dataset

But data systems have little to do

with encapsulation

Service Database

Data on inside

Data on outside

Data on inside

Data on outside

Interface hides data

Interface amplifies

data

Databases amplify the data they hold

The data dichotomy Data systems are about exposing data.

Services are about hiding it.

This affects services in one of

two ways

getOpenOrders( fulfilled=false,

deliveryLocation=CA, orderValue=100,

operator=GreatherThan)

Either (1) we constantly add to the interface, as datasets grow

getOrder(Id)

getOrder(UserId)

getAllOpenOrders()

getAllOrdersUnfulfilled(Id)

getAllOrders()

Services can end up looking like kookie home-grown databases

...and data amplifies this service

boundary problem

(2) Give up and move whole datasets en masse

getAllOrders()

Data Copied Internally

This leads to a different problem

Data diverges over time

Orders Service

Stock Service

Email Service

Fulfill- ment Service

The more mutable copies,

the more data will diverge over time

Nice neat services

Service contract too

limiting

Can we change both services

together, easily? NO

Broaden Contract

Eek $$$

NO

YES

Is it a shared

Database?

Frack it! Just give me ALL the data

Data diverges. (many different

versions of the same facts) Lets encapsulate

Lets centralise to one copy

YES

Start here

There is competition between these forces

Divergence Accessibility Encapsulation

Shared database

Service Interfaces

Better Accessibility

Better Independence

No perfect solution

Is there a better way?

Data on the Inside

Data on the Outside

Data on the Outside

Data on the Outside

Data on the Outside

Service Boundary

Make data-on-the-outside a 1st class citizen

Use a Stream Processing tool-kit

Kafka is a Streaming Platform

Kafka: a Streaming Platform

The Log Connectors Connectors

Producer Consumer

Streaming Engine

The Log?

Shard on the way in

Producing Services

Kafka

Consuming Services

Each shard is a queue

Producing Services

Kafka

Consuming Services

Consumers share load

Producing Services

Kafka

Consuming Services

Load Balanced Services

Fault Tolerant Services

Build ‘Always On’ Services

Rely on Fault Tolerant Broker

Services can “Rewind & Replay” the log

Rewind & Replay

Compacted Log (retains only latest version)

Version 3

Version 2

Version 1

Version 2

Version 1

Version 5

Version 4

Version 3

Version 2

Version 1

Two primitives: (a) Log (append only) (b) Compact-Log (update)

Service Backbone Scalable, Fault Tolerant, Concurrent, Strongly Ordered, Stateful


Producer Consumer

Streaming Engine

A place to keep data-on-the-

outside

Now add Stream Processing

KStreams: A database engine for data-in-flight


Producer Consumer

Streaming Engine

Max(price) From orders where ccy=‘GBP’ over 1 day window emitting every second

Continuously Running Queries

Features: similar to database query engine

Join Filter Aggr- egate

View

Window

What is Stream Processing?

Table Index

Query Engine

Query Engine vs

Database Finite source

Stream Processor Infinite source

What is Stateful Stream Processing?

Table Index

Query Engine Query

Engine

vs

Database Finite source

Stateful Stream Processor Infinite & Finite source

Table Compacted Stream

KStreams

Kafka

Stateful Stream Processing

stream

Compacted stream

Join

Stream Data

Stream-Tabular Data

Windowed Stream

Locally Cached Table

(disk resident)

Kafka Kafka Streams

Email Confirmation Example

Orders

Customer (Compacted)

Join

Customer Stream

Create Confirmation Emails by Joining

Order “Events” with Customer information

Kafka Email Service

Orders Stream

Scales Out

Embeddable

Orders Service Kafka

Orders Service

Orders Service

Analytic function, Event Driven, all in one.

Analytics Event Driven

Business Services

Join shared datasets without putting pressure on source services

Email Service

Legacy App Orders Payments Stock

KSTREAMS

Shared State is only cached in the service, so there is no way to diverge

Email Service

Orders Payments Stock

Lives here !

Cached here!

KSTREAMS

Services work “Event Driven”, but with the power to join, filter and aggregate embedded in

each service


Producer Consumer

Streaming Engine

Services work “Event Driven”, but with the power to join, filter and

aggregate embedded in each service

customer orders

catalogue Pay-

ments

But sometimes you have to move data

Replicate it, so both copies are identical

Keep Immutable

Iterate via view regeneration

Regenerate from the

Log

Fix at source

If you have to move data en masse fix at source and regenerate from the log

Oracle

Connectors make this easier

The Log

Connector Connector

Producer Consumer

Streaming Engine

Cassandra

When building services consider more than just

REST

The data dichotomy Data systems are about exposing data.

Services are about hiding it.

Remember:

Shared data is a reality for most organisations

Catalog

Embrace data that lives in between services

Avoid data services that have complex / amplifying

interfaces

Share Data via Immutable Streams

Embed Function into each service

Distributed Log Embedded

KStreams

Always on, Event-Driven Services

The Log (streams & tables)

Ingestion Services

Services with Polyglotic

persistence

Simple Services

Streaming Services

Keep it simple, Keep it moving

Twitter: @benstopford

Slides at

benstopford.com

Blog series arriving end of the month at

http://confluent.io/blog

jax london slides

Technology