strategic design by architecture and organisation @ finn.no - javazone 2016

45
Sebastian Verheughe Chief Enterprise Architect @ FINN.no JavaZone - 8 September 2016 Strategic Design Architecture and Organization

Upload: sebastian-verheughe

Post on 22-Jan-2018

441 views

Category:

Technology


2 download

TRANSCRIPT

S

Sebastian VerheugheChief Enterprise Architect @ FINN.no

JavaZone - 8 September 2016

Strategic DesignArchitecture and Organization

Norway´s Classifieds Marketplace

~ 250

services ~ 150

deploys / day

~ 1.4

B NOK

~ 120

developers

#1

6800

employees

30

countries 200

million users

One Global Product and Technology Organization

Building Global Platforms & Services

part of

How to use architecture and organization to…

• deliver more effectively and efficiently on the business strategy

• address complexity without needing total rewrites each time

• minimize risk and maintain quality over time...

Strategic

• I will be talking about the big picture in general terms

• You will learn about real challenges we observe at FINN

• You will learn how we try to solve these challenges

• I will provide you with some recommendations…

What to Expect

Partitioning of FINN

Chapter I

The Value of Domain Driven Design

“Getting service boundaries wrong can be expensive. It can lead to a larger

number of cross-service changes, overly coupled components, and in general

could be worse than just having a single monolithic system”

Sam Newman, Thoughtworks

Extract from the article Microservices for greenfields?

Dependency Graph of FINN

With microservices, each service is simpler but the distributed environment is more complex

Conceptual Architecture of FINN

Not only microservices, monolithic front-end and legacy system/database

Front-EndFEFEFE

Legacy

(Big Ball of Mud)µsµsµsµsµsµsµsµsµs

API

The highest complexity seems to be related to a few central structures,

and especially legacy database integration makes it hard to split.

µsµsµsµsµsµs

Front-End

Risk of Unhandled Complexity

Key areas with unhandled complexity, may damage your business

Co

mp

lexity

Time

Point of Total Rewrite

(or worse)

Missed Business

Opportunities

Simplefull speed ahead

The

Twilight

Zone

ComplexWhat happened?

Architecture Challenges @ FINN

• A few central structures with high complexity used in many contexts proves

very hard to modify and can cause unintended behavior elsewhere.

• High coupling between services and long request call chains affects

performance and availability negatively (8 fallacies of distributed computing).

• Clients aggregating services become monolithic front-ends, bottleneck, single-

points-of-failures, and require / desire standardization.

Should we partition by business entities, business objectives or both?

Partitioning Strategy

Customer Service

Agreement Service

Store Service

Login Service

User Service

Authentication Service

Enrollment ServiceReview Service

Challenges with Services Based on Entities

They don´t do one thing well

• support multiple biz challenges

• have unnecessary dependencies

• are complex / hard to change

• Will always be suboptimal solutions

• impact performance and availability (single point of failure)

• grow in complexity for each new business capability using them

CustomerCustomer

Service

Id

Password

Billing Address

Credit Card

Purchases

Shipping Address

Mr. Customer, what do you do?

Front-End Client

Service Service

Benefits with Services Based on Objectives

Auth ShippingTradePaymentBilling

They do one thing well

• have fewer dependencies

• are simpler to change

• allow for optimal models (of entities in context)

• improves performance and availability.

• are only as complex as the business objective they deliver on!

Well, I know why we need Mr. Billing around…

Front-End Client

Customer Id

Billing Address

Customer Lead Id

Interests

Customer Id

Password

Customer Seller Id

Customer Buyer Id

Object Id

Customer Id

Shipping Address

Transactional business boundary

Why do we Default to Global Entities / Models?

We have a global information model / information architect

that created one global model of all entities and their relationships, and you

implemented it.

We learned object-oriented programming at school

usually solving problems within a single context, believing that you should never model

the same entity twice (DRY).

We live in a world filled with entities around us…

and we try to clone the world as-is. It might just be harder for the human brain to work

with abstract models of specific problems, so we default to easy…

Aligning the Architecture with the Business

Technical solutions align with business capabilities (objectives / domains)

Ru

mp

Rib

Tenderloin

Business

Domain

Tech

Solution

Product

SalesAdvertising Billing Communication

They will be stable, with a minimum of dependencies between them,

and form transactional boundaries from a user / business perspective

Target Architecture (we call it direction in FINN)

The front-ends aggregate responses from services clustered by domain

Partitioning applies to both front-end (components) and back-endCapability / domain (problem space) & services (solution space)

Front-End

Message BUS (utilizing eventual consistency)

Front-End

Authentication Product Sales Billing ShippingAdvertising

µs

COMP

µs

µs

µsµs

µs

µs µs µsµs

Service /

System

COMP COMP

COMP COMP

COMP

COMP COMP

COMP

Recommendations

Invest heavily in building competence around distributed systems design - NOW!

Writing a single microservices is easy, testing and running hundreds together is hard.

Distribution requires rethinking of service boundaries, integration and handling of failures. Handle

the increase in number of services by automation and standardization in tools and infrastructure.

Tomorrow: Use a circuit breaker (Hystrix), use pub-sub between two services to communicate,

mock dependencies between two services for testing, read “8 fallacies of distributed systems”.

Design you services around business objectives, with several entity models

Entity services become monolithic & complex, and result in a lot of unnecessary dependencies.

Instead create different models of entities for the different problems you try to solve.

Tomorrow: select a complex entity service (or table) and divide it by different known contexts

Guide to Objective Based Service Boundaries

The service name should reveal the business purpose. Ideally in the form of verb and

optionally subject. E.g. authentication / authenticate user (does not always work)

Evaluate each field of your global entity model, determine data & relations that are

only of interest in a given context that serves a business purpose (e.g. authentication).

Evaluate which fields are updated at the same time (different use cases), these form

transactional boundaries around data groups and should be good candidates.

Ensure that data / fields are only mastered by one service (to avoid sync conflicts)

Never share / integrate via the database (complicated to break away from later…)

Organization Strategy

Chapter II

Organizational Observations @ FINN

Many features required several hand-offs between teams in order to be solved.

This was really slowing down development speed.

Teams where not able deliver a complete user / business capability by themselves.

We observed sub-optimal APIs, badly placed or duplicated business logic, presentation

inconsistency, and lack of full-stack operational responsibility (serving the end user).

Layered teams sometimes became overly focused about technical perfection.

This resulted in bottlenecks in the organization, maybe because they lacked clear

business goals.

Static teams made it difficult to move resources to changing business challenges.

Conway's Law

“Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations”

 Melvin E. Conway

Extract from the article How do Committees Invent?

Aligning the Organization with the Business

The organization should align with the business (and the technical solution)

The goal was an organization that more effectively was able to deliver on the business challenges

Inverse Conway Maneuver: Organize to promote your desired architecture (end state)

Ru

mp

Rib

Tenderloin

Org Ownership

Business

Domain

Tech

Solution

Product

SalesAdvertising Billing Communication

Infrastructure

FINN Technology OrganizationOrganized primarily by business domains, secondarily by cross-cutting concerns

Front-End Platform(s)

Space A

Dom

ain

A

Dom

ain

B

Dom

ain

C

Space BD

om

ain

D

Dom

ain

E

Dom

ain

F

Space C

Dom

ain

G

Dom

ain

H

Dom

ain

I

Space D

Dom

ain

J

Do

ma

in K

Dom

ain

L

HARDER TO CHANGE

EA

SIE

RT

O C

HA

NG

E

FU

LL S

TA

CK

WebiOSAndroid

ServersDatabasesData BusMon ToolsDeploy PL

All cross-cutting teams are

structured as enabling teams,

meaning they should deliver

self-service solutions upfront

to the vertical functional teams.

Spaces allows for Flexible Developer AllocationA space contains all kinds of developers, assigned to dynamic teams to solve

prioritized tasks. Size from 15-25, small enough to know each other well.

Space Leads, two persons in charge of organization and flow of

prioritized tasks affecting domains within the space. Decides the

teams to support the current prioritized business challenges.

Developers, free to move within the space. Grouped into teams

depending on the current problems needing to be solved / prioritized.

Technical Domain Experts (tech leads), responsible for the long term

development of the technical solution (QoS) delivering a full-stack

business capability.

Architects (4 in FINN), responsible for everything that affects multiple

domains (e.g. domain boundaries, integration, front-end architecture…

Space A

Adve

rtis

ing

Pro

duct S

ale

s

Bill

ing

TDE

TDE

TDE

DEV

DEV

DEV

SLSL

DEV

DEV

DEV

DEV

DEV

DEV

Same space

2-4 teams

Teams

Transitioning the Apps Team

About 25% of visits are from native apps and growing, but we had one team.

Since we believe that our main biz challenges are not related to native apps development,

it makes sense to focus on full-stack functional and more autonomous teams instead.

Front-End Platform(s)

Space A

Dom

ain

A

Dom

ain

B

Dom

ain

C

Space BD

om

ain

D

Dom

ain

E

Dom

ain

F

Space C

Dom

ain

G

Dom

ain

H

Dom

ain

I

Space D

Dom

ain

J

Dom

ain

K

Do

ma

in L

WebiOSAndroid

Decentralized Authority (Framing)

The TDEs are free to choose solutions within their domain, no sign-offs needed

Cross-Domain Front-End

Domain Domain Domain DomainDomain

TDE TDE TDE TDE TDE

The architects only govern integration and boundaries between the domains and the front-end,in addition to help succeeding with the implementation of the strategy.

Reorganization Observations (one year in)

• A single team has proven that they can improve a full-stack capability autonomously,

and it seems like faster (we have no god way of measuring)

• Tech leads are starting to take full-stack responsibility for the capability they deliver.

• The teams can freely choose how they within a domain solve each business challenge,

there are less technical “religious” discussions. And no exploitation of new technologies.

• One year in, the systems does not automagically partition themselves by org. structure,

but needs to be driven in each case. The inverse Conway maneuver only removed barriers.

• The organization structure (silos) works as barriers when cross domain work is needed,

and developers rather wait than contribute if changes are needed in someone else domain. It

may be a smaller problem now, but we need to change the culture and expectations.

Organizational Recommendations

Primarily align your teams with your business domains, not tech layers

Teams focusing on business problems first, and who are more efficient as they can take all

decisions internally in order to solve a business problem. Staff the team with the necessary

competence (e.g. full-stack) to reach their goal. Use Conway´s law to your advantage.

Decentralize decision making whenever applicable

When teams understand which problem they must solve, allowing them to take internal decisions

improved development speed, and probably results in the best design (for that specific problem).

Minimize central governance, but where needed make it effective and clear to the org.

Structure cross-cutting teams as enabling teams, not bottlenecks or guards

Cross-cutting functions such as infrastructure or architecture should always deliver tools and

rules (for cross-cutting concerns) up-front, not acting as guards slowing down development.

Make sure your organization culture encourages cross-organizational collaboration

Being Strategic about Complexity

Chapter III

Living With an Imperfect World

Chances are that your current system is not exactly how you want it to be…

What to do about it…

Domain Domain Domain DomainDomain

Message BUS

Big Ball of

Mud

front-end front-end

Medium

Ball of

Mud

Complexity & Business Paralysis

Business: We need to be able to “something important”!

Tech: Oh, that’s part of a complex area, we need to fix that first!

Business: Well, can you fix it this month?

Tech: Funny guy, it will take 9 months to rewrite that mess!

Business: Bugger, what about a simple meaningless feature?

Tech: Sure, with nanoservices it will be in production tomorrow!

The Total Rewrite Disease

Thoughts about why this happens…

• We take decisions about existing structures (systems and structures), not

about future user / business challenges we need to solve.

• We take the decision we can take, and when organization and ownership is

not aligned with user / business challenges, neither will our decisions be it.

• High complexity and lack of knowledge about how to address such complex

systems strategically forces us to solve a larger problem than necessary.

Identifying the smallest step needed to deliver a modified or new capability

Fastest Minimum Viable Capability

1. Identifying What Strategic Capabilities to Improve

In short: fix the basics, focus on advantages, leave the rest when possible…

THE REST

Basics

Advantages

Avoid feature driven…Ignore how for now…

Customers expect / require…

Why customers choose you…

2. Understanding How & Where to Deliver

When you know what, identify where it should go and how to deliver it.

Domain Domain Domain DomainDomain

Big Ball of

Mud

front-end front-end

Medium

Ball of

Mud

Capability model that needs improvement…

About this time, it might be helpful with an architecture strategy…

Slicing a Part of a Legacy Model @ FINN

Illustration of how we modified parts of a legacy model without needing to

rewrite all dependencies at the same time / at all. (Strangler Application)

Evil

Monolithic

Big Ball of Mud

Legacy

System

Front-End

1. Identify Domain

New

Narrower

Model

2. Create new master

3. (a)sync data

4. Move master and get business value quick 5, 6, 7, … Rewrite other dependencies gradually (if any biz value)

The goal is not to have the

new model behave identically

to the legacy model.

The entire point is to change

the capability in this case.

Faster Delivery of the Business Strategy

Tech: What would success look like for “something important”?

Business: Well, if these users could do this when doing that…

Tech: Right, so you don’t need this and that also to use it?

Business: No, I only care about these users being able to do that!

Tech: Why didn´t you say so, I can actually fix that in 3 weeks.

Business: Really? Thanks a lot…

Recommendations

Always question large rewrites without a clear strategic user / business value

Start by trying to understand what the business is trying to accomplish – or rephrased which ability

it needs to deliver using IT. E.g. able to deploy continuously, able to recommending products, able

to register the user. Ask ”why” 5 times.

Deliver new / modified capabilities outside of legacy monoliths

Existing capabilities are often used by many solutions, but the cost and risk of rewriting all

dependencies is very high. Instead, only deliver the modified capability to those who need it.

Syncing the data with the legacy solution minimizes the amount of rewriting needed.

Always find the simplest / smallest solution that delivers on strategic capabilities

When you know what, you can start discussing how best to achieve it. Suppress the temptation to

“upgrade everything” you think as bad with the existing solution. Find the simplest solution that

delivers the capability in a good way (should be possible to test).

Overflow…

Chapter IV

Infrastructure / Cloud

Default to Cloud Services

Unless you have special needs accept the fact that

companies of all sizes choosing cloud services will have an advantage

It´s simple, fast, scalable, and let´s you focus on the business!

FINN is cloudifying…

Register and deploy to e.g. Heroku or Google today and see for yourself

(application containers, message busses, etc.)

Data Pipeline (accessing distributed data)

With smart products and data analytics, how to best access distributed data?

To avoid exponential complexity, move the complexity to the infrastructure

PPPP

CCCC CCCC

PPPP

Message BUS (not ESB) - Kafka

pub

subPublish Domain Events

(not commands)

Pipeline Challenges @ FINN

Challenges we experienced at FINN when learning to use a pipeline

Reliable messaging is more than a product feature on the message bus

We had serious reliability issues related to order and payment in FINN (not good).

Creating a single transaction boundary between the service update and domain event was

not trivial. Running and configuring Kafka our self was also not without issues.

Introducing event based communication is hard, developers are used to RPC

For the developer, it´s easier just to make an RPC call another service. And in the

beginning, there are no data available to consume, and no motivation to publish since the

publisher and consumer are two different teams. You may need to enforce publishing until

it is the norm.

FINN - Technology Stack

Mostly Java (some Scala)

Moving from Sybase to PostgreSQL

Ramping up Node in the front end

Moving data around with the bus Kafka (separate JavaZone presentation)

Using Cassandra for high frequency writing (counting page clicks etc.)

Service calls using REST (default) or Thrift

Key Takeaways

1. Partition by business objective (domains), avoid global entity models

2. Organize by business objective, not by competence, technology or process

3. Take the smallest possible step to reach strategic capability, no total rewrites

But more importantly, it all requires an development organization with good

knowledge of the business domain and a great culture for working together

across organizational boundaries.

The End

Open for Questions…[email protected]