strategic design by architecture and organisation @ finn.no - javazone 2016
TRANSCRIPT
S
Sebastian VerheugheChief Enterprise Architect @ FINN.no
JavaZone - 8 September 2016
Strategic DesignArchitecture and Organization
6800
employees
30
countries 200
million users
One Global Product and Technology Organization
Building Global Platforms & Services
part of
How to use architecture and organization to…
• deliver more effectively and efficiently on the business strategy
• address complexity without needing total rewrites each time
• minimize risk and maintain quality over time...
Strategic
• I will be talking about the big picture in general terms
• You will learn about real challenges we observe at FINN
• You will learn how we try to solve these challenges
• I will provide you with some recommendations…
What to Expect
The Value of Domain Driven Design
“Getting service boundaries wrong can be expensive. It can lead to a larger
number of cross-service changes, overly coupled components, and in general
could be worse than just having a single monolithic system”
Sam Newman, Thoughtworks
Extract from the article Microservices for greenfields?
Dependency Graph of FINN
With microservices, each service is simpler but the distributed environment is more complex
Conceptual Architecture of FINN
Not only microservices, monolithic front-end and legacy system/database
Front-EndFEFEFE
Legacy
(Big Ball of Mud)µsµsµsµsµsµsµsµsµs
API
The highest complexity seems to be related to a few central structures,
and especially legacy database integration makes it hard to split.
µsµsµsµsµsµs
Front-End
Risk of Unhandled Complexity
Key areas with unhandled complexity, may damage your business
Co
mp
lexity
Time
Point of Total Rewrite
(or worse)
Missed Business
Opportunities
Simplefull speed ahead
The
Twilight
Zone
ComplexWhat happened?
Architecture Challenges @ FINN
• A few central structures with high complexity used in many contexts proves
very hard to modify and can cause unintended behavior elsewhere.
• High coupling between services and long request call chains affects
performance and availability negatively (8 fallacies of distributed computing).
• Clients aggregating services become monolithic front-ends, bottleneck, single-
points-of-failures, and require / desire standardization.
Should we partition by business entities, business objectives or both?
Partitioning Strategy
Customer Service
Agreement Service
Store Service
Login Service
User Service
Authentication Service
Enrollment ServiceReview Service
Challenges with Services Based on Entities
They don´t do one thing well
• support multiple biz challenges
• have unnecessary dependencies
• are complex / hard to change
• Will always be suboptimal solutions
• impact performance and availability (single point of failure)
• grow in complexity for each new business capability using them
CustomerCustomer
Service
Id
Password
Billing Address
Credit Card
Purchases
Shipping Address
Mr. Customer, what do you do?
Front-End Client
Service Service
Benefits with Services Based on Objectives
Auth ShippingTradePaymentBilling
They do one thing well
• have fewer dependencies
• are simpler to change
• allow for optimal models (of entities in context)
• improves performance and availability.
• are only as complex as the business objective they deliver on!
Well, I know why we need Mr. Billing around…
Front-End Client
Customer Id
Billing Address
…
Customer Lead Id
Interests
…
Customer Id
Password
…
Customer Seller Id
Customer Buyer Id
Object Id
Customer Id
Shipping Address
…
Transactional business boundary
Why do we Default to Global Entities / Models?
We have a global information model / information architect
that created one global model of all entities and their relationships, and you
implemented it.
We learned object-oriented programming at school
usually solving problems within a single context, believing that you should never model
the same entity twice (DRY).
We live in a world filled with entities around us…
and we try to clone the world as-is. It might just be harder for the human brain to work
with abstract models of specific problems, so we default to easy…
Aligning the Architecture with the Business
Technical solutions align with business capabilities (objectives / domains)
Ru
mp
Rib
Tenderloin
Business
Domain
Tech
Solution
Product
SalesAdvertising Billing Communication
They will be stable, with a minimum of dependencies between them,
and form transactional boundaries from a user / business perspective
Target Architecture (we call it direction in FINN)
The front-ends aggregate responses from services clustered by domain
Partitioning applies to both front-end (components) and back-endCapability / domain (problem space) & services (solution space)
Front-End
Message BUS (utilizing eventual consistency)
Front-End
Authentication Product Sales Billing ShippingAdvertising
µs
COMP
µs
µs
µsµs
µs
µs µs µsµs
Service /
System
COMP COMP
COMP COMP
COMP
COMP COMP
COMP
Recommendations
Invest heavily in building competence around distributed systems design - NOW!
Writing a single microservices is easy, testing and running hundreds together is hard.
Distribution requires rethinking of service boundaries, integration and handling of failures. Handle
the increase in number of services by automation and standardization in tools and infrastructure.
Tomorrow: Use a circuit breaker (Hystrix), use pub-sub between two services to communicate,
mock dependencies between two services for testing, read “8 fallacies of distributed systems”.
Design you services around business objectives, with several entity models
Entity services become monolithic & complex, and result in a lot of unnecessary dependencies.
Instead create different models of entities for the different problems you try to solve.
Tomorrow: select a complex entity service (or table) and divide it by different known contexts
Guide to Objective Based Service Boundaries
The service name should reveal the business purpose. Ideally in the form of verb and
optionally subject. E.g. authentication / authenticate user (does not always work)
Evaluate each field of your global entity model, determine data & relations that are
only of interest in a given context that serves a business purpose (e.g. authentication).
Evaluate which fields are updated at the same time (different use cases), these form
transactional boundaries around data groups and should be good candidates.
Ensure that data / fields are only mastered by one service (to avoid sync conflicts)
Never share / integrate via the database (complicated to break away from later…)
Organizational Observations @ FINN
Many features required several hand-offs between teams in order to be solved.
This was really slowing down development speed.
Teams where not able deliver a complete user / business capability by themselves.
We observed sub-optimal APIs, badly placed or duplicated business logic, presentation
inconsistency, and lack of full-stack operational responsibility (serving the end user).
Layered teams sometimes became overly focused about technical perfection.
This resulted in bottlenecks in the organization, maybe because they lacked clear
business goals.
Static teams made it difficult to move resources to changing business challenges.
Conway's Law
“Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations”
Melvin E. Conway
Extract from the article How do Committees Invent?
Aligning the Organization with the Business
The organization should align with the business (and the technical solution)
The goal was an organization that more effectively was able to deliver on the business challenges
Inverse Conway Maneuver: Organize to promote your desired architecture (end state)
Ru
mp
Rib
Tenderloin
Org Ownership
Business
Domain
Tech
Solution
Product
SalesAdvertising Billing Communication
Infrastructure
FINN Technology OrganizationOrganized primarily by business domains, secondarily by cross-cutting concerns
Front-End Platform(s)
Space A
Dom
ain
A
Dom
ain
B
Dom
ain
C
Space BD
om
ain
D
Dom
ain
E
Dom
ain
F
Space C
Dom
ain
G
Dom
ain
H
Dom
ain
I
Space D
Dom
ain
J
Do
ma
in K
Dom
ain
L
HARDER TO CHANGE
EA
SIE
RT
O C
HA
NG
E
FU
LL S
TA
CK
WebiOSAndroid
ServersDatabasesData BusMon ToolsDeploy PL
All cross-cutting teams are
structured as enabling teams,
meaning they should deliver
self-service solutions upfront
to the vertical functional teams.
Spaces allows for Flexible Developer AllocationA space contains all kinds of developers, assigned to dynamic teams to solve
prioritized tasks. Size from 15-25, small enough to know each other well.
Space Leads, two persons in charge of organization and flow of
prioritized tasks affecting domains within the space. Decides the
teams to support the current prioritized business challenges.
Developers, free to move within the space. Grouped into teams
depending on the current problems needing to be solved / prioritized.
Technical Domain Experts (tech leads), responsible for the long term
development of the technical solution (QoS) delivering a full-stack
business capability.
Architects (4 in FINN), responsible for everything that affects multiple
domains (e.g. domain boundaries, integration, front-end architecture…
Space A
Adve
rtis
ing
Pro
duct S
ale
s
Bill
ing
TDE
TDE
TDE
DEV
DEV
DEV
SLSL
DEV
DEV
DEV
DEV
DEV
DEV
Same space
2-4 teams
Teams
Transitioning the Apps Team
About 25% of visits are from native apps and growing, but we had one team.
Since we believe that our main biz challenges are not related to native apps development,
it makes sense to focus on full-stack functional and more autonomous teams instead.
Front-End Platform(s)
Space A
Dom
ain
A
Dom
ain
B
Dom
ain
C
Space BD
om
ain
D
Dom
ain
E
Dom
ain
F
Space C
Dom
ain
G
Dom
ain
H
Dom
ain
I
Space D
Dom
ain
J
Dom
ain
K
Do
ma
in L
WebiOSAndroid
Decentralized Authority (Framing)
The TDEs are free to choose solutions within their domain, no sign-offs needed
Cross-Domain Front-End
Domain Domain Domain DomainDomain
TDE TDE TDE TDE TDE
The architects only govern integration and boundaries between the domains and the front-end,in addition to help succeeding with the implementation of the strategy.
Reorganization Observations (one year in)
• A single team has proven that they can improve a full-stack capability autonomously,
and it seems like faster (we have no god way of measuring)
• Tech leads are starting to take full-stack responsibility for the capability they deliver.
• The teams can freely choose how they within a domain solve each business challenge,
there are less technical “religious” discussions. And no exploitation of new technologies.
• One year in, the systems does not automagically partition themselves by org. structure,
but needs to be driven in each case. The inverse Conway maneuver only removed barriers.
• The organization structure (silos) works as barriers when cross domain work is needed,
and developers rather wait than contribute if changes are needed in someone else domain. It
may be a smaller problem now, but we need to change the culture and expectations.
Organizational Recommendations
Primarily align your teams with your business domains, not tech layers
Teams focusing on business problems first, and who are more efficient as they can take all
decisions internally in order to solve a business problem. Staff the team with the necessary
competence (e.g. full-stack) to reach their goal. Use Conway´s law to your advantage.
Decentralize decision making whenever applicable
When teams understand which problem they must solve, allowing them to take internal decisions
improved development speed, and probably results in the best design (for that specific problem).
Minimize central governance, but where needed make it effective and clear to the org.
Structure cross-cutting teams as enabling teams, not bottlenecks or guards
Cross-cutting functions such as infrastructure or architecture should always deliver tools and
rules (for cross-cutting concerns) up-front, not acting as guards slowing down development.
Make sure your organization culture encourages cross-organizational collaboration
Living With an Imperfect World
Chances are that your current system is not exactly how you want it to be…
What to do about it…
Domain Domain Domain DomainDomain
Message BUS
Big Ball of
Mud
front-end front-end
Medium
Ball of
Mud
Complexity & Business Paralysis
Business: We need to be able to “something important”!
Tech: Oh, that’s part of a complex area, we need to fix that first!
Business: Well, can you fix it this month?
Tech: Funny guy, it will take 9 months to rewrite that mess!
Business: Bugger, what about a simple meaningless feature?
Tech: Sure, with nanoservices it will be in production tomorrow!
The Total Rewrite Disease
Thoughts about why this happens…
• We take decisions about existing structures (systems and structures), not
about future user / business challenges we need to solve.
• We take the decision we can take, and when organization and ownership is
not aligned with user / business challenges, neither will our decisions be it.
• High complexity and lack of knowledge about how to address such complex
systems strategically forces us to solve a larger problem than necessary.
Identifying the smallest step needed to deliver a modified or new capability
Fastest Minimum Viable Capability
1. Identifying What Strategic Capabilities to Improve
In short: fix the basics, focus on advantages, leave the rest when possible…
THE REST
Basics
Advantages
Avoid feature driven…Ignore how for now…
Customers expect / require…
Why customers choose you…
2. Understanding How & Where to Deliver
When you know what, identify where it should go and how to deliver it.
Domain Domain Domain DomainDomain
Big Ball of
Mud
front-end front-end
Medium
Ball of
Mud
Capability model that needs improvement…
About this time, it might be helpful with an architecture strategy…
Slicing a Part of a Legacy Model @ FINN
Illustration of how we modified parts of a legacy model without needing to
rewrite all dependencies at the same time / at all. (Strangler Application)
Evil
Monolithic
Big Ball of Mud
Legacy
System
Front-End
1. Identify Domain
New
Narrower
Model
2. Create new master
3. (a)sync data
4. Move master and get business value quick 5, 6, 7, … Rewrite other dependencies gradually (if any biz value)
The goal is not to have the
new model behave identically
to the legacy model.
The entire point is to change
the capability in this case.
Faster Delivery of the Business Strategy
Tech: What would success look like for “something important”?
Business: Well, if these users could do this when doing that…
Tech: Right, so you don’t need this and that also to use it?
Business: No, I only care about these users being able to do that!
Tech: Why didn´t you say so, I can actually fix that in 3 weeks.
Business: Really? Thanks a lot…
Recommendations
Always question large rewrites without a clear strategic user / business value
Start by trying to understand what the business is trying to accomplish – or rephrased which ability
it needs to deliver using IT. E.g. able to deploy continuously, able to recommending products, able
to register the user. Ask ”why” 5 times.
Deliver new / modified capabilities outside of legacy monoliths
Existing capabilities are often used by many solutions, but the cost and risk of rewriting all
dependencies is very high. Instead, only deliver the modified capability to those who need it.
Syncing the data with the legacy solution minimizes the amount of rewriting needed.
Always find the simplest / smallest solution that delivers on strategic capabilities
When you know what, you can start discussing how best to achieve it. Suppress the temptation to
“upgrade everything” you think as bad with the existing solution. Find the simplest solution that
delivers the capability in a good way (should be possible to test).
Infrastructure / Cloud
Default to Cloud Services
Unless you have special needs accept the fact that
companies of all sizes choosing cloud services will have an advantage
It´s simple, fast, scalable, and let´s you focus on the business!
FINN is cloudifying…
Register and deploy to e.g. Heroku or Google today and see for yourself
(application containers, message busses, etc.)
Data Pipeline (accessing distributed data)
With smart products and data analytics, how to best access distributed data?
To avoid exponential complexity, move the complexity to the infrastructure
PPPP
CCCC CCCC
PPPP
Message BUS (not ESB) - Kafka
pub
subPublish Domain Events
(not commands)
Pipeline Challenges @ FINN
Challenges we experienced at FINN when learning to use a pipeline
Reliable messaging is more than a product feature on the message bus
We had serious reliability issues related to order and payment in FINN (not good).
Creating a single transaction boundary between the service update and domain event was
not trivial. Running and configuring Kafka our self was also not without issues.
Introducing event based communication is hard, developers are used to RPC
For the developer, it´s easier just to make an RPC call another service. And in the
beginning, there are no data available to consume, and no motivation to publish since the
publisher and consumer are two different teams. You may need to enforce publishing until
it is the norm.
FINN - Technology Stack
Mostly Java (some Scala)
Moving from Sybase to PostgreSQL
Ramping up Node in the front end
Moving data around with the bus Kafka (separate JavaZone presentation)
Using Cassandra for high frequency writing (counting page clicks etc.)
Service calls using REST (default) or Thrift
Key Takeaways
1. Partition by business objective (domains), avoid global entity models
2. Organize by business objective, not by competence, technology or process
3. Take the smallest possible step to reach strategic capability, no total rewrites
But more importantly, it all requires an development organization with good
knowledge of the business domain and a great culture for working together
across organizational boundaries.
The End
Open for Questions…[email protected]