mongodb days uk: building an enterprise data fabric at royal bank of scotland with mongodb
TRANSCRIPT
Building an enterprise data fabric at Royal Bank of Scotland with MongoDB
MongoDB Days UK
5 November 2015
2
Introduction
This talk will explain our journey to build a data fabric, and some things we learnt along the way….
Mike Fulke
Corporate & Institutional Banking
Royal Bank of Scotland
Developer & Architect
Development Manager for CIB’s Data Fabric
We started building Q4 2014
We went live Q2 2015
Currently rebuilding infrastructure to be enterprise ready
3
What’s the problem?
• Our systems have evolved in a business-aligned way for decades
• So there’s lots of functional duplication across the system stacks
• Copying data led to a cottage industry of reconciliation and “controls”
• We must become leaner and more sustainable…so we’re simplifying
• In parallel, improving data quality is a key outcome in the control space
Incumbent Architecture (high-level)
Incumbent Data Flows (sample)
6
Today in the investment bank we have…
• Duplicated Processing• Multiple Copies of Data• Cache Engineering
What are we doing about it?
Rationalise the data layer…behind a common API, implemented by a multi-tenant PaaS, tackling common data problems…
…reducing people and systems costs, improving hardware utilisation, lowering data centre footprint, offering modern capabilities…
8
Our Enterprise Data Fabric
“Data Fabric provides data storage, query and distribution as a service, enabling application developers to concentrate on business functionality.”
9
Data Fabric Definition
Data Fabric isData Storage, Query & DistributionSecure. Low Cost. Performant
Data Fabric supportsGet, Put, Delete, Query, WatchAudit, versioning, historyAccess control and entitlements
Data Fabric technology ismodern, industry standard, open
source, cloud scale
DefinitionService interface (API)Persistent store (DB)Performant (Cache)Recording any data item
(schemaless)Data services (read, write, stream)
Why?Implement once (PaaS)Simplify application developmentConsolidate data, reducing
duplicationReduce coupling, reduce costsDecommission Coherence
10
Technology Stack
RBSAgile
Technology “preferences”…• Open Source• Wide adoption• Modern• Cheap
11
Development Approach
• Integration architecture• “Internal Open Source”• Contributions Welcome • Self-service approach• 2-week iterations• Fully Automated Testing• 100% uptime• Fortnightly releases, intraday
CollaborationInternal
2-week
iteration
12
Data Fabric – where the query result is…
Enumerable Observable
“pass the query through the data”
“pass the data through the query”
…naturally BSON allows the same data representation both at rest and “in transit”…
13
Data Fabric – MongoDB Cluster Topology• Single replica set at first• Migrated to four shards• Split across data centres and halls• Replica member priority• Replica tag sets based on node
location• Homogenous nodes
- running MongoDB and Data Fabric• Private networks for intra-cluster traffic
14
Data Fabric Cluster
15
Kafka – routing, at the heart of the platform
16
Data Fabric – grammars, ANTLR, Groovy, Mongo
Data Fabric API
• Predicates• Projections• Observable• Enumerable
ANTLR
• Generating Java
• Syntax validation
• Error reporting• “Visitor”
opportunities
Rewriters
• Mongo syntax• QueryBuilder• Groovy syntax• BsonDocument
as Map
…allows a unified grammar for specifying both Enumerable and Observable queries.
17
MongoDB Async Java Driver
1. Started with the old synchronousAPI, BasicDBObject etc
2. Upgraded to v3.0.4 Async Java Driver
3. Moving to v3.1.0 shortly
4. Migrated the Mongo interactions over several iterations
Some nice features for Data Fabric….
RawBsonDocument• Handle BSON streams without
encoding or decoding• Allows more efficient servers• Reduced heap usage and churn• Higher throughput• Simpler API than LazyBsonObject
MongoCollection – transactional ops• findOneAndUpdate• findOneAndReplace• Simpler API than previously
18
Next Steps for Data Fabric
Continue onboarding systems
JDBC and Reporting tools
Joins with ourselves
Joins with other data sources
Funky aggregations
Near-caching - maybe?
Open source - gradually?
Contact us if you are interested!
19
Technology Estate Simplification
• £m license cost avoidance (Coherence)• Plans to decommission hundreds of servers• Coherence• Oracle/SQL databases
Cost Reduction
• 2 foundational applications refactored• off Coherence
• Supporting data needs of a dozen applications
Simplification
• Velocity: Develop new applications in days• No need for database administration• self-service data service
• Promotes collaboration and data sharingVelocity