mongodb with rdbms for a portal application
TRANSCRIPT
CIGNEX Datamatics Confidential www.cignex.com
Webinar:
MongoDB with RDBMS for a Portal Application To achieve Performance, Scalability and Data Privacy
Date: 30th Sept 2015
Presenters:
Nikhil Naib
Big Data Solution architect
CIGNEX Datamatics
Nirav Shah
Sr. Director – Marketing & Corporate Communication
CIGNEX Datamatics
CIGNEX Datamatics Confidential www.cignex.com 2
CIGNEX Datamatics: Established in 2000, USA | UK | India
8 Open Source Products #1
Pure Play Open Source Services Company
14 Open Source Books Authored
Global Offices 13+ Business Engagement Platforms 5+
Open Source Community Contributions 5000+ Open Source
Implementations 500+ Open Source Consultants 500+
Portals, Content & Collaboration Portals Enterprise Integration Identity Relationship Management
Enterprise Content Management Document & Web Content Management Learning/Knowledge Management Imaging and Scanning - OCR/Digitization Enterprise & NLP Search BPM/Workflow
E-Commerce B2B B2C
Internet of Things (IoT) Big Data Analytics Data Integration Information Delivery Data Analysis
Solutions We Provide
Business Engagement Platforms
Panoramyx™ Big Data Blueprint
Platform
Vitalstatistyx™ IoT Platform
DEEP™ Digital Employee
Engagement Platform
RMP™ Reputation
Management Platform
FMP™ Franchise
Management Platform
CIGNEX Datamatics Confidential www.cignex.com
As a solution architect, Nikhil identifies best-fit technology stack which is aligned with business needs of all stake holders and development team. With 12+ years of experience, he has been showcasing his expertise by delivering finished products and blueprints (POCs) with quick turn around time.
As MongoDB Certified DBA, Nikhil has hands-on-experience working on 7 Medium to Large scale MongoDB implementations for the solutions such as e-commerce, content management, reputation management etc.
Nikhil takes pleasure in imparting training on different Big Data technologies.
Nikhil Naib
CIGNEX Datamatics Confidential www.cignex.com
• Key Challenges of Enterprise Portal Application
• MongoDB – The Leading NoSQL Database
• Case Study: Global e-learning Platform – Solution architecture
– Approach & Best Practices • Storage Engine, Schema Design, Data Migration, Sharding, Performance & Monitoring
– Benefits
• Best Practices & Learning – Augmentation with MongoDB
• Q & A
4
Today’s Topics
CIGNEX Datamatics Confidential www.cignex.com 5
Key Challenges of Enterprise Portal Application
Analytical & Operational Processing on same Data
reducing performance
Scale according to the business needs Proprietary Database
with higher TCO
Global Application with Geography specific
Data
1000’s – millions queries / sec) - reads
& writes
Agile Application Rollouts
Does your application face any of following challenge ?
CIGNEX Datamatics Confidential www.cignex.com 6
Solution: Augment SQL with NoSQL
Types One Type (Minor Variations)
Many
Key-value stores
Document databases
Wide-column stores
Graph databases
Examples
& More
Schema Design
Define Structure and data types in advance Dynamic
Scalability Vertically Horizontally
Data Querying
Select, Insert, and Update statements Object-oriented APIs
& More..
SQL Databases NoSQL Databases
CIGNEX Datamatics Confidential www.cignex.com 7
MongoDB – The Leading NoSQL Database
Reduces Operational Overhead up to
95%
Auto-sharding with global distribution up to
50 Replica set members
7-10X better
write performance
Up to 80% less
storage with compression
5,000,000+ Downloads 600+ Customers
Deployment Automation
Integrated Caching
Dynamic Schema Design
Source: www.mongodb.com
CIGNEX Datamatics Confidential www.cignex.com 8
Augment Your Portal Application Database with MongoDB
Augmented Database Architecture
Application’s RDBMS Database
Infrastructure (OS & Virtualization, multi- data center deployment)
Mo
nit
ori
ng
& M
anag
emen
t
Secu
rity
& A
ud
itin
g
Portals, Content & Collaboration
Enterprise Content Management
Big Data Analytics
e-Commerce Portals
MonogDB Shards
Shard1 Shard1 Shard n
CIGNEX Datamatics Confidential www.cignex.com 9
CIGNEX Datamatics’ MongoDB Solutions
Single View of “X” (Customer, Employee, Partner and more)
Internet of Things
Product Catalogue
Data Hub
Personalization User Data Management
Reputation Management
Social Listening
Content Management & Delivery
CIGNEX Datamatics Confidential www.cignex.com
Case Study Global e-Learning Platform
Efficient User Data Management
5x Improvement in Content Management & Delivery
Data Privacy with Geographically Distributed Data
10
CIGNEX Datamatics Confidential www.cignex.com
Client Overview
11
• Large networking company that designs, manufactures, and sells networking equipment
– Group: Corporate Affairs division which invests in scalable and self-sustaining programs that use technology to meet some of society's biggest challenges
• CSR Program: An e-learning portal offering IT skills and career building program to learning institutions and individuals worldwide
– 2M+ Students across 160+ countries
– 20,000 instructors
– 146 million online exams conducted so far
CIGNEX Datamatics Confidential www.cignex.com
Challenges and Proposed Augmentation with MongoDB (NoSQL)
12
Restricted
Scalability
Performance Issues (6-16 Sec Response Time)
Complex Queries
(Organization-to-user relationship takes 3 to 4
table joins)
Highly Normalized
schema
Optimized Storage Engine with highly de-normalized data with embedded documents
Replication & sharding to support
Horizontal Scalability & High Availability
Document oriented database with dynamic
schema supporting many data types
Tag Aware sharding facilitating Geo-Awareness to app data to comply with
data privacy laws and reduce network latency
Challenges with existing RDBMS
Proposed Augmentation
CIGNEX Datamatics Confidential www.cignex.com 13
Proposed Portal Application Architecture
Jerysey – RESTful
Web Services in
Java
PostgreSQL – Liferay Application Data
MongoS Configuration Server
User data storage & processing
Global Learning & knowledge Sharing Platform
Mongod Geo 1
Mongod Geo 2
Mongod Geo 3
Liferay Portal
Application Server
Custom Tables & Fields
CIGNEX Datamatics Confidential www.cignex.com 14
Approach – Augmenting Application RDBMS with MongoDB
Storage Engine
Performance & Monitoring
Sharding
Data Migration
Schema Design
MongoDB 3.0 Data Storage Engine - WiredTiger
CIGNEX Datamatics Confidential www.cignex.com 15
Approach – Augmenting Portal Application’s RDBMS with MongoDB
Why WiredTiger ?
1. Document level locking 2. Data compression (Up to 80% with Snappy algorithm and up to 90% data
compression using zlib, indexes compression up to 50%) 3. 7x-10x higher throughput than previous version. 4. Ability to saturate all the CPU cores and an ability to store indexes and data on
separate mounts for optimum utilization of IOPS. 5. 100% backwards compatible 6. Non-disruptive upgrade (no downtime while migration)
Best Practices: 1. Use XFS file system with WiredTiger as there are known issues with WiredTiger
on ext4 2. Ensure to stay up to date with minor version releases of 3.0 as it has important
fixes for both WiredTiger & Sharding
Storage Engine
Performance & Monitoring
Sharding
Data Migration
Schema Design
CIGNEX Datamatics Confidential www.cignex.com 16
Approach – Augmenting Portal Application’s RDBMS with MongoDB
Approach: 1. Understand functionality of each Liferay Portlet. 2. Understand external data points and nature of the data
(structured & unstructured). 3. Understand RDBMS Schema design including each table and
fields. 4. Understand SQL queries covering all CRUD operations along
with triggers, views, cursors, stored procedures. 5. Create a schema for MongoDB collections.
Storage Engine
Performance & Monitoring
Sharding
Data Migration
Schema Design
CIGNEX Datamatics Confidential www.cignex.com 17
Approach – Augmenting Portal Application’s RDBMS with MongoDB
create table LOCATION_Country (
id_ LONG not null primary key,
name VARCHAR(75) null,
ageLimit INTEGER,
coppaAgeLimit INTEGER,
isoCountryCode VARCHAR(75) null,
verified BOOLEAN,
embargo BOOLEAN );
create table LOCATION_State (
id_ LONG not null primary key,
name VARCHAR(75) null,
isoStateCode VARCHAR(75) null,
isoCountryCode VARCHAR(75) null,
verified BOOLEAN );
create table LOCATION_City (
id_ LONG not null primary key,
name VARCHAR(100) null,
displayName VARCHAR(100) null,
isoStateCode VARCHAR(75) null,
isoCountryCode VARCHAR(75) null,
population LONG,
latitude VARCHAR(75) null,
longitude VARCHAR(75) null,
verified BOOLEAN );
Collection:location {
{ _id : ObjectID generated by MongoDB,
displayCountryId : "id from location_country",
displayCountryName : Name from location country
ageLimit : "agelimit from location_country",
coppAgeLimit : "coppagelimit from location_country",
isoCountryCode: "isocountrycode from location_country",
countryVerificationStatus : "verified from location_country",
embargo : "embargo from location_country", }
{ _id : ObjectID generated by MongoDB,
displayCountryId : "id from location_country",,
displayStateId : "id_ from location_state",
displayStateName : "name from location_state",
isoStateCode : "isostatecode from location_state",
stateVerificationStatus : "verified from location_state", }
{ _id : ObjectID generated by MongoDB,
countryId : "id from location_country",
stateId : "id_ from location_state",
cityId : "city_id from location_city",
cityName : "name from location_city",
displayName : "displayname from location_city",
population: "population from location_city",
lattitude: "lattitude from location_city",
longitude: "longitude from location_city",
cityVerificationStatus : "verified from location_city“ }
}
PostgreSQL Schema MongoDB Schema Storage Engine
Performance & Monitoring
Sharding
Data Migration
Schema Design
CIGNEX Datamatics Confidential www.cignex.com 18
Approach – Augmenting Portal Application’s RDBMS with MongoDB
Best Practices & Learning: 1. Analyze Data Access Patterns of the application. 2. Define Indexes by identifying common queries. 3. Use explain method & MongoDB profiler for query optimization. 4. Create the collections to de-normalize the schema for optimal
performance. 5. Reconsider the schema design for collection once the number of
indexes on the collection reaches 10. 6. Carefully design and tune the connection pooling strategy for your
application. 7. There is no support for transactions in MongoDB per say but there
are workarounds available
(https://docs.mongodb.org/v3.0/tutorial/perform-two-phase-commits/)
Storage Engine
Performance & Monitoring
Sharding
Data Migration
Schema Design
CIGNEX Datamatics Confidential www.cignex.com 19
Approach – Augmenting Portal Application’s RDBMS with MongoDB
Approach: 1. Create SQL queries for the migration scripts 2. Use MongoDB Java Driver for interacting with Mongo & JDBC driver for talking
to PostgreSQL 3. Execute the migrations scripts against RDBMS and fill the MongoDB
collections 4. We hosted RDBMS on read optimized instance which can expedite the
execution of migration queries.
Storage Engine
Data Migration
Performance & Monitoring
Sharding
Schema Design
Java Based Custom ETL Tool
Application RDBMS Database
External Data Sources (social media, Salesforce)
ETL Tool
CIGNEX Datamatics Confidential www.cignex.com 20
Approach – Augmenting Portal Application’s RDBMS with MongoDB
Best Practices:
1. Use Bulk API of MongoDB for Bulk ingestion
2. Leverage Java’s support for multithreading for concurrent inserts to reduce the data migration time
3. Migration process should be fault tolerant. The process should only begin from where it had left and NOT from scratch.
4. Reuse infrastructure with migration scripts deployed on the same server as services layer
Storage Engine
Data Migration
Performance & Monitoring
Sharding
Schema Design
CIGNEX Datamatics Confidential www.cignex.com 21
Approach – Augmenting Portal Application’s RDBMS with MongoDB
Approach: Use “Tag Aware” sharding which brought Geo-Awareness to the application data.
Storage Engine
Data Migration
Performance & Monitoring
Sharding
Schema Design
The ideal shard key :
1. High cardinality which makes it
easy for MongoDB to split the
chunks.
2. Higher “randomness”
3. Targeted queries
4. May need to be computed
CIGNEX Datamatics Confidential www.cignex.com
PostgreSQL – Liferay RDBMS
22
Approach – Augmenting Portal Application’s RDBMS with MongoDB Data Tier
Geo
grap
hy
n
Ap
p S
erve
r
Geo 1 Mongod
Primary
Geo 3 Secondary
Geo 2 Secondary
mongod
Config Server App Tier
Geo 1 Mongod
Secondary
mongod Arbiter
mongos
Geo
grap
hy
1
Ap
p S
erve
r
Geo 2 Mongod
Primary
Geo 3 Secondary
Geo 1 Secondary
mongod Geo 2 Mongod
Secondary
mongod Arbiter
mongos
Geo
grap
hy
2
Ap
p S
erve
r
Geo n Mongod
Primary
Geo 2 Secondary
Geo 1 Secondary
mongod
Geo n Mongod
Secondary
mongod Arbiter
mongos
Geo n-1 Secondary
Storage Engine
Data Migration
Performance & Monitoring
Sharding
Schema Design
Lo
ad B
alan
cer
CIGNEX Datamatics Confidential www.cignex.com 23
Approach – Augmenting Portal Application’s RDBMS with MongoDB
Best Practices: 1. Plan for sharding well in advance and ensure to test the same with production
like workloads to see the impact.
2. Involve the Network Administration team while planning for sharding as they are the go to guys for cross data center connectivity issues.
3. Deploy shard router on the application server so that one call over the network can be saved.
4. Balance the non-sharded collections across the different shards so that all the shards receive the same amount of traffic.
5. Ensure that indexes fit well in RAM: index size < RAM
6. Use appropriate write concern and read preference based on the use case. Choosing appropriate read preference helps to scale the reads and also deal with the network latency issues.
7. Use of replica set tag-sets in combination with appropriate write concern & read preference help a lot to address the data privacy concerns
Storage Engine
Data Migration
Performance & Monitoring
Sharding
Schema Design
CIGNEX Datamatics Confidential www.cignex.com
Performance Test Results
24
Approach – Augmenting Portal Application’s RDBMS with MongoDB
Storage Engine
Data Migration
Performance & Monitoring
Sharding
Schema Design
Functionality Only RDBMS
Augmented RDBMS with MongoDB
Portal Dashboard ~ 16 sec ~ 3- 4 sec
User Enrollment ~ 17 sec ~ 2-3 sec
User Profile ~ 10 sec ~ 3 sec
Instructor – Dashboard ~ 13 sec ~ 3 sec
Course Assignment ~ 12 sec ~ 2-3 sec
Search ~ 29 sec ~ 3- 4 sec
Note: Performance depends on many parameters such as network latency, server configuration, number of concurrent users and more.
CIGNEX Datamatics Confidential www.cignex.com
• Cluster Management & Monitoring Approach
– MMS - MongoDB Monitoring and Management Service Tool
• Automation – Provision, Upgrade & Scale
• Backups – Continues backup
– Point-in-Time Recovery
• Monitoring – Dashboard with alerts
25
Approach – Augmenting Portal Application’s RDBMS with MongoDB
Storage Engine
Data Migration
Performance & Monitoring
Sharding
Schema Design
CIGNEX Datamatics Confidential www.cignex.com
MongoDB MMS –Dashboard
26
Approach – Augmenting Portal Application’s RDBMS with MongoDB
Storage Engine
Data Migration
Performance & Monitoring
Sharding
Schema Design
CIGNEX Datamatics Confidential www.cignex.com
• Performance
– Web page response time reduced from 6-16 sec to < 2 Sec
– Migration of 500GB+ data completed in 4 hours
– Geo based Tag Aware sharding to reduce the network latency, as the data can stay close to the application server
• Scalability
– Auto Sharding to accommodate new geographies
• Data Privacy
– Geo based Tag Aware sharding to comply with the data privacy laws of different countries/zones
• Lower TCO
– Open Source technology reduced licensing costs and vendor dependency at accelerated speed to development with out-of-box features.
27
Benefits - Delivered with CIGNEX Datamatics Expertise
CIGNEX Datamatics Confidential www.cignex.com
28
Best Practices & Learning – Augmentation with MongoDB
MongoDB scales & shines !!
2 Plan early for sharding. DO NOT go to production without benchmarking the shard key.
3 Do not forget to set ulimit & noatime. They provide significant performance gains.
Identify indexes carefully. More number of Indexes can bring down the write throughput 1
Use Bulk API of MongoDB Easy for bulk ingestion. 4
6 Use MongoDB Ops manager Monitoring & managing a sharded cluster is a painful process without Ops Manager.
7 Use SSDs & RAID-10 They provide excellent throughput.
Use Java’s support for multithreading Reduces the data migration time for concurrent inserts.
Use MongoDB Enterprise Edition It provides excellent support and high class security features. 8
5
CIGNEX Datamatics Confidential www.cignex.com 29
CIGNEX Datamatics - Big Data Analytics & IoT Case Studies
Improve performance through real-time intelligence by efficient device
management. & issue identification
GPS Services Company Networking Company
Increase customer satisfaction & revenue due to uninterrupted video experience
anywhere anytime on any device
Modernization of legacy Quote Portal resulting into competitive advantage – Quote
in 5 minutes
Insurance Company
First mover advantage with timely launch of Sentiment and Trending Analysis service
SaaS Start-up Company B2B Market Intelligence Services
100% Increase in Conversion Rate with Single View of Business and Market
Intelligence
E-Learning Community Portal
5x Efficient User Data Management with Improved application performance and data
security
CIGNEX Datamatics Confidential www.cignex.com 30
CIGNEX Datamatics Big Data Analytics Expertise
Team Size: 70+
• 70+ Certified Global Consultants on various Big Data
Technologies
• Partnership - MongoDB Advanced Partner, Talend Gold
Partner, Cloudera Authorized Partner
• Frameworks
• Panoramyx™
• Vitalstatistyx™
• Platforms
• Reputation Management Platform™
• Findability Platform™
• Services
• OpeRA™ - Feasibility Study/ Assessment, Workshops
• Big Data Consulting (MongoDB, Hadoop, Talend)
• Development to Production
• Support & Maintenance
Big Data Platforms
Business Intelligence Expertise
CIGNEX Datamatics Confidential www.cignex.com
31
CIGNEX Datamatics: Big Data Analytics & IoT Expertise
PanoramyxTM
Hybrid Data Lake Architecture
Data “Blending” & “Enrichment
Key “Analytics Engines”
Single “Panoramic” view of X X = Customer, Employee,
Partner and more
OpeRA™
Open Source Readiness Assessment for Big Data
Analytics
Blueprint for Legacy Modernization
Migration/Adoption Guidelines
Recommendations for TCO Reduction, Performance
Optimization
VitalstatistyxTM
Internet of Things (IoT) Reference Architecture
APIs Integration with Sensors, Devices
Hardware Vendor connect
Real-time & Predictive analytics with
Reports/Dashboard
Analytics Engines
Powerful, Flexible Analytics “Engines” & Point Solutions
Findability Platform (Social Listening
& Personality Insights)
Reputation Management Platform
Systems of Insight
CIGNEX Datamatics Confidential www.cignex.com 32
Q & A
1) Quick Assessment – http://operaonline.cignex.com
2) Test Drive Big Data Analytics
Engage us for Proof-of-Concept (PoC) @ US5K
CIGNEX Datamatics Confidential www.cignex.com
Thank you
www.cignex.com
Contact Us
Sales: [email protected] | Jobs – [email protected] | Others – [email protected]
facebook.com/CIGNEXTechnologies youtube.com/cignexglobal twitter.com/cignex www.cignex.com