mongodb with rdbms for a portal application

33
CIGNEX Datamatics Confidential www.cignex.com Webinar: MongoDB with RDBMS for a Portal Application To achieve Performance, Scalability and Data Privacy Date: 30 th Sept 2015 Presenters: Nikhil Naib Big Data Solution architect CIGNEX Datamatics Nirav Shah Sr. Director – Marketing & Corporate Communication CIGNEX Datamatics

Upload: cignex-datamatics

Post on 12-Apr-2017

636 views

Category:

Technology


0 download

TRANSCRIPT

CIGNEX Datamatics Confidential www.cignex.com

Webinar:

MongoDB with RDBMS for a Portal Application To achieve Performance, Scalability and Data Privacy

Date: 30th Sept 2015

Presenters:

Nikhil Naib

Big Data Solution architect

CIGNEX Datamatics

Nirav Shah

Sr. Director – Marketing & Corporate Communication

CIGNEX Datamatics

CIGNEX Datamatics Confidential www.cignex.com 2

CIGNEX Datamatics: Established in 2000, USA | UK | India

8 Open Source Products #1

Pure Play Open Source Services Company

14 Open Source Books Authored

Global Offices 13+ Business Engagement Platforms 5+

Open Source Community Contributions 5000+ Open Source

Implementations 500+ Open Source Consultants 500+

Portals, Content & Collaboration Portals Enterprise Integration Identity Relationship Management

Enterprise Content Management Document & Web Content Management Learning/Knowledge Management Imaging and Scanning - OCR/Digitization Enterprise & NLP Search BPM/Workflow

E-Commerce B2B B2C

Internet of Things (IoT) Big Data Analytics Data Integration Information Delivery Data Analysis

Solutions We Provide

Business Engagement Platforms

Panoramyx™ Big Data Blueprint

Platform

Vitalstatistyx™ IoT Platform

DEEP™ Digital Employee

Engagement Platform

RMP™ Reputation

Management Platform

FMP™ Franchise

Management Platform

CIGNEX Datamatics Confidential www.cignex.com

As a solution architect, Nikhil identifies best-fit technology stack which is aligned with business needs of all stake holders and development team. With 12+ years of experience, he has been showcasing his expertise by delivering finished products and blueprints (POCs) with quick turn around time.

As MongoDB Certified DBA, Nikhil has hands-on-experience working on 7 Medium to Large scale MongoDB implementations for the solutions such as e-commerce, content management, reputation management etc.

Nikhil takes pleasure in imparting training on different Big Data technologies.

Nikhil Naib

CIGNEX Datamatics Confidential www.cignex.com

• Key Challenges of Enterprise Portal Application

• MongoDB – The Leading NoSQL Database

• Case Study: Global e-learning Platform – Solution architecture

– Approach & Best Practices • Storage Engine, Schema Design, Data Migration, Sharding, Performance & Monitoring

– Benefits

• Best Practices & Learning – Augmentation with MongoDB

• Q & A

4

Today’s Topics

CIGNEX Datamatics Confidential www.cignex.com 5

Key Challenges of Enterprise Portal Application

Analytical & Operational Processing on same Data

reducing performance

Scale according to the business needs Proprietary Database

with higher TCO

Global Application with Geography specific

Data

1000’s – millions queries / sec) - reads

& writes

Agile Application Rollouts

Does your application face any of following challenge ?

CIGNEX Datamatics Confidential www.cignex.com 6

Solution: Augment SQL with NoSQL

Types One Type (Minor Variations)

Many

Key-value stores

Document databases

Wide-column stores

Graph databases

Examples

& More

Schema Design

Define Structure and data types in advance Dynamic

Scalability Vertically Horizontally

Data Querying

Select, Insert, and Update statements Object-oriented APIs

& More..

SQL Databases NoSQL Databases

CIGNEX Datamatics Confidential www.cignex.com 7

MongoDB – The Leading NoSQL Database

Reduces Operational Overhead up to

95%

Auto-sharding with global distribution up to

50 Replica set members

7-10X better

write performance

Up to 80% less

storage with compression

5,000,000+ Downloads 600+ Customers

Deployment Automation

Integrated Caching

Dynamic Schema Design

Source: www.mongodb.com

CIGNEX Datamatics Confidential www.cignex.com 8

Augment Your Portal Application Database with MongoDB

Augmented Database Architecture

Application’s RDBMS Database

Infrastructure (OS & Virtualization, multi- data center deployment)

Mo

nit

ori

ng

& M

anag

emen

t

Secu

rity

& A

ud

itin

g

Portals, Content & Collaboration

Enterprise Content Management

Big Data Analytics

e-Commerce Portals

MonogDB Shards

Shard1 Shard1 Shard n

CIGNEX Datamatics Confidential www.cignex.com 9

CIGNEX Datamatics’ MongoDB Solutions

Single View of “X” (Customer, Employee, Partner and more)

Internet of Things

Product Catalogue

Data Hub

Personalization User Data Management

Reputation Management

Social Listening

Content Management & Delivery

CIGNEX Datamatics Confidential www.cignex.com

Case Study Global e-Learning Platform

Efficient User Data Management

5x Improvement in Content Management & Delivery

Data Privacy with Geographically Distributed Data

10

CIGNEX Datamatics Confidential www.cignex.com

Client Overview

11

• Large networking company that designs, manufactures, and sells networking equipment

– Group: Corporate Affairs division which invests in scalable and self-sustaining programs that use technology to meet some of society's biggest challenges

• CSR Program: An e-learning portal offering IT skills and career building program to learning institutions and individuals worldwide

– 2M+ Students across 160+ countries

– 20,000 instructors

– 146 million online exams conducted so far

CIGNEX Datamatics Confidential www.cignex.com

Challenges and Proposed Augmentation with MongoDB (NoSQL)

12

Restricted

Scalability

Performance Issues (6-16 Sec Response Time)

Complex Queries

(Organization-to-user relationship takes 3 to 4

table joins)

Highly Normalized

schema

Optimized Storage Engine with highly de-normalized data with embedded documents

Replication & sharding to support

Horizontal Scalability & High Availability

Document oriented database with dynamic

schema supporting many data types

Tag Aware sharding facilitating Geo-Awareness to app data to comply with

data privacy laws and reduce network latency

Challenges with existing RDBMS

Proposed Augmentation

CIGNEX Datamatics Confidential www.cignex.com 13

Proposed Portal Application Architecture

Jerysey – RESTful

Web Services in

Java

PostgreSQL – Liferay Application Data

MongoS Configuration Server

User data storage & processing

Global Learning & knowledge Sharing Platform

Mongod Geo 1

Mongod Geo 2

Mongod Geo 3

Liferay Portal

Application Server

Custom Tables & Fields

CIGNEX Datamatics Confidential www.cignex.com 14

Approach – Augmenting Application RDBMS with MongoDB

Storage Engine

Performance & Monitoring

Sharding

Data Migration

Schema Design

MongoDB 3.0 Data Storage Engine - WiredTiger

CIGNEX Datamatics Confidential www.cignex.com 15

Approach – Augmenting Portal Application’s RDBMS with MongoDB

Why WiredTiger ?

1. Document level locking 2. Data compression (Up to 80% with Snappy algorithm and up to 90% data

compression using zlib, indexes compression up to 50%) 3. 7x-10x higher throughput than previous version. 4. Ability to saturate all the CPU cores and an ability to store indexes and data on

separate mounts for optimum utilization of IOPS. 5. 100% backwards compatible 6. Non-disruptive upgrade (no downtime while migration)

Best Practices: 1. Use XFS file system with WiredTiger as there are known issues with WiredTiger

on ext4 2. Ensure to stay up to date with minor version releases of 3.0 as it has important

fixes for both WiredTiger & Sharding

Storage Engine

Performance & Monitoring

Sharding

Data Migration

Schema Design

CIGNEX Datamatics Confidential www.cignex.com 16

Approach – Augmenting Portal Application’s RDBMS with MongoDB

Approach: 1. Understand functionality of each Liferay Portlet. 2. Understand external data points and nature of the data

(structured & unstructured). 3. Understand RDBMS Schema design including each table and

fields. 4. Understand SQL queries covering all CRUD operations along

with triggers, views, cursors, stored procedures. 5. Create a schema for MongoDB collections.

Storage Engine

Performance & Monitoring

Sharding

Data Migration

Schema Design

CIGNEX Datamatics Confidential www.cignex.com 17

Approach – Augmenting Portal Application’s RDBMS with MongoDB

create table LOCATION_Country (

id_ LONG not null primary key,

name VARCHAR(75) null,

ageLimit INTEGER,

coppaAgeLimit INTEGER,

isoCountryCode VARCHAR(75) null,

verified BOOLEAN,

embargo BOOLEAN );

create table LOCATION_State (

id_ LONG not null primary key,

name VARCHAR(75) null,

isoStateCode VARCHAR(75) null,

isoCountryCode VARCHAR(75) null,

verified BOOLEAN );

create table LOCATION_City (

id_ LONG not null primary key,

name VARCHAR(100) null,

displayName VARCHAR(100) null,

isoStateCode VARCHAR(75) null,

isoCountryCode VARCHAR(75) null,

population LONG,

latitude VARCHAR(75) null,

longitude VARCHAR(75) null,

verified BOOLEAN );

Collection:location {

{ _id : ObjectID generated by MongoDB,

displayCountryId : "id from location_country",

displayCountryName : Name from location country

ageLimit : "agelimit from location_country",

coppAgeLimit : "coppagelimit from location_country",

isoCountryCode: "isocountrycode from location_country",

countryVerificationStatus : "verified from location_country",

embargo : "embargo from location_country", }

{ _id : ObjectID generated by MongoDB,

displayCountryId : "id from location_country",,

displayStateId : "id_ from location_state",

displayStateName : "name from location_state",

isoStateCode : "isostatecode from location_state",

stateVerificationStatus : "verified from location_state", }

{ _id : ObjectID generated by MongoDB,

countryId : "id from location_country",

stateId : "id_ from location_state",

cityId : "city_id from location_city",

cityName : "name from location_city",

displayName : "displayname from location_city",

population: "population from location_city",

lattitude: "lattitude from location_city",

longitude: "longitude from location_city",

cityVerificationStatus : "verified from location_city“ }

}

PostgreSQL Schema MongoDB Schema Storage Engine

Performance & Monitoring

Sharding

Data Migration

Schema Design

CIGNEX Datamatics Confidential www.cignex.com 18

Approach – Augmenting Portal Application’s RDBMS with MongoDB

Best Practices & Learning: 1. Analyze Data Access Patterns of the application. 2. Define Indexes by identifying common queries. 3. Use explain method & MongoDB profiler for query optimization. 4. Create the collections to de-normalize the schema for optimal

performance. 5. Reconsider the schema design for collection once the number of

indexes on the collection reaches 10. 6. Carefully design and tune the connection pooling strategy for your

application. 7. There is no support for transactions in MongoDB per say but there

are workarounds available

(https://docs.mongodb.org/v3.0/tutorial/perform-two-phase-commits/)

Storage Engine

Performance & Monitoring

Sharding

Data Migration

Schema Design

CIGNEX Datamatics Confidential www.cignex.com 19

Approach – Augmenting Portal Application’s RDBMS with MongoDB

Approach: 1. Create SQL queries for the migration scripts 2. Use MongoDB Java Driver for interacting with Mongo & JDBC driver for talking

to PostgreSQL 3. Execute the migrations scripts against RDBMS and fill the MongoDB

collections 4. We hosted RDBMS on read optimized instance which can expedite the

execution of migration queries.

Storage Engine

Data Migration

Performance & Monitoring

Sharding

Schema Design

Java Based Custom ETL Tool

Application RDBMS Database

External Data Sources (social media, Salesforce)

ETL Tool

CIGNEX Datamatics Confidential www.cignex.com 20

Approach – Augmenting Portal Application’s RDBMS with MongoDB

Best Practices:

1. Use Bulk API of MongoDB for Bulk ingestion

2. Leverage Java’s support for multithreading for concurrent inserts to reduce the data migration time

3. Migration process should be fault tolerant. The process should only begin from where it had left and NOT from scratch.

4. Reuse infrastructure with migration scripts deployed on the same server as services layer

Storage Engine

Data Migration

Performance & Monitoring

Sharding

Schema Design

CIGNEX Datamatics Confidential www.cignex.com 21

Approach – Augmenting Portal Application’s RDBMS with MongoDB

Approach: Use “Tag Aware” sharding which brought Geo-Awareness to the application data.

Storage Engine

Data Migration

Performance & Monitoring

Sharding

Schema Design

The ideal shard key :

1. High cardinality which makes it

easy for MongoDB to split the

chunks.

2. Higher “randomness”

3. Targeted queries

4. May need to be computed

CIGNEX Datamatics Confidential www.cignex.com

PostgreSQL – Liferay RDBMS

22

Approach – Augmenting Portal Application’s RDBMS with MongoDB Data Tier

Geo

grap

hy

n

Ap

p S

erve

r

Geo 1 Mongod

Primary

Geo 3 Secondary

Geo 2 Secondary

mongod

Config Server App Tier

Geo 1 Mongod

Secondary

mongod Arbiter

mongos

Geo

grap

hy

1

Ap

p S

erve

r

Geo 2 Mongod

Primary

Geo 3 Secondary

Geo 1 Secondary

mongod Geo 2 Mongod

Secondary

mongod Arbiter

mongos

Geo

grap

hy

2

Ap

p S

erve

r

Geo n Mongod

Primary

Geo 2 Secondary

Geo 1 Secondary

mongod

Geo n Mongod

Secondary

mongod Arbiter

mongos

Geo n-1 Secondary

Storage Engine

Data Migration

Performance & Monitoring

Sharding

Schema Design

Lo

ad B

alan

cer

CIGNEX Datamatics Confidential www.cignex.com 23

Approach – Augmenting Portal Application’s RDBMS with MongoDB

Best Practices: 1. Plan for sharding well in advance and ensure to test the same with production

like workloads to see the impact.

2. Involve the Network Administration team while planning for sharding as they are the go to guys for cross data center connectivity issues.

3. Deploy shard router on the application server so that one call over the network can be saved.

4. Balance the non-sharded collections across the different shards so that all the shards receive the same amount of traffic.

5. Ensure that indexes fit well in RAM: index size < RAM

6. Use appropriate write concern and read preference based on the use case. Choosing appropriate read preference helps to scale the reads and also deal with the network latency issues.

7. Use of replica set tag-sets in combination with appropriate write concern & read preference help a lot to address the data privacy concerns

Storage Engine

Data Migration

Performance & Monitoring

Sharding

Schema Design

CIGNEX Datamatics Confidential www.cignex.com

Performance Test Results

24

Approach – Augmenting Portal Application’s RDBMS with MongoDB

Storage Engine

Data Migration

Performance & Monitoring

Sharding

Schema Design

Functionality Only RDBMS

Augmented RDBMS with MongoDB

Portal Dashboard ~ 16 sec ~ 3- 4 sec

User Enrollment ~ 17 sec ~ 2-3 sec

User Profile ~ 10 sec ~ 3 sec

Instructor – Dashboard ~ 13 sec ~ 3 sec

Course Assignment ~ 12 sec ~ 2-3 sec

Search ~ 29 sec ~ 3- 4 sec

Note: Performance depends on many parameters such as network latency, server configuration, number of concurrent users and more.

CIGNEX Datamatics Confidential www.cignex.com

• Cluster Management & Monitoring Approach

– MMS - MongoDB Monitoring and Management Service Tool

• Automation – Provision, Upgrade & Scale

• Backups – Continues backup

– Point-in-Time Recovery

• Monitoring – Dashboard with alerts

25

Approach – Augmenting Portal Application’s RDBMS with MongoDB

Storage Engine

Data Migration

Performance & Monitoring

Sharding

Schema Design

CIGNEX Datamatics Confidential www.cignex.com

MongoDB MMS –Dashboard

26

Approach – Augmenting Portal Application’s RDBMS with MongoDB

Storage Engine

Data Migration

Performance & Monitoring

Sharding

Schema Design

CIGNEX Datamatics Confidential www.cignex.com

• Performance

– Web page response time reduced from 6-16 sec to < 2 Sec

– Migration of 500GB+ data completed in 4 hours

– Geo based Tag Aware sharding to reduce the network latency, as the data can stay close to the application server

• Scalability

– Auto Sharding to accommodate new geographies

• Data Privacy

– Geo based Tag Aware sharding to comply with the data privacy laws of different countries/zones

• Lower TCO

– Open Source technology reduced licensing costs and vendor dependency at accelerated speed to development with out-of-box features.

27

Benefits - Delivered with CIGNEX Datamatics Expertise

CIGNEX Datamatics Confidential www.cignex.com

28

Best Practices & Learning – Augmentation with MongoDB

MongoDB scales & shines !!

2 Plan early for sharding. DO NOT go to production without benchmarking the shard key.

3 Do not forget to set ulimit & noatime. They provide significant performance gains.

Identify indexes carefully. More number of Indexes can bring down the write throughput 1

Use Bulk API of MongoDB Easy for bulk ingestion. 4

6 Use MongoDB Ops manager Monitoring & managing a sharded cluster is a painful process without Ops Manager.

7 Use SSDs & RAID-10 They provide excellent throughput.

Use Java’s support for multithreading Reduces the data migration time for concurrent inserts.

Use MongoDB Enterprise Edition It provides excellent support and high class security features. 8

5

CIGNEX Datamatics Confidential www.cignex.com 29

CIGNEX Datamatics - Big Data Analytics & IoT Case Studies

Improve performance through real-time intelligence by efficient device

management. & issue identification

GPS Services Company Networking Company

Increase customer satisfaction & revenue due to uninterrupted video experience

anywhere anytime on any device

Modernization of legacy Quote Portal resulting into competitive advantage – Quote

in 5 minutes

Insurance Company

First mover advantage with timely launch of Sentiment and Trending Analysis service

SaaS Start-up Company B2B Market Intelligence Services

100% Increase in Conversion Rate with Single View of Business and Market

Intelligence

E-Learning Community Portal

5x Efficient User Data Management with Improved application performance and data

security

CIGNEX Datamatics Confidential www.cignex.com 30

CIGNEX Datamatics Big Data Analytics Expertise

Team Size: 70+

• 70+ Certified Global Consultants on various Big Data

Technologies

• Partnership - MongoDB Advanced Partner, Talend Gold

Partner, Cloudera Authorized Partner

• Frameworks

• Panoramyx™

• Vitalstatistyx™

• Platforms

• Reputation Management Platform™

• Findability Platform™

• Services

• OpeRA™ - Feasibility Study/ Assessment, Workshops

• Big Data Consulting (MongoDB, Hadoop, Talend)

• Development to Production

• Support & Maintenance

Big Data Platforms

Business Intelligence Expertise

CIGNEX Datamatics Confidential www.cignex.com

31

CIGNEX Datamatics: Big Data Analytics & IoT Expertise

PanoramyxTM

Hybrid Data Lake Architecture

Data “Blending” & “Enrichment

Key “Analytics Engines”

Single “Panoramic” view of X X = Customer, Employee,

Partner and more

OpeRA™

Open Source Readiness Assessment for Big Data

Analytics

Blueprint for Legacy Modernization

Migration/Adoption Guidelines

Recommendations for TCO Reduction, Performance

Optimization

VitalstatistyxTM

Internet of Things (IoT) Reference Architecture

APIs Integration with Sensors, Devices

Hardware Vendor connect

Real-time & Predictive analytics with

Reports/Dashboard

Analytics Engines

Powerful, Flexible Analytics “Engines” & Point Solutions

Findability Platform (Social Listening

& Personality Insights)

Reputation Management Platform

Systems of Insight

CIGNEX Datamatics Confidential www.cignex.com 32

Q & A

1) Quick Assessment – http://operaonline.cignex.com

2) Test Drive Big Data Analytics

Engage us for Proof-of-Concept (PoC) @ US5K