three big data case studies

17
THREE Big Data CASE STUDIES

Upload: atidan-technologies-pvt-ltd

Post on 19-Jan-2015

506 views

Category:

Technology


0 download

DESCRIPTION

Takes you to the fundamentals of Big Data. Has real life examples. Also find out why you may or may not need Big Data.

TRANSCRIPT

Page 1: Three Big Data Case Studies

THREE

Big DataCASE STUDIES

Page 2: Three Big Data Case Studies

Great use cases of Big Data

Big Data ExplorationFind, visualize, understand all big data to improve decision making

Enhanced 3600 View

of the Customer

Extend existing customer views

(CRM, etc) by incorporating

additional internal and external

information sources

Security/Intelligence Extension

Lower risk, detect fraud and

monitor cyber security in real-time

Data Warehouse Augmentation

Integrate big data and data

warehouse capabilities to increase

operational efficiency

Operations Analysis

Analyze a variety of machine

data for improved business results

Page 3: Three Big Data Case Studies

• Greater efficiencies

in business

processes

• New insights from

combining and

analyzing data

types in new ways

• Develop new

business models

with resulting

increased market

presence and

revenue

Why Big Data

File Systems

Relational Data

Content Mgmt

Email

CRM

Supply Chain

ERP

RSS Feeds

Cloud

Custom Sources

Data V

iews

Applications/Users

Page 4: Three Big Data Case Studies

Atidan Approach

Implement a

Hadoop-

centric

reference

architecture

Move

enterprise

batch

processing to

Hadoop

Make Hadoop

the single

point of truth

Massively

reduce ETL by

transforming

within

Hadoop

Move results

and

aggregates

back to legacy

systems for

consumption

Retain, within

Hadoop,

source files at

the finest

granularity for

re-use

Top Criteria

• Allow users to use familiar consumption interfaces (web, mobile)

• Enable businesses to unlock previously unusable data

Unlock Big

Data

Simplify

Your

Warehouse

Preprocess

Raw Data

Ingest

BigData

Arc

hitect

ure

Hig

hle

vel

Page 5: Three Big Data Case Studies
Page 6: Three Big Data Case Studies

Atidan Case StudyUsage Analysis using Hadoop

• Business Need• A large conglomerate had to analyze the last 10 years usage of its web applications by using the IIS logs

• The logs received from IIS were stored in multiple files e.g. Daily logs

• The data had free text, it was unstructured and it also contained irrelevant data

• The exact analysis criteria/parameters/desired outcome were not pre-known

• Solution• Traditional RDBMS could not handle the problem due to the type and volume of the data and the

uncertainty around ultimate analysis criteria

• Atidan delivered a Hadoop based solution that performed transformation of raw data into reports easily

• The solution was fault tolerant to data inconsistencies

• Hadoop provided elasticity to incremental data addition

• Scalability in the range of Peta Bytes

• Based on data size and complexity, the processing can be scaled from one node to 100 nodes

• Schema-less architecture helped in dynamically changing the data model and analytics even at a late stage

in the project

• The organization got completely new and unexpected insights on employee, customer and vendor/partner

behavior

• Correlations between employee’s usage pattern and attrition as well as productivity were established

Page 7: Three Big Data Case Studies

Atidan Case StudyUsage Analysis using Hadoop

02000400060008000

100001200014000

Accep

ted

Bad R

equest…

Cre

ate

d (

20

1)

Forb

idden…

Not…

Not

Found…

OK

(2

00

)

Unauth

ori

se…

Request Types

0

200

400

600

800

1000

1200

January

Marc

h

May

July

Septe

mber

Novem

ber

January

Marc

h

May

July

Septe

mber

Novem

ber

2001 2002

Monthly Requests

0

200000

400000

600000

Am

are

Am

it

Bhagat

Mukesh

Pra

neel

Sanjo

g

Vim

al

Users

Page 8: Three Big Data Case Studies

• The size of data being collected

and analyzed in industry for

business intelligence (BI) is

growing rapidly making

traditional warehousing solution

prohibitively expensive

• Map Reduce is low level and

complex to write

• Hive provides high level query

language like SQL

• This allows for ad-hoc analysis

• Business need not know patterns

to look for in advance

Big Query - Hive

Page 9: Three Big Data Case Studies
Page 10: Three Big Data Case Studies

Atidan Case Study Customer data collection (KYC) using Hadoop

• Business Need• A financial institution had to periodically collect customer data

• Customers are very reluctant to provide updated data

• This customer data has to be cross-checked against the billions of transactions they receive per day

• They want to collate data that is available in public domain from known social media sites

• The data had free text, it was unstructured and it also contained irrelevant data

• Solution• A graph database is constructed over the extracted social data to analyze transactions

• Atidan delivered a Hadoop based solution that performed transformation of raw data into a graph database

• Aggregate customer information from existing sources, social media, government sources

• Analyzed transaction to find hidden patterns

• Enable link analysis, risk monitoring

• Facilitate decision making(new products) and customer discovery

Page 11: Three Big Data Case Studies

Atidan Case Study Customer data collection (KYC) using Hadoop

Big Data Processing

Graph Database

Customer Clustering

Income/Expense changes

Corporate structure

changes

AML

Peer group analysis

Pattern Analysis

Customer InformationWeb

Social

Channel

PartnersUtility

Providers

Aadhar

UIDAI

Page 12: Three Big Data Case Studies

• Lowers cost of follow-up with users

• Reduces loses by highlighting risky

users early

• Graph database based AML

• Insights into

• New products

• New customers

• New loans to existing customers

• New investment opportunities for

customers

• Reduces operational errors

• Traceability of data source

Advantages

of Hadoop (KYC) Solution to Banks

AML

Graph

Queries

Due

Diligence

Risk

Credit

Scoring

Mitigation

Analysis

Peer

groups

New

Prospects

Insights

New

Products

New

Customers

Page 13: Three Big Data Case Studies
Page 14: Three Big Data Case Studies

Atidan Case Study Email scanning and categorization using MongoDB

Business NeedRetrieve potentially millions of daily emails from a common webmail account, categorize them and post them into individual user’s

page for frontend access

The existing process had significant performance, reliability and scalability issues. The user would also receive a lot of SPAM

SolutionAtidan proposed a MongoDB-Drupal based solution with the following approach:

• Scheduler was created to pull only headers from the all-user common webmail account

• Stored them into the intermediate Catalog in MongoDB

• Data transformed based on the recipient address and user preferences. SPAM removed. Email body was fetched for the filtered

records and saved into the final Catalog in MongoDB

• Emails from the final catalog pushed into the front end platform (Drupal)

Key Takeaways• Leverage the power of MongoDB in processing ’Big Data’ of millions of daily emails. It is much faster, easy to scale and very flexible

• The task was spilt into multiple sub-tasks and better algorithm used for performance and efficiency

Page 15: Three Big Data Case Studies

Atidan Case Study Email scanning and categorization using MongoDB

Page 16: Three Big Data Case Studies

• Node.js (data transformation)

• MongoDB (database)

• Schema-less

• RESTFUL service to access data from the browser

• Drupal (Frontend)

• Basic unit of data storage and transfer was JSON object

• Storage and querying

• NoSQL/Simple/Schema-less database

• Advantages

• highly scalable, very flexible, simple

• Connectivity

• node.js

Server side Javascript

Technologies used

Page 17: Three Big Data Case Studies

Thank you!

www.atidan.com

[email protected]