essential tools for your big data arsenal

37
Matt Asay (@mjasay) VP, Business Development & Strategy, MongoDB Essential Tools For Your Big Data Arsenal

Upload: mongodb

Post on 08-May-2015

5.094 views

Category:

Technology


5 download

DESCRIPTION

For some, Hadoop is synonymous with “Big Data,” but Hadoop is just one component of a successful Big Data architecture. Depending on one’s application, it may not even be the most important part. NoSQL solutions like MongoDB also play a dominant role for storage and real-time data processing, helping companies keep pace with the scale of their data requirements. But NoSQL figures even more prominently in helping enterprises consume a wide variety of data sources at speeds not currently possible in Hadoop. NoSQL, then, offers a useful complement to Hadoop, as well as the transaction-based data of traditional RDBMSs. Tackling Big Data is not a one-tool job, and so the orchestration of the appropriate NoSQL database with Hadoop and RDBMS is essential. In this session, we’ll dig deep into the different types of NoSQL, identifying how they differ and the types of Big Data workloads for which they’re best suited. We’ll also explore the trade-offs one makes in choosing NoSQL databases like MongoDB or Neo4j over an RDBMS like MySQL, and when it makes sense to use both Hadoop and NoSQL and when it’s more appropriate to use NoSQL on its own.

TRANSCRIPT

Page 1: Essential Tools For Your Big Data Arsenal

Matt Asay (@mjasay)VP, Business Development & Strategy, MongoDB

Essential Tools For Your Big Data Arsenal

Page 2: Essential Tools For Your Big Data Arsenal

The Big Data Unknown

Page 3: Essential Tools For Your Big Data Arsenal

3

Top Big Data Challenges?

Translation? Most struggle to know what Big Data is, how to manage it and who can manage it

Source: Gartner

Page 4: Essential Tools For Your Big Data Arsenal

4

Understanding Big Data – It’s Not Very “Big”

from Big Data Executive Summary – 50+ top executives from Government and F500 firms

64% - Ingest diverse, new data in real-time

15% - More than 100TB of data

20% - Less than 100TB (average of all? <20TB)

Page 5: Essential Tools For Your Big Data Arsenal

Innovation As Iteration

Page 6: Essential Tools For Your Big Data Arsenal

“I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison

Page 7: Essential Tools For Your Big Data Arsenal

7

Back in 1970…Cars Were Great!

Page 8: Essential Tools For Your Big Data Arsenal

8

So Were Computers!

Page 9: Essential Tools For Your Big Data Arsenal

9

Lots of Great Innovations Since 1970

Page 10: Essential Tools For Your Big Data Arsenal

10

Including the Relational Database

Page 11: Essential Tools For Your Big Data Arsenal

11

RDBMS Makes Development Hard

Relational Database

Object Relational Mapping

Application

Code XML Config DB Schema

Page 12: Essential Tools For Your Big Data Arsenal

12

And Even Harder To Iterate

New Table

New Table

New Column

Name Pet Phone Email

New Column

3 months later…

Page 13: Essential Tools For Your Big Data Arsenal

13

RDBMS

From Complexity to Simplicity

MongoDB

{

_id : ObjectId("4c4ba5e5e8aabf3"),

employee_name: "Dunham, Justin",

department : "Marketing",

title : "Product Manager, Web",

report_up: "Neray, Graham",

pay_band: “C",

benefits : [

{ type :  "Health",

plan : "PPO Plus" },

{ type :   "Dental",

plan : "Standard" }

]

}

Page 14: Essential Tools For Your Big Data Arsenal

14

So…Use Open Source

Page 15: Essential Tools For Your Big Data Arsenal

15

Big Data != Big Upfront Payment

Page 16: Essential Tools For Your Big Data Arsenal

16

RDBMS Is Expensive To Scale

“Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.”

IBM Press Release 28 Aug, 2012

Page 17: Essential Tools For Your Big Data Arsenal

17

Spoiled for choice

1 Oracle  Relational DBMS 1583.84 54.232 MySQL  Relational DBMS 1331.34 25.583 Microsoft SQL Server  Relational DBMS 1207 -106.784 PostgreSQL  Relational DBMS 177.01 -5.225 DB2  Relational DBMS 175.83 3.586 MongoDB  NoSQL Document Store 149.48 -2.717 Microsoft Access  Relational DBMS 142.49 -4.218 SQLite  Relational DBMS 77.88 -4.99 Sybase  Relational DBMS 73.66 -1.68

10 Teradata  Relational DBMS 54.41 3.32

DB-Engines.com Database Ranking

Page 18: Essential Tools For Your Big Data Arsenal

18

Remember the Long Tail?

Page 19: Essential Tools For Your Big Data Arsenal

19

It Didn’t Work Out So Well

Page 20: Essential Tools For Your Big Data Arsenal

20

Use Popular, Well-Known Technologies

Source: Silicon Angle, 2012

Page 21: Essential Tools For Your Big Data Arsenal

21

Ask the Right Questions…

“Organizations already have people who know their own data better than mystical data scientists….Learning Hadoop [or MongoDB] is easier than learning the company’s business.”

(Gartner, 2012)

Page 22: Essential Tools For Your Big Data Arsenal

22

Leverage Existing Skills

Page 23: Essential Tools For Your Big Data Arsenal

23

Search as a Sign?

Page 24: Essential Tools For Your Big Data Arsenal

When To Use Hadoop, NoSQL

Page 25: Essential Tools For Your Big Data Arsenal

25

Enterprise Big Data Stack

EDWHadoop

Man

agem

ent

& M

on

ito

rin

gS

ecurity &

Au

ditin

g

RDBMS

CRM, ERP, Collaboration, Mobile, BI

OS & Virtualization, Compute, Storage, Network

RDBMS

Applications

Infrastructure

Data Management

Online Data Offline Data

Page 26: Essential Tools For Your Big Data Arsenal

26

Consideration – Online vs. Offline

• Long-running• High-Latency• Availability is lower

priority

• Real-time• Low-latency• High availability

Online Offlinevs.

Page 27: Essential Tools For Your Big Data Arsenal

27

Consideration – Online vs. Offline

Online Offlinevs.

Page 28: Essential Tools For Your Big Data Arsenal

28

Hadoop Is Good for…

Risk Modeling Churn AnalysisRecommendation

Engine

Ad TargetingTransaction

AnalysisTrade

Surveillance

Network Failure Prediction

Search Quality Data Lake

Page 29: Essential Tools For Your Big Data Arsenal

29

MongoDB/NoSQL Is Good for…

360° View of the Customer

Mobile & Social Apps

Fraud Detection

User Data Management

Content Management &

DeliveryReference Data

Product CatalogsMachine to

Machine AppsData Hub

Page 30: Essential Tools For Your Big Data Arsenal

How To Use The Two Together?

Page 31: Essential Tools For Your Big Data Arsenal

31

Finding Waldo

Page 32: Essential Tools For Your Big Data Arsenal

32

Customer example: Online Travel

Travel

• Flights, hotels and cars

• Real-time offers• User profiles, reviews• User metadata

(previous purchases, clicks, views)

• User segmentation• Offer recommendation

engine• Ad serving engine• Bundling engine

Algorithms

MongoDB Connector for

Hadoop

Page 33: Essential Tools For Your Big Data Arsenal

33

Predictive Analytics

Government

• Predictive analytics system for crime, health issues

• Diverse, unstructured (incl. geospatial) data from 30+ agencies

• Correlate data in real-time

• Long-form trend analysis• MongoDB data dumped

into Hadoop, analyzed, re-inserted into MongoDB for better real-time response

Algorithms

MongoDB

+ Hadoop

Page 34: Essential Tools For Your Big Data Arsenal

34

Data Hub

Insurance

• Insurance policies• Demographic data• Customer web data• Call center data• Real-time churn

detection

• Customer action analysis

• Churn prediction algorithms

Churn Analysis

MongoDB Connector for

Hadoop

Page 35: Essential Tools For Your Big Data Arsenal

35

Machine Learning

Ad-Serving

• Catalogs and products

• User profiles• Clicks• Views• Transactions

• User segmentation• Recommendation

engine• Prediction engine

Algorithms

MongoDB Connector for

Hadoop

Page 36: Essential Tools For Your Big Data Arsenal

36

• Makes MongoDB a Hadoop-enabled file system

• Read and write to live data, in-place

• Copy data between Hadoop and MongoDB

• Full support for data processing

– Hive

– MapReduce

– Pig

– Streaming

– EMR

MongoDB + Hadoop Connector

MongoDB Connector for

Hadoop

Page 37: Essential Tools For Your Big Data Arsenal

@mjasay