your big data arsenal - strata 2013

37
Matt Asay (@mjasay) VP, Business Development & Strategy, MongoDB Essential Tools For Your Big Data Arsenal

Upload: mjasay

Post on 10-May-2015

792 views

Category:

Technology


0 download

DESCRIPTION

Matt Asay presents at Strata 2013 on how NoSQL fits into the Big Data landscape, particularly how MongoDB and Hadoop work well together. Not an infomercial.

TRANSCRIPT

Page 1: Your Big Data Arsenal - Strata 2013

Matt Asay (@mjasay)VP, Business Development & Strategy, MongoDB

Essential Tools For Your Big Data Arsenal

Page 2: Your Big Data Arsenal - Strata 2013

The Big Data Unknown

Page 3: Your Big Data Arsenal - Strata 2013

3

Top Big Data Challenges?

Translation? Most struggle to know what Big Data is, how to manage it and who can manage it

Source: Gartner

Page 4: Your Big Data Arsenal - Strata 2013

4

Understanding Big Data – It’s Not Very “Big”

from Big Data Executive Summary – 50+ top executives from Government and F500 firms

64% - Ingest diverse, new data in real-time

15% - More than 100TB of data

20% - Less than 100TB (average of all? <20TB)

Page 5: Your Big Data Arsenal - Strata 2013

Innovation As Iteration

Page 6: Your Big Data Arsenal - Strata 2013

“I have not failed. I've just found 10,000 ways that won't work.” ― Thomas A. Edison

Page 7: Your Big Data Arsenal - Strata 2013

7

Back in 1970…Cars Were Great!

Page 8: Your Big Data Arsenal - Strata 2013

8

So Were Computers!

Page 9: Your Big Data Arsenal - Strata 2013

9

Lots of Great Innovations Since 1970

Page 10: Your Big Data Arsenal - Strata 2013

10

Including the Relational Database

Page 11: Your Big Data Arsenal - Strata 2013

11

RDBMS Makes Development Hard

Relational Database

Object Relational Mapping

Application

Code XML Config DB Schema

Page 12: Your Big Data Arsenal - Strata 2013

12

And Even Harder To Iterate

New Table

New Table

New Column

Name Pet Phone Email

New Column

3 months later…

Page 13: Your Big Data Arsenal - Strata 2013

13

RDBMS

From Complexity to Simplicity

MongoDB

{

_id : ObjectId("4c4ba5e5e8aabf3"),

employee_name: "Dunham, Justin",

department : "Marketing",

title : "Product Manager, Web",

report_up: "Neray, Graham",

pay_band: “C",

benefits : [

{ type :  "Health",

plan : "PPO Plus" },

{ type :   "Dental",

plan : "Standard" }

]

}

Page 14: Your Big Data Arsenal - Strata 2013

14

So…Use Open Source

Page 15: Your Big Data Arsenal - Strata 2013

15

Big Data != Big Upfront Payment

Page 16: Your Big Data Arsenal - Strata 2013

16

RDBMS Is Expensive To Scale

“Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.”

IBM Press Release 28 Aug, 2012

Page 17: Your Big Data Arsenal - Strata 2013

17

Spoiled for choice

1 Oracle  Relational DBMS 1583.84 54.232 MySQL  Relational DBMS 1331.34 25.583 Microsoft SQL Server  Relational DBMS 1207 -106.784 PostgreSQL  Relational DBMS 177.01 -5.225 DB2  Relational DBMS 175.83 3.586 MongoDB  NoSQL Document Store 149.48 -2.717 Microsoft Access  Relational DBMS 142.49 -4.218 SQLite  Relational DBMS 77.88 -4.99 Sybase  Relational DBMS 73.66 -1.68

10 Teradata  Relational DBMS 54.41 3.32

DB-Engines.com Database Ranking

Page 18: Your Big Data Arsenal - Strata 2013

18

Remember the Long Tail?

Page 19: Your Big Data Arsenal - Strata 2013

19

It Didn’t Work Out So Well

Page 20: Your Big Data Arsenal - Strata 2013

20

Use Popular, Well-Known Technologies

Source: Silicon Angle, 2012

Page 21: Your Big Data Arsenal - Strata 2013

21

Ask the Right Questions…

“Organizations already have people who know their own data better than mystical data scientists….Learning Hadoop [or MongoDB] is easier than learning the company’s business.”

(Gartner, 2012)

Page 22: Your Big Data Arsenal - Strata 2013

22

Leverage Existing Skills

Page 23: Your Big Data Arsenal - Strata 2013

23

Search as a Sign?

Page 24: Your Big Data Arsenal - Strata 2013

When To Use Hadoop, NoSQL

Page 25: Your Big Data Arsenal - Strata 2013

25

Enterprise Big Data Stack

EDWHadoop

Man

agem

ent

& M

on

ito

rin

gS

ecurity &

Au

ditin

g

RDBMS

CRM, ERP, Collaboration, Mobile, BI

OS & Virtualization, Compute, Storage, Network

RDBMS

Applications

Infrastructure

Data Management

Online Data Offline Data

Page 26: Your Big Data Arsenal - Strata 2013

26

Consideration – Online vs. Offline

• Long-running• High-Latency• Availability is lower

priority

• Real-time• Low-latency• High availability

Online Offlinevs.

Page 27: Your Big Data Arsenal - Strata 2013

27

Consideration – Online vs. Offline

Online Offlinevs.

Page 28: Your Big Data Arsenal - Strata 2013

28

Hadoop Is Good for…

Risk Modeling Churn AnalysisRecommendation

Engine

Ad TargetingTransaction

AnalysisTrade

Surveillance

Network Failure Prediction

Search Quality Data Lake

Page 29: Your Big Data Arsenal - Strata 2013

29

MongoDB/NoSQL Is Good for…

360° View of the Customer

Mobile & Social Apps

Fraud Detection

User Data Management

Content Management &

DeliveryReference Data

Product CatalogsMachine to

Machine AppsData Hub

Page 30: Your Big Data Arsenal - Strata 2013

How To Use The Two Together?

Page 31: Your Big Data Arsenal - Strata 2013

31

Finding Waldo

Page 32: Your Big Data Arsenal - Strata 2013

32

Customer example: Online Travel

Travel

• Flights, hotels and cars

• Real-time offers• User profiles, reviews• User metadata

(previous purchases, clicks, views)

• User segmentation• Offer recommendation

engine• Ad serving engine• Bundling engine

Algorithms

MongoDB Connector for

Hadoop

Page 33: Your Big Data Arsenal - Strata 2013

33

Predictive Analytics

Government

• Predictive analytics system for crime, health issues

• Diverse, unstructured (incl. geospatial) data from 30+ agencies

• Correlate data in real-time

• Long-form trend analysis• MongoDB data dumped

into Hadoop, analyzed, re-inserted into MongoDB for better real-time response

Algorithms

MongoDB

+ Hadoop

Page 34: Your Big Data Arsenal - Strata 2013

34

Data Hub

Insurance

• Insurance policies• Demographic data• Customer web data• Call center data• Real-time churn

detection

• Customer action analysis

• Churn prediction algorithms

Churn Analysis

MongoDB Connector for

Hadoop

Page 35: Your Big Data Arsenal - Strata 2013

35

Machine Learning

Ad-Serving

• Catalogs and products

• User profiles• Clicks• Views• Transactions

• User segmentation• Recommendation

engine• Prediction engine

Algorithms

MongoDB Connector for

Hadoop

Page 36: Your Big Data Arsenal - Strata 2013

36

• Makes MongoDB a Hadoop-enabled file system

• Read and write to live data, in-place

• Copy data between Hadoop and MongoDB

• Full support for data processing

– Hive

– MapReduce

– Pig

– Streaming

– EMR

MongoDB + Hadoop Connector

MongoDB Connector for

Hadoop

Page 37: Your Big Data Arsenal - Strata 2013

@mjasay