exploring the wider world of big data- vasalis kapsalis

29
The Big Data Landscape

Upload: netappuk

Post on 26-Jan-2015

106 views

Category:

Technology


1 download

DESCRIPTION

Every second of every day you hear about Electronic systems creating ever increasing quantities of data. Systems in markets such as finance, media, healthcare, government and scientific research feature strongly in the Big Data processing conversation. While extracting business value from Big Data is forecast to bring customer and competitive advantage and benefits. In this session hear Vas Kapsalis, NetApp Big Data Business Development Manager, discuss his views and experience on the wider world of Big Data.

TRANSCRIPT

Page 1: Exploring the Wider World of Big Data- Vasalis Kapsalis

The Big Data Landscape

Page 2: Exploring the Wider World of Big Data- Vasalis Kapsalis

Entering a New Era of Scale

2

Page 3: Exploring the Wider World of Big Data- Vasalis Kapsalis

Convergence of Technology Disrupters

Create Opportunity

NetApp Confidential - Internal Use Only

Cloud

Social Mobile

Internet of

Things

Big Data

Page 4: Exploring the Wider World of Big Data- Vasalis Kapsalis

Traditional Structured and Replicated Data mix shift is driven by:

− Efficiency (Dedup, Compr, Thin Prov, SATA)

− Growth in new category of storage consumers using cloud / content depots

Unstructured Data (files and objects) in traditional storage + Content depots / Cloud) will be the largest storage category by 2014

− Content depots / Cloud expected to be 95% unstructured data

Revenue Share by Segment

Traditional structured

Traditional replicated

Traditional unstructured

Content depots / public cloud

Unstructured Data Growth Dominates

Page 5: Exploring the Wider World of Big Data- Vasalis Kapsalis

Not Even to The “Peak”

Estimated size of the

digital universe in 2020

40 Zettabytes 5 Billion Smart phones

30 Billion Pieces of new content to

Facebook per month

5

Technology Trigger

Peak of Inflated Expectations

Trough of Disillusionment

Slope of Enlightenment

Plateau of Productivity

VISIBILITY

TIME

80% Unstructured

data

Page 6: Exploring the Wider World of Big Data- Vasalis Kapsalis

Big Data Is All Data From Everywhere

Transactional Data

Machine Data

Social Data

Enterprise Content

Fundamentally changes your business

The Jet way

The Call Center

Page 7: Exploring the Wider World of Big Data- Vasalis Kapsalis

Big Data Vendor Landscape A Lot of Hype and Buzz – Everyone is Jumping In

Market is expected to grow from $3.2 billion

in 2010 to $16.9 billion in 2015

NoSQL $2Bn PA by 2015

Most firms are taking a pragmatic approach

Big data is in the very early stages of maturity

Best practices are not mature

IDC Big Data Survey

7

Nov-11

400

350

300

250

200

150

100

50

0 Jan-08

Cloudera series B

MapR series A

Cloudera series C

10gen series D

MapR series B

DataStax series B

Neo Technology series A

Opera Solutions series A

Platfora series A

Couchbase series C

Cloudera series D

Funding for Hadoop and NoSQL

"The Big Data market is expanding rapidly …

For technology buyers, opportunities exist to

use Big Data technology to improve

operational efficiency and to drive innovation.

Use cases are already present across

industries and geographic regions."

Dan Vesset, Vice President, IDC

451 Research

Page 8: Exploring the Wider World of Big Data- Vasalis Kapsalis

Data Growth Impact on Business

8

Complexity

Volume Speed

Bu

sin

es

s V

elo

cit

y

Inflection

Point

Information Becomes

a Propellant to Business

Data Becomes a

Burden to IT Infrastructure

2010 2020

“Big Data” refers to datasets whose size is

beyond the ability of typical tools to capture,

store, manage and analyze

Page 9: Exploring the Wider World of Big Data- Vasalis Kapsalis

Why Should You Care? It’s the Value of Your Data

Top line revenue

– Leverage their data

assets into business

advantage

Bottom Line savings

– Lower the cost of

compliance

– Manage ever growing

data efficiently

Over 1PB of data

Growth of 175% YOY

90 days of data within

24 hours of a failure

5 Billion Records

Anywhere, Anytime

Faster time to market

50% Increase in Revenue

9

Page 10: Exploring the Wider World of Big Data- Vasalis Kapsalis

NetApp Big Data

Page 11: Exploring the Wider World of Big Data- Vasalis Kapsalis

Why NetApp? Practical solutions that solve today’s problems

Get

Control

NetApp helps you turn your

exploding data from threat to

opportunity. Manage your data

effectively and affordably.

Break

Through

Break through the limits. With

NetApp, you can take on even the

most massive and complex data

projects.

Gain

Insight

Turn insight to action. NetApp helps

you get to clarity and insight faster

and more reliably.

11

Page 12: Exploring the Wider World of Big Data- Vasalis Kapsalis

Experience Managing Data at Scale

12

100 Customers

50 Customers

10 Customers

4 Customers 100 PB

50 PB

20 PB

10 PB

NetApp’s Largest Customer

Page 13: Exploring the Wider World of Big Data- Vasalis Kapsalis

NetApp Big Data Strategy

Best of breed storage for Big

Data Applications

Create deep integration and

value add

Build on open standards with

best-in-class partnerships

Validate with Ecosystem

Leaders

– Complete server, network and

storage “Racks”

– Delivered via trusted high-value

partners

13

Open

Best-of-Breed

Choice

Page 14: Exploring the Wider World of Big Data- Vasalis Kapsalis

Industry-Leading Storage Innovation

14

Flash Arrays for ultra-high performance

E-Series for price-performance at scale

StorageGRID for web scale object storage

Clustered Data ONTAP for Shared Infrastructure

Corporate

Data Centers

Cloud

Data Centers

Page 15: Exploring the Wider World of Big Data- Vasalis Kapsalis

Big Content Retain forever, multi-site distribution

Big Bandwidth Ingest, Process, Stream

Big Analytics Reduce, Analyze, Report

Cloud Private/Public

Retain, Distribute

Big Data Building Blocks

Applications

Extract

Retain, Distribute

Store

Retrieve

15

Page 16: Exploring the Wider World of Big Data- Vasalis Kapsalis

16

Page 17: Exploring the Wider World of Big Data- Vasalis Kapsalis

Analytics Oriented Business Processing

RDBMS General Purpose DB

Data organized to

align with schemas

Fixed consistency

model

Complex queries

supported

Volume based data

management

Columnar DB Analytics Oriented

Data organized in

column files

Tabular interface

without rigid schemas

Fast column scans

Multiple consistency

models

Transaction granular

data management

Document Store Transaction Oriented

Data organized in

data structures in

memory

Schemaless

transaction store for

structured data

High transactional

performance

K-V Store Metadata Service

Oriented

Data organized in key

value pairs

Suitable for metadata

services with CMS’

Associated with

object services

Transaction Processing

Realtime Analytics

Business Applications

Memory Ingest Disk/Flash Tier

Query-based

Retrieval

Commit

Federated Database Store (Build/Buy/Partner)

Persisted

Commit

Transaction granular data

resilience, recoverability &

protection at line speeds

Data organization

optimized by query

interface

Performance

optimized query

service

Page 18: Exploring the Wider World of Big Data- Vasalis Kapsalis

Analytics Technologies to look out for!

Columnar

DBs (Analytics

Oriented)

Document

Stores (Transaction

Oriented)

Key-Value

Stores (Content/Object

Service)

Graph

DBs (Niche)

Relational DBs

Row-oriented

RDBMS’

Datacenter Multi - Datacenter

• ACID constrained

• Complete query set

• Limited availability

• High consistency

• Rich query set

• Good availability

• Tuneable consistency

• Limited query set

• Highest/WAN availability

Old World New World

Page 19: Exploring the Wider World of Big Data- Vasalis Kapsalis

Analytics & Enterprise Apps Environment

19

Sensors

Applications

Logs

Location/GPS

Mobile Devices

Storage (All other storage, i.e. internal DAS)

Content

Repositories Shared Storage

Infrastructure

Storage File Systems

Data Management

Analytics

Applications

Reporting/Dashboard/Visualization

ETL

OLAP

OLTP

Other Data Sources

OLAP ETL

Storage Data Management

NFS/sNFS/pNFS

NetApp Confidential – Limited Use

Page 20: Exploring the Wider World of Big Data- Vasalis Kapsalis

Some problems require an Enterprise Class

Hadoop solution

20

Enterprise Class Hadoop

Packaged ready-to-deploy modular Hadoop cluster

The data has intrinsic value $$$ Capacity and compute requirements

expanding very fast Higher storage performance Real human consequences if the system

fails (Threats, treatments, financial losses) System has to allow for asymmetric growth

Commodity, Off the Shelf Hadoop

Values associated with early adopters of Hadoop

Social Media Space Contributors to Apache Strong bias to JBOD Skeptical of ALL vendors

Enterprise Class Hadoop

Packaged ready-to-deploy modular compute intensive Hadoop cluster

Compute intensive applications

Video, imaging analysis

Extremely tight Service Level expectations

Severe financial consequences if the

data analytic application or service is

run late

Enterprise Class Hadoop

Packaged ready-to-deploy modular storage intensive Hadoop cluster

Storage intensive applications

Additional CPUs does not help run time

Financial ticker data analysis

Extremely tight Service Level expectations

Need deeper storage per datanode

Co

mp

ute

Po

we

r

Storage Capacity

NetApp Confidential – Limited Use

Page 21: Exploring the Wider World of Big Data- Vasalis Kapsalis

21

NetApp Open Solution for Hadoop

Easy to Deploy, Manage and Scale

Uses High Performance storage

– Resilient and Compact

– RAID Protection of Data

– Less Network Congestion

Raw Capacity and density

– 120TB or 180TB in 4U

– Fully serviceable storage system

Reliability

– Hardware RAID & hot swap prevent job restart due to node go off-line in case of media failure

– Reliable metadata (Name Node)

Enterprise Class Hadoop

Map Reduce

NameNode

DataNodes / TaskTracker

DataNodes / TaskTracker

:

HDFS

Secondary NameNode

4 separate shared nothing partitions

NetApp Confidential – Limited Use

JobTracker

FAS2040

E2660

Page 22: Exploring the Wider World of Big Data- Vasalis Kapsalis

NetApp Open Solution for Hadoop Validated Benefits for the Enterprise

Improved cluster performance by 62%

Completed jobs 200% faster under

drive failure

Delivered linear performance scalability

as nodes, data grew

Per-server capacity increase of 1.5x

The NetApp Open Solution for Hadoop improves capacity

and performance efficiency and recoverability compared to

a server-based DAS deployment.

- ESG, 2012

Page 23: Exploring the Wider World of Big Data- Vasalis Kapsalis

Optimizing Performance and Stay Healthy

23

Source: Garrett, Brian and Lockner, Julie, “NetApp Open Solution for Hadoop”, ESG Report,

May 2012, http://bit.ly/LyYG0t

Network Overhead Useful Work

Availability and

Resiliency Burst Handling and

Queuing

Oversubscription

Ratio

Data Node Network

Speed

Network

Latency

Source: Cisco: http://bit.ly/yL54Ts

Page 24: Exploring the Wider World of Big Data- Vasalis Kapsalis

DAS vs. NetApp footprint

DAS Option 2RU, CPU: 2x8 cores, RAM: 48GB, Disk:

24 TB

1 Rack(42RU): 20 servers (320 cores, 960GB, 480TB)

6 Racks: 1920 cores, 5.7TB RAM, 2.8 PB Storage (120 servers)

NetApp Option 1RU, CPU: 2x8 cores, RAM: 48GB, Disk: 2

TB (8TB Max(Optional PIXI Boot Diskless)

1 Rack (42RU)

CPU and Memory: 24 servers(6:1), 384 cores, 1.152TB

Storage: 4 E2660 720TB

4 Racks: 1536 cores, 4.6TB, 2.8 PB (96 servers)

Page 25: Exploring the Wider World of Big Data- Vasalis Kapsalis

Case Study: ASUP NetApp Analytics

25

Gateways

• 800K ASUPs every week

• 40% coming over the weekend

Extract Transform

Load

Data Warehouse Data Mart

Data Mart

ETL

• Data needs to be parsed and loaded in 15 minutes

Data Warehouse

• Only 5% of data goes into the data warehouse, rest unstructured, yet it’s growing 7-10 TB per month

• No easy way to access this unstructured content

Reporting

• Numerous mining requests are not satisfied currently

• Huge untapped potential of valuable insight

Finally, the incoming load doubles every 16 months!

NetApp Proprietary - Limited Use Only

Page 26: Exploring the Wider World of Big Data- Vasalis Kapsalis

Case Study: NetApp Large-Scale Analytics

CHALLENGE NETAPP

SOLUTION BENEFITS

4 weeks to run a query on 24 billion unstructured records

10-node Hadoop Cluster

Time reduced from 4 weeks to 10.5 hours

Impossible to run a query: 240 billion unstructured records

Previously impossible, now achievable in just 18 hours

26 NetApp Proprietary - Limited Use Only

Page 27: Exploring the Wider World of Big Data- Vasalis Kapsalis

Big Data System Integrators Solutions Built on NetApp®

Integrated Big Data Solutions and Expertise

Planning and implementation expertise for Big Data

Turn-key solution stacks and Big Data services

27

Page 28: Exploring the Wider World of Big Data- Vasalis Kapsalis

Next Steps - Team with the Experts

Strategic Assessment – Business goals

– Data growth needs

– Use case discovery (partner

delivery)

Consult – Solution architecture and design

(NetApp delivery)

Deploy – Installation and implementation

(NetApp delivery)

– Solution implementation (partner

delivery)

28

Support options:

Global support available

from NetApp and partners

Page 29: Exploring the Wider World of Big Data- Vasalis Kapsalis

NetApp Confidential - Internal Use Only