Transcript
Page 1: EMC Hadoop Starter Kit - ViPR Edition

1© Copyright 2014 EMC Corporation. All rights reserved.

EMC Hadoop Starter KitViPR Edition

EMC Open Innovation Lab

Page 2: EMC Hadoop Starter Kit - ViPR Edition

2© Copyright 2014 EMC Corporation. All rights reserved.

The Digital Universe

Less than 1% of the World’s Data

is AnalyzedBy 2020, the Internet will

connect 7.6B people

and 200B things (sensors, machines, cars, appliances…)

Data Volumes

2000: 2 Exabytes a year2011: 2 Exabytes a day

Page 3: EMC Hadoop Starter Kit - ViPR Edition

3© Copyright 2014 EMC Corporation. All rights reserved.

Location & Types Of Big Data

Structured Data

UnstructuredData

Enterprise

ForecastData

LocationData

CreditData

ShippingData

Social, Video Data

Partner Public

10101010100101010011001010101110010

1101010100101011111

TelemetryData

Location & Types Of Big (& Fast!) Data

Page 4: EMC Hadoop Starter Kit - ViPR Edition

4© Copyright 2014 EMC Corporation. All rights reserved.

Hadoop Challenges

Depends on HDFS for data repository– Must make legacy data accessible through HDFS

Hadoop HDFS inefficiencies:– 3 copies for protection– No advanced data efficiency: de-duplication, thin provision– Security

Integration with robust traditional data center products: compute virtualization, enterprise storage

Page 5: EMC Hadoop Starter Kit - ViPR Edition

5© Copyright 2014 EMC Corporation. All rights reserved.

Hadoop Storage Options

Hadoop HDFS

• Leverage Hadoop distro HDFS data services

• Compute, and data converged on cluster of servers

Storage Array

• Name node and Data node services from storage array (i.e. EMC Isilon)

Storage OS

Name node and Data node services from storage OS (i.e. EMC ViPR)

Page 6: EMC Hadoop Starter Kit - ViPR Edition

6© Copyright 2014 EMC Corporation. All rights reserved.

ViPR HDFS

HDFS is becoming the de facto file system for distributed applications

ViPR is a great platform for HDFS– Addresses limitations of off-the-shelf HDFS– Brings HDFS to existing storage hardware– Enables HDFS/object/file scenarios– Flexible software model allows colocation

Page 7: EMC Hadoop Starter Kit - ViPR Edition

7© Copyright 2014 EMC Corporation. All rights reserved.

Support Mixed WorkloadsObject, File and HDFS operations on the same data

VIRTUAL ARRAY

Isilon3rd Party

VNX5500

ViPR Data Services offer three bucket options:

– Object– HDFS– ObjectandHDFS

ObjectandHDFS provides user with access to either S3 or HDFS

– Full compatibility with existing object based APIs

▪ Amazon S3, Openstack Swift, Atmos

Object HDFSObject& HDFS

Page 8: EMC Hadoop Starter Kit - ViPR Edition

8© Copyright 2014 EMC Corporation. All rights reserved.

Simple, Easy, Cost Effective EMC Starter Kit for Hadoop – ViPR Edition

Deployment guides for major Hadoop distributions:– Pivotal, Cloudera, and Hortonworks

Four step deployment:– Deploy preferred Hadoop Distribution– Deploy EMC ViPR with Object, and HDFS data services– Configure Hadoop distribution to use ViPR HDFS target– Validation Process

▪ Load data file via S3 interface▪ Test MapReduce job

Page 9: EMC Hadoop Starter Kit - ViPR Edition

Top Related