big data on aws

Post on 28-Nov-2014

821 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

BIG Data on AWS

Paul Duffy

Big Data on the Cloud

In the Real World

How the Cloud Is

Big Data’s Best Friend

Characteristics of

Big Data

Characteristics of Big Data

The cost of data generation is falling rapidly

Dramatic increase in volume, velocity and

variety of data

BIG DATA

A collection of tools, techniques and technologies that

allow you to work productively with data at any scale.

Big Data is Getting Bigger

2.7 Zetabytes in 2012

Over 90% will be

unstructured

Data spread across a wide

array of silos

Features driven by MapReduce

Variable data structures and sources

Computer Generated

• Application server logs

(web sites, games)

• Sensor data (weather,

water, smart grids)

• Images/videos (traffic,

security cameras)

Human Generated

• Twitter “Fire Hose” 50m

tweets/day 1,400%

growth per year

• Blogs/Reviews/Emails/P

ictures

• Social Graphs:

Facebook, Linked-in,

Contacts

The Role of Data

is Changing

Traditional analytics required a fixed data model,

based on pre-known questions

Big Data promotes data exploration and experimentation which leads to innovation

Generation Collection &

storage Computation & analytics

Collaboration & sharing

Generation Collection &

storage Computation & analytics

Collaboration & sharing

Lower costs,

faster throughput

Increased pressure on traditional IT and tools

Require tools designed for data

collection and computation at

any volume, velocity or format.

Software

• Designed for distribution

• Easy programming models

• Flexible language choice

• Platform for abstraction and ecosystem

• Good example: Hadoop

Infrastructure

• Designed for distribution

• Easy programming models

• Flexible language choice

• Platform for abstraction and ecosystem

• Good example: Cloud computing

Software

Infrastructure

How the Cloud Is

Big Data’s Best Friend

How do we define the cloud?

By Benefits!

Cloud

Elasticity

Fast Time to Market Focus on core

competency

Pay Per

Use

No Cap Ex

Why is the Cloud

Big Data’s Best Friend?

We know we want collect, store, organize, analyze and

share it.

But we have limited resources.

The Cloud Optimizes

Precious IT Resources

i.e. Skilled People

“Over the next decade, the number of files or containers that

encapsulate the information in the digital universe will grow by

75x.

While the pool of IT staff available to manage them will grow

only slightly. At 1.5x”

- 2011 IDC Digital Universe Study

Deploying a Hadoop cluster is hard

Using Big Data

70%

The Old IT World

30%

Managing All of the “Undifferentiated Heavy Lifting”

Cloud computing

Cloud-Based Infrastructure

Using Big Data

Analyzing and Using Big Data Configuring

Cloud Assets

70%

30% 70%

30%

Managing All of the “Undifferentiated Heavy Lifting”

Cloud computing

The Old IT World

Reusability Managed

Services

Scale Innovation

Reusability Managed

Services

Scale Innovation

Reusability Managed

Services

Scale Innovation

Reusability Managed

Services

Scale Innovation

Reusability Managed

Services

Scale Innovation

The Cloud Optimizes

Capacity Resources

On and Off Fast Growth

Variable peaks Predictable peaks

Elastic Compute Capacity

Elastic Compute Capacity

On and Off Fast Growth

Predictable peaks Variable peaks

WASTE

CUSTOMER DISSATISFACTION

Elastic cloud capacity

Traditional

IT capacity

Your IT needs

Time

Capacity

Elastic Compute Capacity

Elastic Compute Capacity

Fast Growth On and Off

Predictable peaks Variable peaks

The Cloud Empowers Users

to Balance Cost and Time

1 instance for 500 hours

=

500 instances for 1 hour I like this!

I scale

The Cloud

Reduces Cost

For Experimentation

The Cloud

Enables Collection and Storage

of Big Data

Storage Costs are Declining

0,000

250,000

500,000

750,000

1000,000

1 Trillion

750k+ peak transactions per second

Simple Storage Service

Global Accessibility

Region

US-WEST (N. California) EU-WEST (Ireland)

ASIA PAC (Tokyo)

ASIA PAC

(Singapore)

US-WEST (Oregon)

SOUTH AMERICA (Sao Paulo)

US-EAST (Virginia)

GOV CLOUD

Amazon DynamoDB

Managed NoSQL database service

Unlimited size

Unlimited scale

Flexible key/value store

Consistent, low latencies (single digit milliseconds, SSD)

Robust, durable data storage

Integrated analytics with Elastic MapReduce

Amazon Elastic MapReduce

On-demand, managed analytics platform

Powered by Hadoop

Integrated with Spot instances to lower costs

Vibrant ecosystem of tools

Elastic clusters

Flexible programming model (Java, Python, Ruby etc)

Big Data on the Cloud

In the Real World

Big Data Verticals

Media/Advertising

Targeted Advertising

Image and Video

Processing

Oil & Gas

Seismic Analysis

Retail

Recommend

Transactions Analysis

Life Sciences

Genome Analysis

Financial Services

Monte Carlo Simulations

Risk Analysis

Security

Anti-virus

Fraud Detection

Image Recognition

Social Network/Gamin

g

User Demographics

Usage analysis

In-game metrics

Visualizations

Bank – Monte Carlo Simulations

“The AWS platform was a good fit for its unlimited and flexible computational power to our risk-simulation process requirements. With AWS, we now have the power to decide how fast we want to obtain simulation results, and, more importantly, we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter

23 Hours to 20 Minutes

The Taste Test http://www.etsy.com/tastetest

Recommendations

etsy.com/gifts

Recommendations

Gift Ideas for Facebook Friends

Targeted Ad

User recently

purchased a

sports movie and

is searching for

video games (1.7 Million per day)

Click Stream Analysis

Big Data on the Cloud

In the Real World

How the Cloud Is

Big Data’s Best Friend

Characteristics of

Big Data

Thank you…

top related