big data on aws

55
BIG Data on AWS Paul Duffy

Upload: amazon-web-services-latin-america

Post on 28-Nov-2014

821 views

Category:

Technology


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Big Data on AWS

BIG Data on AWS

Paul Duffy

Page 2: Big Data on AWS

Big Data on the Cloud

In the Real World

How the Cloud Is

Big Data’s Best Friend

Characteristics of

Big Data

Page 3: Big Data on AWS

Characteristics of Big Data

Page 4: Big Data on AWS

The cost of data generation is falling rapidly

Dramatic increase in volume, velocity and

variety of data

Page 5: Big Data on AWS

BIG DATA

A collection of tools, techniques and technologies that

allow you to work productively with data at any scale.

Page 6: Big Data on AWS

Big Data is Getting Bigger

2.7 Zetabytes in 2012

Over 90% will be

unstructured

Data spread across a wide

array of silos

Page 7: Big Data on AWS

Features driven by MapReduce

Page 8: Big Data on AWS

Variable data structures and sources

Computer Generated

• Application server logs

(web sites, games)

• Sensor data (weather,

water, smart grids)

• Images/videos (traffic,

security cameras)

Human Generated

• Twitter “Fire Hose” 50m

tweets/day 1,400%

growth per year

• Blogs/Reviews/Emails/P

ictures

• Social Graphs:

Facebook, Linked-in,

Contacts

Page 9: Big Data on AWS

The Role of Data

is Changing

Page 10: Big Data on AWS

Traditional analytics required a fixed data model,

based on pre-known questions

Big Data promotes data exploration and experimentation which leads to innovation

Page 11: Big Data on AWS

Generation Collection &

storage Computation & analytics

Collaboration & sharing

Page 12: Big Data on AWS

Generation Collection &

storage Computation & analytics

Collaboration & sharing

Lower costs,

faster throughput

Increased pressure on traditional IT and tools

Page 13: Big Data on AWS

Require tools designed for data

collection and computation at

any volume, velocity or format.

Page 14: Big Data on AWS

Software

• Designed for distribution

• Easy programming models

• Flexible language choice

• Platform for abstraction and ecosystem

• Good example: Hadoop

Page 15: Big Data on AWS

Infrastructure

• Designed for distribution

• Easy programming models

• Flexible language choice

• Platform for abstraction and ecosystem

• Good example: Cloud computing

Page 16: Big Data on AWS

Software

Infrastructure

Page 17: Big Data on AWS

How the Cloud Is

Big Data’s Best Friend

Page 18: Big Data on AWS

How do we define the cloud?

By Benefits!

Page 19: Big Data on AWS

Cloud

Elasticity

Fast Time to Market Focus on core

competency

Pay Per

Use

No Cap Ex

Page 20: Big Data on AWS

Why is the Cloud

Big Data’s Best Friend?

Page 21: Big Data on AWS

We know we want collect, store, organize, analyze and

share it.

But we have limited resources.

Page 22: Big Data on AWS

The Cloud Optimizes

Precious IT Resources

i.e. Skilled People

Page 23: Big Data on AWS

“Over the next decade, the number of files or containers that

encapsulate the information in the digital universe will grow by

75x.

While the pool of IT staff available to manage them will grow

only slightly. At 1.5x”

- 2011 IDC Digital Universe Study

Page 24: Big Data on AWS

Deploying a Hadoop cluster is hard

Page 25: Big Data on AWS

Using Big Data

70%

The Old IT World

30%

Managing All of the “Undifferentiated Heavy Lifting”

Cloud computing

Page 26: Big Data on AWS

Cloud-Based Infrastructure

Using Big Data

Analyzing and Using Big Data Configuring

Cloud Assets

70%

30% 70%

30%

Managing All of the “Undifferentiated Heavy Lifting”

Cloud computing

The Old IT World

Page 27: Big Data on AWS

Reusability Managed

Services

Scale Innovation

Page 28: Big Data on AWS

Reusability Managed

Services

Scale Innovation

Page 29: Big Data on AWS

Reusability Managed

Services

Scale Innovation

Page 30: Big Data on AWS

Reusability Managed

Services

Scale Innovation

Page 31: Big Data on AWS

Reusability Managed

Services

Scale Innovation

Page 32: Big Data on AWS

The Cloud Optimizes

Capacity Resources

Page 33: Big Data on AWS

On and Off Fast Growth

Variable peaks Predictable peaks

Elastic Compute Capacity

Page 34: Big Data on AWS

Elastic Compute Capacity

On and Off Fast Growth

Predictable peaks Variable peaks

WASTE

CUSTOMER DISSATISFACTION

Page 35: Big Data on AWS

Elastic cloud capacity

Traditional

IT capacity

Your IT needs

Time

Capacity

Elastic Compute Capacity

Page 36: Big Data on AWS

Elastic Compute Capacity

Fast Growth On and Off

Predictable peaks Variable peaks

Page 37: Big Data on AWS

The Cloud Empowers Users

to Balance Cost and Time

Page 38: Big Data on AWS

1 instance for 500 hours

=

500 instances for 1 hour I like this!

I scale

Page 39: Big Data on AWS

The Cloud

Reduces Cost

For Experimentation

Page 40: Big Data on AWS

The Cloud

Enables Collection and Storage

of Big Data

Page 41: Big Data on AWS

Storage Costs are Declining

Page 42: Big Data on AWS

0,000

250,000

500,000

750,000

1000,000

1 Trillion

750k+ peak transactions per second

Simple Storage Service

Page 43: Big Data on AWS

Global Accessibility

Region

US-WEST (N. California) EU-WEST (Ireland)

ASIA PAC (Tokyo)

ASIA PAC

(Singapore)

US-WEST (Oregon)

SOUTH AMERICA (Sao Paulo)

US-EAST (Virginia)

GOV CLOUD

Page 44: Big Data on AWS

Amazon DynamoDB

Managed NoSQL database service

Unlimited size

Unlimited scale

Flexible key/value store

Consistent, low latencies (single digit milliseconds, SSD)

Robust, durable data storage

Integrated analytics with Elastic MapReduce

Page 45: Big Data on AWS

Amazon Elastic MapReduce

On-demand, managed analytics platform

Powered by Hadoop

Integrated with Spot instances to lower costs

Vibrant ecosystem of tools

Elastic clusters

Flexible programming model (Java, Python, Ruby etc)

Page 46: Big Data on AWS

Big Data on the Cloud

In the Real World

Page 47: Big Data on AWS

Big Data Verticals

Media/Advertising

Targeted Advertising

Image and Video

Processing

Oil & Gas

Seismic Analysis

Retail

Recommend

Transactions Analysis

Life Sciences

Genome Analysis

Financial Services

Monte Carlo Simulations

Risk Analysis

Security

Anti-virus

Fraud Detection

Image Recognition

Social Network/Gamin

g

User Demographics

Usage analysis

In-game metrics

Page 48: Big Data on AWS

Visualizations

Page 49: Big Data on AWS

Bank – Monte Carlo Simulations

“The AWS platform was a good fit for its unlimited and flexible computational power to our risk-simulation process requirements. With AWS, we now have the power to decide how fast we want to obtain simulation results, and, more importantly, we have the ability to run simulations not possible before due to the large amount of infrastructure required.” – Castillo, Director, Bankinter

23 Hours to 20 Minutes

Page 50: Big Data on AWS

The Taste Test http://www.etsy.com/tastetest

Recommendations

Page 51: Big Data on AWS

etsy.com/gifts

Recommendations

Gift Ideas for Facebook Friends

Page 52: Big Data on AWS
Page 53: Big Data on AWS

Targeted Ad

User recently

purchased a

sports movie and

is searching for

video games (1.7 Million per day)

Click Stream Analysis

Page 54: Big Data on AWS

Big Data on the Cloud

In the Real World

How the Cloud Is

Big Data’s Best Friend

Characteristics of

Big Data

Page 55: Big Data on AWS

Thank you…