washington, d.c.d36cz9buwru1tt.cloudfront.net/146cb-300-big-data... · 2013 aws worldwide public...

Post on 01-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Big Data in the Cloud: Accelerating Innovation in the Public Sector

Jamie Kinney│Principal Solutions Architect

jkinney@amazon.com │ @jamiekinney

2013 AWS Worldwide Public Sector Summit

Technologies and techniques for

working productively with data,

at any scale

BIG DATA

2013 AWS Worldwide Public Sector Summit

The more data you collect

The more VALUE you can

derive from it

Bigger is Better!

2013 AWS Worldwide Public Sector Summit

YOU DON’T HAVE

THE CHOICE…

27 TB per day Large Hadron Collider – CERN

2013 AWS Worldwide Public Sector Summit

GB TB

PB

Compute Storage Big Data

Unconstrained data growth

95% of the 1.2 zettabytes of data in the digital universe is unstructured

70% of of this is user-generated content

Unstructured data growth explosive, with estimates of compound annual growth (CAGR) at 62% from 2008 – 2012.

Source: IDC

ZB

EB

2013 AWS Worldwide Public Sector Summit

Big Data Verticals

Media Advertising

Targeted Advertising

Image and Video

Processing

Oil & Gas

Seismic Analysis

Retail

Recom-mendations

Transaction Analysis

Life Sciences

Genome Analysis

Financial

Services

Monte Carlo

Simulations

Risk Analysis

Security

Anti-virus

Fraud Detection

Image Recognition

Social Network Gaming

User Demo-graphics

Usage analysis

In-game metrics

VOLUME

VELOCITY

VARIETY

COLLECT │ STORE │ ANALYZE │ SHARE

COLLECT │ STORE │ ANALYZE │ SHARE

AWS

IMPORT / EXPORT

AWS

Direct Connect

COLLECT │ STORE │ ANALYZE │ SHARE

AMAZON S3

2013 AWS Worldwide Public Sector Summit

Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012 Q2 2013

2 Trillion

1.1 M peak transactions per second

Objects in S3

AMAZON

DYNAMODB

AMAZON

REDSHIFT

AMAZON RDS

HBase on

AMAZON EMR

COLLECT │ STORE │ ANALYZE │ SHARE

AMAZON EC2

2013 AWS Worldwide Public Sector Summit

1

2

4

8

16

32

64

128

256

1 2 4 8 16 32 64 128

Mem

ory

(GB)

EC2 Compute Units

Instance Types

Standard 2nd Gen Standard Micro High-Memory High-CPU Cluster Compute Cluster GPU High I/O High-Storage Cluster High-Mem

hi1.4xlarge 60.5 GB of memory 35 EC2 Compute Units 2x1024 GB SSD instance storage 64-bit platform

cc1.4xlarge 23 GB of memory 33.5 EC2 Compute Units 1690 GB of instance storage 64-bit platform

c1.xlarge 7 GB of memory 20 EC2 Compute Units 1690 GB of instance storage 64-bit platform

m1.small 1.7 GB memory 1 EC2 Compute Unit 160 GB instance storage 32-bit or 64-bit

m1.medium 3.75 GB memory 2 EC2 Compute Unit 410 GB instance storage 32-bit or 64-bit platform

m1.large EBS Optimizable 7.5 GB memory 4 EC2 Compute Units 850 GB instance storage 64-bit platform

m1.xlarge EBS Optimizable 15 GB memory 8 EC2 Compute Units 1,690 GB instance storage 64-bit platform

m2.xlarge 17.1 GB of memory 6.5 EC2 Compute Units 420 GB of instance storage 64-bit platform

m2.2xlarge 34.2 GB of memory 13 EC2 Compute Units 850 GB of instance storage 64-bit platform

m2.4xlarge EBS Optimizable 68.4 GB of memory 26 EC2 Compute Units 1690 GB of instance storage 64-bit platform

t1.micro 613 MB memory Up to 2 EC2 Compute Units EBS storage only 32-bit or 64-bit platform

c1.medium 1.7 GB of memory 5 EC2 Compute Units 350 GB of instance storage 32-bit or 64-bit platform

cg1.4xlarge 22 GB of memory 33.5 EC2 Compute Units 2 x NVIDIA Tesla “Fermi”  M2050  GPUs 1690 GB of instance storage 64-bit platform

cc2.8xlarge 60.5 GB of memory 88 EC2 Compute Units 3370 GB of instance storage 64-bit platform m3.xlarge

15 GB of memory 13 EC2 Compute Units

m3.2xlarge EBS Optimizable 30 GB of memory 26 EC2 Compute Units

hs1.8xlarge 117 GB of memory 35 EC2 Compute Units 24x2 TB instance storage 64-bit platform

cr1.8xlarge 244 GB of memory 88 EC2 Compute Units 2x120 GB SSD instance storage 64-bit platform

GPU GRAPHICS PROCESSING UNIT

2013 AWS Worldwide Public Sector Summit

CLUSTER GPU

QUADRUPLE EXTRA LARGE

Intel Xeon X5570, quad-core

Nehalem architecture

NVIDIA Tesla Fermi

M2050 GPUs

22 GB of memory – 1.7 TB of storage

2x

2x

$0.35 / hour (Amazon EC2 Spot)

PARALLELIZATION

ON A SINGLE INSTANCE

COST: 4h x $2.1 = $8.4

RENDERING TIME: 4h

ON MULTIPLE INSTANCES

COST: 2 x 2h x $2.1 = $8.4

RENDERING TIME:

2013 AWS Worldwide Public Sector Summit

What are Spot Instances?

Availability Zone

Region

Availability Zone

Unused

Unused

Unused

Unused

Unused

Unused

Sold at 50% Discount!

Sold at 56% Discount!

Sold at 66% Discount!

Sold at 59% Discount!

Sold at 54% Discount!

Sold at 63% Discount!

ON MULTIPLE SPOT INSTANCES

COST: 4 x 1h x $0.35 = $1.4

RENDERING TIME:

2013 AWS Worldwide Public Sector Summit

"Hadoop is a reliable storage and data analysis system"

HDFS MapReduce

Deploying a Hadoop cluster is hard

AMAZON EMR HADOOP + AWS

2013 AWS Worldwide Public Sector Summit

2013 AWS Worldwide Public Sector Summit

0

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

5/22/2010

7/10/2010

8/28/2010

10/16/2010

12/4/2010

1/22/2011

3/12/2011

4/30/2011

6/18/2011

8/6/2011

9/24/2011

11/12/2011

12/31/2011

2/18/2012

4/7/2012

5/26/2012

7/14/2012

9/1/2012

10/20/2012

12/08/2012

1/26/2013

3/16/2013

Amazon Elastic MapReduce: Clusters launched by customers

Amazon EMR: 5.5M clusters launched by customers since May 2010

Massive Scale

2013 AWS Worldwide Public Sector Summit

2013 AWS Worldwide Public Sector Summit

USE THE RIGHT TOOL FOR THE RIGHT JOB

RDBMS (Amazon RDS)

Affordable Storage/Compute

Structured or Not (Agility)

Resilient Auto Scalability

Interactive Reporting (<1sec)

Multistep Transactions

Lots of Updates/Deletes

Hadoop (Amazon EMR)

2013 AWS Worldwide Public Sector Summit

Expand to

25 instances

Data Warehouse

(Steady State)

Data Warehouse

(Batch Processing)

Shrink to

9 instances

Data Warehouse

(Steady State)

COLLECT │ STORE │ ANALYZE │ SHARE

PUBLIC

DATA SETS

http://aws.amazon.com/publicdatasets

COLLECT │ STORE │ ANALYZE │ SHARE

INNOVATE

« Want to increase innovation?

Lower the cost of failure »

Joi Ito

AWS LOWERS

THE COST OF INNOVATION Testing a new idea is cheap

Georgetown University Next-generation sequencing and whole genomics

analysis to identify causation for premature birth

Solution Overview

Alignment, mapping, variant-calling

Downstream variant analytic pipelines

Hosted data portal including MongoDB

Genomic data storage (raw and processed)

Accessing 1,000 genomes public data set

SEC MIDAS & Tradeworx Real-time analysis of 20 billion messages/day

Reconstruct any market, any day in history

Solution Overview

Data Servers

Analytic Servers

Market reconstruction processing

Store historical stock ‘tick’ information

2013 AWS Worldwide Public Sector Summit

The Results

“For the growing team of quant types now employed at the SEC, MIDAS is

becoming the world’s greatest data sandbox. And the staff is planning to use

it to make the SEC a leader in its use of market data”

Elisse B. Walter, Chairman of the SEC

"This basically propels the SEC from zero to 60 in one fell swoop, going

from being way behind even the most basic market participant to being on par if

not ahead of the vast majority of market participants, in terms of their system and

analytical capabilities’’

Gregg E. Berman, Associate Director of the Office of Analytics and

Research

Thank You

top related