big data analysis: powered by the cloud

61
Cloud

Upload: amazon-web-services-korea

Post on 20-Dec-2014

410 views

Category:

Technology


4 download

DESCRIPTION

Opening Keynote at ZDNet Advanced Computing Conference by Abhishek Sinha (Business Development Manager APAC)

TRANSCRIPT

Page 1: Big Data Analysis: Powered by the Cloud

Cloud

Page 2: Big Data Analysis: Powered by the Cloud

What is big data

Data analysis Pipeline

How customers are using the pipeline

Page 3: Big Data Analysis: Powered by the Cloud
Page 4: Big Data Analysis: Powered by the Cloud

When your data sets become so large that you have to start

innovating how to collect, store, organize, analyze and share it

Page 5: Big Data Analysis: Powered by the Cloud

What does big data look like ?

Page 6: Big Data Analysis: Powered by the Cloud

Volume Velocity Variety

3Vs

Page 7: Big Data Analysis: Powered by the Cloud

Where is this data coming from ?

Page 8: Big Data Analysis: Powered by the Cloud

Human generated

Machine generated

Tweet

Surf the internet

Buy and sell products

Upload images and videos

Play games

Check in at restaurants

Search for cafes

Find deals

Watch content online

Look for directions

Use social media

Page 9: Big Data Analysis: Powered by the Cloud

Human generated

Machine generated

Networks and security devices

Mobile phones

Cell phone towers

Smart grids

Smart meters

Telematics from cars

Sensors on machines

Videos from traffic and security cameras

Page 10: Big Data Analysis: Powered by the Cloud

What is it used for ?

Page 11: Big Data Analysis: Powered by the Cloud

Data for competitive advantage

Page 12: Big Data Analysis: Powered by the Cloud

Data for competitive advantage

Customer Segmentation

Financial modeling,

System analysis,

Line-of-sight,

Replacing Human decisions

Business intelligence..

Page 13: Big Data Analysis: Powered by the Cloud

Data for competitive advantage

Customer Segmentation

Financial modeling,

System analysis,

Line-of-sight,

Replacing Human decisions

Business intelligence..

Innovating new business and revenue models

Page 14: Big Data Analysis: Powered by the Cloud

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

Page 15: Big Data Analysis: Powered by the Cloud

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

lower cost,

increased

throughput

Page 16: Big Data Analysis: Powered by the Cloud

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

lower cost,

increased

throughput

constraint

Page 17: Big Data Analysis: Powered by the Cloud

Very high barrier to

turning data into

information…

Page 18: Big Data Analysis: Powered by the Cloud

Very high barrier to

turning data into

information.

Infrastructure capacity

Technical Skills

Questions to ask

Cheap experimentation

Page 19: Big Data Analysis: Powered by the Cloud

Amazon Web Services Cloud

Page 20: Big Data Analysis: Powered by the Cloud

Elastic and highly scalable

No upfront capital expense

Only pay for what you use

+

+

Available on-demand

+

= Remove

constraints

Page 21: Big Data Analysis: Powered by the Cloud

Remove constraints = More experimentation

More experimentation = More innovation

More Innovation = Competitive edge

Page 22: Big Data Analysis: Powered by the Cloud

Amazon Web Services

Removes constraints

Focus on your data

Leave undifferentiated heavy lifting to us

Page 23: Big Data Analysis: Powered by the Cloud

HOW

Page 24: Big Data Analysis: Powered by the Cloud

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

Page 25: Big Data Analysis: Powered by the Cloud

25

Page 26: Big Data Analysis: Powered by the Cloud

AWS

Import/Export

Corporate

data center

Amazon

Elastic

MapReduce Amazon

Simple

Storage

Service (S3)

BI Users

Clickstream data

from 500+

websites and VoD

platform

Page 27: Big Data Analysis: Powered by the Cloud

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

Page 28: Big Data Analysis: Powered by the Cloud

More than 25 Million Streaming Members

50 Billion Events Per Day

30 Million plays every day

2 billion hours of video in 3

months

4 million ratings per day

3 million searches

Device location , time ,

day, week etc.

Social data

Page 29: Big Data Analysis: Powered by the Cloud

10 TB of streaming data per day

Page 30: Big Data Analysis: Powered by the Cloud

What is S3?

Highly scalable data storage

Access via APIs

Fast

(850K requests

per sec)

Highly available & durable

(99.999999999% Durability

Economical

($0.095 per GB)*

Web store

Page 31: Big Data Analysis: Powered by the Cloud

Data consumed in multiple ways

S3

EMR

Prod Cluster (EMR)

Recommen

dation

Engine

Ad-hoc

Analysis

Personalization

Page 32: Big Data Analysis: Powered by the Cloud
Page 33: Big Data Analysis: Powered by the Cloud
Page 34: Big Data Analysis: Powered by the Cloud
Page 35: Big Data Analysis: Powered by the Cloud
Page 36: Big Data Analysis: Powered by the Cloud
Page 37: Big Data Analysis: Powered by the Cloud

Velocity of data

Amazon Dynamodb

Page 38: Big Data Analysis: Powered by the Cloud

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

Page 39: Big Data Analysis: Powered by the Cloud

“Who buys video games?”

Page 40: Big Data Analysis: Powered by the Cloud

3.5 billion records

13 TB of click stream logs

71 million unique cookies

Per day:

Page 41: Big Data Analysis: Powered by the Cloud
Page 42: Big Data Analysis: Powered by the Cloud

500% return on ad spend

17,000% reduction in

procurement time

Results:

Page 43: Big Data Analysis: Powered by the Cloud

“Who is using our

service?”

Page 44: Big Data Analysis: Powered by the Cloud

Identified early mobile usage

Invested heavily in mobile

development

Finding signal in the noise of logs

Page 45: Big Data Analysis: Powered by the Cloud

9,432,061 unique mobile devices

used the Yelp mobile app.

4 million+ calls. 5 million+ directions.

In January 2013

Page 46: Big Data Analysis: Powered by the Cloud
Page 47: Big Data Analysis: Powered by the Cloud
Page 48: Big Data Analysis: Powered by the Cloud
Page 49: Big Data Analysis: Powered by the Cloud

What is EMR?

Map-Reduce engine Integrated with tools

Hadoop-as-a-service

Massively parallel

Cost effective AWS wrapper

Integrated to AWS services

Page 50: Big Data Analysis: Powered by the Cloud

+

Source: http://nerds.airbnb.com/redshift-performance-cost

Table Size Query type Hive Redshift

3 billion

rows

Simple range

query

1680

seconds (28

min)

360 seconds

(6 min)

1 million

rows

2 complex

joins

182 seconds 8 seconds

$13.60/hour on Redshift versus $57/hour on

HIVE

Page 51: Big Data Analysis: Powered by the Cloud

Every day is crucial and costly

Page 52: Big Data Analysis: Powered by the Cloud

Challenge: To run a virtual screen with a higher

accuracy algorithm & 21 million compounds

Page 53: Big Data Analysis: Powered by the Cloud
Page 54: Big Data Analysis: Powered by the Cloud

Metric Count

Compute Hours of

Work

109,927 hours

Compute Days of

Work

4,580 days

Compute Years of

Work

12.55 years

Ligand Count ~21 million ligands

Using Cycle Computing and Amazon

Web Services

Page 55: Big Data Analysis: Powered by the Cloud

3 Hours for $4828.85/hr

Page 56: Big Data Analysis: Powered by the Cloud

Instead of $20+

Million in

Infrastructure

Page 57: Big Data Analysis: Powered by the Cloud

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

Page 58: Big Data Analysis: Powered by the Cloud

Open web index.

3.4 billion records.

Available to all.

1000 Genomes

project

Page 59: Big Data Analysis: Powered by the Cloud
Page 60: Big Data Analysis: Powered by the Cloud

Generation

Collect

Store

Collaboration & sharing

Analysis and Computation

Page 61: Big Data Analysis: Powered by the Cloud

Thank you! aws.amazon.com/big-data

[email protected]

May 21st, COEX Auditorium, Seoul

One day Free training

Walk through of services

http://aws.amazon.com/apac/awsday/seoul/