aws big data analytics ip expo 2013

44
Big Data Analytics David de Santiago Business Development Manager, Analytics EMEA

Upload: amazon-web-services

Post on 15-Jan-2015

286 views

Category:

Technology


0 download

DESCRIPTION

Many companies recognize the use of data analytics as an opportunity to better understand their customers and gain a lead on their competition. The ability to get better insight from vast amounts of unstructured data, coming from a multitude of sources, can give businesses the advantage in an industry where even the smallest improvement can mean a big difference. Amazon Web Services offers a range of big data, analytics and storage solutions that are used by companies such as NASDAQ, Bankinter and S&P Capital to deliver a highly secure and agile platform. Join this session and learn how it allows customers to start on a small scale but grow as their business requires, giving them the agility they need to deliver cutting edge solutions to their customers without any upfront CAPEX investment.

TRANSCRIPT

Page 1: AWS Big Data Analytics IP Expo 2013

Big Data Analytics

David de Santiago

Business Development Manager, Analytics EMEA

Page 2: AWS Big Data Analytics IP Expo 2013

1. Introducing Big Data

2. From data to actionable information

3. Analytics and Cloud Computing

Overview

Page 3: AWS Big Data Analytics IP Expo 2013

Introducing Big Data

1

Page 4: AWS Big Data Analytics IP Expo 2013

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 5: AWS Big Data Analytics IP Expo 2013

The cost of data generation

is falling

Page 6: AWS Big Data Analytics IP Expo 2013

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Lower cost,

higher throughput

Page 7: AWS Big Data Analytics IP Expo 2013

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Lower cost,

higher throughput

Highly

constrained

Page 8: AWS Big Data Analytics IP Expo 2013

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure

Through 2011

IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Generated data

Available for analysis

Data volume

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011

IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Page 9: AWS Big Data Analytics IP Expo 2013

Elastic and highly scalable

No upfront capital expense

Only pay for what you use +

+

Available on-demand

+

= Remove

constraints

Page 10: AWS Big Data Analytics IP Expo 2013

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Lower cost,

higher throughput

Highly

constrained

Page 11: AWS Big Data Analytics IP Expo 2013

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Accelerated

Page 12: AWS Big Data Analytics IP Expo 2013

Technologies and techniques for

working productively with data,

at any scale.

Big Data

Page 13: AWS Big Data Analytics IP Expo 2013

From data to

actionable information

2

Page 14: AWS Big Data Analytics IP Expo 2013
Page 15: AWS Big Data Analytics IP Expo 2013

3.5 billion records

13 TB of click stream logs

71 million unique cookies

Per day:

Page 16: AWS Big Data Analytics IP Expo 2013

User bought

recently a home

theatre system

And is now

looking at sport

games

Targeted Ad

Page 17: AWS Big Data Analytics IP Expo 2013

500% return on ad spend

17,000% reduction in procurement time

Results:

“We couldn’t have done it”

Page 18: AWS Big Data Analytics IP Expo 2013
Page 19: AWS Big Data Analytics IP Expo 2013

Identified early mobile usage

Invested heavily in mobile development

Finding signal in the noise of logs

Page 20: AWS Big Data Analytics IP Expo 2013

9,432,061 unique mobile devices

used the Yelp mobile app.

Other Features powered by EMR: People Who Viewed this Also Viewed

Review highlights

Auto complete as you type on search

Search spelling suggestions

Top searches

Ads

In January 2013

Page 21: AWS Big Data Analytics IP Expo 2013

Open web index.

3.4 billion records.

Available to all.

Page 22: AWS Big Data Analytics IP Expo 2013

You Are What You Tweet: Analyzing Twitter for Public Health. M. J. Paul and M. Dredze, 2011

Tweeting about Flu

Page 23: AWS Big Data Analytics IP Expo 2013

Full parse for impact of

social networks

300 lines of Ruby code.

14 hours.

$100.

Page 24: AWS Big Data Analytics IP Expo 2013

Analytics and

Cloud Computing

3

Page 25: AWS Big Data Analytics IP Expo 2013

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 26: AWS Big Data Analytics IP Expo 2013

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

S3, Glacier,

Storage Gateway,

DynamoDB,

Redshift, RDS,

HBase

Page 27: AWS Big Data Analytics IP Expo 2013

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

EC2 &

Elastic MapReduce

Page 28: AWS Big Data Analytics IP Expo 2013

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

EC2 & S3,

CloudFormation,

Elastic MapReduce,

RDS, DynamoDB, Redshift

Page 29: AWS Big Data Analytics IP Expo 2013

Amazon Redshift

Fully Managed Data Warehouse

Scales to 1.6PB

Faster, Simpler, Cheaper

Page 30: AWS Big Data Analytics IP Expo 2013

Amazon Redshift

Effective

Hourly Price

Per TB

Effective

Annual Price

per TB

On-Demand $ 0.425 $ 3,723

1 Year Reservation $ 0.250 $ 2,190

3 Year Reservation $ 0.114 $ 999

Page 31: AWS Big Data Analytics IP Expo 2013

“Two months to migrate to Amazon Redshift.”

Greg Johnson, Head of Analytics, Nokia

“TOWARDS THE END OF LAST YEAR OUR DATA

VOLUMES LITERALLY

BROKE THE EXISTING

DATABASE. WE WERE NO

LONG ABLE TO SCALE THE

DATABASE OR DO ANYTHING

USEFUL; LIKE RUNNING

QUERIES”

Page 32: AWS Big Data Analytics IP Expo 2013

Elastic Map Reduce: How does it work?

EMR

EMR Cluster S3

1. Put the data into S3 (or HDFS)

3. Get the results

2. Launch your cluster. Choose: • Hadoop distribution • How many nodes • Node type (hi-CPU,

hi-memory, etc.) • Hadoop apps (Hive,

Pig, HBase)

Page 33: AWS Big Data Analytics IP Expo 2013

EMR

EMR Cluster

Elastic Map Reduce: How does it work?

S3

You can easily resize the cluster

Page 34: AWS Big Data Analytics IP Expo 2013

EMR

EMR Cluster

Elastic Map Reduce: How does it work?

S3

Use Spot nodes to save time

and money

Page 35: AWS Big Data Analytics IP Expo 2013

EMR

EMR Cluster

Elastic Map Reduce: How does it work?

S3

Launch parallel clusters against the same data source (tune for the

workload)

Page 36: AWS Big Data Analytics IP Expo 2013

Elastic Map Reduce: How does it work?

EMR Cluster S3

When the work is complete, you can terminate the cluster

(and stop paying)

Page 37: AWS Big Data Analytics IP Expo 2013

Thousands of Customers, 5+ Million Clusters

Page 38: AWS Big Data Analytics IP Expo 2013

Give it a try.

Cost to run a 100-node EMR cluster:

£4.90 / hour

Page 39: AWS Big Data Analytics IP Expo 2013

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

EC2 & S3,

CloudFormation,

Elastic MapReduce,

RDS, DynamoDB, Redshift

EC2 &

Elastic MapReduce

S3, Glacier,

Storage Gateway,

DynamoDB,

Redshift, RDS,

HBase AWS Data Pipeline

Page 40: AWS Big Data Analytics IP Expo 2013

AWS Data Pipeline

Data-intensive orchestration and automation

Reliable and scheduled

Easy to use, drag and drop

Execution and retry logic

Map data dependencies

Create and manage temporary compute

resources

Page 41: AWS Big Data Analytics IP Expo 2013

Anatomy of a pipeline

Page 42: AWS Big Data Analytics IP Expo 2013

Arbitrarily complex pipelines

Page 43: AWS Big Data Analytics IP Expo 2013

Thanks. [email protected]

To Learn More:

aws.amazon.com/elasticmapreduce

aws.amazon.com/datapipeline

aws.amazon.com/big-data

aws.amazon.com/redshift

aws.amazon.com/rds

Page 44: AWS Big Data Analytics IP Expo 2013

Thank you!