structured, unstructured and streaming big data on the aws

36
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Markku Lepistö Principal Technology Evangelist, APAC Structured, Unstructured and Streaming Big Data on Amazon Web Services

Upload: amazon-web-services

Post on 15-Apr-2017

2.029 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Structured, Unstructured and Streaming Big Data on the AWS

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Markku Lepistö

Principal Technology Evangelist, APAC

Structured, Unstructured and Streaming Big Data

on Amazon Web Services

Page 2: Structured, Unstructured and Streaming Big Data on the AWS

Agenda

1:00pm - 2:00pm Registration – Lunch & Meet AWS SAs 2:00pm - 2:20pm Welcome & Introduction 2:20pm - 3:40pm Structured, unstructured and streaming Big Data on the AWS Platform 3:40pm - 4:00pm Break 4:00pm - 5:15pm Building an Amazon RedShift Data warehouse 5:15pm - 5:30pm Q&A 5.30pm Close

Page 3: Structured, Unstructured and Streaming Big Data on the AWS
Page 4: Structured, Unstructured and Streaming Big Data on the AWS

Big Data End to End Framework

Page 5: Structured, Unstructured and Streaming Big Data on the AWS

Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Apache Storm

PIG

Amazon Machine Learning

Amazon EMR

Amazon Glacier

Amazon DynamoDB

Page 6: Structured, Unstructured and Streaming Big Data on the AWS

”I  got  kicked  out  of  the  bookshop  last  week,    because  I  moved  all  of  the  Big  Data  books    

into  the  Religion  sec<on”                                                                                                                      

Page 7: Structured, Unstructured and Streaming Big Data on the AWS

Ingest Store Process Analyse Data Answers

Simplify Big Data Processing

Page 8: Structured, Unstructured and Streaming Big Data on the AWS

Databases

Database Flat Files Database

Data

File Data

IoT Device

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Database

INGEST STORE

Page 9: Structured, Unstructured and Streaming Big Data on the AWS

Databases

Database Flat Files Database

Data

File Data

IoT Device

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Database

INGEST

Amazon Redshift

Amazon RDS

STORE

Page 10: Structured, Unstructured and Streaming Big Data on the AWS

Data Tier

Search Cache Object Store

RDBMS NoSQL Data Warehouse

logging analy)cs

webscale transac)ons

rich  search hot  reads complex  queries and  transac)ons

Data Tier

Amazon DynamoDB

Amazon RDS

Amazon ElastiCache

Amazon S3

Amazon Redshift

Amazon CloudSearch

Traditional Relational Database

Page 11: Structured, Unstructured and Streaming Big Data on the AWS

Amazon

Redshift Amazon

RDS

Scaling Vertical Horizontal

Storage Row Column

Workload Transactional Analytical

Architecture SMP MPP

Type SQL Relational SQL Relational

Page 12: Structured, Unstructured and Streaming Big Data on the AWS

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Storage

INGEST

Amazon Redshift

Amazon RDS

Application

Amazon S3

STORE

Page 13: Structured, Unstructured and Streaming Big Data on the AWS

Impala PIG

Amazon EMR

Page 14: Structured, Unstructured and Streaming Big Data on the AWS

Amazon S3

Amazon Redshift

Amazon EMR

Glacier

Amazon

DynamoDB

Amazon Machine Learning

Applications

Page 15: Structured, Unstructured and Streaming Big Data on the AWS

Amazon

Redshift

Scaling Add nodes Automatic

Speed Fastest Fast

Cost Higher Lower

Durability Configurable Built-in

Amazon S3

Page 16: Structured, Unstructured and Streaming Big Data on the AWS

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Stream Processor

INGEST

Amazon Redshift

Amazon RDS

Amazon S3

Amazon Kinesis

STORE

Page 17: Structured, Unstructured and Streaming Big Data on the AWS

Why Stream Storage?

Sensors Amazon Kinesis

Apache Kafka

Page 18: Structured, Unstructured and Streaming Big Data on the AWS

Availability Zone

Availability Zone

Availability Zone

 Data  Sources  

 Data  Sources  

Data  Sources  

 Data  Sources  

 Data  Sources  

Logging

Metrics

Analysis

Processing  

S3

DynamoDB

Redshift

Lambda Amazon Kinesis

Stream

Page 19: Structured, Unstructured and Streaming Big Data on the AWS

Amazon

Redshift

Ordering Yes Yes

Persistence 24 Hours Configurable

Size 50 KB Configurable

Scaling High High

Latency Low Low

Managed Yes No

Amazon Kinesis

Page 20: Structured, Unstructured and Streaming Big Data on the AWS

”The  world  of  gaming  never  sleeps.    We  owe  every  player  a  great  experience,    and  AWS  is  our  main  tool  to  make  that  happen.”                                                                                                              -­‐    Sami  Yliharju,  Services  Lead    

Page 21: Structured, Unstructured and Streaming Big Data on the AWS

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Amazon EMR

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

Hadoop

Page 22: Structured, Unstructured and Streaming Big Data on the AWS

Amazon

Redshift

Scaling 2 PB+ Nodes

Storage Native HDFS/S3

BI Tools High Medium

Durability High High

Latency Low Low

Managed Fully Semi (EMR)

Amazon

Redshift

Nodes

HDFS

Medium

High

Low

Semi (EMR)

Amazon Redshift Impala

Page 23: Structured, Unstructured and Streaming Big Data on the AWS

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

PIG

Stre

amin

g

Amazon EMR

Hadoop

Page 24: Structured, Unstructured and Streaming Big Data on the AWS

PIG

SQL on Hadoop

Eats anything

New Processing Engine

Page 25: Structured, Unstructured and Streaming Big Data on the AWS

Amplab Big Data Benchmark https://amplab.cs.berkeley.edu/benchmark/

Page 26: Structured, Unstructured and Streaming Big Data on the AWS

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

PIG

Amazon EMR

Hadoop

AWS Lambda

Page 27: Structured, Unstructured and Streaming Big Data on the AWS

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

PIG

ANALYSE

Amazon Machine Learning

L

Amazon EMR

Hadoop

AWS Lambda

Page 28: Structured, Unstructured and Streaming Big Data on the AWS

Use Cases

Page 29: Structured, Unstructured and Streaming Big Data on the AWS

FOMO                                                                                                                      

Page 30: Structured, Unstructured and Streaming Big Data on the AWS

Amazon EMR

Hadoop

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Flat Files Database

Data

Event Data

Streaming Data

Databases Amazon Redshift

Amazon Redshift

Database Data

SQL Analytics

Page 31: Structured, Unstructured and Streaming Big Data on the AWS

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Am

azon

Ela

stic

Map

Red

uce

Flat Files Database

Data

Event Data

Streaming Data

Clickstream Analysis - Batch

Am

azon

Ela

stic

Map

Red

uce

Event Data

Amazon EMR

Hadoop

Page 32: Structured, Unstructured and Streaming Big Data on the AWS

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Am

azon

Ela

stic

Map

Red

uce

Flat Files Database

Data

Event Data

Streaming Data

Clickstream Analysis – Near Real Time

Event Producer

Amazon Kinesis

Amazon S3

Amazon Redshift

Kinesis Consumers Streaming

Data

Page 33: Structured, Unstructured and Streaming Big Data on the AWS

Demo

Realtime Twitter analytics using AWS Kinesis, Lambda and Open Source Software

Page 34: Structured, Unstructured and Streaming Big Data on the AWS

vs

Page 35: Structured, Unstructured and Streaming Big Data on the AWS

Amazon Kinesis

Twitter Stream AWS Lambda

Demo: Live Twitter Feed Analysis

* https://blog.twitter.com/2013/new-tweets-per-second-record-and-how

Twitter - On a typical day: More than 500 million Tweets sent* •  Average 5,700 TPS

Page 36: Structured, Unstructured and Streaming Big Data on the AWS

Thank You!