big data analytics on aws

38
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dickson Yue, Solutions Architect 17 June 2016 Big Data Analytics on AWS Digital Innovation & e-Commerce Track

Upload: amazon-web-services

Post on 22-Jan-2018

768 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Big Data Analytics on AWS

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Dickson Yue, Solutions Architect

17 June 2016

Big Data Analytics on AWS Digital Innovation & e-Commerce Track

Page 2: Big Data Analytics on AWS
Page 3: Big Data Analytics on AWS

How to get started?

Page 4: Big Data Analytics on AWS

Data Answers

START HERE WITH A BUSINESS CASE

Revenue Lift

Market acquisition

Product recommendation

Improve user experience

Operation intelligence

Page 5: Big Data Analytics on AWS

Data Answers

Time to Answer (Latency) Throughput

Cost

Ingest/ Collect

Consume/ visualize Store Process/

analyze

1 4 0 9

5

Page 6: Big Data Analytics on AWS

Data Answers Ingest/ Collect

Consume/ visualize Store Process/

analyze

1 4 0 9

5

Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon RDS

Amazon EMR

Amazon Redshift

Amazon Machine Learning

Storage Processing Visualize

ElasticSearch service

QuickSight

ElastiCache

Page 7: Big Data Analytics on AWS

Tracking Clickstream, user retention

Page 8: Big Data Analytics on AWS

Answer •  User retention •  High spending customer

navigation pattern •  Product recommendation •  User journey in the shop •  UX improvement •  What deal/ad to try

next

Use case

Data source •  Page •  Click event •  Web log •  Thing event

Page 9: Big Data Analytics on AWS

JavaScript (Snowplow)

AWS SDK

logstach

Fluentd

Ingest Store

@ 30km/s a.k.a 300 rps

HTTP Post

Amazon S3

Storage

Page 10: Big Data Analytics on AWS

@ 100km/s Ingest Store

JavaScript (Snowplow)

AWS SDK

LOG4J

Flume

Fluentd

HTTP Post

Amazon Kinesis

Firehose

API Server Streaming Buffer

24hrs-7days

Web Servers

Amazon S3

Storage Data lake

Page 11: Big Data Analytics on AWS

@ 100km/s Ingest Store

JavaScript (Snowplow)

AWS SDK

LOG4J

Flume

Fluentd

HTTP Post

Amazon S3

Amazon Kinesis

Firehose

API Gateway

API Server Streaming Buffer

24hrs-7days

Storage Data lake

Page 12: Big Data Analytics on AWS

Amazon S3

Storage Data lake

Store Process/Analyze

EMR

Redshift

Redshift EMR ETL

Visualize

JDBC ODBC

JDBC ODBC

QuickSight

Page 13: Big Data Analytics on AWS

Amazon S3

Store Process

EMR

Visualize

JDBC ODBC

Redshift Basket

CRM ERP DBs

Log file

QuickSight

Page 14: Big Data Analytics on AWS

Day-14 retention over time

User retention and growth

N-day retention

Page 15: Big Data Analytics on AWS

Social listening Social CRM, Chatbot

Page 16: Big Data Analytics on AWS

Answer Campaign performance Customer service automation Building Chatbot

Use case

Data Brand page activity Post #hashtag User profile

Page 17: Big Data Analytics on AWS

Logstash

AWS SDK

Ingest Store

Bot AWS SDK

App

Crawlers AWS SDK

Amazon Kinesis

Firehose

Store

Amazon S3 Data Lake

ElasticSearch Last 120mins

Analysts

AWS SDK

Page 18: Big Data Analytics on AWS
Page 19: Big Data Analytics on AWS

Why do we need machine learning for this?

The social media stream is high-volume, and most of the messages are not CS-actionable

Page 20: Big Data Analytics on AWS

Logstash

AWS SDK

Ingest Store

Bot AWS SDK

App

Crawlers AWS SDK

Amazon Kinesis

Process

Amazon Lambda

Analyze

AWS SDK

Machine learning

Notification

Action

Support issue

Database

Feature request

Keep training the ML model with new data

Action

Amazon S3

Page 21: Big Data Analytics on AWS

AWS SDK

Ingest Store

Bot AWS SDK

Messenger

Amazon Kinesis

Process

Amazon Lambda

Analysts

Machine learning

Action

Bot

App

Get prediction

Keep training the ML model with new data Amazon S3

Page 22: Big Data Analytics on AWS

OI from Business view with custom source

Page 23: Big Data Analytics on AWS

Refrigerator

POS

Door sensor

Water

Camera

Storefront

Kitchen

Lambda

SQS

AWS IoT

SQSPoller

Http Event Collector

Serverless Architecture

Page 24: Big Data Analytics on AWS
Page 25: Big Data Analytics on AWS
Page 26: Big Data Analytics on AWS

Our Big Data Scale

Total ~25 PB DW on Amazon S3 Read ~10% DW daily Write ~10% of read data daily ~ 550 billion events daily ~ 350 active platform users

Page 27: Big Data Analytics on AWS

predict what you want to watch before you watch it.

Page 28: Big Data Analytics on AWS

Netflix Prize - best collaborative filtering algorithm

Page 29: Big Data Analytics on AWS

Storage Compute Service Tools

Big Data Portal

API Portal

Big Data API

AWS S3

Page 30: Big Data Analytics on AWS

Data Answers Ingest/ Collect

Consume/ visualize Store Process/

analyze

1 4 0 9

5

START WITH A BUSINESS CASE

MATCH AVAILABLE DATA

CHOOSE BEST FIT

Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon RDS

Amazon EMR

Amazon Redshift

Amazon Machine Learning

Storage Processing Visualize

ElasticSearch service

QuickSight

ElastiCache

Page 31: Big Data Analytics on AWS

Source DBs

3rd Party Data

Log Data

Reporting

Analysis

Processing

Data Lake

S3

Source of truth

Page 32: Big Data Analytics on AWS

Remember to complete your evaluations!

Page 33: Big Data Analytics on AWS

Thank you

Page 34: Big Data Analytics on AWS

CRM ERP DBs

Log file

AWStats

days

MB

2002 Big bang

Page 35: Big Data Analytics on AWS

<2005 Hello world

Page/Event tracking

GA

hours

GB

Page 36: Big Data Analytics on AWS

SOLOMO

minutes - hours

TB

<2008 New customer service

New System monitoring New QA

Page 37: Big Data Analytics on AWS

IoT

O2O

seconds – hours PB

2016 Fast and big

data driven marketing

Page 38: Big Data Analytics on AWS

Analytics

ETL

Interactive data exploration

Interactive slice & dice

RT analytics & iterative/ML algo and more ...

Different Big Data Processing Needs