b3 - business intelligence apps on aws

Post on 15-Jan-2015

536 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description hides many technical challenges IT teams struggle with. This session will show how to build business intelligence applications leveraging AWS, from the raw data import, consumption and storage down to the information production. We will also cover best practices for services such as Amazon Redshift or Amazon RDS, and how to use applications such as SAP Hana, Jaspersoft and others.

TRANSCRIPT

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Business Intelligence Applications on AWS Steffen Krause, Amazon Web Services

@sk_bln

Overview

Designing BI & big data solutions in the cloud Not the only way to do it (but one that we have seen)

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Data  App   App  

h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/  

Data  has  gravity  

Compute  Storage   Big  Data  

Data  App   App  

h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/  

latency   Throughput  

…and  iner0a  at  volume…  

Compute  Storage   Big  Data  

Data  

h(p://blog.mccrory.me/2010/12/07/data-­‐gravity-­‐in-­‐the-­‐clouds/  

…easier  to  move  applica0ons  to  the  data  

Compute  Storage   Big  Data  

Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html

S3 as a “single source of truth”

S3

Getting your Data into AWS

Amazon S3

Corporate  Data  Center  

•  Console Upload

•  FTP

•  AWS Import Export

•  S3 API

•  Direct Connect

•  Storage Gateway

•  3rd Party Commercial Apps

•  Tsunami UDP

Write directly to a data source

Your  applica+on   Amazon S3

DynamoDB  

Any  other  data  store  

Amazon S3

Amazon  EC2    

Queue, pre-process and then write

Amazon  Simple  Queue  Service  (SQS)  

Amazon S3

DynamoDB  

Any  other  data  store  

Amazon  SQS  

Amazon S3

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Choose depending upon design

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Hadoop based Analysis

Amazon S3 Amazon EMR

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

EMR is Hadoop in the Cloud

Amazon Elastic MapReduce (EMR)?

EMR  Cluster

S3

Put  the  data  into  S3  

Choose:  Hadoop  distribuGon,  #  of  nodes,  types  of  nodes,  custom  configs,  Hive/Pig/etc.  

Get  the  output  from  S3  

Launch  the  cluster  using  the  EMR  console,  CLI,  SDK,  or  APIs  

You  can  also  store  everything  in  HDFS  

How does EMR work ?

Resize Nodes

EMR Cluster

You  can  easily  add  and  remove  nodes  

1  instance  for  100  hours  =  

100  instances  for  1  hour  

Small  instance  =  $5.50  (including  EMR  –  without:  $4.40)  

1  instance  for  1000  hours  =  

1000  instances  for  1  hour  

Small  instance  =  $55  (including  EMR  –  without:  $44)  

 

When  you  turn  off  your  cloud  resources,  you  actually  stop  paying  for  them  

SQL based processing

Amazon S3 Amazon EMR

Amazon Redshift

Pre-processing framework

Petabyte scale Columnar Data -warehouse

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the AWS cloud

What is Amazon Redshift ?

Easy to provision and scale

No upfront costs, pay as you go

High performance at a low price

Open and flexible with support for popular BI tools

Demo: Amazon Redshift

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Your choice of BI Tools

Amazon S3 Amazon EMR

Amazon Redshift

Pre-processing framework

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Demo Jaspersoft as a BI Frontend

Sharing results and visualizations

Amazon S3 Amazon EMR

Amazon Redshift

Web App Server Visualization tools

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Sharing results and visualizations

Amazon S3 Amazon EMR

Amazon Redshift Business

Intelligence Tools

Business Intelligence Tools

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Geospatial Visualizations

Amazon S3 Amazon EMR

Amazon Redshift Business

Intelligence Tools

Business Intelligence Tools

GIS tools on hadoop

GIS tools

Visualization tools

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Rinse and Repeat

Amazon S3 Amazon EMR

Amazon Redshift

Visualization tools

Business Intelligence Tools

Business Intelligence Tools

GIS tools on hadoop

GIS tools

Amazon data pipeline

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

The complete architecture

Amazon S3 Amazon EMR

Amazon Redshift

Visualization tools

Business Intelligence Tools

Business Intelligence Tools

GIS tools on hadoop

GIS tools

Amazon data pipeline

Amazon  SQS  

DynamoDB  

Any  SQL  or  NoSQL  Store  

Log  Aggrega+on    tools  

Real Time

Amazon Kinesis •  Real-time processing • Massive scale •  Integrated •  Use cases:

•  Real-time log analysis •  Real-time data analytics •  Social media monitoring •  Financial transactions •  Online machine learning

Amazon Kinesis Data Flow Data Sources

App.4    [Machine  Learning]  

AWS  En

dpoint  

App.1    [Aggregate  &  De-­‐Duplicate]  

Data Sources

Data Sources

Data Sources

App.2    [Metric  ExtracGon]  

S3

DynamoDB  

Redshift

App.3  [Sliding  Window  Analysis]  

Data Sources

Availability Zone

Shard 1 Shard 2 Shard N

Availability Zone Availability Zone

Use cases

SkillPages

Customer Use Case

Everyone Needs Skilled People

At Home At Work In Life

Repeatedly

Who they are

What they can do

Your real life connections to them

Examples of what they can do

Data Architecture

Data Analyst

Raw Data

Get Data

Join via Facebook

Add a Skill Page

Invite Friends

Web Servers Amazon S3 User Action Trace Events

EMR Hive Scripts Process Content

•  Process log files with regular expressions to parse out the info we need.

•  Processes cookies into useful searchable data such as Session, UserId, API Security token.

•  Filters surplus info like internal varnish logging.

Amazon S3

Aggregated Data

Raw Events

Internal Web

Excel Tableau

Amazon Redshift

We  found  that  Amazon  Redshi^  offers  the  performance  we  needed  while  freeing  us  from  the  licensing  costs  of  our  previous  soluGon  With  Amazon  Redshi^  and  Tableau,  anyone  in  the  company  can  set  up  any  queries  they  like—from  how  users  are  reacGng  to  a  feature,  to  growth  by  demographic  or  geography,  to  the  impact  sales  efforts  have  had  in  different  areas.  It’s  very  flexible  

Jon  Hoffman,  So<ware  Engineer,  Foursquare  

0

0.2

0.4

0.6

Female Male

Gender

0 20 40 60 80

Age

Foursquare

Gorilla Coffee

Gray's Papaya

Amorino

When do people go to a place?

Stack – analysis and sharing

App

licat

ion

Sta

ck

Scala/Liftweb API Machines WWW Machines Batch Jobs

Scala Application code

Mongo/Postgres/Flat Files Databases Logs

Dat

a S

tack

Amazon S3 Database Dumps Log Files

Hadoop Elastic Map Reduce

Hive/Ruby/Mahout Analytics Dashboard Map Reduce Jobs

mongoexport postgres dump Flume

Everything that was a limited resource

is now a programmable resource

•  Hadoop Technology and Use Cases: http://www.powerof60.com/

•  http://aws.amazon.com/de •  Start with the Free Tier:

http://aws.amazon.com/de/free/ •  25 US$ credits for new German customers:

http://aws.amazon.com/de/campaigns/account/ •  Twitter: @AWS_Aktuell •  Facebook:

http://www.facebook.com/awsaktuell •  Webinars: http://aws.amazon.com/de/about-aws/events/

Resources

top related