aws webcast - tableau big data solution showcase

57
From weeks to hours: how Tableau and AWS changed big data analytics AWS Big Data Solution Showcase The recording of this webinar is available here: https://connect.awswebcasts.com/p8hwp1gyvtd

Upload: amazon-web-services

Post on 18-Dec-2014

557 views

Category:

Technology


3 download

DESCRIPTION

2 years ago if someone had claimed they could stand up a petabyte scale data warehouse in under an hour and then have a non-technical business user querying it live 30 minutes later without knowing any SQL or coding language, they would have been laughed out of the room. These days, that’s called taking advantage of disruptive technology. Amazon Web Services and Tableau Software have shifted the entire paradigm by which organizations not only store and access their data, but ultimately how they innovate with it. The fast, scalable, and inexpensive services that AWS provides for housing data combined with Tableau’s unbelievably flexible and user friendly visual analytic solution means that within hours an organization can securely put the power of their massive data assets into the hands of their domain experts without expensive overhead or lengthy ramp-up time. Attend this webinar to learn how Amazon Web Services and Tableau Software are leveraged together everyday to: • Empower visual ad-hoc data discovery against big data • Revolutionize corporate reporting and dashboards • Promote data driven decision making at every level The presentation will include: • A live demonstration of AWS and Tableau working together • A real customer case study focused on fraud detection and online video metrics • Live Q&A and an opportunity to trial both solutions

TRANSCRIPT

Page 1: AWS Webcast - Tableau Big Data Solution Showcase

From weeks to hours: how Tableau and AWS changed big data analytics

AWS Big Data Solution Showcase

The recording of this webinar is available here:

https://connect.awswebcasts.com/p8hwp1gyvtd

Page 2: AWS Webcast - Tableau Big Data Solution Showcase

Introductions

• Paul Lilford

– Channel Director, Technology Partners, Tableau

• Dustin Smith

– Product Marketing Manager, Tableau

• Rahul Bhartia

– Ecosystem Solution Architect, AWS

Page 3: AWS Webcast - Tableau Big Data Solution Showcase

Agenda

Everything you need to be up and running AWS + Tableau

– AWS big data related services

– Tableau analytics on AWS

– Live demo

– Customer success story: Mixpo

– Q & A

Page 4: AWS Webcast - Tableau Big Data Solution Showcase

Big data & AWS

Technologies and techniques for working productively

with data, at any scale.

Page 5: AWS Webcast - Tableau Big Data Solution Showcase

Big data Cloud computing

Big data and AWS Cloud computing

• Potentially massive datasets

• Iterative, experimental style of

data manipulation and analysis

• Frequently not steady-state

workload; peaks and valleys

• Hard to configure/manage the

Infrastructure

• Massive, virtually unlimited capacity

• Iterative, experimental style of infrastructure deployment/usage

• Elasticity for highly variable workloads

• Managed services for data storage and analysis

Page 6: AWS Webcast - Tableau Big Data Solution Showcase

AWS Data Services

Data

Velocity

Variety

Volume

Structured, Unstructured, Text, Binary

Gigabytes, Terabytes, Petabytes

Millisecond, Second, Minute, Hour, Day

EC2EBS

Instance

RedshiftRDS

Relational

EMR

Hadoop

DynamoDB

NoSQL

Kinesis

Stream

Storage

S3 Glacier

Elasticache

Caching

Data

Pipeline

Orchestrate

Page 7: AWS Webcast - Tableau Big Data Solution Showcase

Store anything

Object storage

Scalable

Designed for 99.999999999% durability

Amazon

S3

Page 8: AWS Webcast - Tableau Big Data Solution Showcase

Real-time processing

High throughput; elastic

Easy to use

EMR, S3, Redshift, DynamoDB Integration

Amazon

Kinesis

Page 9: AWS Webcast - Tableau Big Data Solution Showcase

NoSQL Database

Seamless scalability

Zero admin

Single digit millisecond latency

Amazon

DynamoDB

Page 10: AWS Webcast - Tableau Big Data Solution Showcase

Relational data warehouse

Massively parallel

Petabyte scale

Fully managed

$1,000/TB/Year

Amazon

Redshift

Page 11: AWS Webcast - Tableau Big Data Solution Showcase

Hadoop/HDFS clusters

Hive, Pig, Impala, HBase

Easy to use; fully managed

On-demand and spot pricing

S3, DynamoDB, Redshift and Kinesis

Amazon

Elastic

MapReduce

Page 12: AWS Webcast - Tableau Big Data Solution Showcase

http://aws.amazon.com/marketplace

Big Data Case Studies

Learn from other AWS customers

aws.amazon.com/solutions/case-studies/big-data

Page 13: AWS Webcast - Tableau Big Data Solution Showcase

Tableau & AWS

Page 14: AWS Webcast - Tableau Big Data Solution Showcase

• The Opportunity of the Cloud

Time to Implement

Total Cost of Ownership

Access. Anywhere. Anytime. Any

Device.

Page 15: AWS Webcast - Tableau Big Data Solution Showcase
Page 16: AWS Webcast - Tableau Big Data Solution Showcase
Page 17: AWS Webcast - Tableau Big Data Solution Showcase
Page 18: AWS Webcast - Tableau Big Data Solution Showcase
Page 19: AWS Webcast - Tableau Big Data Solution Showcase
Page 20: AWS Webcast - Tableau Big Data Solution Showcase

Amazon Web Services and

Tableau together make seeing,

exploring, analyzing, and reporting

off of Big Data an achievable

everyday task for the everyday

person.

Page 21: AWS Webcast - Tableau Big Data Solution Showcase

FlexibleTransform all types of data into self-service analytics

Page 22: AWS Webcast - Tableau Big Data Solution Showcase

FlexibleTransform all types of data into self-service analytics

Page 23: AWS Webcast - Tableau Big Data Solution Showcase

FlexibleTransform all types of data into self-service analytics

Page 24: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Page 25: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Page 26: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Page 27: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Page 28: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Page 29: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Page 30: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Page 31: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Page 32: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Page 33: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Tableau Server

Page 34: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Tableau Server

Page 35: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Tableau Server

Page 36: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Tableau Server

Page 37: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Tableau Server

Page 38: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Tableau Server

virtual private cloudvirtual private cloud

Tableau

Server

Page 39: AWS Webcast - Tableau Big Data Solution Showcase

Amazon EMR Amazon RDS Amazon Redshift

Tableau

Desktop

ODBC

Tableau Server

virtual private cloudvirtual private cloud

Tableau

Server

Tableau Online

Page 40: AWS Webcast - Tableau Big Data Solution Showcase

DemoTableau Desktop connected live to Amazon Redshift(Customer Behavior Metrics – credit application web session tracking)

Page 41: AWS Webcast - Tableau Big Data Solution Showcase

Customer Success Story

Page 42: AWS Webcast - Tableau Big Data Solution Showcase

Angie MoeDirector of Analytics, Mixpo

Page 43: AWS Webcast - Tableau Big Data Solution Showcase
Page 44: AWS Webcast - Tableau Big Data Solution Showcase

Replays

# of Clicks

Page Location

Volume

Page 45: AWS Webcast - Tableau Big Data Solution Showcase

Replays

# of Clicks

Page Location

Volume

Page 46: AWS Webcast - Tableau Big Data Solution Showcase

Replays

# of Clicks

Page Location

Volume

Page 47: AWS Webcast - Tableau Big Data Solution Showcase

Mixpo Web Servers

PostgreSQL Database Environment

1 Billion Views Monthly

Page 48: AWS Webcast - Tableau Big Data Solution Showcase

Mixpo Web Servers

PostgreSQL Database Environment

1 Billion Views Monthly

Query Times

Hours/Day

s

Page 49: AWS Webcast - Tableau Big Data Solution Showcase

Mixpo Web Servers

PostgreSQL Database Environment

Amazon Redshift

Query Times

Minutes

after

Page 50: AWS Webcast - Tableau Big Data Solution Showcase

Results of Mixpo’s Redshift + Tableau Implementation

Existing Analytics Faster

Innovation: Fraud Detection

New Methodology

Page 51: AWS Webcast - Tableau Big Data Solution Showcase

Curious?

Page 52: AWS Webcast - Tableau Big Data Solution Showcase

Tableau:

www.tableausoftware.com/products/trial

Redshift:

http://aws.amazon.com/redshift/getting-started/

How to Get Started

Page 53: AWS Webcast - Tableau Big Data Solution Showcase

Tableau + Amazon Redshift Solution Page: http://tabsoft.co/AWSRedshift

Tableau + Amazon Redshift Mixpo Case Study: http://tabsoft.co/AWSRedshiftMixpo

Tableau Getting Started Kit: http://tabsoft.co/20dayquickstart

Tableau + Redshift Test Drive: https://www.slalom.com/aws

Redshift FAQ Document: http://aws.amazon.com/redshift/faqs/

.

Additional Resources

Page 54: AWS Webcast - Tableau Big Data Solution Showcase

Q&A Session TranscriptionQuestion Answer(s) Resource(s)

I've noticed that Tableau Extracts for detailed data takes a longtime to create (even using RDS). Any recommendations on how to reduce how long it takes to create the initial extract

Tableau Data Extracts that take a long time to create can usually be traced back to one of two things: 1. A slow data environment, or 2. "Long Data" - a table that has quite a few columns (100+). If RDS is the data source, then it might be a number of columns issue. You might try excluding any data columns you're not using in your analysis when you take a Tableau Data Extract. While setting up the extract, there is an option to hide unused columns. This effectively doesn't bring them into the Tableau Data Extract.

* http://bensullins.com/leveraging-your-tableau-server-to-create-large-data-extracts/

Can we host tableau server locally within our internal network?

Tableau Server can absolutely be hosted internally within your organization's network and still take full advantage of hosted Amazon Web Service data environments like Redshift, EMR, and RDS. Depening on your organization's use case for sharing interactive analytics, some Tableau customers will deploy one instance of Tableau Server to an internal network for internal reporting and/or staging. They will also choose to host a second instance of Tableau Server in an EC2 instance in order to serve customers or partners with analytic reports and applications without having to open ports in their fire wall.

*http://www.tableausoftware.com/learn/whitepapers/ensuring-high-availability*http://downloads.tableausoftware.com/quickstart/feature-guides/aws.pdf

What challenges do you find for organizations to adopt Tableau, do you run into embedded structures that might be threatened by how it empowers non-technical end users?

Many organizations will begin adopting both Tableau Desktop and Tableau Server from the business side and then after some time IT will become involved to help manage and further support Tableau deployments. Often times the IT group is very excited to help support Tableau adoption once they realize that it has the power to let them focus strategic projects as opposed to needing to support analytic efforts (refreshing locla data sources, reporting queue, etc.). Since Tableau supports a true self-service Business Inteligence model where business users can engage with data directly, this results in IT being able to stay focused on platform health. When Tableau is combined with AWS solutions like Redshift, EMR, and RDS the overhead for IT to manage data environments becomes even less. Hosting Tableau Server in AWS EC2 goes even further to help IT organizations manage the capabilities and costs of their overall platform.

* http://www.tableausoftware.com/drive

Where can I get more information about Tableau Server on VPC

Tableau has a published quickstart guide on hosting Tableau Server in the AWS cloud leveraging a VPC. You can also refer to our walk through guide on our community forum page.

*http://downloads.tableausoftware.com/quickstart/feature-guides/aws.pdf*http://community.tableausoftware.com/thread/135464

Is this HIPAA secure?

Tableau Answer: Tableau is used by many Healthcare organizations in the United States who must meet HIPAA compliance. This is accomplished in several ways - all depending on the unique data environments and requirements of each institution. Please the Tableau Forum thread where this is discussed by several of those healthcare institutions.

AWS Answer: Yes, Redshift is HIPAA complaint...and you can take advantage of feautures like built in encryption to run HIPAA compliant workloads on AWS

*http://community.tableausoftware.com/message/194129

Page 55: AWS Webcast - Tableau Big Data Solution Showcase

Q&A Session Transcription (cont.)Question Answer(s) Resource(s)

Using Tableau 8, it was not possible to mix data from Orace, SQL Server in one analysis. Is this still true?.

Tableau has the ability to take query results from multiple data sources such as Redshift, SQL Server, Oracle, Salesforce, Splunk, Hadoop (to name a few) and actually aggregate them on the fly. We call this process Data Blending and it requres no SQL query writing to accomplish since Tableau can dynamically detect like fields and use those as blending keys. This capability is incredibly powerful especially quickly needing to evaluate the value/veracity of data sources that may want to be added to an Amazon Redshift environment.

* http://www.tableausoftware.com/videos/data-integration

I'm using Tableau with Redshift with some billions of rows of aggregated data. The queries, especially when using joins, are tens of seconds or minutes -- which is just too much for explorative analysis (I'd want max 10 seconds per query). Are there easy ways to sample the data in Tableau?

Tableau doesn't have an automatic way for sampling data from a connection. If performance is an issue with queries coming from a Redshift environment I highly suggest exploring some of the tuning techniques listed in the joint Tableau and Amazon Whitepaper.

*http://www.tableausoftware.com/learn/whitepapers/tuning-your-amazon-redshift-and-tableau-software-deployment-better-performance

Can I build analytics in tableau by connecting to a MDM source and Big data information from AWS cloud services? How are the keys and joins resolved?

Tableau Answer: Tableau helps both business and IT groups jointly keep data safe and secure inside organizations. MDM solutions often play a role in how this is accomplished and often differ depening on the technology, approach, and goal of the ogranization itself.

Any university teach about Tableau?Many Universities have started incorporating Tableau into their acamdemic programs for a variety of courses. In support of academic institutions using Tableau for learning environments, Tableau has started the "Tableau for Teching" program which allows any full time student (elementary, high school, collegiate) as well as instrcutors at fully acredited institutions to use Tableau for free.

* http://www.tableausoftware.com/academic

Is it possible to get what is the # of CPU on the Tableau server which was handling 23 million rows ?

Technical specification recommendations for Tableau Server implementations are readily available. * http://www.tableausoftware.com/products/techspecs

Is this how it looks for an end user or is this the admin interface?

The majority of the demonstration during the webinar was Tableau Desktop which would be considered the report author's view. Hosted Tableau server views designed purely for interactive consumption do not offer the creation aspect seen in Tableau Desktop. Please see the accompanying link that shows a final Tableau Server example.

*https://demodepot.tableausoftware.com/views/SecuritiesTechnical/1#1

Page 56: AWS Webcast - Tableau Big Data Solution Showcase

Q&A Session Transcription (cont.)Question Answer(s) Resource(s)

Can you share a dashboard with another tableau professional desktop user without creating an extract? (by sharing the connection to redshift)?

Tableau allows for workbook files to be shared between Tableau Desktop users that do not require extracted data. The Tableau Workbook file (extension .twb) contatins the analytics, but no local data -just a memory of how to connect back up to Amazon Redshift.

*http://www.theinformationlab.co.uk/2013/12/02/tableau-file-types-and-extensions/

How does the speed of querying a dataset on Redshift compare with querying a Tableau Data Extract data source on a Tableau server?

Performance of Tableau queries against Amazon Redshift as a datasource vs. a Tableau Data Extract hosted on Tableau Server is totally dependent on the type of data and complexity of the query. From a scalability standpoint, Amazon Redshift may be the better choice for bigger datasets given it's ability to elastically provision more compute power.

If you are building out a data model for tableau dashboards, should you use vertical or horizontal data structures for your data marts?

Tableau works best with vertical data structures.

How mac version of tableau connects with Redshift?Tableau Desktop Professional for the Mac leverages the same ODBC based connection approach for working with Amazon Redshift as it does Tableau Desktop for Windows.

You can find the drivers here: http://www.tableausoftware.com/support/drivers

Do you have a testing version of Tableau? With some testing datasets, that would allow one to practice design dashboards and day to day analytics, please :)

For anyone interested in using Tableau to experiment with building visual analytics and leveraging Amazon Redshift, I highly recommend trying the AWS test drive page set up by Slalom Consulting.

* https://www.slalom.com/aws

We have a non performing platform hosted locally with slow response times when you interact in Tableau. Would simply putting the tableau extract on Redshift result in a boost in performance?

Tableau Answer: If the data environment your organization is using internally is slow or not set up for analytics, I would recommend looking into Amazon Redshift or RDS. Neither of these options would require you to even need to take a Tableau Data Extract. Tableau customer Mixpo, had almost exactly this scenario and saw tremendous results leveraging Redshift.

*http://www.tableausoftware.com/learn/webinars/explore-big-data-analytics-amazon-redshift

Can Tableau Server be clustered for HA ?Tableau Answer: Tableau can absolutely be clustered to ensure an HA (Highly Available) environment. No restrictions on the Tableau side but there are cursor limitations on the Redshift side. Please refer to the whitepaper for more details

*http://www.tableausoftware.com/sites/default/files/whitepapers/high_availablility_reduced_downtime.pdf

What are the challenges one can encounter while working with tableau on redshift?How complete is the integration of tableau and redshift? For example, will all the analytical functions that tableau generates in its SQL available in redshift?

Tableau Answer: Every organization's data and analytical requirements are unique. Knowing how to tune performanc in both Tableau and Redshift is very helpful and is covered in the joint Tableau and Amazon Redshift whitepaper

*http://www.tableausoftware.com/learn/whitepapers/tuning-your-amazon-redshift-and-tableau-software-deployment-better-performance

Page 57: AWS Webcast - Tableau Big Data Solution Showcase

THANK YOU