bi/analytics on nosql: review of architectures

19
BI/Analytics for NoSQL: Review of Architectures

Upload: dataversity

Post on 27-May-2015

6.081 views

Category:

Technology


0 download

DESCRIPTION

NoSQL is great for running your apps; flexible and scalable. Traditional SQL-centric BI tools are challenging if not impossible to use with data in NoSQL systems. We will cover and discuss existing implementations, and the broad set of architectures for how organizations are "doing BI" on top of NoSQL systems. We will cover the challenges, strengths, and war stories of these various architectures along with practical advice for those who are adopting or building out these solutions. In particular we hope to help attendees answer the following questions: How do I enable AdHoc, self service reporting on data in my NoSQL? My NoSQL system is massively scalable, but my users complain that reports are slow: How do I improve report performance on my NoSQL data? How do I integrate my NoSQL data with my existing Data Warehouse or BI systems? Some of my data is in traditional RDBMSes; How do I build BI based on NoSQL plus additional outside data? How do I do simple reporting on NoSQL data, but also do the rich complex analytics that only my NoSQL allows (graph analytics, social media analytics, etc)?

TRANSCRIPT

Page 1: BI/Analytics on NoSQL: Review of Architectures

BI/Analytics for NoSQL:Review of Architectures

Page 2: BI/Analytics on NoSQL: Review of Architectures

What we'll answer in 50 minutes

• Who is this guy?• How do I enable AdHoc, self

service reporting on NoSQL?• How do I improve the

performance of dashboards on top of NoSQL?

• How do I integrate NoSQL data with my other data not inside NoSQL?

• How do I enable, easy to build simple reports but also preserve the ability for rich NoSQL queries?

Page 3: BI/Analytics on NoSQL: Review of Architectures

Nicholas Goodman

• Open Source BI thought leader– 50+ Open Source BI customer projects– Blogger, whitepapers, etc

• Entrepreneur – DynamoBI Corporation– Bayon Technologies, Inc.

• Data Geek, hacker, tinkerer, committer

GOAL: Share perspectives, research, opinions.DISCLAIMER: Your Mileage ...

Page 4: BI/Analytics on NoSQL: Review of Architectures

How do we answer those Q's?

Page 5: BI/Analytics on NoSQL: Review of Architectures

Promise of “Big Data”

• NoSQL/Hadoop/MapReduce Systems– Keep more of it– Cost effective analysis– “Massive scale” data, now accessible to everyone (elastic)– Not just SQL queries, more complex analysis

ACCOMPLISHED: WEB SCALE, MASSIVE NEVER BEFORE SEEN SCALE OF DATA STORAGE AND PROCESSING

Page 6: BI/Analytics on NoSQL: Review of Architectures

Reality Check!

• Petabytes? Y• Cheap Storage? Y• Raw Processing? Y• Rich Query Languages? Y• Flexible data structures? Y• Reliable, Fault Tolerant? Y

• Fast Queries? N• Ad Hoc access? N• Accessibility to commodity BI

tools? N• Easy report authoring? N• Levels of Aggregation? N• Integrated Data? N

Big Data has solved the INFRASTRUCTURE of raw/core data storage but has provided less value to what BUSINESS users want for analytics.

Page 7: BI/Analytics on NoSQL: Review of Architectures

Data Gaps too!

• Code, Developers• MR, Rich Graph/Access• Hierarchical, Unstructured

• Analysts w/ Excel, Dashboards• Simple 2D (tables, charts)• Filtering and easy analytics

Page 8: BI/Analytics on NoSQL: Review of Architectures

100 BILLION

100 MILLION

100 BILLION100 BILLION

1 MILLION1 MILLION1 MILLION

10K

Levels of Aggregation

1 ROWTO 1 BILLION ROWS

SAME DATA AT VARIOUS LEVELS OF AGGREGATION HUGELY IMPORTANT IN REAL LIFE IMPLEMENTATIONS!

Page 9: BI/Analytics on NoSQL: Review of Architectures

Architectures

• NoSQL reports• NoSQL thru and thru• NoSQL + MySQL• NoSQL as ETL Source• NoSQL programs in BI Tools• NoSQL via BI Database (SQL)

Page 10: BI/Analytics on NoSQL: Review of Architectures

NoSQL reports

• Pay Developer to build applications for reports

Apps

• 100% Richness of NoSQL• Up to date, current• Excellent performance on

large datasets• Custom built, beautiful

reports/dashboards• Single system to manage

• $$, developer driven process• No commodity BI tools• Managing rollups/summaries• Schema-less = Harder!• Hard to integrate other

reporting information

Page 11: BI/Analytics on NoSQL: Review of Architectures

NoSQL thru and thru

• Pay Developer to build FLEXIBLE applications for reports

AdvancedApps

• All of NoSQL report advantages

• Managed aggregations, rollups

• “Guided Adhoc” available inside application

• Higher performance for dashboards/summaries

• $$, developer driven process• $$, app required for aggs• No commodity BI tools• Hard to integrate other

reporting information• Limited AdHoc (only

developer built combinations)

IndicesAggs

Page 12: BI/Analytics on NoSQL: Review of Architectures

NoSQL + MySQL

• Pay Developer to build FLEXIBLE applications for reports

• Less IT $$ since developers aren't “building reports”

• Rich, NoSQL analysis left in place (ETL + NoSQL)

• Easy, Ad Hoc reporting via commodity BI tools

• Easier to understand data for self service reports

• Data freshness (24 hrs old)• Once into MySQL no rich

NoSQL application use (M/R)• BI Tool can connect ONLY to

data in MySQL, not NoSQL• Aggregations still self

managed in MySQL

MySQLETLApp

Page 13: BI/Analytics on NoSQL: Review of Architectures

Informatica

NoSQL as ETL Data Source

• NoSQL treated like any other data source

• Allows use of consolidated, BI tool for AdHoc

• Enables integrated (combined) datasets for reporting

• Aggregations Often “managed”

• Best of Breed tools

• ETL Development Expense• Data Latency• Loss of NoSQL language

richness• Traditional DW tools are $$• Scaling issues with DW

Database

Teradata

Page 14: BI/Analytics on NoSQL: Review of Architectures

NoSQL programs in BI Tools

• Write a program in BI tool that flattens data, output into report

• Rich use of NoSQL native language

• Direct, up to date access• Access to 100% of dataset• Leverage “guided” report

parameter pages• Less expensive than apps

• Developer required to write program ($$)

• Slow-er (aggs, summaries)• Lacks integration with other

datasets• Still (usually) no AdHoc

access

Page 15: BI/Analytics on NoSQL: Review of Architectures

NoSQL via BI Database (SQL)

• Enable NoSQL data access via SQL (gasp!)

• Easy reports, easy (SQL)• Integration with other data• ETL is simple INSERT/MERGEs• Live, up to date access• High performance, cached data• AdHoc access to Live + Cached• Aggregations/Summaries

• Another system in between• Still needs to be refreshed,

nightly• Not all capabilities for NoSQL

richness available via SQL

Live Query

Cached, 24hr data

Page 16: BI/Analytics on NoSQL: Review of Architectures

Mozilla: NoSQL thru and thru(DB)

• Socorro Project: Crash reports, optionally sent to Mozilla• https://crash-stats.mozilla.com

Page 17: BI/Analytics on NoSQL: Review of Architectures

X: NoSQL via SQL

• Using “Splunk” (ie, a commercial NoSQL-eee data aggregator/etc)• Desire to use Tableau for advanced analytics/visualization

Page 18: BI/Analytics on NoSQL: Review of Architectures

Meteor Solutions:NoSQL thru and thru

• Using Cloudant BigCouch solution (SaaS)• High performance set of multi purpose indices on pre defined

aggregations• Up to date aggregation/reports• Better fit for Social Media graph structures over relational DB• Custom built BI applications (dashboards/reports) providing a

flexible guided view through data

AdvancedApps

Page 19: BI/Analytics on NoSQL: Review of Architectures

A,B,C: NoSQL + MySQL

MySQLETLApp

• Many Many companies (3 we've worked with)• All “web related” companies (semi structured, some, mostly

volume)• Heavy lifting and storage, and “ETL/Data prepartion” inside

Hadoop• Push summarized, aggregated data into MySQL for analysis by

easy, dashboarding/BI Tools