webinar: introducing the mongodb connector for bi 2.0 with tableau
TRANSCRIPT
Introducing the MongoDBConnector for BI 2.0 with Tableau
Buzz MoschettiEnterprise Architect
[email protected]@buzzmoschetti
Vaidy KrishnanSenior Product Marketing Manager
Agenda
• Introduction to MongoDB• What is The BI Connector?• Analytics with Tableau on MongoDB• Demo• Best Practices
3
MongoDB: The Post-Relational General Purpose Database
Document Data Model
Open-Source
Fully FeaturedHigh Performance
Scalable
{ name: “John Smith”,pfxs: [“Dr.”,”Mr.”],address: “10 3rd St.”,phone: {
home: 1234567890,mobile: 1234568138 }
}
4
MongoDB Company Overview
600+ employees 2500+ customers
Over $311 million in fundingOffices in NY & Palo Alto and
across EMEA, and APAC
6
db-engines.com Ranks ~300 Databases
7
Nexus Architecture
Scalability& Performance
Always On,Global Deployments
FlexibilityExpressive Query Language& Secondary Indexes
Strong Consistency
Enterprise Management& Integrations
• Deep Dive
8
Major Sweet SpotsBigData Product&Asset
CatalogsSecurity&Fraud
InternetofThings Database-as-a-Service
MobileApps
CustomerDataManagement
SingleView Social&Collaboration
ContentManagement
IntelligenceAgencies
TopInvestmentandRetailBanks
TopGlobalShippingCompany
TopIndustrialEquipmentManufacturer
TopMediaCompany
TopInvestmentandRetailBanks
ComplexDataManagement
TopInvestmentandRetailBanks
Embedded/ISV
Cushman&Wakefield
Agenda
• Introduction To MongoDB• What is The BI Connector?• Analytics with Tableau on MongoDB• Demo• Best Practices
10
MongoDB Query Language is Powerful
> db.results.values.aggregate([{$match: { runnum:23, timeSeriesPath: "CDSSpread.12M//1909468128” },{$project: { timeSeriesPath: "$timeSeriesPath", values: foml }},{$unwind: {path: "$values", idx: "v_idx"}},{$match: {values: {$gt: 60}, {$or: [ {idx: 0}, {idx: {$size: . . .},{$group: {_id: {a: "$timeSeriesPath", b: term: "$idx"},
n: {$sum:1}, max: {$max: "$values"}, min: {$min: "$values"}},sdev: {$stdDevPop: "$values"}}
,{$lookup: { from: ”deskLimits", localField: ”instID", foreignField: ”instID", as: ”inst"}},{$match: {maxDeskLimit: {$gt: {$cond: [ {$gt: [2, $max]}, 2, $max]}}}},{$group: {_id: "$deskID", total: {$sum: “$max”}}}]);
11
Able To Leap Tall Buildings in a Single Bound!> db.foo.insert({_id:1, "poly": [ [0,0], [2,12], [4,0], [2,5], [0,0] ] });> db.foo.insert({_id:2, "poly": [ [2,2], [5,8], [6,0], [3,1], [2,2] ] });
> db.foo.aggregate([{$project: {"conv": {$map: { input: "$poly", as: "z", in: {
x: {$arrayElemAt: ["$$z”,0]}, y: {$arrayElemAt: ["$$z”,1]},len: {$literal: 0} }}}}}
,{$addFields: {first: {$arrayElemAt: [ "$conv", 0 ]} }},{$project: {"qqq":
{$reduce: { input: "$conv", initialValue: "$first", in: {x: "$$this.x”, y: "$$this.y",len: {$add: ["$$value.len", // len = oldlen + newLen
{$sqrt: {$add: [{$pow:[ {$subtract:["$$value.x","$$this.x"]}, 2]},{$pow:[ {$subtract:["$$value.y","$$this.y"]}, 2]}] }} ] } }}
,{$project: {"len": "$qqq.len"}}
{ "_id" : 1, “len" : 35.10137973546188 }{ "_id" : 2, "len" : 19.346952903339393 }
12
… But It Doesn’t Natively Speak SQL> db.restaurants.sql("select * from restaurants where cusine = 'Peruvian'");2017-01-12T14:57:23.930-0500 E QUERY [main] TypeError: db.restaurants.sql is not a function
13
The MongoDB BI Connector: A “SQL Bridge”
MongoDBMongoDB
BIConnector
Anything That
Speaks MySQL
select A.fn, A.LN, P.prodType, T.amt, T.tdfrom tx TJOIN product P on T.product = P.prodJOIN acct A on T.acct = A.acctwhereA.acct in ('A5' , 'A10')and T.td = '2015-03-01 00:00:00’and P.prodType = 'CAR'
db.tx.aggregate([{$match:{td:ISODate(“2015-03-01 00:00:00”)},{$lookup:{from: “acct”, localfield: “acct” …{$match:{acct: {$in: [“A5”, “A10” ]}},{$lookup:{from: “product”, localfield: “prod”{$match: {prodType: “CAR”}}
14
The MongoDB BI Connector: A “SQL Bridge”
MongoDBMongoDB
BIConnector
select A.fn, A.LN, P.prodType, T.amt, T.tdfrom tx TJOIN product P on T.product = P.prodJOIN acct A on T.acct = A.acctwhereA.acct in ('A5' , 'A10')and T.td = '2015-03-01 00:00:00’and P.prodType = 'CAR'
db.tx.aggregate([{$match:{td:ISODate(“2015-03-01 00:00:00”)},{$lookup:{from: “acct”, localfield: “acct” …{$match:{acct: {$in: [“A5”, “A10” ]}},{$lookup:{from: “product”, localfield: “prod”{$match: {prodType: “CAR”}}
15
Authentication & Entitlements are ALSO Bridged
MongoDBMongoDB
BIConnector
biUser?mechanism= MONGODB-CR,source=authDBpassword=*******
client = connect(biUser, *******);
16
A Mapping File is The Key Ingredient
schema:- db: food
tables:- table: restaurants
collection: restaurantscolumns:- Name: _idMongoType: bson.ObjectIdSqlName: _idSqlType: varchar
- Name: address.buildingMongoType: stringSqlName: address.buildingSqlType: varchar
MongoDBMongoDB
BIConnector
17
Mapping Generator to Get You Started
MongoDBMongoDB
BIConnector
mongodrdl –d food –c restaurants –o food.drdl
mongosqld –schema=food.drdl
Agenda
• Introduction To MongoDB• What is The BI Connector?• Analytics with Tableau on MongoDB• Demo• Best Practices
ConnectivityAccess to all
data
PerformanceFast interaction
with all data
DiscoveryFinding the right
data.
Tableau’s Big Data Focus
Analytics for All your Data
Broad access to Big Data platforms
Visual analytics without coding
Platform query performance
Consistent visual interface
Hybrid data architecture
Big Data Connectivity Roadmap
2010 2012 2013 2014 2015
Tableau v6.1.4Cloudera Hadoop
Tableau v7.0.10HortonworksHadoop
Tableau v8.2.3IBM BigInsights
Tableau v9.0Spark SQL
Tableau v5.2Pivotal Greenplum& HAWQ
2011
Tableau v7.0.10Cloudera Impala
Tableau v7.0.7MapR Hadoop
Tableau v7.0.10Datastax Enterprise& Cassandra
Tableau v8.1.4Splunk
Tableau v8.0.1Amazon Redshift
Tableau v8.2.3MarkLogic
Tableau v8.3.2Amazon EMR
Tableau v8.0Google BigQuery
Today
2016 2017
Cold, Warm, Hot Framework
• The Data Lake• Store Everything and
Anything• Unknown Questions
with Unknown Answers• Unstructured / Data
Mining / Data Science
• Data Warehouses• Data marts prepared
for entity analytics• Known questions
with unknown answers
• Regularly refreshed business concepts
• In-memory computing• Precomputed aggregates
to answer specific questions
• Known questions with known answers
• Dashboards
Aggregated dataPrepared data
Data Size
PerformanceLarge data (raw or prepared)
Cold, Warm, Hot Strategy
Aggregated dataPrepared data
Data Size
PerformanceLarge data (raw or prepared)
Cold, Warm, Hot Strategy with Optimized MongoDB
How do we see customers using Tableau on MongoDB
• Use Case–Data Exploration/Mining–Ad-Hoc Report Conceptual Modeling–Query directly/Explore Concepts to Migrate to Analytically Optimized
Data Stores
MongoDB
• Financial Services: Analyze ticks, tweets, satellite imagery, weather trends, and any other type of data to inform trading algorithms in real time.
• Government: Identify social program fraud within seconds based on program history, citizen profile, and geospatial data.
• HighTech: Identify unique individuals across any type of device, browser or app and use a holistic behavioral model to advertise to them.
• Retail: Set up a digital geo-fence around your brick-and-mortar locations to push in-store incentives to shoppers in real time.
• MongoDB – Verticals & Use Cases
Agenda
• Introduction To MongoDB• What is The BI Connector?• Analytics With Tableau on MongoDB• Demo• Best Practices
Agenda
• Introduction To MongoDB• What is The BI Connector?• Analytics With Tableau on MongoDB• Demo• Best Practices
Basic MongoDB Optimizations
✔ DO: ✗ AVOID
• Model for use
• Index effectively
• Use prejoined array tables
• Leverage custom pipelines in DRDL
• Let dates (SQL timestamp) and decimal
types flow w/o conversion to string
• Casts
• Date arithmetic
• Cross-collection
• Non-equijoins
• Subqueries
Tableau Data Extracts – When to use them?
Extracts Recommended Live Connection Recommended
• Slow SQL to MQL translation
• Smaller dataset sizes needed
• Offline analysis required
• Reduce “big query” impact on
nominal workload performance**
• Fast SQL to MQL translation
• Larger dataset sizes needed
• Real-time analysis required
• Extract Sampling Techniques• Filters
• Keep only well-known dimensions and measures• Use short date ranges
• Aggregates• Aggregate dimensions and measures when possible• Roll-up dates when possible
• Samples• Utilize Custom SQL with sample function
• Top N• May be skewed since non-random sampling
Optimize your Tableau Data Extracts
General Techniques for Improvement
Partition field as filter
Single denormalized table
Monitor for long running queries• Data blending large datasets
– Executed on the Tableau client side• Cull Unnecessary joins
– …and take advantage of prejoined tables in the BI Connector– Imperfectly implemented on many big data systems– Assume referential integrity
• Inefficient formulas
MongoDB
Leverage a multi-tiered approach based on your data
TDE+
Fast analytical database
Aggregateddata
Prepared data
Raw data (large)
MongoDB
• Chunks of Human Consumable Data• Aggregation of Data Tiers
• Year to Quarter to Month to Week to Day to Records
• Region to Country to State to County to Zip Code
• Drill Down to Raw Data with Context• Use Aggregates for Guided Drilling• Use Action Filters to Navigate the
Pyramid
•Human Scale of Data
SingleConsumableChunkofData
attheHumanScale(Dashboard)
AggregationLevel
Year
(4)
Mon
th (
48)
Wee
k (1
05)
Day
(90)
Raw
Dat
a
Filter Year
Filter Month
Filter Week
Filter Day
Select Week
Select Month
Select DimensionSelect Dimension
In the Weeds
• Use Action Filters to Jump from Tier to Tier with a filter context
• Drill Down to the Details• Leave the Data in the Appropriate
Data Architecture• Hot - Analytical Query• Warm - Entity Query• Cold - Data Discovery
Action Filters: Big Data Secret Weapon
COLD
WARM
HOT
• Dashboard or Document Acceleration• High Performance• Aggregations• Persistence
• Row Level Security• Live Connections• Core Report Development
• Data Mining• Detailed Data• Raw Data• Machine Learning
1. Do you have sufficient infrastructure/hardware to deal with the kind of data that will be analyzed? ~ Law of inertia , nothing moves till there is sufficient force applied to move it
2. Have you chosen an underlying data source that matches your performance aspirations, and have you engineered it for interactive performance? ~ law of dynamics, Force = mass * acceleration
3. Have you designed your Tableau vizzes so that the queries run efficiently? ~ For every action (viz) there is an equal and opposite reaction (from the data source)
Don’t forget the laws of Data Motion
Q & A
Thank You!
Buzz MoschettiEnterprise Architect
[email protected]@buzzmoschetti
Vaidy KrishnanSenior Product Marketing Manager