Download - 10 Commandments for BI in Big Data, Shant Hovsepian, Arcadia Data [FirstMark's Data Driven]
What’s so special aboutBI on Big Data?
12.14.15
Shant Hovsepian@superdupershant
Presentation prepared for Data Driven NYC #42
1
What’s so special aboutBI on Big Data?
12.14.15
Shant Hovsepian@superdupershant
Presentation prepared for Data Driven NYC #42
2
#BigDataSeacrest
Co-Founder & CTO
Came out of stealth mode in June and just announced our GA product release.
Rapidly Growing and focused on the Fortune 2000
See lots of customer struggles with data, Big and Small
You don’t use previous generation architectures to store Big Data so why use previous generation BI tools to analyze it?
Create businessvalue fromBig Data
Data Driven NYC #42 12.14.15 3
– OUR FOUNDING VISION –
Data Driven NYC #42 12.14.15 4
@BigDataBorat
Data Driven NYC #42 12.14.15 5
#BigDataSeacrest
Data Driven NYC #42 12.14.15 6
#BigDataMoses
-The 10 commandments of BI ON BIG DATA-
Thou shalt notmove Big Data
7Data Driven NYC #42 12.14.15
Moving Big Data Is Expensive
On-Cluster BI is now possible
Push all the computation down close to the data
Careful having to extract data out to data marts & cubes
-Lots of native analysis engines out there, make sure your BI tools support them.-ODBC/JDBC connectors aren’t always enough.-
-Having to extract data out of the system is slow and defeats the purpose of having a specialized architecture.-Extracts and cubes in situ aren’t so bad as long as it’s not a required first step to analysis.-
-YARN, Mesos, have made it possible to run a BI server right next to the data.-The benefits of unified management, performance, workload management are just huge when the infrastructure is converged.-
8Data Driven NYC #42 12.14.15
Thou shalt not stealor violate corporate security policy
9Data Driven NYC #42 12.14.15
Data Driven NYC #42 12.14.15 10
Security is Serious
-All the serious Big Data infrastructure vendors have implemented some form of security, your BI tool should support it.-BI software shouldn’t require re-implementing all the access control rules all over again. -RBAC – Role Based Access Control-Single Sign On especially for embedded use cases-
Thou shalt not payfor every user or megabyte
11Data Driven NYC #42 12.14.15
Be wary of pricing models that penalize you for increased adoption
-We’ve seen Big Data deployments quadruple in size and adoption within a couple of months-Keep an eye out for licensing models that bill for users or data size, these too can grow much quicker than you can anticipate-
12Data Driven NYC #42 12.14.15
Thou shalt covetthy neighbor’s visualizations
13Data Driven NYC #42 12.14.15
First Class Support for Collaboration
SHAREPUBLISH-Export to PDF or email is expected by everyone.-Publish to server to preserve interactivity instead of a static image.-Supporting source data updates after publishing is even better.-
-Preserve data lineage and how.-Network effects, github for BI clone and fork.-
14
Collaborative exploration is needed because in some cases no single person understands the entire data set.
Data Driven NYC #42 12.14.15
Thou shalt analyze thine datain its natural form
15Data Driven NYC #42 12.14.15
This is What Big Data Looks Like-Free form text-
16Data Driven NYC #42 12.14.15
This is What Big Data Looks Like-Free form text-Key Value Pairs-
17Data Driven NYC #42 12.14.15
8=FIX.4.2^A9=145^A35=D^A34=4^A49=ABC_DEFG01^A52=20090323-15:40:29^A56=CCG^A115=XYZ^A11=NF0542/03232009^A54=1^A38=100^A55=CVS^A40=1^A59=0^A47=A^A60=20090323-15:40:29^A21=1^A207=N^A10=139^A
This is What Big Data Looks Like-Free form text-Key Value Pairs-JSON / Semi-Structured-
18Data Driven NYC #42 12.14.15
8=FIX.4.2^A9=145^A35=D^A34=4^A49=ABC_DEFG01^A52=20090323-15:40:29^A56=CCG^A115=XYZ^A11=NF0542/03232009^A54=1^A38=100^A55=CVS^A40=1^A59=0^A47=A^A60=20090323-15:40:29^A21=1^A207=N^A10=139^A
This is What Big Data Looks Like-Free form text-Key Value Pairs-JSON / Semi-Structured-Tables-
19Data Driven NYC #42 12.14.15
8=FIX.4.2^A9=145^A35=D^A34=4^A49=ABC_DEFG01^A52=20090323-15:40:29^A56=CCG^A115=XYZ^A11=NF0542/03232009^A54=1^A38=100^A55=CVS^A40=1^A59=0^A47=A^A60=20090323-15:40:29^A21=1^A207=N^A10=139^A
20Data Driven NYC #42 12.14.15
Don’t let your BIsolution tell youotherwise.
Thou shalt not waitendlessly for thy results
21Data Driven NYC #42 12.14.15
No Surprise Here, Things Should Be Fast
Take Samples of the Data
Build anOLAP Cube
Create Temp Tables-
This works pretty well once you’ve got a good idea of what metrics matter.-Don’t get stuck with “cube first results later”.-Make sure your cubes can live on cluster or scale out easily.-
-This can be as simple as fancy caching. Make sure some of tables can be intelligently reused.-Materialize complex expressions so we don’t have to recalculate them every time.-Store them on cluster where they belong. Be wary of extracts out. -
22
Tricks legacy BI tools use to achieve performance
Data Driven NYC #42 12.14.15
-Instant gratification though the results may not be correct initially.-How far down can the samples be pushed, need to cognizant of blocking operations. -
Thou shalt not buildreports but apps instead
23Data Driven NYC #42 12.14.15
What comes to mind when I say reports?
24Data Driven NYC #42 12.14.15
What comes to mind when I say reports?
-Traffic Report-
25Data Driven NYC #42 12.14.15
What comes to mind when I say reports?
-Traffic Report-Weather Report-
26Data Driven NYC #42 12.14.15
What comes to mind when I say reports?
-Traffic Report-Weather Report-Book Report-
27Data Driven NYC #42 12.14.15
What comes to mind when I say reports?
-Traffic Report-Weather Report-Book Report-Report Card-
28Data Driven NYC #42 12.14.15
What comes to mind when I say reports?
-Traffic Report-Weather Report-Book Report-Report Card-
29Data Driven NYC #42 12.14.15
What comes to mind when I say apps?
30Data Driven NYC #42 12.14.15
What comes to mind when I say apps?
31Data Driven NYC #42 12.14.15
Visual Information Seeking Mantra
Rails made web apps easy, BI Tool should do the same.
Async data from multiple sources.
Interact with Visual elements not text boxes.
-Pull in new data async without having to refresh the entire thing.-Supporting auxiliary data sources and APIs to bring in richer content.-
-We don’t to deal with control box and parameter hell.-Want to be able to interact with actual visual elements drawn have the visualization update accordingly.-
-Templates and reusable components.-Decoupling the data from the app and make it easy to manage and mass produce multiple apps. -
32
“overview, zoom and filter, then details on demand
Data Driven NYC #42 12.14.15
Thou shalt useintelligent tools
33Data Driven NYC #42 12.14.15
“Smart” BI Tools will help the user out.
34
-Help with suggesting Vizs to create.-Built in search for everything.-Automatically maintaining models and caches the burden isn’t on the end user.-
Data Driven NYC #42 12.14.15
Thou shalt go beyondthe basics
35Data Driven NYC #42 12.14.15
You don’t ask the same questions of your Big Data?
Make some of that functionality is available in an easy to use manner.
36
Big Data is a gold mind of predictive and advanced analytics use cases.
Data Driven NYC #42 12.14.15
Thou shalt use Arcadia Data
37Data Driven NYC #42 12.14.15
Thou shalt use Arcadia DataJust kidding
38Data Driven NYC #42 12.14.15
39
Arcadia DataConvergedAnalyticsPlatform
arcadiadata.com
Data Driven NYC #42 12.14.15
Thank you.
40
12.14.15
Data Driven NYC #42 12.14.15