how auto microcubes work with indexing & caching to deliver a consistently fast business...
TRANSCRIPT
![Page 1: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/1.jpg)
![Page 2: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/2.jpg)
About Jethro
SQL
Data
• What Does Jethro Do?– BI on Big Data acceleration– Reporting, dashboards, discovery, ad-
hoc
• How It Works?– Indexing and caching server– Combines columnar SQL DB design
with search-indexing technology
• Partnerships– BI: Tableau, Qlik– Hadoop: Cloudera, MapR, Hortonworks
![Page 3: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/3.jpg)
SQL on Hadoop – Complimentary Approaches
• Hive / Tez• Impala• Presto• SparkSQL• Drill
• HAWQ• IBM/Big SQL• Actian• Tajo• …
SQL-on-Hadoop SolutionsFull-Scan: Read all rows
• JethroData
JethroDataIndex-Access: Read ONLY needed rows
Comparison:Full-Scan: Optimal for predictive & reportingIndex-Access: Optimal for interactive BI
![Page 4: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/4.jpg)
What Is Jethro for BI Tools?An indexing & caching server• BI tool uses live DB access
– Sends SQL queries via ODBC / JDBC
• Jethro key performance features1. Full indexing – every column is indexed2. Result cache – every query is cached3. Auto Cubes – every repeatable pattern
• Everything stored in Hadoop– Cache, aggregations, index & column files, …
• Incrementally updated– Every day / hour / min
Live Access
HDFS
BI Tool
![Page 5: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/5.jpg)
Jethro: Enabling Unlimited Interactive BI for Big Data
Unlimited
Interactive
Big Data
![Page 6: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/6.jpg)
Low Conccurency
Interactive
Slow
MPP speed =
more resources
Jethro:Hi Speed
Low resources
HiConccurency
Interactive BI requires both speed and conncurency
Faster
• Indexes• Hi Performance Execution• Results cache• Auto Cubes
![Page 7: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/7.jpg)
Indexes – boosting filtered queries
• Indexes everything – every column, every value• Filtering (where clause expression) is done against the indexes• The more you filter the faster you get the results – execution
time depends on size of scanned data set• Resources required per query are order or manganite lower
which enable high concurrency
![Page 8: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/8.jpg)
What to do when indexes are not enough?
• The Challenge: How to provide interactive response time (seconds) for use cases that include wide queries with little or no filtering
• Our Approach:Add CUBES technology which is complimentary to INDEXES
• Jethro rule:Make this absolutely seamless to the user
![Page 9: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/9.jpg)
Traditional OLAP Cubes – The Short StoryCube: Select City, Item, Year, sum(sold_price) from Sales group by City, Item, Year
Queries that can use the cube: Select City, Item, sum(sold_price) from Sales where Year=2016 group by
City, Item Select Item, sum(sold_price) from Sales group by Item
City Item Year sum(sold_price)NY iPhone7 2016 $50,000NY Samsung7 2016 $40,000NY iPhone6 2015 $42,500LA iPhone7 2016 $70,000LA Samsung6 2015 $35,000
![Page 10: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/10.jpg)
Traditional OLAP Cubes
• Performance:Fast response time for queries that hit the cube
• Concurrency: Low resource footprint per query enabling high concurrency
• Use Case:Works great for static query environment
Not suitable for dynamic environments that support self service and complex dashboards
![Page 11: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/11.jpg)
Traditional OLAP Cubes: Challenges• Hard to implement: Manually pre-defined, requires specialized tools and expertize• Resources consuming: Heavy processing on cubes creation that can effect global system
performance• Operational overhead: keeping cube up to data with source data is time and resource
consuming• Use case limitations: Size limitation and operational limitations that make it practically
impossible to use for many use cases, Such as:– Large number of dimensions– High cardinality dimension– Count distinct aggregators– Complex expressions– Many different queries
![Page 12: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/12.jpg)
How to have your cake and eat it too• Auto generated cubes– Cubes are automatically generated in the background based on actual user
interaction – No expertise, no specialize tools, no pre design– Unlimited access to the data
• Micro Cubes– Many Micro cubes instead few gigantic cubes– Easily support many different queries
• Incremental– Auto cubes are incremental and automatically updated– Zero operational overheads – Stable performance unaffected by ongoing new data streaming
![Page 13: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/13.jpg)
How to have your cake and eat it too, cont.• Complex queries normalization– Rewrite complex queries to reuse simplified common query blocks– Increase cubes reusability
• Optimized for count distinct– Handling for count distinct using values bitmaps– Handle count distinct without hitting cube size limitation
• Complementary to indexes– Use indexes for large number of filters or hi cardinality dimensions:– Maintain stable interactive performance by utilizing complementary
index and cubes
![Page 14: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/14.jpg)
Jethro Query Processing FlowQuery Arrives
Query Match
?
CubeMatch
?
Process Query(Indexes, Columns,
MT execution)
Optimal for
Cube?
Cache Results
Generate Cube
Response from Results Cache
Response from Cubes
reply
reply
replyNo
No
Yes
Yes
![Page 15: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/15.jpg)
LowRepeatability
Hi-filter
Mid-filter
No-filter
Results Cache
Indexed Based Query
Execution
AutoCubeJethro
2.0
HiRepeatability
MidRepeatability
Jethro: Consistently Fast Queries
![Page 16: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/16.jpg)
DEMO
• TCP-DS data set• Single table: 1.2 billion rows• Multi tables: 1.6 billion rows fact• 2 Jethro nodes (AWS r3.4xl) over EFS• BI: Tableau
![Page 17: How Auto Microcubes Work with Indexing & Caching to Deliver a Consistently Fast Business Intelligence Experience on Hadoop](https://reader035.vdocument.in/reader035/viewer/2022062905/586f74c81a28ab10258b5d33/html5/thumbnails/17.jpg)