to be or not to be a bi appliance embracer

4
<-- Back to full color view To Be or Not to Be a BI Appliance E mbrac er by Haranath Gnana Originally published 23 September 2009 Printer-friendly Email to a friend Email to myself Comments I was at a business intell igence (BI) presentat ion recently, and a p rofessor fr om Berkeley characterized the current dat a explosi on using the phrase “I ndustrial Revolu tion of data.” This resonated nicely as it highl ighted a key contributing factor t o the increase in data volumes we are chal lenged with, i.e., data produced by automat ed systems such as self-ser vice tell er machines, the Internet, cell phon es, etc. G iven the continued growth in al l of these systems, the rat e at whi ch data is expected to grow is continui ng to increase. Enterprises, like it or not, have to brace themselves for t his da ta onslaught. Most of t he traditional d atabases such as Oracle, DB2 and Microsoft have managed to deliver busine ss intelli gence (BI) value with data volumes up to 4 or 5TB, and this is possibl e only with expensi ve high-end iron. Successful data management beyond 5TB has bee n almost impossible for an average enterpr ise IT shop to even imagine running on these traditional database pl atfor ms. The continued success of Teradata and the very successf ul i nitial p ublic o ff ering (IPO ) of Netezza just a couple years ago is a clea r indication of the value of innovation in this spa ce. Teradata’s success has primarily been w ith its target of t he Fortune 500 customer s who ha ve deep po ckets to invest in its proprietary hardware/sof tware/s ervices solution. Netezza, on the other hand, attem pted to expand the set of cust omers t hat could leverage such B I technologies by sign ificantly reducing the entry-level pricing/affordability of its solution. But i t still requires a proprietary hardware/ soft ware solution. Both these players pushed the l imits of the “shared nothing” massively parallel processing (MPP) architect ures to scale to many tens of terabytes. This approach beat the shared architect ures of the traditional database players hands dow n. However, both these solutions are still relatively more expensive, and the proprietary nature of t heir hardware pla tf orm has not been well received by many enterpr ises. Google has proven that it is possible to leverage comm odity hardware in an extrem ely effective manner and still d eal w ith data volumes that are orders of magnitude greater than that of the largest enterpr ise datasets. The bi g upsi de of working wi th comm odity hardware is that you can benefit fr om the billi ons of dollars that the hardware companies are pumping into their products, constantly lowering the

Upload: saama-technologies-inc

Post on 06-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

8/3/2019 To Be or Not to Be a BI Appliance Embracer

http://slidepdf.com/reader/full/to-be-or-not-to-be-a-bi-appliance-embracer 1/4

<-- Back to full color view

To Be or Not to Be a BI Appliance Embracer 

by Haranath Gnana

Originally published 23 September 2009

Printer-friendlyEmail to a friendEmail to myself 

Comments

I was at a business intelligence (BI) presentation recently, and a professor fromBerkeley characterized the current data explosion using the phrase “IndustrialRevolution of data.” This resonated nicely as it highlighted a key contributing factor tothe increase in data volumes we are challenged with, i.e., data produced byautomated systems such as self-service teller machines, the Internet, cell phones,etc. Given the continued growth in all of these systems, the rate at which data isexpected to grow is continuing to increase. Enterprises, like it or not, have to bracethemselves for this data onslaught.

Most of the traditional databases such as Oracle, DB2 and Microsoft have managedto deliver business intelligence (BI) value with data volumes up to 4 or 5TB, and thisis possible only with expensive high-end iron. Successful data management beyond5TB has been almost impossible for an average enterprise IT shop to even imaginerunning on these traditional database platforms.

The continued success of Teradata and the very successful initial public offering(IPO) of Netezza just a couple years ago is a clear indication of the value of innovation in this space. Teradata’s success has primarily been with its target of theFortune 500 customers who have deep pockets to invest in its proprietary

hardware/software/services solution. Netezza, on the other hand, attempted toexpand the set of customers that could leverage such BI technologies by significantlyreducing the entry-level pricing/affordability of its solution. But it still requires aproprietary hardware/software solution. Both these players pushed the limits of the“shared nothing” massively parallel processing (MPP) architectures to scale to manytens of terabytes. This approach beat the shared architectures of the traditionaldatabase players hands down. However, both these solutions are still relatively moreexpensive, and the proprietary nature of their hardware platform has not been wellreceived by many enterprises.

Google has proven that it is possible to leverage commodity hardware in anextremely effective manner and still deal with data volumes that are orders of magnitude greater than that of the largest enterprise datasets. The big upside of working with commodity hardware is that you can benefit from the billions of dollarsthat the hardware companies are pumping into their products, constantly lowering the

8/3/2019 To Be or Not to Be a BI Appliance Embracer

http://slidepdf.com/reader/full/to-be-or-not-to-be-a-bi-appliance-embracer 2/4

price and improving performance. An architecture that leverages commodityhardware can gain the benefits of this ever-evolving platform.

This commodity hardware-based architecture has became the foundation for severalBI appliance start-ups, attempting to bring to the enterprise “structured” dataenvironments what Google has done for the unstructured world. Players likeGreenplum, Dataupia, Kognitio and Aster Data have all pioneered this approach withsome variations. These new players have also based their solution on the shared-

nothing MPP architectures. As expected of start-ups, these players have beenextremely aggressive in highlighting and proving their key value proposition, i.e.price/performance ratio.

I’ve been involved in two BI appliance bake-offs over the last year; and in both cases,these new players have had a very significant upper hand for the price/performancevalue. Also, their ability to scale out linearly, leveraging commodity hardware hasbeen a huge value proposition for enterprises.

Most of these players do offer the choice of either a “software-only” solution onrecommended hardware platforms (restricted more from a support perspective) or apackaged solution which includes hardware and software, providing additionalflexibility for IT organizations to choose the type of hardware they would like to get.

The BI appliance players make big claims of performance gains not just on the“querying” of data, but also the loading process. In one of the proofs of concept(PoCs), I put this loading process to the test. There was a particular load processbuilt with an established ETL company’s solution that was taking about 33 hours tocomplete. This 33-hour process included loading data from flat files to the stagingarea and then into a star schema and then building a set of aggregates. This processincluded data inserts, deletes and updates testing all of the load operations. Each

PoC run had to start with the same set of flat files and at the end of the run have datain all of the final tables including the aggregates. We did a table-by-table differentialat the end of each run to compare them with the baseline tables to ensure that the runproduced the same results.

Even though these appliances claimed the ability to deliver the significantperformance gains without aggregates, we ensured that they built all of theaggregates. This was primarily for two reasons:

To ensure that we had a clear baseline for comparison on the load processperformance.These aggregates could not be eliminated as that would require the rewrite of abunch of reporting and analytical applications that had been built on top of these aggregates.

It took each of the appliance players less than a week to build the scripts to mimic the33-hour load process. Two of the appliance players that participated in this bake-off had performance gains that they were able to prove which were mind blowing to saythe least. Both the players were able to bring down the load time from 33 hours toabout 30 minutes. Just incredible!! I do want to state that the process did not havevery complex transformation, but still this performance gain was way too significant to

ignore.

Even though these BI appliances showcased significant performance gains on thequery side as well. The IT management was so impressed by the load performancegains that it was enough to make a business case for it.

8/3/2019 To Be or Not to Be a BI Appliance Embracer

http://slidepdf.com/reader/full/to-be-or-not-to-be-a-bi-appliance-embracer 3/4

I also included the simultaneous load and query tests to see how effective they werein minimizing the downtimes of these BI environments. Both players had architectedtheir systems to support querying and data loads to happen simultaneously,eliminating the traditional bottlenecks and non-availabili ty situations. So they wereable to prove that there was no degradation in performance when dealing with mixedload tasks as well.

Most projects in this customer’s environment required to plan for different

development, testing and preproduction environments during a project life cycle.Many a times creating these environments was in the project’s critical path, and eachof these environment setups needed anywhere from 3-5 business days. With the newBI appliance platform, this task could be cut down to under an hour which resulted insignificantly lowering the project costs. This was one of the key selling points for theappliance business case.

To conclude, I would strongly encourage every enterprise dealing with growing datavolumes, even as small as a terabyte, to explore the appliance options and leveragethe huge value that it can provide. BI appliances are here to stay and the sooner 

enterprises embrace them, the sooner they will be able to leverage the performancegains to deliver incredible value to their business users at a price point that does notneed them to file for Chapter 11.

 

SOURCE: To Be or Not to Be a BI Appliance Embracer 

Haranath Gnana 

Haranath Gnana is a Senior Principal at Saama Technologies with more than15 years of IT experience. He has spent more than 10 years focusing onenterprise business intelligence and data warehousing services. He hasconsulted for many clients in multiple industry verticals such as high tech, biotechnology and finance. His expertise ranges from helping define the BI roadmap and strategy for an enterprise, to its translation into an operational realityand, as such, has been instrumental in evangelizing BI at many of hisengagements. He has led several cutting-edge initiatives involving BIappliances and BI software-as-a-service (SaaS) models for enterprises. He canbe reached at [email protected].

 

Related Stories

Who Doesn’t Need a Data Warehouse?The Advantages of Data Warehouse Appliances Revisited

 

Comments

8/3/2019 To Be or Not to Be a BI Appliance Embracer

http://slidepdf.com/reader/full/to-be-or-not-to-be-a-bi-appliance-embracer 4/4

Want to post a comment? Login or become a member today!

Be the first to comment!

 

Copyright 2004 — 2011. Powell Media, LLC. All rights reserved.BeyeNETWORK™ is a trademark of Powell Media, LLC