demystifying elastic data warehousing: enabling high-speed ...€¦ · tdwi e-book demystifying...

11
tdwi.org Sponsored by: TDWI E-Book 1 Q&A: Elastic Data Warehousing: A Modern Solution for Data Analytics 4 Top 5 Reasons for Data Warehouse Modernization 6 The Elastic Data Warehouse: Data Warehousing Reinvented for the Cloud 10 About Snowflake Demystifying Elastic Data Warehousing: Enabling High-Speed Analytics in the Cloud APRIL 2016

Upload: others

Post on 27-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Demystifying Elastic Data Warehousing: Enabling High-Speed ...€¦ · tdwi e-book demystifying elastic data warehousing: enabling high-speed analytics in the cloud Expert Q&A Data

tdwi.org

Sponsored by:

TDWI E-Book

1 Q&A: Elastic Data Warehousing: A Modern Solution for Data Analytics

4 Top 5 Reasons for Data Warehouse Modernization

6 The Elastic Data Warehouse: Data Warehousing Reinvented for the Cloud

10 About Snowflake

Demystifying Elastic Data Warehousing: Enabling High-Speed Analytics in the Cloud

APRIL 2016

Page 2: Demystifying Elastic Data Warehousing: Enabling High-Speed ...€¦ · tdwi e-book demystifying elastic data warehousing: enabling high-speed analytics in the cloud Expert Q&A Data

1 TDWI E-BOOK DEMYSTIFYING EL ASTIC DATA WAREHOUSING: ENABLING HIGH-SPEED ANALY TICS IN THE CLOUD

Expert Q&A Data Warehouse Modernization Elastic DW for the Cloud About Snowflake

TDWI: What are the key factors and challenges driving organizations to look at new, more flexible approaches to data warehousing?

Jon Bock: The two biggest shifts that are driving the conversation are (1) the desire to give direct access to reports and analytics to a much larger number of users both inside and outside the organization and (2) the shift from a batch approach to delivering reports and analytics to a continuous, real-time flow of up-to-date insights.

Direct access to the data warehouse used to be limited to a small “inner sanctum” of people in order to avoid overtaxing the data warehouse. Giving more people access to the information in the data warehouse typically required creating new data marts or data extracts. However, the complexity of that approach—managing data movement, ensuring data consistency and quality, maintaining duplicate data and infrastructure—added significant burdens that limited how much access to data could be opened up, requiring people to look for new approaches.

Second, reports and analytics used to be datapoints we expected to be updated and used only periodically, much as newspapers used to be published only once a day, such that weekly or at best daily updates were satisfactory. Just as people expect to have access to instant news updates, data users now demand continuously updated insight—retailers want to recommend products to shoppers while they are still browsing online, marketers want to tailor and optimize advertising for audiences and individuals in seconds or less, mobile devices and sensors are reporting back information thousands

What do elastic data warehouses offer, what do they look like, and what technologies do they use? In this interview with Jon Bock, vice president of product and marketing at Snowflake Computing, we discuss the challenges, benefits, and use cases of elastic data warehouses, as well as how to evaluate the technology.

ELASTIC DATA WAREHOUSING: A MODERN SOLUTION FOR DATA ANALYTICS

Page 3: Demystifying Elastic Data Warehousing: Enabling High-Speed ...€¦ · tdwi e-book demystifying elastic data warehousing: enabling high-speed analytics in the cloud Expert Q&A Data

2 TDWI E-BOOK DEMYSTIFYING EL ASTIC DATA WAREHOUSING: ENABLING HIGH-SPEED ANALY TICS IN THE CLOUD

Expert Q&A Data Warehouse Modernization Elastic DW for the Cloud About Snowflake

or millions of times per second. To support that, it’s necessary to deliver continuous updates to data and analytics, even embedding live analytics in applications.

What are the characteristics that define this new, more flexible data warehouse—what you call an “elastic data warehouse”?

Elastic data warehousing is defined by flexibility, scalability, and agility that separate it from previous approaches. More specifically, an elastic data warehouse can:

• Scale up and down at any time without significant effort or disruption. A traditional data warehouse requires significant planning and effort to scale up and is often difficult or impossible to scale down. That limitation was acceptable when data warehousing implementations involved significant up-front planning that decided on needs for multiple years, but it is inadequate for supporting exploratory, experimental, and ad hoc analytics. An elastic data warehouse’s scalability supports rapid iteration and evolution in data usage.

• Respond to increasing demands without performance degradation. An elastic data warehouse can rapidly adapt to growth in data volume, data rate, concurrency, and query intensity without degrading performance because it can provide dedicated resources for different workloads and can acquire and use additional resources on the fly when workloads need them.

• Adapt to evolution and changes in data without requiring significant redesign or rearchitecting. An elastic data warehouse supports agile approaches, particularly in relation to changing data. Data can change and evolve in structure and content, and an elastic data warehouse can quickly adapt to any of those changes without requiring data schema redesign, data migration, or new data repositories.

What technologies can help make elastic data warehousing possible?

Elastic data warehousing requires flexibility and adaptability in both resources and data processing.

• Cloud is a key enabling technology for elastic data warehousing. Cloud infrastructure offers a near-infinite, on-demand pool of resources available programmatically to applications such as a data warehouse.

• Schema and data model flexibility, such as schema-on-query and dynamic schemas, are necessary for an elastic data warehouse to be able to adapt quickly and easily to changes in

data structure. Supporting multiple forms of data models (for example, both star and snowflake data models) also aids this flexibility.

• Adaptive optimization makes it possible for a data warehouse to take advantage of resources that might change at any time. It does this by dynamically determining optimal data distribution, parallelism, and query decomposition based on resources needed and available at the time of execution.

• Standard SQL processing is a key requirement because so many existing skills, tools, and processes in place today understand and rely on support for standard SQL.

What are some of the misconceptions about elastic data warehousing?

Among the key misconceptions, these four come to mind first:

• Any data warehouse deployed in the cloud is an elastic data warehouse. Elasticity requires an elastic infrastructure such as the cloud as well as software designed in a different way to take advantage of the elasticity of that infrastructure.

• Elastic data warehousing is about scale. Although it is easy to focus on the biggest volume of data, elastic data warehousing is primarily about adapting to any scale without added complexity or disruption.

• Elastic data warehousing is only valuable for the most demanding environments. Elastic data warehousing’s agility and flexibility can benefit a wide range of scenarios, not just the largest or most demanding applications.

• Elastic data warehousing is less secure. The association of elastic data warehousing with cloud infrastructure often leads to an assumption that elastic data warehousing requires compromises to data security. However, as demonstrated by the history of security breaches of on-premises systems, the location of data is not what ensures data security. The controls and policies put in place ensure security.

What challenges can an elastic data warehouse help to address?

Traditional approaches to data warehousing required significant up-front planning and investment as well as significant ongoing management overhead to deal with evolution in data and usage of data. Elastic data warehousing’s immediate scalability eliminates the challenges of finding a crystal ball that can predict future needs for capacity planning purposes. Its adaptability handles

Page 4: Demystifying Elastic Data Warehousing: Enabling High-Speed ...€¦ · tdwi e-book demystifying elastic data warehousing: enabling high-speed analytics in the cloud Expert Q&A Data

3 TDWI E-BOOK DEMYSTIFYING EL ASTIC DATA WAREHOUSING: ENABLING HIGH-SPEED ANALY TICS IN THE CLOUD

Expert Q&A Data Warehouse Modernization Elastic DW for the Cloud About Snowflake

evolving data formats and ingest rates, and its dynamic optimization eliminates the manual tuning and tweaking that often delays projects and consumes significant resources.

What are some examples of the benefits of elastic data warehousing?

Elastic data warehousing makes it possible to do more with data, faster and more easily, at a fraction of the cost. The elimination of data silos makes possible new insights from correlating more granular data, the agility and flexibility deliver ongoing value even as data and the use of data evolve, and the elasticity makes it possible to reduce costs by paying for only the resources needed and only when they are needed.

What are examples of use cases that are a good fit for elastic data warehousing?

Data exploration is a perfect example of a use case ideally suited to elastic data warehousing. The resource needs are generally not known in advance precisely because exploration may lead in many directions, making elastic scalability extremely valuable.

Ad hoc analytics is another example well suited to elastic data warehousing—the flexibility and adaptability of an elastic data warehouse allow an analyst to query the data in multiple ways, and elastic scalability makes it possible to do that without slowing down other reporting and batch workloads.

Monitoring and event-driven analytic applications are a further example of a great use case for elastic data warehousing. These applications need to incorporate new data and then update reports and dashboards on a continuous basis. Much of this data is application-generated semistructured data, which requires a data warehouse that can flexibly adapt to handle it. The need to continuously ingest and process data also requires an elastic data warehouse in order to keep up with variations and spikes in data flow.

What considerations go into planning to deploy an elastic data warehouse?

As a first step, elastic data warehousing can be approached as a change in infrastructure, deployed without major changes to existing tools and processes. The biggest consideration when planning this step is determining how to route data to the cloud, where the elastic data warehouse typically resides. Data originating in external sources can often be readily redirected to a cloud target. However data from internal sources may require additional planning in order to transfer it to the cloud.

Realizing the full benefits of elastic data warehousing requires going beyond this initial step by adapting and evolving processes and data flows to take advantage of the cloud. This includes changing batch data ingestion pipelines into near-continuous streams of data updates, adapting data transformation flows to leverage the flexibility of an elastic data warehouse by removing or simplifying data processing steps, and designing approaches for resource allocation that take advantage of the online scaling possible in an elastic data warehouse.

What are the key criteria in evaluating a solution for elastic data warehousing?

Choosing the right solution for an elastic data warehouse requires considering the following:

• Does it provide truly elastic scaling? Not all data warehouse offerings are able to scale both up and down to support changes in data, workloads, and concurrency, let alone do so online without disruption—all of which are key capabilities of an elastic data warehouse. Features such as separating compute from storage and providing a means to scale concurrency in a single system are examples of capabilities that support that.

• How flexible is it with regard to data? There are many ways traditional data warehousing can handle nontraditional data, most of which require trade-offs of performance against flexibility—typically data is either transformed into a fixed relational structure before loading, which sacrifices flexibility, or stored as a complex object in the database, which sacrifices performance. An elastic data warehouse can support diverse data by supporting schema flexibility but also provides optimizations that avoid performance penalties.

• Does it work with current tools? To avoid significant rework, it is important to assess any new solution’s support for tools currently in use. Support for standard SQL is a minimum criterion for reducing migration pains, but for many tools, deeper integrations are important to provide the optimizations and ease of use necessary to ensure successful implementation.

• How much management is required? To quickly and easily scale, adapt, and evolve, an elastic data warehouse should require minimal management, automating or eliminating time-consuming manual configuration, tuning, and monitoring.

Page 5: Demystifying Elastic Data Warehousing: Enabling High-Speed ...€¦ · tdwi e-book demystifying elastic data warehousing: enabling high-speed analytics in the cloud Expert Q&A Data

4 TDWI E-BOOK DEMYSTIFYING EL ASTIC DATA WAREHOUSING: ENABLING HIGH-SPEED ANALY TICS IN THE CLOUD

Expert Q&A Data Warehouse Modernization Elastic DW for the Cloud About Snowflake

Many paths lead to the improvements users need for analytics, big data, real-time speed, productivity, and costs.

In recent surveys by TDWI Research, roughly half of respondents report that they will replace their primary data warehouse (DW) platform and/or analytic tools within three years. Ripping out and replacing a DW or analytics platform is expensive for IT budgets and intrusive for business users. This raises the question: What circumstances would lead so many people down such a dramatic path?

It’s because many organizations need a more modern DW platform to address a number of new and future business and technology requirements. In a nutshell, organizations that seek to modernize their data warehouse environment do so to improve advanced analytics, scale, speed, productivity, or economics. Each of the five reasons listed here has multiple meanings, they are all interrelated, and users sort the five into varying priority orders, based on their needs.

Even so, in general, the list constitutes the top five reasons for data warehouse modernization and they can provide some guidance for users facing modernization.

• Advanced analytics. Many organizations have invested heavily in reporting and OLAP, but now they need to invest in advanced forms of analytics to leverage big data, find new customer segments, and stay competitive.

TOP 5 REASONS FOR DATA WAREHOUSE MODERNIZATIONBy Philip Russom

Page 6: Demystifying Elastic Data Warehousing: Enabling High-Speed ...€¦ · tdwi e-book demystifying elastic data warehousing: enabling high-speed analytics in the cloud Expert Q&A Data

5 TDWI E-BOOK DEMYSTIFYING EL ASTIC DATA WAREHOUSING: ENABLING HIGH-SPEED ANALY TICS IN THE CLOUD

Expert Q&A Data Warehouse Modernization Elastic DW for the Cloud About Snowflake

• Speed. Organizations likewise need the data warehouse and related systems to operate faster because speed contributes to scale, supports agile development and discovery analytics, and brings analytics closer to real-time business operations.

• Scale. This continues to be an issue with big data and other burgeoning enterprise datasets, as well as with growing numbers of concurrent users, reports, analyses, and data structures.

• Productivity. Traditional requirements gathering, prototyping, and development takes months, which is too long for a modern business. That’s why agile development methods are now the norm in DW/BI and analytics. Likewise, users are adopting agile tool types, including those for data exploration and discovery, data profiling, and data visualization.

• Costs. The good news that modernization is not only a chance to increase speeds and feeds in your data warehouse environment, but it is also a golden opportunity to rethink DW overall costs, as users seek to save money in some areas (storage, CPUs, upgrades, admin) so they can invest in others (new data platforms, analytics tools, and developing new solutions).

Achieving the top five goals of DW modernization demands the acquisition of new data platforms and tool types, usually columnar databases, DW appliances, NoSQL databases, Hadoop, and so on. For example, according to a recent TDWI survey about Hadoop, only 16 percent of respondents report having the Hadoop Distributed File System (HDFS) in production today, while a whopping 67 percent expect to deploy HDFS within three years.1

Hence, the result of some DW modernizations is a multiplatform DW environment. The benefit is that users can choose the best platform for a given data workload or analytic goal, plus off-load certain workloads from the data warehouse. The challenge is to establish and maintain a broad data warehouse architecture that unifies data and its processing, despite being spread across multiple platforms.

For more information about trends in data warehouse modernization, read the 2014 TDWI Best Practices Report: Evolving Data Warehouse Architectures in the Age of Big Data, available for download at tdwi.org/bpreports.

Other articles of interest include the 2013 TDWI Checklist Report: The Modern Data Warehouse and the 2013 TDWI Checklist Report: Hadoop Best Practices for Data Warehousing, Data Integration, and Analytics.

1 From the 2015 TDWI Best Practices Report: Hadoop for the Enterprise, online at www.tdwi.org/bpreports.

Page 7: Demystifying Elastic Data Warehousing: Enabling High-Speed ...€¦ · tdwi e-book demystifying elastic data warehousing: enabling high-speed analytics in the cloud Expert Q&A Data

6 TDWI E-BOOK DEMYSTIFYING EL ASTIC DATA WAREHOUSING: ENABLING HIGH-SPEED ANALY TICS IN THE CLOUD

THE ELASTIC DATA WAREHOUSE: DATA WAREHOUSING REINVENTED FOR THE CLOUD

Expert Q&A Data Warehouse Modernization Elastic DW for the Cloud About Snowflake

The elastic data warehouse is a fundamentally new approach to scaling, managing, and paying for a data warehouse system.

For enterprises looking to move from on-premises solutions to the cloud, the elastic data warehouse offers a new way to merge the power of the cloud with the needs of a 21st-century enterprise. An elastic data warehouse achieves an authentic translation of data warehouse architecture into the software-as-a-service (SaaS) cloud paradigm. An elastic data warehouse provides the same consistency and performance guarantees as an on-premises data warehouse system—even a highly scalable massively parallel processing (MPP) data warehouse platform.

On the other hand, it has advantages that an on-premises system doesn’t. The first and most obvious of these in its name: elasticity. In the elastic data warehousing paradigm, an organization can scale capacity up or down as needed. Storage and/or compute capacity is added or subtracted on demand—with negligible impact on performance and operations. The second advantage is risk mitigation. In the elastic data warehouse paradigm, it’s fast and cheap to get started. Because there’s no huge up-front investment in hardware, it’s also much easier to stop should it become necessary to do so. A third advantage is time to value. “Implementing” the elastic data warehouse doesn’t require months of research and institutional maneuvering. It doesn’t entail sizing and spec’ing system configurations, securing approval and funding, or waiting for and taking delivery of hardware. Nor does it involve time-consuming system configuration and ongoing administration because with software-as-a-service, there’s nothing to implement. You just login and go.

Page 8: Demystifying Elastic Data Warehousing: Enabling High-Speed ...€¦ · tdwi e-book demystifying elastic data warehousing: enabling high-speed analytics in the cloud Expert Q&A Data

7 TDWI E-BOOK DEMYSTIFYING EL ASTIC DATA WAREHOUSING: ENABLING HIGH-SPEED ANALY TICS IN THE CLOUD

Expert Q&A Data Warehouse Modernization Elastic DW for the Cloud About Snowflake

“It’s a new approach to data warehousing architecture that allows you to scale your storage and your compute elastically. That means up and down on demand. If it’s done well, you have the opportunity to bring in different kinds of data in addition to the traditional structured data,” says Kent Graziano, a senior technical evangelist with SaaS data warehousing specialist Snowflake Computing Inc.

In Snowflake’s vision of the elastic data warehouse, Graziano argues, there’s a fourth advantage: “The elastic data warehouse actually offers rich support for semistructured data. In Snowflake, we can bring things with flexible schema, such as JSON, into storage as well.”

Graziano’s just getting started. He cites a fifth advantage—you can use your in-house tools, methods, and skills to build out the elastic data warehouse—and a sixth: speed. The elastic data warehouse paradigm is simply faster, he says: faster and easier to provision and manage; faster and easier to scale up or scale down; and, arguably, a faster query-processing platform to boot.

The difference, so to speak, is in the bits: the bespoke bits of the elastic data warehouse are new and cloud-ready. Those of virtually every other cloud data warehouse service are not.

Designing for the CloudIt is difficult to take an RDBMS that was designed for use in an on-premises environment and transplant it in the cloud. The problem arises not so much in getting said RDBMS to run—the OLTP RDBMS is, after all, a mainstay of the PaaS cloud—but, rather, in getting it to scale, particularly for analytical use cases, where high levels of concurrency are common. This is true even of the MPP data warehouse, which, by virtue of its distributed architecture, might seem like a good fit for the highly virtualized cloud.

The problem is that almost all PaaS data warehouse offerings are based on RDMBS code first developed for on-premises use. These RDBMSs presuppose a 1:1 mapping between an instance of the database and the physical hardware on which it’s running.

This 1:1 mapping does not exist in the cloud, where the abstraction of physical resources is a feature, not a bug. In the PaaS cloud, virtualized database instances share the same physical compute, storage, and network resources. This isn’t necessarily a huge problem for transaction-oriented database workloads. For analytical workloads, which tend to involve complex queries and/or computationally intensive joins, it’s a critical issue. There’s also the MPP issue.

In practice, on-premises MPP database configurations must be carefully tweaked and balanced. DBAs will take great care to tune performance—for example, by controlling for and minimizing the skew effect that’s a product of distributing a database system across multiple nodes. (In most cases, DBAs will focus on tweaking data distribution across an MPP cluster precisely in order to minimize data skew.) In an MPP cloud configuration using virtualized resources, skew could pose significant problems, chiefly because storage is virtualized and non-local. The upshot is that MPP database architecture, by itself, only partially addresses the challenge of scaling analytical workloads in the cloud.

The basic metaphor of the cloud model is elasticity—the ability to expand or contract as needed—but PaaS data warehouses are comparatively inelastic. Scaling them up or scaling them down isn’t a function of flicking a proverbial switch or turning a no less proverbial knob. In the case of one PaaS MPP service, in fact, it currently involves at least an intermediate step: migrating a smaller MPP configuration to a larger one.

In other words, Graziano stresses, a database doesn’t magically become elastic (or inherit elasticity) simply by moving it to a cloud or PaaS context. It must instead be designed to exploit the elastic characteristics of the cloud model. Everything from the database logic that controls for structure and placement optimization to the optimizer that tunes queries must be designed with the advantages—and constraints—of the cloud model in mind. “You need to have an advanced optimization engine for tuning the query so that you’re not manually tuning. If the data is going to be distributed across the cloud architecture, you want it distributed transparently,” Graziano argues.

“You also need a dynamic optimization engine, one that’s going to adjust to changes—in data, in workloads, in resources—automatically, unlike the traditional ones, where you pretty much have to know the details about your data to get the best performance out of any optimization engine.”

Minimal Disruption, Huge BenefitsFrom a data management perspective, the elastic data warehouse is both a disruptive and a non-disruptive paradigm. With respect to disruption, the elastic data warehouse eliminates all of the time-consuming tasks associated with sizing, balancing, and tuning an on-premises data warehouse system. “You no longer have to worry about things like ‘How am I going to stripe two petabytes of data across storage hardware?’ The cloud manages that for you,” Graziano points out.

Page 9: Demystifying Elastic Data Warehousing: Enabling High-Speed ...€¦ · tdwi e-book demystifying elastic data warehousing: enabling high-speed analytics in the cloud Expert Q&A Data

8 TDWI E-BOOK DEMYSTIFYING EL ASTIC DATA WAREHOUSING: ENABLING HIGH-SPEED ANALY TICS IN THE CLOUD

Expert Q&A Data Warehouse Modernization Elastic DW for the Cloud About Snowflake

An elastic data warehouse is also disruptive because it fundamentally changes your costing model. It eliminates up-front expenditures on hardware, software, and, in some cases, services. It eliminates annual support or maintenance costs. It eliminates the need for periodic system upgrades. Finally, it eliminates the costs of unplanned upgrades, as when the growth of or demands on a system outstrip its capacity. In this sense, the elastic data warehouse transforms the practice of capacity planning, too. Planning for “upgrades” can be eliminated.

In spite of these changes, almost all of which are beneficial, the elastic data warehouse paradigm is nondisruptive in several important ways. For one thing, it doesn’t require you to change how you go about using your data warehouse system. Because the elastic data warehouse is a translation of data warehouse architecture into the cloud, you can use the same concepts, tools, and methods to build out an elastic data warehouse as you would an on-premises system. These include data modeling and ETL tools, source-to-target mapping tools, prototyping tools, and even data warehouse automation (DWA) software.

“You should be able to take whatever you have in your on-premises system and put it in the cloud and it’s going to perform better. If you’re used to doing dimensional models, you can do those. Third-normal form? You can do that. Data vault modeling? Yes,” says Graziano, who is himself a certified data vault modeler and practitioner.

The most disruptive impact of elastic data warehousing is that it involves a change in thinking, Graziano argues. Because the elastic data warehouse eliminates the inflexibility of the on-premises data warehouse, data management practitioners must examine the basic assumptions and ingrained behaviors that used to determine how, why, and what they did in the former paradigm.

“It’s primarily a change in the way you think about what you’re doing, because you have options that you didn’t have before. If you have a regular process that needs to run faster on a particular day for some reason, you need to be mindful of the fact that you can scale that process up. More generally, you can turn a knob and apply more or less resources to the same [workloads] you’ve been running, which is something that you can’t do in a traditional system,” Graziano says.

“In traditional data warehousing, people build out a lot of data marts and star-schema dimensional models, and a lot of times they do this for performance. In elastic data warehousing, we can handle

any kind of model, whether it’s relational, star schema, or big, wide, and flat: any kind of design you want. You don’t have to build an extra physical layer in order to get performance. The performance is more determined by the resources you apply than any specific physical modeling technique.”

Data ElasticityThe elastic data warehouse model is at least as tolerant of semi-structured and poly-structured data as its on-premises kith. What’s more, because it not only “lives” in the cloud but was born in the cloud, the elastic data warehouse is ideally positioned as a platform for ingesting, managing, and analyzing semistructured and poly-structured data types, too.

One of the backbone data-exchange mechanisms of the cloud is JSON, or JavaScript Object Notation. True, most cloud data warehouse services provide some level of in-database support for JSON objects, as well as support for other semistructured data types, such as XML. Bear in mind, however, that most of these services are powered by relational database systems that were first developed—decades ago—for use in on-premises environments. These designs presupposed the existence of persistent, discernable schema, so support for the flexible schema of something like JSON was an add-on in later years.

An elastic data warehouse, by contrast, is designed from the ground up with the cloud and data from the cloud in mind. This means it supports a wide variety of data types, along with several different kinds of data connectivity and exchange mechanisms. Think of this as what might be called “data elasticity.”

Most extant PaaS data warehouse services fail this test, Graziano contends.

“Within Snowflake, we have a new data type called ‘VARIANT’ which is effectively doing what everybody says they’re doing: taking that JSON document and putting it into a column in a relational table. In our dynamic optimization engine, as we’re loading the data in and sharding it behind the scenes, we’re not going to shard it out into columns and dynamically build a table, but we’re going to record it so that the optimization engine, at the metadata layer, knows what the structure is. It knows the highs and lows, knows the midpoints, and treats it just as if it’s columnar data, just as it would with relational,” he explains. “If there are 25 elements packed inside that JSON object, with our engine, they’ll look like columns. If somebody’s using SQL queries, it will pull [the JSON object] apart and join it with relational data as if you’re joining two relational tables at a column level.”

Page 10: Demystifying Elastic Data Warehousing: Enabling High-Speed ...€¦ · tdwi e-book demystifying elastic data warehousing: enabling high-speed analytics in the cloud Expert Q&A Data

9 TDWI E-BOOK DEMYSTIFYING EL ASTIC DATA WAREHOUSING: ENABLING HIGH-SPEED ANALY TICS IN THE CLOUD

Expert Q&A Data Warehouse Modernization Elastic DW for the Cloud About Snowflake

The ideal elastic data warehouse should permit schema-on-read flexibility for semistructured data—even as it enforces a predefined schema for traditional, strictly structured data types. In this way, such a system is “elastic” with respect to how it stores, structures, and manages data, too.

“If you’re able to natively ingest JSON documents as well as changes to that JSON document structure, you’ve eliminated that whole aspect of [data] preparation. You don’t need to come up with a schema and make changes to that schema before you can load the data, and your existing queries won’t break,” he says. “One of the nice things with the SQL extension [that Snowflake implements] is that anything going against the JSON data will continue to work, even if the structure changes. If new elements are added, that won’t break existing queries. Yes, you’ll have to discover those elements and add them into the query, but you’ll be able to do that without having to analyze [the JSON document], break it down, and create new storage structures first.”

Getting StartedThe signature strength of the cloud model is that it permits a subscriber to start small by spinning up an account and loading in some data. The elastic data warehouse (DWaaS) works along the same lines.

Because it’s a translation of data warehouse architecture to the cloud paradigm, you must first populate it with data and start building warehouse structures—e.g., dimensional models—in order to use it. For this reason, it makes sense to start with simple, clearly identified use cases.

“Come up with a solid use case of a problem that you have in your on-premises legacy data warehouse environment. Come up with a use case that you can get your arms around and do a proof of concept, do the research, find the right vendor to work with, but have a solid use case in place that is finite so that you can get comfortable with its options and advantages,” Graziano says.

“The elastic data warehouse is the kind of thing that sells itself. Like any project, you need to have goals that you’re going to try to achieve that you can measure, so you can provide a clear return on investment. What usually happens is people come in with a problem that’s giving them fits in their current environment. The process of getting it up and running [in the cloud] is so simple and the performance is so good that, yes, it really does sell itself.”

Page 11: Demystifying Elastic Data Warehousing: Enabling High-Speed ...€¦ · tdwi e-book demystifying elastic data warehousing: enabling high-speed analytics in the cloud Expert Q&A Data

10 TDWI E-BOOK DEMYSTIFYING EL ASTIC DATA WAREHOUSING: ENABLING HIGH-SPEED ANALY TICS IN THE CLOUD

Expert Q&A Data Warehouse Modernization Elastic DW for the Cloud About Snowflake

snowflake.net

Snowflake Computing, the cloud data warehousing company, has reinvented the data warehouse for the cloud and today’s data. The Snowflake Elastic Data Warehouse is built from the cloud up with a patent-pending new architecture that delivers the power of data warehousing, the flexibility of big data platforms, and the elasticity of the cloud—at a fraction of the cost of traditional solutions. The company is backed by leading investors including Altimeter Capital, Redpoint Ventures, Sutter Hill Ventures and Wing Ventures. Snowflake is headquartered in Silicon Valley and can be found online at snowflake.net.

To help get you started using Snowflake’s Elastic Data Warehouse, we’ll give you 200 free credits. Sound interesting?

tdwi.org

TDWI is your source for in-depth education and research on all things data. For 20 years, TDWI has been helping data professionals get smarter so the companies they work for can innovate and grow faster.

TDWI provides individuals and teams with a comprehensive portfolio of business and technical education and research to acquire the knowledge and skills they need, when and where they need them. The in-depth, best-practices-based information TDWI offers can be quickly applied to develop world-class talent across your organization’s business and IT functions to enhance analytical, data-driven decision making and performance.

TDWI advances the art and science of realizing business value from data by providing an objective forum where industry experts, solution providers, and practitioners can explore and enhance data competencies, practices, and technologies.

TDWI offers five major conferences, topical seminars, onsite education, a worldwide membership program, business intelligence certification, live webinars, resourceful publications, industry news, an in-depth research program, and a comprehensive website: tdwi.org.

© 2016 by TDWI, a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. Email requests or feedback to [email protected].

Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies.