atscale technical overview · works directly on any data platform - cloud data lake, snowflake,...

18
AtScale Technical Overview Technical Overview 2020

Upload: others

Post on 31-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

AtScale Technical Overview

Technical Overview2020

Page 2: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

TABLE OF CONTENTS

Why AtScale? ___________________________________________________ 1

AtScale System Overview _________________________________________ 2

AtScale Cloud OLAP ___________________________________________ 2

AtScale Universal Semantic Layer™ _______________________________ 3

AtScale Autonomous Data Engineering™ ___________________________ 4

AtScale Intelligent Data Virtualization ______________________________ 5

Collaborative Model Management _____________________________ 5

Integrated Source Data Discovery ______________________________ 5

Virtual Cube Catalog _______________________________________ 5

Data Virtualization ________________________________________ 5

AtScale Design Center _________________________________________ 6

AtScale Data Platform SDK _____________________________________ 10

Deployment ___________________________________________________ 11

Integrations ___________________________________________________ 13

Frequently Asked Questions _______________________________________ 14

Page 3: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 1

WHY ATSCALE?AtScale offers a modern approach to business intelligence and analytics in the cloud. AtScale’s Cloud

OLAP enables analysts to perform sub-second, multidimensional analysis with popular BI tools.

Enterprises rely on AtScale to overcome data and analytics challenges including: accelerating data-

driven decisions at scale, creating one compliant view of business metrics and definitions, controlling

the complexity and costs of analytics and reducing the risk of analytics.

AtScale helps enterprises:

▲ Seamlessly migrate to the cloud. Enterprises are avoiding business disruption and port analytical

workloads without rewriting them.

▲ Simplify the analytics infrastructure. Enterprises can use the best tool and platform for the job

without moving data or adding new data stores.

▲ Modernize and future proof the analytics stack. Enterprises are taking advantage of data lakes

and cloud data warehouses while preparing for future platforms.

▲ Secure and govern data in one place. With a live connection to all data in a virtual cube,

enterprises can stop worrying about data traveling the world on user’s laptops.

▲ Turbocharge analytics and machine learning initiatives. Enterprises are Instantly integrating new

data sources because AtScale delivers a single, super-fast, data-as-a-service API for all data.

▲ See all data in a single, unified view, no matter where it is stored or how it is formatted.

▲ Conduct interactive and multidimensional analyses using business users’ preferred BI tools,

whether that is Excel, Power BI, Tableau, or something else.

▲ Get consistent answers across departments and business units via AtScale’s Universal Semantic

Layer™ that standardizes queries regardless of BI tool or query language.

Page 4: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 2

ATSCALE SYSTEM OVERVIEWAtScale provides a single, secured and governed workspace for distributed data. The combination of

AtScale’s Cloud OLAP, Universal Semantic Layer™, Autonomous Data Engineering™ and Intelligent

Data Virtualization powers business intelligence (BI), artificial intelligence (AI) and machine learning

(ML) initiatives resulting in faster, more accurate business decisions at scale.

AtScale Cloud OLAP

AtScale provides a Cloud OLAP (or COLAP) solution that involves hosting an OLAP server on a Cloud

platform. This lets you create a virtual OLAP cube on top of Snowflake, Google BigQuery, Amazon

Redshift, Azure SQL Server, Hadoop, Oracle, Teradata and more. With COLAP, you can keep the same

great SSAS MDX power with a platform that is built for today’s data types and scalability.

A T S C A L E C L O U D O L A P I N T W O S T E P S

Page 5: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 3

AtScale Cloud OLAP:

▲ Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift,

Teradata Cloud, Oracle Cloud, and more - without the need to extract data or build a physical cube

▲ Scales with your data platform rather than requiring you to provision a separate environment for

hosting your cubes

▲ Handles any amount of data because it doesn’t require that you pre-compute every possible

combination of dimensions and measures

▲ Doesn’t require ETL as it models cubes virtually without data engineering

▲ Handles data in different data stores and locations via Intelligent Data Virtualization

▲ Can be deployed in any Linux ecosystem

AtScale’s Universal Semantic Layer

The best way to get everyone on the same page is to have everyone speaking the same language. This

ensures that there won’t be conflicting answers to the same questions. A single, centralized workspace

for business metrics and definitions is key to offering one consistent, compliant view of data to both

business users and data scientists alike.

AtScale’s Universal Semantic Layer unifies semantic definitions and metrics for data and makes it

available in one location for BI, AI and ML applications. It works on data anywhere whether it’s on-

premises or in the cloud. With AtScale’s semantic layer, you can move data “as is” to a different data

platform or environment without disrupting your end users.

How AtScale’s Universal Semantic Layer Works

The AtScale Universal Semantic Layer models data as virtual cubes using dimensions, hierarchies,

measures, attributes and calculations. It creates a business-centric view of the data using an OLAP

modeling paradigm. Using familiar modeling constructs, users can quickly source and model any data,

creating virtual views regardless of where the data lives or how it is stored.

Once a model is created, it can be published and accessed via ODBC, JDBC, MDX or REST. The AtScale

Universal Semantic Layer allows any BI tool to connect to AtScale and access models without the need

for additional data engineering or custom client installed drivers. It virtualizes both the data platforms

and the data analytics tools through centralized management and governance.

Page 6: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 4

The AtScale query service behaves like a data warehouse for SQL-speaking clients and an OLAP cube

for MDX-speaking clients. The AtScale service intercepts client queries, translates the virtualized

queries into physical queries and passes those queries onto the underlying physical data warehouse

or data lake for execution. As end users interact with the data in the virtual cube, AtScale will

automatically create or modify aggregates through its acceleration structures technology. It creates the

aggregates on the source data platform and determines the optimal location to store those aggregates

in a federated query scenario. AtScale’s automated tuning functionality works consistently regardless of

the underlying data platform (data warehouse or data lake) or location (on-premises or cloud).

AtScale Autonomous Data Engineering

Gathering live data from multiple sources across the organization can be a long, manual process.

Data engineers should be creating new value for the business rather than simply preparing and

moving data for business reporting.

Business users want self service data access and they want it yesterday. AtScale’s Autonomous Data

Engineering™ technology identifies query patterns and creates and manages intelligent aggregates,

just like the data engineering team would do. The AI-driven optimizer learns from user behavior

and data relationships and takes care of data updates and changes, so business users can focus on

gathering insights from data and data engineers can focus on other projects.

With AtScale, the moment a model is published, data access is “live”. AtScale builds aggregates in

real-time in response to user activity and automatically tunes queries without additional manual

intervention.

Page 7: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 5

AtScale Intelligent Data Virtualization

Migrating data to the cloud can come with sticker shock. The ease of spinning up new servers (and

forgetting to shut them down) is the perfect recipe for ever-escalating costs. On the cloud data platform

side, a malformed query can cost thousands of dollars if you’re on a consumption-based pricing plan

and that same query can eat up all your available resources if you’re on a fixed pricing plan.

AtScale’s Intelligent Data VirtualizationTM automates the sourcing, curation and modeling of data on

premises or in the cloud. It blends live data from multiple data sources into virtual cubes. Virtualization

makes IT more agile with the ability to store data in the most suitable platforms while providing the

flexibility to adopt new platforms in the future without re-architecting their stack or disrupting their

downstream data consumers.

AtScale’s Intelligent Data Virtualization provides access to enterprise data by functioning as an

abstraction layer on top a variety of data platforms but without manually moving data. Features include:

▲ Collaborative Model Management - A browser-based, multi-user UI that provides a unified

modeling environment, built for sharing and re-use.

▲ Integrated Source Data Discovery - Automatic metadata discovery and synchronization with your

data platforms when working with source data.

▲ Virtual Cube Catalog - A browser-based UI that allows data consumers to search and discover

published virtual cubes and request access from the data owner.

▲ Data Virtualization - First generation data virtualization was not designed for the large

analytical workloads that are typical of today’s BI and AI use cases. AtScale’s deep expertise in

multidimensional analytics along with federated query processing provides unparalleled support

for BI and AI tools alike. While first generation data virtualization lacked automated performance

management, AtScale’s autonomous data engineering accelerates queries by orders of magnitude

without manual intervention and data engineering.

AtScale’s patented security capabilities respect native data warehousing security by supporting end-

to-end user delegation and impersonation. AtScale’s object-level security supports user and group

access rules while providing discoverability for a 360-degree feedback experience with virtual cube

designers. With integrations with enterprise data catalog and governance tools, AtScale can enforce

data governance rules using AtScale’s virtualized governance layer.

Page 8: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 6

AtScale Design Center

AtScale Design Center is a browser-based interface that model developers use to create and publish

AtScale virtual cubes. Design Center is organized around the following object model:

▲ Organizations: The ‘top-level’ structure contains sets of users, groups, roles, permissions, and

more. AtScale instances will have at least one Organization. Additional Organizations can be

created and they are mutually exclusive. Nothing is shared between Organizations.

▲ Projects: A Project is a collection of one or more virtual OLAP cubes. Each project contains a shared

Library with objects like Datasets, Dimensions and Calculations that can be shared with other cubes

in that project. There is no limit to how many Projects you can create or how many objects can be

shared.

▲ Cubes: A Cube is created within a Project and can use shared objects from that Project’s Library. A

Cube is a collection of Datasets, Dimensions, Measures, Hierarchies, and Calculations along with

their Relationships that form the basis of a virtual, multidimensional view of your source data.

Models in AtScale visually look like a ‘star’ or ‘snowflake’ schema. There is no dependency for any

particular physical data structure or layout: data can be normalized or denormalized or a little bit of both.

This image shows a customer dimension built from 2 base tables specifying a hierarchy with Country/Region at the top, following with State/Province, City, and Customers. Customers have two additional dimensions attached to the Customer ID level format. There is no limit to logical model creation regardless of what the physical model is on disk.

Page 9: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 7

Along with defining dimensions and measures, you can define table relationships for many-to-one and many-to-many table joins.

There are a number of measures coming from the ‘sales_log’ table, such as Order Quantity and Sales Amount. Items that are in green text are virtual calculated columns and don’t need to physically exist in the data. The series of orange lines connecting dimensions to the fact table denote where join relationships have been defined in the model.

Page 10: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 8

Once a model is published, the connection instructions are shown for the various connection types like JDBC, ODBC and XMLA.

Connection details can be automatically downloaded for tools such as a Tableau (.TDS file). Once published, the AtScale cube can be accessed via BI/AI tool of choice or in a custom application.

Page 11: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 9

This view of a Tableau report shows the dimensions and measures on the left that have been defined in an AtScale model and automatically appear in Tableau via an AtScale generated TDS.

Page 12: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 10

A view of the same AtScale cube in Excel. The same dimensions and measures appear here as they did in Tableau but through an XMLA interface instead.

AtScale’s Universal Semantic Layer provides the same logical semantic layer regardless of the BI and/

or AI/ML tool. Users can interact with data using the same dimensions, hierarchies, and measures

defined in Design Center. With AtScale, data is delivered as a service to all data consumers without

any restrictions to share and collaborate delivered at the speed of thought with AtScale’s Acceleration

Structures.

AtScale Data Platform Software Development Kit

AtScale’s Data Platform Software Development Kit (SDK) makes it easy to extend support to new data

sources. The SDK enables developers to add support for new data platforms and include platform

specific optimizations that enhance performance and add additional capabilities. The SDK is plug and

playable, making it easy for 3rd party developers to add support without requiring changes to the core

AtScale platform.

Page 13: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 11

DEPLOYMENTAtScale micro-services install on a Linux server or virtual machine within any environment, either

on-premise or in the cloud. The AtScale instance serves both as a query endpoint for BI/AI tools and

modeling endpoint for AtScale Design Center, a browser-based design environment for creating and

managing virtual cubes.

After creating a model in AtScale Design Center and publishing it, data is available for querying from

any BI/AI tool or custom application via SQL (ODBC/JDBC) or MDX (OLE DB/ XMLA). AtScale has a

zero client-side footprint: there’s no need to install AtScale-specific drivers or applications on client

machines to query the AtScale service. With the ODBC/JDBC protocol, AtScale can accept queries

using any Hive/Thrift driver. With the MDX/XMLA protocol, AtScale adheres to the Microsoft SQL Server

Analysis Services (SSAS) XMLA standard so tools like Excel will work “as is” using existing drivers.

Page 14: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 12

The AtScale platform consists of multiple services:1. Agent - Installed on a virtualization fabric node to communicate with the virtualization service.

2. Balancer - Internal load balancers for routing traffic for High Availability (HA).

3. Coordinator - Installed at least 3 nodes for managing High Availability (HA).

4. Database - AtScale internal database for storing application and configuration data.

5. Directory - Internal LDAP directory used if an external directory is not defined.

6. Engine - AtScale query engine service.

7. Health - Health check service.

8. Ingress - Bridge service for virtualization configuration services.

9. Modeler - AtScale Designer Center service.

10. Servicecontrol - AtScale services manager.

11. Virtualization_listener - Virtualization fabric listener service.

12. Virtualization_supervisor - Virtualization fabric supervisor service.

13. Virtualization_worker - Virtualization fabric worker.

AtScale requires the following software to operate:1. Active Directory, LDAP or cloud based Identity platform

2. External Load Balancer (for HA configurations)

Page 15: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 13

INTEGRATIONSAtScale support for data platforms is more than just SQL dialect translation. AtScale understands

the semantics of the queries as well as the native platforms underlying capabilities for security,

performance and agility. As a result, AtScale can holistically evaluate many different strategies for query

optimization and autonomous data engineering.

AtScale supports the following data platforms: 1. Data Lake SQL Engines

a. Hive

b. Impala

c. Spark

2. Traditional Data Warehouses

a. Teradata

b. Oracle

c. Microsoft SQL Server

d. Postgres

3. Cloud Data Warehouses

a. Snowflake

b. Google BigQuery

c. Amazon Redshift

Page 16: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 14

FREQUENTLY ASKED QUESTIONS

What do I need to deploy AtScale? ▲ You need to configure AtScale to point to a supported data platform as listed in the Integrations

section of this document. While not required, you will also want to configure AtScale to access

your directory service (AD/LDAP) and your external load balancer for High Availability (HA)

configurations. For AtScale installation, at least one Linux server or virtual machine is required with

some basic prerequisites to install the AtScale software. For client tool access, you may need the

appropriate JDBC/ODBC drivers if they aren’t already installed. No additional driver is necessary for

Excel or tools that use the XMLA (MDX) protocol.

Is there a trial and/or open-source version of AtScale? ▲ AtScale supports a proof-of-concept trial. Please contact us to discuss your use case and/or project

needs to determine if a proof-of-concept trial would be appropriate.

How does AtScale interact with my data platform? ▲ AtScale acts as a client to your data platform(s) and will generate optimized, platform specific SQL

based on the AtScale model defined in the AtScale Design Center.

▲ Once a cube is published, it is immediately available for BI and/or AI/ML activity. There is no pre-

processing or data movement required when publishing a model. Data consumers can connect to

the AtScale engine via ODBC/JDBC or MDX and begin querying the cube.

▲ AtScale receives these incoming queries from end users acting as a server to the BI/AI tools and

custom applications. AtScale rewrites queries for execution on a data platform and leverages any

available aggregates in the acceleration structures that would be beneficial to the user’s query.

▲ Simultaneously, AtScale’s machine learning algorithms are monitoring user activity and managing

its acceleration structures to automatically optimize query performance. The acceleration

structures are aggregate tables that are created and stored in a schema on your data platform(s).

What are the options for aggregate creation in the acceleration structures? ▲ Acceleration structures may be triggered in 3 different ways.

▲ ‘Demand-based Aggregates’ are generated heuristically based on user query behavior.

Page 17: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the

© 2020 AtScale Inc. All rights reserved. 15

▲ ‘Predictive Aggregates’ are generated proactively based on model design. For example, dimensional

aggregates may be generated to facilitate fast lookups for building reports.

▲ ‘User-Defined Aggregates’ are defined by the AtScale Design Center user and are stored with the

virtual cube model. Users can specify combinations of dimensions and measures to design an

aggregate manually.

▲ In addition to these types, settings are available for adjusting behavior and thresholds for creating

demand and prediction based aggregates.

How are the acceleration structures managed and kept current? ▲ There are three methods of controlling how and when the acceleration structures are refreshed.

▲ Acceleration structures may be refreshed on a time or calendar basis using AtScale’s built-in

scheduler.

▲ Acceleration structures may be refreshed on a file trigger basis by using AtScale’s file watcher

utility. This method is often used in conjunction with an ETL pipeline to trigger a refresh upon

completion of an ETL flow.

▲ Acceleration structures may be refreshed using AtScale’s REST API. As with the file trigger option,

this method is often used in conjunction with an ETL pipeline workflow..

NOTE: Acceleration structures can be updated either incrementally or in full refresh mode. Incremental

updates allow for the appending of new or changed data whereas a full refresh will rebuild the aggregates

from scratch.

ABOUT ATSCALE

The Global 2000 relies on AtScale – the intelligent data virtualization company – to provide a single, secured and governed

workspace for distributed data. The combination of the company’s Cloud OLAP, Autonomous Data EngineeringTM and Universal

Semantic LayerTM powers business intelligence resulting in faster, more accurate business decisions at scale. For more

information, visit www.atscale.com..atscale.com.

Page 18: AtScale Technical Overview · Works directly on any data platform - Cloud Data Lake, Snowflake, Google BigQuery, AWS Redshift, Teradata Cloud, Oracle Cloud, and more - without the