azure data warehouse as a service(dwaas)files.meetup.com/18594014/nova dwaas feb 2016 final.pdf ·...
TRANSCRIPT
AzureData Warehouse As A Service(DWaaS)SQL Server 2016
Northern Virginia SQL Users Group Meeting - February 2016
Azure DWaaS Agenda
• Microsoft’s Data Platform: The Big Picture
• DWaaS FAQs
• Key DWaaS Topics
• Business Continuity
• Code Tips and Tools
• Demo
• Future
Northern Virginia SQL Users Group Meeting - February 2016
The BIG PICTUREData Warehouse as a Service
Northern Virginia SQL Users Group Meeting - February 2016
Gartner’s Magic Quadrant
Northern Virginia SQL Users Group Meeting - February 2016
Northern Virginia SQL Users Group Meeting - February 2016
Traditional Data Warehouse
Northern Virginia SQL Users Group Meeting - February 2016
The Cortana Analytics Suite
Northern Virginia SQL Users Group Meeting - February 2016
Azure Data Factory
Data Factory works across on-premises and cloud data sources and SaaS to ingest, prepare, transform, analyze, and publish your data. Use Data Factory to compose services into managed data flow pipelines to transform your data.
Q&A
Northern Virginia SQL Users Group Meeting - February 2016
Azure Machine Learning
Azure Machine Learning was designed for applied machine learning to your organization. Use best-in-class algorithms and a simple drag-and-drop interface—and go from idea to deployment in a matter of clicks.
Q&A
Northern Virginia SQL Users Group Meeting - February 2016
Azure Stream Analytics
Azure Stream Analytics (ASA) is a fully managed, cost effective real-time event processing engine that helps unlock deep insights from data.
Stream Analytics makes it easy to set up real-time analytic computations on data streaming from devices, sensors, web sites, social media, applications, infrastructure systems, and more.
Q&A
Northern Virginia SQL Users Group Meeting - February 2016
Azure Event Hubs
Azure Event Hubs is an event processing service that provides event and telemetry ingress to the cloud at massive scale, with low latency and high reliability.
Event Hubs provides a message stream handling capability and though an Event hub is an entity similar to queues and topics, it has characteristics that are very different from traditional enterprise messaging.
Q&A
Northern Virginia SQL Users Group Meeting - February 2016
Are you using them?
File Storage
- 10 GB Files
Power BI
- Free to test
- Mobile App
- Gateways
64 bit Power
- Unlimited RAM
Big Data Ready
- DAX
- Power Query
Language (M)
Northern Virginia SQL Users Group Meeting - February 2016
Big Data “Proven” Solutions now!!
X-Velocity Differences
• Excel In-Memory Only- Integration of a modified SSAS
engine (Vertipaq/X-Velocity) into Excel- PowerPivot- PowerView- Sharepoint
- 10-15x Compression- Supports Tabular or
Multidimensional(Excel 2016)- Scan rates of 100s of millions of
rows per second
• SQL Server In-Memory & Columnstore
- Vertipaq/X-Velocity(SSAS)- In-Memory Columnstore/X-
Velocity(Memory Optimized Columnstore Engine)
- In-Memory OLTP(Operational Analytics)
- Scan rates of 100s of billions of rows per second.
Northern Virginia SQL Users Group Meeting - February 2016
Details to Plan Data Platform Strategy…
• Excel 2013/2016 Limits*• https://support.office.com/en-us/article/Excel-specifications-and-limits-CA36E2DC-1F09-4620-B726-67C00B05040F
• Access 2016 Limits*• https://support.office.com/en-us/article/Access-2016-specifications-0cf3c66f-9cf2-4e32-9568-98c1025bb47c
• SQL Server 2014 Limits*• https://msdn.microsoft.com/en-us/library/ms143432(v=sql.120).aspx
• SQL Azure Transact SQL Differences • https://azure.microsoft.com/en-us/documentation/articles/sql-database-transact-sql-information/
* Older version of the products have lower limits than current versions!
Microsoft Confidential 14
Micron/Intel Storage Breakthrough in 2015
Video short on Intel’s 3D XPOINT:
https://www.youtube.com/watch?v=IWsjbqbkqh8
Northern Virginia SQL Users Group Meeting - February 2016
Hardcore Big Data Workstations
HP Z840 Workstation
• 2 TB of RAM(DDR4)
• 12 GB Nvidia Quadro GPU
• 18 Core XEON Processor
• DreamColor Engine Displays
• Connected with the Verizon Quantum Fios (500/500mbps)
• $50-75,000 Price Tag
Microsoft Confidential 16
Portable Big Data Power
• Surface Book• 6th Generation Intel Core i7
• 16 GB of RAM
• 1TB SSD
• PixelSense Display
• 1GB Nvidia GeoForce GPU
• 2-n-1
• $899-3500 Price Tag
https://www.youtube.com/watch?v=XVfOe5mFbAE
Northern Virginia SQL Users Group Meeting - February 2016
How DWaaS works
Northern Virginia SQL Users Group Meeting - February 2016
DWaaS FAQsData Warehouse as a Service
Northern Virginia SQL Users Group Meeting - February 2016
Disclaimer
Microsoft’s Data Warehouse as a Service (DWaaS) is Preview!!
Service details may change!!
Northern Virginia SQL Users Group Meeting - February 2016
FAQ's: Where is the Service now?
Datacenters
• US• Central, East, East2, North Central, South Central, and West.
• Europe• North Europe and West Europe
• Asia• East Asia, Southeast Asia, Australia East, Australia Southeast, Japan East,
Japan West
• LATAM• Brazil South
Northern Virginia SQL Users Group Meeting - February 2016
FAQ's: What's a DWU?
• SQL DWAAS -> DWU• Data Warehouse Units
• Query Transactions per second.
• Scan/Aggregation
• Loading
• Create Table As Select(CTAS) (Speed to create tables)
• Q&A
• SQL Azure -> DTU• Database Transaction Units
• Database Transactions per second.
• SQL Azure -> eDTU
• Elastic Database Transaction Units
Northern Virginia SQL Users Group Meeting - February 2016
FAQ's: What is CTAS?
• Creates a new table that is populated with the results from a SELECT statement.
• Q&A
Northern Virginia SQL Users Group Meeting - February 2016
FAQ's: How does the usage of DWaaS show up on my bill? • Your bill will be comprised of a predictable monthly fee for storage,
plus a variable fee for compute.
• Compute can vary by month, depending on quantity and duration of compute resources (DWUs) consumed.
• Storage and compute are broken out separately on your bill to make it easier to see the level of compute resources consumed.
• Q&A
Northern Virginia SQL Users Group Meeting - February 2016
FAQ's: What if my data warehouse is active for less than an hour with DWUs? • DWUs are charged on an hourly basis as long as your database is
active (i.e., not paused).
• If DWUs are active for less than an hour, the bill will reflect a charge for a full hour.
• Q&A
Northern Virginia SQL Users Group Meeting - February 2016
FAQ's: What if I am using different level of DWUs within the same hour? • DWUs will be charged on an hourly basis at the highest level of
performance that applied during the hour.
• Q&A
Northern Virginia SQL Users Group Meeting - February 2016
FAQ's: What is the pause functionality?
• Unique to SQL Data Warehouse is the ability to pause and resume compute on demand.
• Pause and resume of your compute power can be done through the Azure Portal, via REST APIs or through Powershell.
• Pausing cancels all running or queued activities and when you return you can resume your compute resources in seconds.
• Q&A
Northern Virginia SQL Users Group Meeting - February 2016
FAQ's: Can I use my regular storage account with DWAAS? • If you have a regular Azure Storage Account (either Page Blob or Block
Blob), you will need to import it to SQL DW using Polybase.
• Supported file formats include delimited text, Hive RCFile, or Hive ORC.
• Q&A
Northern Virginia SQL Users Group Meeting - February 2016
FAQ’s: What are the limits of my Azure Account?• Each Azure account can have 100 storage containers.
• Each container can hold 50TB.
• If your doing Big Data, enlist EA Administrator and Microsoft to structure your subscription(s) correctly for your workloads.
• Q&A
Northern Virginia SQL Users Group Meeting - February 2016
FAQ’s: What Global Security Certifications does Azure have?
Northern Virginia SQL Users Group Meeting - February 2016
DWaaS Key TopicsData Warehouse as a Service
Northern Virginia SQL Users Group Meeting - February 2016
DWAAS: Concurrency Model
• Concurrency Queries• Concurrent queries equates to the
number of queries executing at the same time.
• Each query execution counts as a single query regardless of whether it is a serial query (single threaded) or parallel query (multi-threaded).
• 80-128 Concurrency Slots is max per DW
• The exact number of slots depends on three factors:• The DWU setting for the SQL Data
Warehouse• The resource class that the user
belongs to• Whether the query or operation is
governed by the concurrency slot model
• Some queries and operations do not consume concurrency slots at all.
Northern Virginia SQL Users Group Meeting - February 2016
DWAAS: Concurrency Model Limits
Concurrency Slot Consumption
DW100 DW200 DW300 DW400 DW500 DW600 DW1000 DW1200 DW1500 DW2000
Max Concurrent Queries
32 32 32 32 32 32 32 32 32 32
Max Concurrency Slots
4 8 12 16 20 24 40 48 60 80
Northern Virginia SQL Users Group Meeting - February 2016
DWAAS: Concurrency Model Resource Classes
Concurrency Slot Consumption
DW100 DW200 DW300 DW400 DW500 DW600 DW1000 DW1200 DW1500 DW2000
SmallRC(default)
1 1 1 1 1 1 1 1 1 1
MediumRC 1 2 2 4 4 4 8 8 8 16
LargeRC 2 4 4 8 8 8 16 16 16 32
XLargeRC 4 8 8 16 16 16 32 32 32 64
Northern Virginia SQL Users Group Meeting - February 2016
DWAAS: Concurrency Model Memory Allocation
Memory Available(Per Dist)
DW100 DW200 DW300 DW400 DW500 DW600 DW1000 DW1200 DW1500 DW2000
SmallRC(default)
100MB 100MB 100MB 100MB 100MB 100MB 100MB 100MB 100MB 100MB
MediumRC 100MB 200MB 200MB 400MB 400MB 400MB 800MB 800MB 800MB 1600MB
LargeRC 200MB 400MB 400MB 800MB 800MB 800MB 1600MB 1600MB 1600MB 3200MB
XLargeRC 400MB 800MB 800MB 1600MB 1600MB 1600MB 3200MB 3200MB 3200MB 6400MB
Northern Virginia SQL Users Group Meeting - February 2016
DWAAS: Workload Management
Northern Virginia SQL Users Group Meeting - February 2016
Workload Group Concurrency Slot Mapping Priority Mapping
SloDWGroupC00 1 Medium
SloDWGroupC01 2 Medium
SloDWGroupC02 4 Medium
SloDWGroupC03 8 Medium
SloDWGroupC04 16 High
SloDWGroupC05 32 High
SloDWGroupC06 64 High
SloDWGroupC07 128 High
DWAAS: Tables Design, datatypes supported
• bigint• binary
• bit• char• date
• datetime• datetime2• datetimeoffset
• decimal
• float• int
• money• nchar
• nvarchar• real• smalldatetime
• smallint• smallmoney• time
• tinyint
• varbinary• varchar
Northern Virginia SQL Users Group Meeting - February 2016
DWAAS: Tables Design, datatype alternatives
• geometry, use a varbinary type• geography, use a varbinary type• hierarchyid, CLR type not native
• image, text, ntext when text based use varchar/nvarchar (smaller the better)
• nvarchar(max), use nvarchar(4000) or smaller for better performance
• numeric, use decimal• sql_variant, split column into
several strongly typed columns
• sysname, use nvarchar(128)
• table, convert to temporary tables
• timestamp, re-work code to use datetime2 and CURRENT_TIMESTAMP function.
• varchar(max), use varchar(8000) or smaller for better performance
• uniqueidentifier, use varbinary(8)
• user defined types, convert back to their native types where possible
• xml, use a varchar(8000) or smaller for better performance - split across columns if needed
Northern Virginia SQL Users Group Meeting - February 2016
DWAAS: Hash Distribution
• Smart hash distribution decisions improve query performance!
• There are three factors:• Minimize Data Movement
• Avoid Data Skew
• Provide Balanced Execution
• In DWAAS data type consistency matters!
• Make sure that the existing schema is consistently using the same type for a column.
• If the distribution key data types are not synchronized and the tables are joined then needless data movement will occur. This could be costly
Northern Virginia SQL Users Group Meeting - February 2016
DWAAS: Hash Distribution Table Recommendations• Is a static value since you cannot update the hash column.
• Is used in JOIN, GROUP BY, DISTINCT, or HAVING clauses in your queries.
• Is not used in WHERE clauses
• Has lots of different values, at least 1000.
• Does not have a disproportionately large number of rows that will hash to a small number of distributions.
• Is defined as NOT NULL. NULL rows will congregate in one distribution.
Northern Virginia SQL Users Group Meeting - February 2016
DWAAS: Hash Distribution
• The hash function is deterministic. The same value is always assigned to the same distribution.
• One column is used as the distribution column. The hash function uses the nominated column to compute the row assignments to distributions.
• The hash function is based on the type of the column not on the values themselves
• Hash distributing a table can sometimes result in a skewed table• Hash distributed tables generally require less data movement when
resolving queries, and therefore improve query performance for large fact tables.
• Observe the recommendations for hash distributed column selection to enhance query throughput.
Northern Virginia SQL Users Group Meeting - February 2016
DWAAS: Statistics
• Creating and updating statistics is important in order to achieve the high query performance:• Create statistics as part of
database design
• Update statistics as part of database maintenance
• View statistics with system views and functions
Northern Virginia SQL Users Group Meeting - February 2016
• Q&A
DWAAS: Statistics Recommendations
• Ensure that each loaded table has at least one statistics object updated. This updates the tables size (row count and page count) information as part of the stats update.
• Focus on columns participating in JOIN, GROUP BY, ORDER BY and DISTINCT clauses
• Consider updating "ascending key" columns such as transaction dates more frequently as these values will not be included in the statistics histogram.
• Consider updating static distribution columns less frequently.
• Remember each statistic object is updated in series. Simply implementing UPDATE STATISTICS <TABLE_NAME> may not be ideal - especially for wide tables with lots of statistics objects.
Northern Virginia SQL Users Group Meeting - February 2016
DWaaS Business ContinuityData Warehouse as a Service
Northern Virginia SQL Users Group Meeting - February 2016
Business Continuity: Data Redundancy
• Since SQL Data Warehouse separates compute and storage, all your data is directly written to geo-redundant Azure Storage (RA-GRS).
• Fully managed solutions have “hidden concerns” we should talk about…
• Q&A
Northern Virginia SQL Users Group Meeting - February 2016
Business Continuity: Data Redundancy
Datacenter Pairing
• US• |North Central | South Central US|
• |South Central | North Central US |
• | East US | West US |
• | West US | East US |
• |US East 2 | Central US |
• | Central US | US East 2 |
• Europe• | North Europe | West Europe |
• | West Europe | North Europe |
• Asia• | South East Asia| East Asia|
• | East Asia| South East Asia|
• | Japan East| Japan West|
• | Japan West| Japan East|
• | Australia East| Australia Southeast|
• | Australia Southeast| Australia East|
• LATAM• | Brazil South | South Central US|
Northern Virginia SQL Users Group Meeting - February 2016
Business Continuity: Db Restore
• Azure SQL Data Warehouse service protects all databases with automatic storage snapshots at least every 8 hours and retains them for 7 days to provide you with a discrete set of restore points.
• The automatic backup and restore features come with no additional charges and provide a zero-cost and zero-admin way to protect databases from accidental corruption or deletion.
• Q&A
Northern Virginia SQL Users Group Meeting - February 2016
Business Continuity: Geo Restore
• Geo-Restore is designed to recover your database in case it becomes unavailable due to a disruptive event.
• You can contact support to restore a database from a geo-redundant backup to create a new database in any Azure region.
• Because the backup is geo-redundant it can be used to recover a database even if the database is inaccessible due to an outage.
• Geo-Restore feature comes with no additional charges.
• Q&A
Northern Virginia SQL Users Group Meeting - February 2016
DWaaS Code Tips and ToolsData Warehouse as a Service
Northern Virginia SQL Users Group Meeting - February 2016
Code Tips #1 : Identifying incompatible columns
Northern Virginia SQL Users Group Meeting - February 2016
Code Tips #2: Round-Robin Distribute
Northern Virginia SQL Users Group Meeting - February 2016
Code Tips #3: Hash Distribute
Northern Virginia SQL Users Group Meeting - February 2016
Code Tips #4: Set Boundary Points
Northern Virginia SQL Users Group Meeting - February 2016
Code Tips #5: Single Column Statistics
Create single-column statistics with default optionsCREATE STATISTICS [statistics_name] ON [schema_name].[table_name]([column_name]);
Create single-column statistics by examining every row CREATE STATISTICS [statistics_name] ON [schema_name].[table_name]([column_name]) WITH FULLSCAN;
Create single-column statistics by specifying the sample size CREATE STATISTICS col1_stats ON dbo.table1 (col1) WITH SAMPLE = 50 PERCENT;
Create single-column statistics on only some of the rowsCREATE STATISTICS stats_col1 ON table1(col1) WHERE col1 > '2000101' AND col1 < '20001231';
Create single-column statistics with all the optionsCREATE STATISTICS stats_col1 ON table1 (col1) WHERE col1 > '2000101' AND col1 < '20001231' WITH SAMPLE = 50 PERCENT;
Northern Virginia SQL Users Group Meeting - February 2016
Code Tips #6: Multi-Column Statistics
• Create multi-column statisticsCREATE STATISTICS stats_2cols ON table1 (product_category, product_sub_category) WHERE product_category > '2000101' AND product_category < '20001231' WITH SAMPLE = 50 PERCENT;
• Create statistics on all the columns in a table
Northern Virginia SQL Users Group Meeting - February 2016
Code Tips #7: Updating Statistics
• Update one specific statistics objectUPDATE STATISTICS [schema_name].[table_name]([stat_name]);
• Update all statistics on a tableUPDATE STATISTICS [schema_name].[table_name];
Northern Virginia SQL Users Group Meeting - February 2016
Data Warehouse Migration Utility Tool
The Data Warehouse Migration Utility migrates schema and data from SQL Server and Azure SQL DB to the new SQL Data Warehouse (SQL DW) service.
Northern Virginia SQL Users Group Meeting - February 2016
Data Warehouse Migration Utility Tool
• During schema migration, this tool helps map the schema from the source to the destination.
• This tool gives user ability to design MPP database schema during migration by choosing distribution criteria and to easily apply that schema to a SQL DW instance.
• After schema translation and migration this tool also gives users the ability to create BCP scripts to run that will automatically migrate the data.
• In addition to migration services, the Data Warehouse Migration Utility also provides an option to generate compatibility report, and analyze the potential skew of data on hash distributed tables for better migration decisions
Northern Virginia SQL Users Group Meeting - February 2016
Data Warehouse Migration Utility Tool
Northern Virginia SQL Users Group Meeting - February 2016
Data Warehouse Migration Utility Tool
Northern Virginia SQL Users Group Meeting - February 2016
Data Warehouse Migration Utility Tool
Northern Virginia SQL Users Group Meeting - February 2016
Data Warehouse Migration Utility Tool
Northern Virginia SQL Users Group Meeting - February 2016
DWaaS DemoData Warehouse as a Service
Northern Virginia SQL Users Group Meeting - February 2016
Future Session (Fall 2016)
• DWAAS Deep Dive*• We’ll explore the GA release of DWaaS• Present a global reference model that we’ll configure • Move various DWs:
• Oracle• SQL 2005, 2008, 2008R2, 2012, and 2014 • DB2
• Integration from different sources:• Polybase• Azure Factory• SSIS• 3rd Parties
* I’ll have an office with ExpressRoute in place by then
Northern Virginia SQL Users Group Meeting - February 2016
Prepare your Organizations –Analysts/Executives• Data Analysis Expressions
• https://msdn.microsoft.com/en-us/library/gg413422.aspx
• Power Query Mashup Language • https://msdn.microsoft.com/en-us/library/mt211003.aspx
• TypeScript• http://www.typescriptlang.org/
Northern Virginia SQL Users Group Meeting - February 2016
Prepare your Organizations – IT/Data Scientists• Microsoft Open R (Formerly RevoR, Multi-Threaded R)
• https://mran.microsoft.com/open/
• Multi-Dimensional Expressions(MDX) • https://msdn.microsoft.com/en-us/library/ms145506.aspx
• Data Mining Expressions(DMX) • https://msdn.microsoft.com/en-us/library/ms132058.aspx
Northern Virginia SQL Users Group Meeting - February 2016
Free Training Resources
Microsoft Virtual Academyhttps://mva.microsoft.com/
EDX(Audit to enter classes for free) https://www.edx.org/school/microsoft
Data Camp for Data Science
https://www.datacamp.com/
Join SQL PASS (Free)
http://www.sqlpass.org/
Northern Virginia SQL Users Group Meeting - February 2016
Thank You
Northern Virginia SQL Users Group Meeting - February 2016