smp mpp with pdw ** workload requirements usually drive the architecture decision

31
Microsoft SQL Server 2008 R2 Parallel Data Warehouse Deep Dive Matt Peebles Principal Architect Microsoft Corporation Email: [email protected] SESSION CODE: BIE309

Upload: jasmin-price

Post on 20-Jan-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Microsoft SQL Server 2008 R2 Parallel Data Warehouse Deep Dive Matt PeeblesPrincipal ArchitectMicrosoft CorporationEmail: [email protected]

SESSION CODE: BIE309

Page 2: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

AgendaConcepts and PrinciplesWhat is PDW?What is it doing under the hood?MTP Feedback Questions

Page 3: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

PDW Core ConceptsShared Nothing Computing

Resource and data independence are maintained within each DBMS instanceEach instance reserves shared resources (CPU, Memory, Disk) for only its distribution of system dataSimply add new resources to continually scale out

Massively Parallel Processing (MPP): Ability to leverage multiple concurrent resources to resolve SQL set operations against Distributed data.Each instance works in parallel on its own “distribution” of a single user query.PDW supports up to 10 parallel instances of SQL Server DBMS per Data Rack.

Max of 40 Nodes

Page 4: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

SMP vs MPPSMP

HW advancements increasing ability to scale-up

Scaling is limitedHigh end SMP very expensive

Extremely high concurrency for some workloadsLess than 1-2 TB of data SMP will almost always be better. Usually <10TBFull SQL Server functionalityHA must be architected in

MPP with PDWHW advancements increasing ability to scale-up & scale-out

Scaling to 1 PB+Scale out is relatively low cost

Relatively high concurrency for complex workloads> 10 TB (typically) up to 1 PBLimited SQL Server functionalityHA is built in

** Workload requirements usually drive the architecture decision

Page 5: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Ultra Shared NothingAn extension of traditional shared nothing designPush shared nothing architecture into SMP node

IO and CPU affinity within SMP nodesEliminate contention per user queryUse full resources for each user queryPredictable results

Multiple physical instances of tablesDistribute large tablesReplicate smaller tables

Redistribute rows “on-the-fly” when necessary

Page 6: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

AgendaConcepts and PrinciplesWhat is PDW?What is it doing under the hood?MTP Feedback Questions

Page 7: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

SQL Server 2008 R2 PDW is…Converted DATAllegro architecture to Microsoft platform

Win2008 SP2, SQL Server 2008 SP1cu5Appliance-like

Software + commodity hardware solutionPre-tuned & optimized specifically for sequential IOUser’s see 1 “server”

Only runs on specific HW - Reference Architectures Launching with Dell, HP, IBM, EMC, and Bull in Europe

Page 8: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

SQL Server 2008 R2 PDW is…Designed for DW workloads

Large Data (~10TB -> 500TB)Normalized, as well as, star schemas

First class integration with Microsoft BIExcel PowerPivotAnalysis ServicesReporting ServicesIntegration Services

Will support 3rd party BI solutions

Page 9: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Compute Nodes

Infin

iban

d

Control Nodes

Landing Zone

Backup Node

Storage Nodes

Passive Compute Node

Dua

l Fib

er C

hann

el

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

SQL

Management Servers

Client Drivers

ETL Load Interface

Corporate Backup Solution

Data Center Access

Corporate Network Private Network

PDW Reference Architecture ControlRack

Data Rack

Expand by adding data rack(s) when capacity or

performance requirements

changeDua

l

Active / Passive

Reporting ServicesAnalysis ServicesIntegration Services

R2

DataDirect DriversADO.NET, OLE-DB, ODBC

1. PDW Engine2. Admin Console3. Metadata4. Workspace

1.User Data2. DMS

1. SSIS Instance2. Loader Tool3. File Staging

1. File store for backups2. Std. SQL Backups3. Full and differential

1. Active Directory/DNS2. HPC3. Setup/patching

Page 10: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

CPU

CPU

RAM

Compute Node Server Architecture

Current Hardware Options

Vendor Model FormFactor CPU Total Cores Memory Local Storage

(TempDB)

HP DL360 G6 1U Intel Nehalem 8 CoresHyper threaded 72 GB 6 – 300GB 10K SAS

DELL R610 1U Intel Nehalem 8 CoresHyper threaded 96 GB 4 – 300GB 10K SAS

Enterprise ClassDBMS

TempDBWorkspace

Dual Multi-CoreProcessors

Models listed as of SQL Server 2008 R2 PDW MTP2 release** Server models could change before RTM**

Page 11: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Storage Node Architecture

EMC AX4 (8 Arrays/Rack)

DriveCapacity

Spindle Speed Bus

RackCapacity

With 2.5X Compression

450 GB 15K SAS 36 TB

1 TB 7.2K SATA 80 TB

DUAL 4Gb FCStg Processor

DUAL 4Gb FCStg Processor

Data & Log Drives (RAID 10)

HotSpare

HP MSA (10 Arrays/Rack)

DriveCapacity

Spindle Speed Bus

RackCapacity

With 2.5X Compression

450 GB 15K SAS 45 TB

1 TB 7.2K SAS 100 TB

Models listed as of SQL Server 2008 R2 PDW MTP2 release** Storage models and drives could change before RTM**

Page 12: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Other ItemsManageability

DMVs – PDW and SQL Server surfacedWeb-based Admin ConsoleMonitoring

DMVs provides status on HW and software componentsAlerts/Warnings queried from DMVs by Customer monitoring solutionPDW Admin Console visualizes warnings/status alsoSCOM pack will be released after V1

Page 13: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Fault ToleranceRedundant components

Disks, networks, power, storage processors Windows Failover ClusteringEach rack is a separate clusterManagement nodes use AD “failover” technology

Page 14: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

AgendaConcepts and PrinciplesWhat is PDW?What is it doing under the hood?MTP Feedback Questions

Page 15: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Ultra Shared Nothing Example

15

Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day

Store DimStore Dim IDStore NameStore MgrStore Size

Product DimProd Dim IDProd CategoryProd Sub CatProd Desc

Sales Facts

Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold Mktg

Campaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End

SQL

SQL

SQL

SQL

Page 16: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

PDW Ultra Shared Nothing

16

Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day

Store DimStore Dim IDStore NameStore MgrStore Size

Product DimProd Dim IDProd CategoryProd Sub CatProd Desc

Sales Facts

Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold Mktg

Campaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End

SQL

SQL

SQL

SQL

TD PD

SD MD

TD PD

SD MD

TD PD

SD MD

Smaller Dimension Tables are Replicated on Every Compute

Node

TD PD

SD MD

Page 17: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

PDW Ultra Shared Nothing

17

Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day

Store DimStore Dim IDStore NameStore MgrStore Size

Product DimProd Dim IDProd CategoryProd Sub CatProd Desc

Sales Facts

Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold Mktg

Campaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End

SQL

SQL

SQL

SQL

TD PD

SD MD

TD PD

SD MD

TD PD

SD MD

TD PD

SD MD

Larger Fact Table is Hash Distributed Across All

Compute Nodes

SF-1

SF-2

SF-3

SF-4

Page 18: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

PDW Ultra Shared Nothing (Node1)Parallelism within the nodeCPU to IO affinity with SoftNUMA and table layout

Compute Node

SQL Server 2008 SP2

TempDBUser Database

Storage Node

Storage Processor Storage Processor

Data LUNs Data LUNs Tx LogsHot Spare

Distributed table hashed within node into physical tables

Distributed Table

SF1-

a

SF1-

b

SF1-

c

SF1-

d

SF1-

e

SF1-

f

SF1-

g

SF1-

h

SF-1

Each distribution lands on a specific LUNReplicated tables striped across LUNsSF

1-a

SF1-

b

SF1-

c

SF1-

d

SF1-

e

SF1-

f

SF1-

g

SF1-

h

Repl

Page 19: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Query Processing Flow

SQL Server

DW Authentication DW Configuration DW Schema TempDB

Data Movement

Service (DMS)

Compute NodesCompute Nodes

Compute Node

Query Tool

SQL Server

Data Movement Service (DMS)

User Data

Control Node

PDW Engine

Parse SQL

Validate & Authorize

Build MPP Plan

Execute Plan

Return Data to Client

TempDB

Page 20: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Data RackControl Rack

PDW Distributed Table Load Example

20

Control Node

Landing Zone

Compute Nodes Storage Nodes

Infin

iban

d

LoadFile/SSIS

DMS Ser erPDW Engine

Load Manager

DMSManager DMS

SQLServer

Load Client

DMS

DMS

Converter Sender

Receiver Writer

DMS

Converter Sender

Receiver Writer

DWLoader invoked/

SSIS

DMS Reads Load Data and buffers records

to Send to Compute Nodes

round-robin

Load Manager Creates Staging

Tables

Each row is converted for bulk insert and hash the distribution column

Hashed row is sent to appropriate node

receiver for loading

Received row is

pushed onto writer thread

Row is bulk inserted into staging table

SSIS API

Page 21: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Full Process ViewInsert-Select

Load Data Bulk Insert

Partitioned Staging

Table(CIDX)

Partitioned FinalTable(CIDX)

Sort each BATCH

in memory or TempDB

Sort each partition

In memory or TempDBN

ode1

Dis

t A

Insert-Select

Load Data Bulk Insert

Partitioned Staging

Table(CIDX)

Partitioned FinalTable(CIDX)

Sort each BATCH

in memory or TempDB

Sort each partition

In memory or TempDBN

ode1

Dis

t B

Insert-Select

Load Data Bulk Insert

Partitioned Staging

Table(CIDX)

Partitioned FinalTable(CIDX)

Sort each BATCH

in memory or TempDB

Sort each partition

In memory or TempDBN

ode1

Dis

t H

Page 22: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

AgendaConcepts and PrinciplesWhat is PDW?What is it doing under the hood?MTP Feedback Questions

Page 23: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Customer Feedback

Credit card processingBetter than linear performance under concurrencyVirtually no degradation for concurrent loads & queries

EDW WorkloadCurrent DB 24 core SMP SQL ServerAveraged 18X faster 2X – 166X fasterMulti-hour queries now run in secondsPDW queries were run with only 1 clustered index on each table

MTP and TAP programs in progressIf you have large data issues now, please contact us to participate

Performance very competitive with other Appliance offerings

Page 24: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

SummaryPDW is …

Enterprise class, MPP version of SQL Server 2008Ultra shared nothing architectureHardware and softwareCurrently scales to 500 TerabytesSeamless integration with Microsoft BI toolsFAST!!!

Page 25: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Related ContentSQL Server Data Warehouse Station – Located at TLC/Yellow/BIN

Fast Track Talk - BIE07-INT - Developing a Microsoft SQL Server 2008 Fast Track Data Warehouse

Wed @1:30pm

SQL Server Web Sitehttp://www.microsoft.com/sqlserver/2008/en/us/parallel-data-warehouse.aspx

Page 26: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

AgendaConcepts and PrinciplesWhat is PDW?What is it doing under the hood?MTP Feedback Questions?

Page 27: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Resources

www.microsoft.com/teched

Sessions On-Demand & Community Microsoft Certification & Training Resources

Resources for IT Professionals Resources for Developers

www.microsoft.com/learning

http://microsoft.com/technet http://microsoft.com/msdn

Learning

Page 28: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Complete an evaluation on CommNet and enter to win!

Page 29: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st

http://northamerica.msteched.com/registration

You can also register at the

North America 2011 kiosk located at registrationJoin us in Atlanta next year

Page 30: SMP MPP with PDW ** Workload requirements usually drive the architecture decision

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Page 31: SMP MPP with PDW ** Workload requirements usually drive the architecture decision