smp mpp with pdw ** workload requirements usually drive the architecture decision
TRANSCRIPT
Microsoft SQL Server 2008 R2 Parallel Data Warehouse Deep Dive Matt PeeblesPrincipal ArchitectMicrosoft CorporationEmail: [email protected]
SESSION CODE: BIE309
AgendaConcepts and PrinciplesWhat is PDW?What is it doing under the hood?MTP Feedback Questions
PDW Core ConceptsShared Nothing Computing
Resource and data independence are maintained within each DBMS instanceEach instance reserves shared resources (CPU, Memory, Disk) for only its distribution of system dataSimply add new resources to continually scale out
Massively Parallel Processing (MPP): Ability to leverage multiple concurrent resources to resolve SQL set operations against Distributed data.Each instance works in parallel on its own “distribution” of a single user query.PDW supports up to 10 parallel instances of SQL Server DBMS per Data Rack.
Max of 40 Nodes
SMP vs MPPSMP
HW advancements increasing ability to scale-up
Scaling is limitedHigh end SMP very expensive
Extremely high concurrency for some workloadsLess than 1-2 TB of data SMP will almost always be better. Usually <10TBFull SQL Server functionalityHA must be architected in
MPP with PDWHW advancements increasing ability to scale-up & scale-out
Scaling to 1 PB+Scale out is relatively low cost
Relatively high concurrency for complex workloads> 10 TB (typically) up to 1 PBLimited SQL Server functionalityHA is built in
** Workload requirements usually drive the architecture decision
Ultra Shared NothingAn extension of traditional shared nothing designPush shared nothing architecture into SMP node
IO and CPU affinity within SMP nodesEliminate contention per user queryUse full resources for each user queryPredictable results
Multiple physical instances of tablesDistribute large tablesReplicate smaller tables
Redistribute rows “on-the-fly” when necessary
AgendaConcepts and PrinciplesWhat is PDW?What is it doing under the hood?MTP Feedback Questions
SQL Server 2008 R2 PDW is…Converted DATAllegro architecture to Microsoft platform
Win2008 SP2, SQL Server 2008 SP1cu5Appliance-like
Software + commodity hardware solutionPre-tuned & optimized specifically for sequential IOUser’s see 1 “server”
Only runs on specific HW - Reference Architectures Launching with Dell, HP, IBM, EMC, and Bull in Europe
SQL Server 2008 R2 PDW is…Designed for DW workloads
Large Data (~10TB -> 500TB)Normalized, as well as, star schemas
First class integration with Microsoft BIExcel PowerPivotAnalysis ServicesReporting ServicesIntegration Services
Will support 3rd party BI solutions
Compute Nodes
Infin
iban
d
Control Nodes
Landing Zone
Backup Node
Storage Nodes
Passive Compute Node
Dua
l Fib
er C
hann
el
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
SQL
Management Servers
Client Drivers
ETL Load Interface
Corporate Backup Solution
Data Center Access
Corporate Network Private Network
PDW Reference Architecture ControlRack
Data Rack
Expand by adding data rack(s) when capacity or
performance requirements
changeDua
l
Active / Passive
Reporting ServicesAnalysis ServicesIntegration Services
R2
DataDirect DriversADO.NET, OLE-DB, ODBC
1. PDW Engine2. Admin Console3. Metadata4. Workspace
1.User Data2. DMS
1. SSIS Instance2. Loader Tool3. File Staging
1. File store for backups2. Std. SQL Backups3. Full and differential
1. Active Directory/DNS2. HPC3. Setup/patching
CPU
CPU
RAM
Compute Node Server Architecture
Current Hardware Options
Vendor Model FormFactor CPU Total Cores Memory Local Storage
(TempDB)
HP DL360 G6 1U Intel Nehalem 8 CoresHyper threaded 72 GB 6 – 300GB 10K SAS
DELL R610 1U Intel Nehalem 8 CoresHyper threaded 96 GB 4 – 300GB 10K SAS
Enterprise ClassDBMS
TempDBWorkspace
Dual Multi-CoreProcessors
Models listed as of SQL Server 2008 R2 PDW MTP2 release** Server models could change before RTM**
Storage Node Architecture
EMC AX4 (8 Arrays/Rack)
DriveCapacity
Spindle Speed Bus
RackCapacity
With 2.5X Compression
450 GB 15K SAS 36 TB
1 TB 7.2K SATA 80 TB
DUAL 4Gb FCStg Processor
DUAL 4Gb FCStg Processor
Data & Log Drives (RAID 10)
HotSpare
HP MSA (10 Arrays/Rack)
DriveCapacity
Spindle Speed Bus
RackCapacity
With 2.5X Compression
450 GB 15K SAS 45 TB
1 TB 7.2K SAS 100 TB
Models listed as of SQL Server 2008 R2 PDW MTP2 release** Storage models and drives could change before RTM**
Other ItemsManageability
DMVs – PDW and SQL Server surfacedWeb-based Admin ConsoleMonitoring
DMVs provides status on HW and software componentsAlerts/Warnings queried from DMVs by Customer monitoring solutionPDW Admin Console visualizes warnings/status alsoSCOM pack will be released after V1
Fault ToleranceRedundant components
Disks, networks, power, storage processors Windows Failover ClusteringEach rack is a separate clusterManagement nodes use AD “failover” technology
AgendaConcepts and PrinciplesWhat is PDW?What is it doing under the hood?MTP Feedback Questions
Ultra Shared Nothing Example
15
Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store DimStore Dim IDStore NameStore MgrStore Size
Product DimProd Dim IDProd CategoryProd Sub CatProd Desc
Sales Facts
Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold Mktg
Campaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End
SQL
SQL
SQL
SQL
PDW Ultra Shared Nothing
16
Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store DimStore Dim IDStore NameStore MgrStore Size
Product DimProd Dim IDProd CategoryProd Sub CatProd Desc
Sales Facts
Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold Mktg
Campaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End
SQL
SQL
SQL
SQL
TD PD
SD MD
TD PD
SD MD
TD PD
SD MD
Smaller Dimension Tables are Replicated on Every Compute
Node
TD PD
SD MD
PDW Ultra Shared Nothing
17
Time DimDate Dim IDCalendar YearCalendar QtrCalendar MoCalendar Day
Store DimStore Dim IDStore NameStore MgrStore Size
Product DimProd Dim IDProd CategoryProd Sub CatProd Desc
Sales Facts
Date Dim IDStore Dim IDProd Dim IDMktg Camp IdQty SoldDollars Sold Mktg
Campaign DimMktg Camp IDCamp NameCamp MgrCamp StartCamp End
SQL
SQL
SQL
SQL
TD PD
SD MD
TD PD
SD MD
TD PD
SD MD
TD PD
SD MD
Larger Fact Table is Hash Distributed Across All
Compute Nodes
SF-1
SF-2
SF-3
SF-4
PDW Ultra Shared Nothing (Node1)Parallelism within the nodeCPU to IO affinity with SoftNUMA and table layout
Compute Node
SQL Server 2008 SP2
TempDBUser Database
Storage Node
Storage Processor Storage Processor
Data LUNs Data LUNs Tx LogsHot Spare
Distributed table hashed within node into physical tables
Distributed Table
SF1-
a
SF1-
b
SF1-
c
SF1-
d
SF1-
e
SF1-
f
SF1-
g
SF1-
h
SF-1
Each distribution lands on a specific LUNReplicated tables striped across LUNsSF
1-a
SF1-
b
SF1-
c
SF1-
d
SF1-
e
SF1-
f
SF1-
g
SF1-
h
Repl
Query Processing Flow
SQL Server
DW Authentication DW Configuration DW Schema TempDB
Data Movement
Service (DMS)
Compute NodesCompute Nodes
Compute Node
Query Tool
SQL Server
Data Movement Service (DMS)
User Data
Control Node
PDW Engine
Parse SQL
Validate & Authorize
Build MPP Plan
Execute Plan
Return Data to Client
TempDB
Data RackControl Rack
PDW Distributed Table Load Example
20
Control Node
Landing Zone
Compute Nodes Storage Nodes
Infin
iban
d
LoadFile/SSIS
DMS Ser erPDW Engine
Load Manager
DMSManager DMS
SQLServer
Load Client
DMS
DMS
Converter Sender
Receiver Writer
DMS
Converter Sender
Receiver Writer
DWLoader invoked/
SSIS
DMS Reads Load Data and buffers records
to Send to Compute Nodes
round-robin
Load Manager Creates Staging
Tables
Each row is converted for bulk insert and hash the distribution column
Hashed row is sent to appropriate node
receiver for loading
Received row is
pushed onto writer thread
Row is bulk inserted into staging table
SSIS API
Full Process ViewInsert-Select
Load Data Bulk Insert
Partitioned Staging
Table(CIDX)
Partitioned FinalTable(CIDX)
Sort each BATCH
in memory or TempDB
Sort each partition
In memory or TempDBN
ode1
Dis
t A
Insert-Select
Load Data Bulk Insert
Partitioned Staging
Table(CIDX)
Partitioned FinalTable(CIDX)
Sort each BATCH
in memory or TempDB
Sort each partition
In memory or TempDBN
ode1
Dis
t B
Insert-Select
Load Data Bulk Insert
Partitioned Staging
Table(CIDX)
Partitioned FinalTable(CIDX)
Sort each BATCH
in memory or TempDB
Sort each partition
In memory or TempDBN
ode1
Dis
t H
AgendaConcepts and PrinciplesWhat is PDW?What is it doing under the hood?MTP Feedback Questions
Customer Feedback
Credit card processingBetter than linear performance under concurrencyVirtually no degradation for concurrent loads & queries
EDW WorkloadCurrent DB 24 core SMP SQL ServerAveraged 18X faster 2X – 166X fasterMulti-hour queries now run in secondsPDW queries were run with only 1 clustered index on each table
MTP and TAP programs in progressIf you have large data issues now, please contact us to participate
Performance very competitive with other Appliance offerings
SummaryPDW is …
Enterprise class, MPP version of SQL Server 2008Ultra shared nothing architectureHardware and softwareCurrently scales to 500 TerabytesSeamless integration with Microsoft BI toolsFAST!!!
Related ContentSQL Server Data Warehouse Station – Located at TLC/Yellow/BIN
Fast Track Talk - BIE07-INT - Developing a Microsoft SQL Server 2008 Fast Track Data Warehouse
Wed @1:30pm
SQL Server Web Sitehttp://www.microsoft.com/sqlserver/2008/en/us/parallel-data-warehouse.aspx
AgendaConcepts and PrinciplesWhat is PDW?What is it doing under the hood?MTP Feedback Questions?
Resources
www.microsoft.com/teched
Sessions On-Demand & Community Microsoft Certification & Training Resources
Resources for IT Professionals Resources for Developers
www.microsoft.com/learning
http://microsoft.com/technet http://microsoft.com/msdn
Learning
Complete an evaluation on CommNet and enter to win!
Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st
http://northamerica.msteched.com/registration
You can also register at the
North America 2011 kiosk located at registrationJoin us in Atlanta next year
© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.