sql server 2012 (v2) parallel data warehouse insights server 2012 parallel data...•polybase...
TRANSCRIPT
SQL Server 2012 (V2)
Parallel Data Warehouse
Insights (Level 200)
Agenda • Introducing
• Why SQL PDW – PDW Hardware Architecture
– PDW New Database Features
• SQL Server PDW Live demo’sComparing SQL2012/SQL2014 (SMP) vs PDW (MPP)
Data loading
Query Performance - Data loading
Query Concurrency
Polybase / Hadoop integration
• Q&A
About Henk
• 10 years of Unisys-EMEA Performance Center• 2002- Largest SQL DWH in the world (SQL2000)• Project Real – (SQL 2005)• ETL WR - loading 1TB within 30 mins (SQL 2008)• Contributed to several SQL whitepapers
• Schuberg Philis-100% uptime for mission critical applications• Since april 1st, 2011 – Microsoft SQL PDW - Western Europe
1999 - How it all started with Unisys & High End SQL Server: the ES7000
5
• 32 Way / 64 GB RAM• 32bit PIII - 550 MHz• 64bit Itanium 800 MHz• SQL Server (64-bit) released in April 2002.
• SQL Server 2000 EE, SP1
Unisys ES7000 … Big Windows / Wintel Mainframe Architecture
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
0 50 100 150 200 250 300
Drives
KB
/sec
32K 64K 128K 256KUp to 32 CPU’s
64/128 GB RAM
96 PCI slots
Example : Sequential Read
ES7000/100
1 x OS: Windows 2000 Datacenter
42 x 1Gbit HBA’s
EMC Clariion’s with250 physical drives
PDW: Querying 1Petabyte of data in 1 second / 40 node
Querying 294 billion rows in seconds
Diminishing performance
Limitations: Performance and Scale today
Existing Tables (Partitions)
Rowstore
Diminishing Scale as
requirements grow
Non-optimal performance
for many DW queries
Scale UP
SQL Server 2012 Parallel Data Warehouse (PDW)Insights on any data of any size
Next-generation
Performance At ScaleBuilt For Big Data
What is Parallel Data Warehouse?
» Massively parallel processing (MPP)
» A “Control” server that accepts user queries, generates a plan, and distributes operations in parallel to compute nodes
» Multiple “Compute” servers running SQL Server
» A “Management” server for administering the system
» A “Data Movement Service” that facilitates parallel SQL operations
» Balanced and pre-configured software and industry standard hardware from Dell or HP
» Single Call Support
» Fastest Time to Market
» Scales from 2 to 56 Nodes
HP ExampleDell Example
Key Design Elements
•
» Windows Server 2012 Storage Spaces
» Windows Server 2012 Hyper-V
» SQL Server 2012 PDW xVelocity ColumnStore
HP Example
SQL Instance Storage
Scale up (SMP)» Build for specific requirement
» Build HA etc. additionally
» Maintain and Tune (Load/File Distribution)
» Unknown Future workloads
» Still a very good data mart solution in a Hub and Spoke architecture with SQL Server PDW
Scale out (MPP)» Resilient & Predictable
» Big data / DW Best Practices in a box
» Deploy Fast and Drive Value
» Built-in HA
» Scalable (start small/grow when needed)
SQL Instance #2
SQL Instance #1
SQL Instance #4
SQL Instance #3
SQL Instance #8
SQL Instance #....
Storage
Storage
Storage
Storage
Storage
Storage
Ultra Shared Nothing architecture: Distribution
Larger Fact Table is Hash Distributed Across
All Compute NodesTD
SD
PD
MD
SF
01-08Time Dim
Date Dim ID
Calendar Year
Calendar Qtr
Calendar Mo
Calendar Day
Store Dim
Store Dim ID
Store Name
Store Mgr
Store Size
Product Dim
Prod Dim ID
Prod Category
Prod Sub Cat
Prod Desc
Sales Facts
Date Dim ID
Store Dim ID
Prod Dim ID
Mktg Camp ID
Qty Sold
Dollars Sold
Mktg Campaign
Dim
Mktg Camp ID
Camp Name
Camp Mgr
Camp Start
Camp End
TD
SD
PD
MD
SF
09-16
TD
SD
PD
MD
SF
17-24
TD
SD
PD
MD
SF
25-32
TD
SD
PD
MD
SF
33-n
Ultra Shared Nothing architecture: Replication
Smaller “Dimension Tables” are Replicated
on each server/nodeTD
SD
PD
MD
SF
01-08Time Dim
Date Dim ID
Calendar Year
Calendar Qtr
Calendar Mo
Calendar Day
Store Dim
Store Dim ID
Store Name
Store Mgr
Store Size
Product Dim
Prod Dim ID
Prod Category
Prod Sub Cat
Prod Desc
Sales Facts
Date Dim ID
Store Dim ID
Prod Dim ID
Mktg Camp ID
Qty Sold
Dollars Sold
Mktg Campaign
Dim
Mktg Camp ID
Camp Name
Camp Mgr
Camp Start
Camp End
TD
SD
PD
MD
SF
09-16
TD
SD
PD
MD
SF
17-24
TD
SD
PD
MD
SF
25-32
TD
SD
PD
MD
SF
33-n
Result: Fact - Dimension
Joins can be performed
locally
= Co-location!
The Logical View on the base unit
Host 1
Host 0
Host 2
Host 3
Storage
SpacesIB &
Ethernet
Direct attached SAS
CTLFA
B
AD
VMM
Compute 1
Compute 2
• Window Server 2012
• DMS Core
• SQL Server 2012
• 8 Distributions spread across 32 Disk Spindles
(+Spare), per Node!
• Window Server
2012
• PDW engine
• DMS Manager
• SQL Server 2012
General Details» All hosts run Windows Server 2012
Standard
» All VMs run Windows Server 2012 Standard as a guest OS
» All Fabric and workload activity happens in Hyper-V virtual machines, with Fabric VM’s sharing 1 server
» Failover is handled by Hyper-V
» PDW Agent runs on all hosts and all VMs, collects appliance health data on fabric and workload
» Windows Storage Spaces handles mirroring and spares
PDW Workload Details» SQL Server 2012 Parallel Data Warehouse
is used on control node and compute nodes for PDW workload
Specs» 32 cores on 2 active compute nodes
» 512 GB of RAM on compute
» Up to 225TB of user data
MAD
01
Start small with a few TB and linearly Scale Out
Seamlessly add capacity
Smallest (0TB) To Largest (6PB)
• Start small with a few Terabyte warehouse• Scale Out: Incrementally add HW for Near
Linear Scale• Shared Nothing• Add capacity up to 6 Petabytes
0TB 6 PB
Add
Capacity
Add
Capacity
Largest Warehouse
PB
Start Small And Grow
Minimal Downtime
Lighting Fast Data Query Processing
Cu
stom
er
Sale
s
Co
un
try
Su
pp
lier
Pro
du
cts
Columnstore Provides Dramatic Performance
• Updateable and clustered xVelocity columnstore• Stores data in columnar format• Updateable to support bulk and/or trickle loading• Full DML (Insert / Update / Delete / Select) supported
directly on ColumnStore
• Memory-optimized for next-generation performance
Up to50X Faster
Up to 15x compression
Save Timeand Costs
Real-TimeDW
• CCI automatically includes all columns in the table
reorganize
Move inserts
rebuild
Reclaims space
Data loading SQL SMP 2012/2014/PDW
SQL 2012 SMP SQL 2014 SMP
SQL PDW
Single 75 GB / 600 Million row LineItem flat file
Comparing throughput
• Impression Bulk inserting a single flatfile on SMP SQL2012 vs PDW
Avg. 21 MB/sec vs 270 MB/sec
SMP SQL201x Bulk insert usesa single core per task
PDW Bulk insert is writinginto all distributions in parallel
Data loading
PDW loads single flatfile data 15-36x faster!
Single 75 GB / 600 Million row LineItem flat file
SSIS PDW V2 Destination Adapter
SSIS Dataload
2.5x Faster
Single 75 GB / 600 Million row LineItem flat file
Exporting Data
• Bulk Write data out:
– Remote Table Copy Into SMP (RTC)
– SQLCMD
• Polybase (Hadoop integration)
Remote table copy from PDW to SMP• Creating a remote table on an regular SQL SMP Server utilizing all available
cores
• Export data from all distributions in Parallel
33
CREATEREMOTETABLE
SMP_DB.dbo..LineItem_test AT
( 'Data Source = {Destination},1433 ;
…' )
AS SELECT* FROMSQLRally2013.dbo.lineitem
Parallel Data Transfers
SQL Server
SQL Server SQL Server
…
SQL Server
DN DN DN
DN DN DN
DN DN DN
DN DN DN
Hadoop Cluster
37
SMP
Sqoop
connector
This is PDW!
Familiar Tools To Analyze Structured/Unstructured Data
Familiar Tools Analyse Big Data
• Native Microsoft BI Integration to PDW
• Structured and unstructured data in same spreadsheet
• Widely adopted and familiar user tools
No IT Intervention
Analyze All Data Types
High Adoption Of Excel
PDW V2 - Improved Workload management
• 10 compute node under load / 200 concurrent users
• PDW V1 completes 400 complex queries in 10 minutes
PDW V2 default resource class• 2 node V2 Base unit under load / 200 concurrent users
• PDW V2 completes 1135 complex queries in 10 minutes
Base Unit~3xFaster thanV1
Additional Resources
» SQL Server Parallel Data Warehouse (PDW) Landing Page:www.microsoft.com/PDW
» New PDW Whitepaper:
http://msdn.microsoft.com/en-us/library/dn520808.aspx
» Introduction to Polybase:http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/polybase.aspx
» Price/TB comparison:http://www.valueprism.com/resources/resources/Resources/PDW%20Compete%20Pricing%20FINAL.pdf
PDW is…
• A DWH appliance delivering a new type of value at lower cost
• All-In-One MPP Hub for Data Warehousing
• Deals easily with Complex Calcs, near realtime DW & unpredictable growth
• Steep learning curve
• ‘Out-of-the-box performance’ – MS Stamped’
• Easy to manage, integrate and highly available
• Complex query response from hours -> minutes/secs
THANK YOU!For attending this session and PASS SQLRally Nordic 2013, Stockholm
Titles are set to 34 pt, Arial
Click to edit Master title style• Level 1 text is 28 pt Arial
– Level 2 text is 24 pt Arial
• Level 3 text is 20 pt Arial
– Level 4 text is 20 pt Arial
• Level 5 text is 20 pt Arial