sql server scaling on big iron (numa) systems joe chang [email protected] tpc-h
TRANSCRIPT
SQL Server Scaling on Big Iron (NUMA) Systems
TPC-H
About Joe ChangAbout Joe Chang
SQL Server Execution Plan Cost Model
True cost structure by system architecture
Decoding statblob (distribution statistics)
SQL Clone – statistics-only database
ToolsExecStats – cross-reference index use by SQL-execution plan
Performance Monitoring,
Profiler/Trace aggregation
TPC-HTPC-H
TPC-HTPC-H
DSS – 22 queries, geometric mean60X range plan cost, comparable actual range
Power – single streamTests ability to scale parallel execution plans
Throughput – multiple streams
Scale Factor 1 – Line item data is 1GB
875MB with DATE instead of DATETIME
Only single column indexes allowed, Ad-hoc
Observed Scaling BehaviorsObserved Scaling Behaviors
Good scaling, leveling off at high DOP
Perfect Scaling ???
Super Scaling
Negative Scaling especially at high DOP
Execution Plan change Completely different behavior
TPC-H Published ResultsTPC-H Published Results
TPC-H SF 100GBTPC-H SF 100GB
Between 2-way Xeon 5570, all are close, HDD has best throughput, SATA SSD has best composite, and Fusion-IO has be power.Westmere and Magny-Cours, both 192GB memory, are very close
2-way Xeon 5355, 5570, 5680, Opt 6176
0
20,000
40,000
60,000
80,000
100,000
Power Throughput QphH
Xeon 5355 5570 HDD
5570 SSD 5570 Fusion
5680 SSD Opt 6176
TPC-H SF 300GBTPC-H SF 300GB8x QC/6C & 4x12C Opt,
6C Istanbul improved over 4C Shanghai by 45% Power, 73% Through-put, 59% overall.4x12C 2.3GHz improved17% over 8x6C 2.8GHz
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
Power Throughput QphH
Opt 8360 4C Opt 8384 4COpt 8439 6C Opt 6716 12X 7560 8C
TPC-H SF 1000TPC-H SF 1000
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
Power Throughput QphH
Opt 8439 SQL Opt 8439 Sybase
Superdome Superdome 2
Oracle RAC, 64-nodes, 128 Xeon 5450 quad-core 3.0GHz processorsPower 782,608, 5.6X higher than Superdome 2 with 64-cores
TPC-H SF 3TBTPC-H SF 3TBX7460 & X7560
Nehalem-EX 64 cores better than 96 Core 2.
0
50,000
100,000
150,000
200,000
250,000
Power Throughput QphH
16 x X74608 x 7560POWER6M9000
TPC-H SF 100GB, 300GB & 3TBTPC-H SF 100GB, 300GB & 3TB
0
20,000
40,000
60,000
80,000
100,000
Power Throughput QphH
Xeon 5355 5570 HDD
5570 SSD 5570 Fusion
5680 SSD Opt 6176Westmere and Magny-Cours are very closeBetween 2-way Xeon 5570, all are close, HDD has best through-put, SATA SSD has best composite, and Fusion-IO has be power
SF100 2-way
SF300 8x QC/6C & 4x12C6C Istanbul improved over 4C Shanghai by 45% Power, 73% Through-put, 59% overall.4x12C 2.3GHz improved17% over 8x6C 2.8GHz
SF 3TB X7460 & X7560Nehalem-EX 64 cores better than 96 Core 2.
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
Power Throughput QphH
Opt 8360 4C Opt 8384 4COpt 8439 6C Opt 6716 12X 7560 8C
0
50,000
100,000
150,000
200,000
16 x X7460
8 x 7560
32 x Pwr6
TPC-H Published ResultsTPC-H Published Results
SQL Server excels in Power Limited by Geometric mean, anomalies
Trails in ThroughputOther DBMS get better throughput than power
SQL Server throughput below Power
by wide margin
Speculation – SQL Server does not throttle back parallelism with load?
TPC-H SF100TPC-H SF100
PowerThrough
putQphHProcessors
TotalCores
SQLGHzMemGB
SF
23,378.0 13,381.0 17,686.72 Xeon 5355 8 5sp22.66 64 100
67,712.9 38,019.1 50,738.42x5570 HDD 8 8sp12.93 144 100
99,426.3
94,761.5
55,038.2
53,855.6
73,974.6
71,438.3
2 Xeon 5680
2 Opt 6176
12
24
8r2
8r2
3.33
2.3
192
192
100
100
70,048.5 37,749.1 51,422.42x5570 SSD 8 8sp12.93 144 100
72,110.5 36,190.8 51,085.65570 Fusion 8 8sp12.93 144 100
TPC-H SF300TPC-H SF300
PowerThrough
putQphHProcessors
TotalCores
SQLGHzMemGB
SF
25,206.4
67,287.4
75,161.2
109,067.1
13,283.8
41,526.4
44,271.9
76,869.0
18,298.5
52,860.2
57,684.7
91,558.2
4 Opt 8220
8 Opt 8360
8
32
5rtm
8rtm
2.8
2.5
128
256
8 Opt 8384
8 Opt 8439
32
48
8rtm
8sp1
2.7
2.8
256
256
300
300
300
300
129,198.3 89,547.7 107,561.24 Opt 6176 48 8r22.3 512 300
152,453.1 96,585.4 121,345.64 Xeon 7560 32 8r22.26 640 300
All of the above are HP results?, Sun result Opt 8384, sp1, Pwr 67,095.6, Thr 45,343.5, QphH 55,157.5
TPC-H 1TBTPC-H 1TB
PowerThrough
putQphHProcessors
TotalCores
SQLGHzMemGB
SF
95,789.1 69,367.6 81,367.68 Opt 8439 48 8R2?2.8 512 1000
108,436.8 96,652.7 102,375.38 Opt 8439 48 ASE2.8 384 1000
111,557.0 128,259.1 123,323.1Itanium 9140 64 O11g1.6 384 1000
139,181.0 141,188.1 140,181.1Itanium 9350 64 O11R21.73 512 1000
782,608.7 1,740,122 1,166,977Xeon 5450 512 O RAC3.0 2048 1000
TPC-H 3TBTPC-H 3TB
PowerThrough
putQphHProcessors
TotalCores
SQLGHzMemGB
SF
120,254.8 87,841.4 102,254.816 Xeon 7460 96 8r22.66 1024 3000
185,297.7 142,685.6 162,601.78 Xeon 7560 64 8r22.26 512 3000
142,790.7 171,607.4 156,537.3POWER6 64 Sybase5.0 512 3000
182,350.7 216,967.7 198,907.5SPARC 128 O11R22.88 512 3000
TPC-H Published ResultsTPC-H Published Results
Power
23,378
72,110.5
99,426.3
94,761.5
25,206.4
67,287.4
75,161.2
109,067.1
129,198.3
185,297.7
Throughput
13,381
36,190.8
55,038.2
53,855.6
13,283.8
41,526.4
44,271.9
76,869.0
89,547.7
142,685.6
QphH
17,686.7
51,085.6
73,974.6
71,438.3
18,298.5
52,860.2
57,684.7
91,558.2
107,561.2
162,601.7
ProcessorsTotalCores
SQLGHzMemGB
2 Xeon 5355
2 Xeon 5570
2 Xeon 5680
2 Opt 6176
8
8
12
24
5sp2
8sp1
8r2
8r2
2.66
2.93
3.33
2.3
64
144
192
192
4 Opt 8220
8 Opt 8360
8
32
5rtm
8rtm
2.8
2.5
128
256
8 Opt 8384
8 Opt 8439
32
48
8rtm
8sp1
2.7
2.8
256
256
4 Opt 6176 48 8r22.3 512
8 Xeon 7560 64 8r22.26 512
SF
100
100
100
100
300
300
300
300
300
3000
SF100 Big Queries (sec)SF100 Big Queries (sec)
0
10
20
30
40
50
60
Q1 Q9 Q13 Q18 Q21
5570 HDD 5570 SSD
5570 FusionIO 5680 SSD
6176 SSD
Xeon 5570 with SATA SSD poor on Q9, reason unknownBoth Xeon 5680 and Opteron 6176 big improvement over Xeon 5570
Qu
ery
tim
e in
se
c
SF100 Middle QSF100 Middle Q
0
1
2
3
4
5
6
7
8
Q3 Q5 Q7 Q8 Q10 Q11 Q12 Q16 Q22
5570 HDD 5570 SSD 5570 FusionIO
5680 SSD 6176 SSD
Xeon 5570-HDD and 5680-SSD poor on Q12, reason unknownOpteron 6176 poor on Q11
Qu
ery
tim
e in
se
c
SF100 Small QueriesSF100 Small Queries
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Q2 Q4 Q6 Q14 Q15 Q17 Q19 Q20
5570 HDD 5570 SSD 5570 FusionIO
5680 SSD 6176 SSD
Qu
ery
tim
e in
se
c
Xeon 5680 and Opteron poor on Q20Note limited scaling on Q2, & 17
SF300 Big QueriesSF300 Big QueriesQ
ue
ry ti
me
in s
ec
Opteron 6176 poor relative to 8439 on Q9 & 13, same number of total cores
0
20
40
60
80
100
120
Q1 Q9 Q13 Q18 Q21
8 x 8360 QC 2M
8 x 8384 QC 6M
8 x 8439 6C
4 x 6176 12C
4 x 7560 8C
SF300 Middle QSF300 Middle Q
Opteron 6176 much better than 8439 on Q11 & 19Worse on Q12
Qu
ery
tim
e in
se
c
0
4
8
12
16
20
24
28
Q3 Q5 Q7 Q8 Q10 Q11 Q12 Q16 Q19 Q20 Q22
8x8360 QC 2M 8x8384 QC 6M
8x8439 6C 4x6176 12C
4x7560 8C
SF300 Small QSF300 Small Q
Opteron 6176 much better on Q2, even with 8439 on others
Qu
ery
tim
e in
se
c
0
1
2
3
4
5
6
Q2 Q4 Q6 Q14 Q15 Q17
8 x 8360 QC 2M 8 x 8384 QC 6M
8 x 8439 6C 4 x 6176 12C
4 x 7560 8C
SF1000 Sybase vs. SQL ServerSF1000 Sybase vs. SQL Server
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Query time, Sybase relative SQL Server, both on DL785 48-core
SF1000 Large QueriesSF1000 Large Queries
0
50
100
150
200
250
300
350
400
Q1 Q9 Q13 Q18 Q21
SQL Server
Sybase
SF1000 Middle QueriesSF1000 Middle Queries
0
10
20
30
40
50
60
70
80
Q3 Q5 Q7 Q8 Q10 Q11 Q12 Q17 Q19
SQL Server
Sybase
SF1000 Small QueriesSF1000 Small Queries
0
5
10
15
20
25
30
35
Q2 Q4 Q6 Q14 Q15 Q16 Q20 Q22
SQL Server
Sybase
SF1000 Itanium - SuperdomeSF1000 Itanium - Superdome
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Query time, Superdome 2 versus Superdome,16-way quad-core and 32-way dual-core
512-core C2 RAC vs. 64-core It2512-core C2 RAC vs. 64-core It2
0
2
4
6
8
10
12
14
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Query time, Superdome 2 versus RAC,16-way quad-core (64 cores) and 64-node 2-way quad-core (512 cores)Oracle RAC 5.6X higher Power
SF 3TB – 8SF 3TB – 8××7560 versus 167560 versus 16××74607460
0.0
0.5
1.0
1.5
2.0
2.5
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Broadly 50% faster overall, 5X+ on one, slower on 2, comparable on 3
5.6X
64 cores, PWR6 vs. Xeon 756064 cores, PWR6 vs. Xeon 7560
0
1
2
3
4
5
6
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Query time, POWER6 relative to X7560Overall, Xeon 7560 is 30% faster on power, but wide variations on individual queries, some with Pwr6 faster
SF3000 Big QueriesSF3000 Big Queries
0
100
200
300
400
500
600
Q1 Q9 Q13 Q18 Q21
Uni 16x6
DL980 8x8
Pwr6
M9000
SF3000 Middle and Small QSF3000 Middle and Small Q
0
20
40
60
80
100
120
140
160
180
200
Q3 Q5 Q7 Q8 Q10 Q11 Q12 Q16 Q17 Q19
Uni 16x6
DL980 8x8
Pwr6
M9000
0
10
20
30
40
50
60
Q2 Q4 Q6 Q14 Q15 Q16 Q20 Q22
Uni 16x6
DL980 8x8
Pwr6
M9000
TPC-H SummaryTPC-H Summary
Scaling is impressive on some SQL
Limited ability (value) is scaling small Q
Anomalies, negative scaling
TPC-H QueriesTPC-H Queries
Q1 Pricing Summary ReportQ1 Pricing Summary Report
Query 2 Minimum Cost SupplierQuery 2 Minimum Cost Supplier
Wordy, but only touches the small tables, second lowest plan cost (Q15)
Q3Q3
Q6 Forecasting Revenue ChangeQ6 Forecasting Revenue Change
Q7 Volume ShippingQ7 Volume Shipping
Q8 National Market ShareQ8 National Market Share
Q9 Product Type Profit MeasureQ9 Product Type Profit Measure
Q11 Important Stock IdentificationQ11 Important Stock Identification
Non-Parallel Parallel
Q12 Random IO?Q12 Random IO?
Q13Q13 Why does Q13 have perfect scaling?
Q17 Small Quantity Order RevenueQ17 Small Quantity Order Revenue
Q18 Large Volume CustomerQ18 Large Volume Customer
Non-Parallel
Parallel
Q19Q19
Q20?Q20?
This query may get a poor execution plan
Date functions are usually written as
because Line Item date columns are “date” typeCAST helps DOP 1 plan, but get bad plan for parallel
Q21 Suppliers Who Kept Orders WaitingQ21 Suppliers Who Kept Orders Waiting
Note 3 references to Line Item
Q22Q22