leveraging memory in sql server

65
About Me Leveraging Memory With SQL Server Level 400

Upload: chris-adkin

Post on 06-Jan-2017

436 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Leveraging memory in sql server

About Me

Leveraging Memory With SQL ServerLevel 400

Page 2: Leveraging memory in sql server

15+ years plus database experience Speaker at the last three SQL Bits and at

Pass events around Europe Some of my material on spinlocks is

referenced by SQL Skills

About Me

Page 3: Leveraging memory in sql server

Page life expectancy Understanding different forms of

memory pressure through: System health event Pending memory grants

Memory grant mis-estimates Plan cache bloat Memory pressure in virtualized

environments Balloon drivers etc

C P UC P U “Old World” Memory Wisdom

Page 4: Leveraging memory in sql server

What memory is and isn’t: DRAM, NVRAM, Flash etc

How to encourage CPU cachefriendly hash join behavior

How large pages work

Synchronization primitives and memory

Locality, locality, locality !!!

C P UC P U However, We Are Going To Go “Off-Piste” !!!

Page 5: Leveraging memory in sql server

The Basics

Level 300

Page 6: Leveraging memory in sql server

C P UC P UC P UC P U Myth Busting: DRAM and NAND Flash Are The Same !

Flash latency 15~97 ms = 0.015 seconds (best case)

DRAM latency 100 ns = 0.000000100 seconds

Also NAND flash is not byte addressable

Page 7: Leveraging memory in sql server

C P UMemory Cache Lines Modern Computer, A 200 Foot Overview

Page 8: Leveraging memory in sql server

Core

L3 Cache

L1 Instruction Cache 32KB

L0 UOP cache

L2 Unified Cache 256K

L1 Data Cache32KB

Core

CoreL1 Instruction Cache 32KB

L0 UOP cache

L2 Unified Cache 256K

L1 Data Cache32KB

Core

Bi-directional ring bus

Memory bus

C P U

Your CPU Has Its Own Memory Hierarchy

Page 9: Leveraging memory in sql server

Single Socket IO Performance

Core

Core

Core

Core

L1

L1

L1

L1

L3

L2

L2

L2

L2

Core

Core

Core

Core

L1

L1

L1

L1

L3

L2

L2

L2

L2

Remote memory accessLocal

memory access

Local memory access

NUMA Node 0 NUMA Node 1

C P UC P UC P UC P U Introducing NUMAe

Page 10: Leveraging memory in sql server

An additional 20% overhead when accessing ‘Foreign’ memory !( from coreinfo )

C P UC P U Remote Memory Access – No Free Lunch Here !

Page 12: Leveraging memory in sql server

C P UC P UC P UC P U Why Do We Have NUMA In The First Place ?

Single shared memory bus

CPU CPU CPU CPU

Bus saturation !!!

The “Old world” of uniform memory access:

Page 13: Leveraging memory in sql server

C P UC P UC P UC P U SQL Server High Level Memory Architecture

CPU 0 CPU 1

Memorynode

Memorynode

Memoryclerk

Memoryclerk

sys.dm_os_nodes

sys.dm_os_memory_clerks

sys.dm_os_memory_nodes

Page 14: Leveraging memory in sql server

C P UC P UC P UC P U Knobs and Dials For Controlling NUMA and Memory

Min and max memory setting

Lock pages in memory privilege

CPU affinity mask

Trace flag 8048: upgrade memory partitioning to CPU level (SQL 2016 default)

Trace flag 8015: disable NUMA at SQL OS level

Trace 835: use lock pages in memory for SQL Server standard edition

Page 15: Leveraging memory in sql server

C P UC P UC P UC P U SQL Server Memory Myth Busting

Page 16: Leveraging memory in sql server

C P UC P UC P UC P U SQL Server Memory Myth Busting

Might help when not hitting a good PLE per node for an OLTP style application

For a data warehouse / OLAP style application, focus on being able to fit the largest partition in memory.

More memory may equal slower clocked memory

How you access memory mattersWhere you access memory matters

Page 17: Leveraging memory in sql server

L1 Cache sequential access

L1 Cache In Page Random access

L1 Cache In Full Random access

L2 Cache sequential access

L2 Cache In Page Random access

L2 Cache Full Random access

L3 Cache sequential access

L3 Cache In Page Random access

L3 Cache Full Random access

Main memory

0 20 40 60 80 100 120 140 160 1804

4

4

11

11

11

14

18

38

167

Main

memoryCPU

Main Memory Is Not As Fast As We Might Think !!!

Page 18: Leveraging memory in sql server

The Database Engine Is Not Always CPU Cache Friendly

Take the loop join for example . . .

Page 19: Leveraging memory in sql server

Crawling A Tree In Memory

Page 20: Leveraging memory in sql server

Memory Is Scare What Happens When Memory Is Scarce ?

Available hash memory (MB)

Page 21: Leveraging memory in sql server

C P UC P U There Are Things We Can Do To Leverage The CPU Cache !!!

2 4 6 8 10 12 14 16 18 20 22 240

10000

20000

30000

40000

50000

60000

70000

80000

Non-sorted column store Sorted column store

Degree of Parallelism

Tim

e (m

s)

Page 22: Leveraging memory in sql server

Advanced Topics

Level 400

Page 23: Leveraging memory in sql server

System on chip architecture

Multi level memory hierarchy

Integrated memory and PCI controllers

Utility services provisioned by ‘Un-core’ part of the die

Core

L3 Cache

L1 Instruction Cache 32KB

L0 UOP cache

L2 Unified Cache 256K

Power and

ClockQPIMemory

Controller

L1 Data Cache32KB

Core

CoreL1 Instruction Cache 32KB

L0 UOP cache

L2 Unified Cache 256K

L1 Data Cache32KB

Core

Bi-directional ring bus

PCI2.0TLB

C P U

QPI…

Un-core

Memory Cache Lines The Modern Intel CPU . . . Again

Page 24: Leveraging memory in sql server

Memory Cache Lines

new OperationData() new OperationData() new OperationData()

Cache LineCache Line

64B

Cache Lines And CPU Cache SetsC P UCache

Level 1: 8 way associative

Level 2: 8 way associative

Level 3: 16 way associative

Page 25: Leveraging memory in sql server

Memory Cache Lines All Memory Access is CPU IntensiveC P U

Takeaway point: we want to stay “On socket” !!!

Page 26: Leveraging memory in sql server

Memory Cache Lines How We Access Memory Matters !!!

Page 27: Leveraging memory in sql server

C P U

Core

L3 CacheThe Old World

The New World With Data Direct IO

Core

C P U

Core

L3 Cache

Core

Everything Should Go Via Main Memory, Right ?

Main memory is not as fast as you might think !!!

Page 28: Leveraging memory in sql server

2 x 10 GBe

2 x 10 GBe

4 x 10 GBe

4 x 10 GBe

6 x 10 GBe

6 x 10 GBe

8 x 10 GBe

8 x 10 GBe

0102030405060708090

Single Socket IO Performance

Tran

sacti

ons/

Sec

(Mu)

Xeon5600

XeonE5

C P UC P U What Data-Direct IO Gives Us

Page 29: Leveraging memory in sql server

How Large Memory Pages Work

L3 Cache

Power andClock

Core

Bi-directional ring bus

PCI

TLBQPI

Un-core

Core

Page Translation

Table

MemoryController

DTLB( 1 st level )

STLB( 2 nd level )

~10s of CPU

cycles

160+ CPU

cycles

Page 30: Leveraging memory in sql server

The Look Aside Buffer With Large Pages

L3 Cache

Power andClock

Core

Bi-directional ring bus

PCITLB: 32 x 2MBpages

QPI

Un-core

Core

MemoryController

128Kb of logical to physical memory mapping coverage is

increased to 64Mb !!!

Fewer trips off theCPU to the page table

Page 31: Leveraging memory in sql server

. . . And You Think I’m An Uber-Geek ?

“Dr Bandwidth”from the IntelDeveloper Zone

Page 32: Leveraging memory in sql server

C P UC P UOLTP Rules Of Thumb – Execution PlansOLTP Tuning For Dummies The Difference Large Pages Make

Large pages29 % increase

in page lookups / s

Page 33: Leveraging memory in sql server

IO H

ub

CPU 1 CPU 3

CPU 0 CPU 2

CPU 6 CPU 7

CPU 4 CPU 5

CPU 2 CPU 3

CPU 0 CPU 1

IO Hub

IO HubIO

Hub

IO H

ub

IO H

ub

C P UC P U Advanced NUMA Topolgies

Information courtesy of Joe Chang

Page 34: Leveraging memory in sql server

C P UC P U SQLOS Checking For NUMA Locality Under The Covers

Page 35: Leveraging memory in sql server

C P UC P U Why Is Buffer Pool Pressure Bad ?

Page 36: Leveraging memory in sql server

17.41Mb column store Vs. 51.7Mb column store

The fastest

?

The fastest

?

C P UC P U Which Statement Has The Lowest Elapsed Time ?

Page 37: Leveraging memory in sql server

Hash agg lookup weight 65,329.87

Column Store scan weight 28,488.73

C P UC P U Using Non Pre-Sorted Data – Call Stack

Page 38: Leveraging memory in sql server

Control flow

Data flow

The call stack indicates that theBottleneck is right here

C P UC P U How Queries Are Executed

Page 39: Leveraging memory in sql server

Hash agg lookup weight: now 275.00 before 65,329.87

Column Store scan weight now 45,764.07 before 28,488.73

Hash probes resulting in sequential memory access = CPU savings > cost of scanning an enlarged column store

C P UC P U Using Pre-Sorted Data – Call Stack

Page 40: Leveraging memory in sql server

2 4 6 8 10 12 14 16 18 20 22 240

1,000,000,000

2,000,000,000

3,000,000,000

4,000,000,000

5,000,000,000

6,000,000,000

Non-sorted Sorted

Degree of Parallelism

LLC

Miss

es

Last Level Cache saturation point

Dip because worker threads 13 and14 have the LLC of CPU 1 to themselves

C P UC P U What It Boils Down To – CPU Stalls !

Page 41: Leveraging memory in sql server

C P UC P UC P UC P U SQL Server Memory Myth Busting

Memory only matters for the majormemory pools and

query plan iterator memory grants

Page 42: Leveraging memory in sql server

C P UC P U Latches Versus Spinlocks

Page 43: Leveraging memory in sql server

A task will spin until it can acquire the spinlock it is after

For short lived waits this uses less CPU cycles than a yielding then waiting for the task thread to be at the head of the runnable queue.

C P UC P U How Spinlocks Work

Page 44: Leveraging memory in sql server

We have to yield the scheduler at some stage !

C P UC P U SQL 2008 R2 Introduced Exponential Back Off

Page 45: Leveraging memory in sql server

spin_acquireInt s

spin_acquireInt s

spin_acquireInt s

Transfer cache entry

Transfer cache entry

CPU CPU

L3

Core

Core

C P U

L3

Core

Core

C P U

C P UC P U Spinlocks and Memory

Page 46: Leveraging memory in sql server

L3

Core 0

Core 1

Core 2

Core 4

Core 3

Core 5

Core 6

Core 7

Core 9

Core 8

C P U

L3

Core 0

Core 1

Core 2

Core 4

Core 3

Core 5

Core 6

Core 7

Core 9

Core 8

C P U

Faster here ?Numa Node 0

. . . Or faster here? Numa Node 1

18 threads here

73 s

18 threads here

125 s

C P UC P U Which CPU Socket The Insert Run The Fastest On ?

Page 47: Leveraging memory in sql server

18 insert thread log writer CPU socketCo-location.

18 insert threads not co-located on same socket as the log writer

84,697 ms

Vs.

11,281,235 ms

C P UC P U What Does Windows Performance Toolkit Have To Say ?

Page 48: Leveraging memory in sql server

spin_acquireInt s

spin_acquireInt s

spin_acquireInt s

Transfer cache line

Transfer cache line

CPU CPU

L3Core

C P U

C P U C P U100 CPU cycles

Core

34 CPU cycles100 CPU cycles

34 CPU cycles

Core to core on the same socket Core to core on different sockets

C P UC P U The CPU Cycle Cost Of Cache Line Transfers

Page 49: Leveraging memory in sql server

C P UC P U The In Memory OLTP Hash Indexes: Think Buckets

Smaller bucket counts = better cache line reuse + reduced TLB thrashing + reduced hash table cache out

Larger bucket counts = reduced cache line reuse + increased TLB thrashing + less hash bucket scanning for lookups

Lookup Table(Hash)

Page 50: Leveraging memory in sql server

C P UC P U Is There A Hash Index Bucket Count Sweet Spot ?

0

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

7,000,000

8,000,000

9,000,000

531575 545851 530447

7911392 8064516 8169934 8445945Insert Rate For 10 Threads / Bucket Count

Bucket Count

Inse

rt R

ate

16,777,216 67,108,864 33,554,432 2,097,152 1,048,576 4,194,304 524,288

Page 51: Leveraging memory in sql server

C P UC P U NUMA Locality and The In-Memory OLTP Engine

SQL Server 2016 RC3

Singleton inserts into a memory optimised table with a hash index

2 sockets, 10 cores per socket

Measuring the effect of moving the CPU affinity mask around

Page 52: Leveraging memory in sql server

C P UC P U

Questions ?

Page 54: Leveraging memory in sql server

C P UC P U

Addendum: Windows Performance Toolkit Basics

Page 55: Leveraging memory in sql server

Wait Time +

Service Time

What is happening

here ?

C P UC P U Wait Time Is Well Understood , Service Time However…

Page 56: Leveraging memory in sql server

C P UC P U Introducing Windows Performance Toolkit

Remember to turn the paging executive off !

Page 57: Leveraging memory in sql server

CPU analysisWhere out CPU cycles are going

Wait AnalysisWhat threads are waiting on

Deferred Procedure Call/Interrupt Service Request Analysis ?

C P UC P U What Does Windows Performance Toolkit Allow Us To See ?

Page 58: Leveraging memory in sql server

xperf –on base –stackwalk profile

xperf –d stackwalk.etlWPA

Run query

C P UC P U Collecting An Event Trace For Windows

Page 59: Leveraging memory in sql server

A call stack is a stack data structure that stores information about the active subroutines of a computer program. 

C P UC P U What Is A Call Stack ?

Page 60: Leveraging memory in sql server

But for two DLL’s, SQLOSwould run on bare metal

You should only be interestedCPU and DPC/ISR analysisunless significant waits onPREEMPTIVE_OS_ waitevents are prevalent

C P UC P U What Available CPU Stats Are We Interested In ?

Page 61: Leveraging memory in sql server

Database EngineLanguage Processing and Optimisation

sqllang.dll

Runtimesqlmin.dll, sqltst.dll, qds.dll, hekaton.dll, <in-memory-table.dll>, <natively-compiled-obj.dll>

SQLOSsqldk.dll, sqlos.dll

C P UC P U What Do The .DLLs In The Call Stack Represent ?

Page 62: Leveraging memory in sql server

C P UC P U Where Is The CPU Burned In The Legacy Engine ?

sqlmin.dllQuery executionLatchingSpin lockingLockingLog writingLazy writing IO

sqldk.dll sqltst.dll sqlos.dll qds.dll

Page 63: Leveraging memory in sql server

A debug symbol expresses which programming-language constructs generated a specific piece of machine code in a given executable module

If the debug symbols exist, they will be on the symbol server pointed to by WPA by default

C P UC P U What Is A Debug Symbol ?

Page 64: Leveraging memory in sql server

C P UC P U Investigating CPU Saturation

1. Load ETL file 2. Load symbols

3. Open graph explorer

4. Drag ‘Computation’ onto analysis canvas and select graph and table

Page 65: Leveraging memory in sql server

C P UC P U Computation Columns Of Interest

Weight (in view)The sampled CPU time in ms across all CPU cores

% WeightSampled CPU time as a percentage of all CPU time available during the entire sampling period