sql server 2014: in in-memory oltp for database developers

Kevin LiuPrincipal Lead Program Manager

SQL Server 2014: In In-memory OLTP for Database Developers

Agenda

• Overview of SQL Server 2014 and In-memory improvements

• In-memory OLTP architecture• How to use• Memory usage and control• Data durability• Analysis and Migration Tools

SQL Server 2014 InvestmentsIn-Memory

TechnologiesEnhanced

High Availability New Hybrid Scenarios

In-Memory OLTP• 5-25X performance gain

for OLTP integrated into SQL Server

In-Memory DW• 5-25X performance gain

and high data compression

• Updatable and clustered

SSD Buffer Pool Extension• 4-10X of RAM and up to

3X performance gain transparently for apps

Always On Enhancements • Increased availability and

improved manageability of active secondaries

Online Database Operations• Increased availability for

index/partition maintenance

Backup to Azure• Easy to implement and

cost effective Disaster Recovery solution to Azure Storage

HA to Azure VM• Easy to implement and

cost effective high availability solution with Windows Azure VM

Deploy to Azure• Deployment wizard to

migrate database

Better together with Windows Server• WS2012 ReFS support• Online resizing VHDx• Hyper-V replica• Windows “Blue”

support

Extending Power View• Enable Power View on

existing analytic models and support new multi-dimensional models.

Other investments

In-memory TechnologiesIn-Memory

Technologies

In-Memory OLTP• 5-25X performance gain

for OLTP integrated into SQL Server

In-Memory DW• 5-30X performance gain

and high data compression

• Updatable and clustered

SSD Buffer Pool Extension• 4-10X of RAM and up to

3X performance gain transparently for apps

Applicable to

Transactional workloads: Concurrent data entry, processing and retrieval

Applicable to

Decision support workloads: Large scans and aggregates

Applicable to

Disk-based transactional workloads:Large working (data)set

Why In-Memory OLTP (Hekaton)

• Market need for higher throughput and predictable lower latency OLTP at a lower cost

• Hardware trends demand architectural changes on RDBMS

• In-Memory OLTP is: High performance,Memory-optimized OLTP engine, Integrated into SQL Server and Architected for modern hardware trends

1990

1991

1991

1992

1993

1993

1994

1994

1995

1995

1996

1997

1997

1998

1998

1999

2000

2000

2001

2002

2002

2003

2004

2006

2007

2008

2009

2009

2010

2011

1

10

100

1000

10000

100000

1000000

$ per GB of PC Class Memory

US$

/GB

Decreasing RAM cost

Moore’s Law on total CPU processing power holds but

in parallel processing…

CPU clock rate stalled…

Hardware trends

SQL Server Integration

• Same manageability, administration & development experience

• Integrated queries & transactions

• Integrated HA and backup/restore

Main-Memory Optimized

• Direct pointers to rows

• Indexes exist only in memory

• No buffer pool• No write-ahead

logging• Stream-based

storage

High Concurrency

• Multi-version optimistic concurrency control with full ACID support

• Lock-free data structures

• No locks, latches or spinlocks

• No I/O during transaction

T-SQL Compiled to Machine Code• T-SQL compiled to

machine code leveraging VC compiler

• Procedure and its queries, becomes a C function

• Aggressive optimizations @ compile-time

Steadily declining memory price,

NVRAM

Many-core processors

Stalling CPU clock rate

TCO

Hardware trends Business

In-Memory OLTP: architected to enable new real-time applications

Hybrid engine and integrated

experience

High performance

data operations

Frictionless scale-up

Efficient, business-logic

processingCu

sto

mer

Ben

efi

ts

Hekato

n T

ech

P

illa

rsD

rivers

Demo Business and User impact

Memory-optimized Table Filegroup Data Filegroup

SQL Server.exe

Hekaton Engine: Memory_optimized Tables &

Indexes

TDS Handler and Session Management

In-Memory OLTP: built into SQL Server 2014

Native-Compiled SPs and Schema

Buffer Pool

Execution Plan cache for ad-hoc T-

SQL and SPs

Application

Transaction Log

Query Interop

Non-durable Table T1 T3T2

T1 T3T2

T1 T3T2

T1 T3T2

TablesIndexes

T-SQL Interpreter

T1 T3T2

T1 T3T2

Access Methods

Parser, Catalog, Optimize

r

Hekaton Compiler Hekaton

Component

KeyExisting

SQL Compone

nt

Generated .dll

20-40x more efficient

Reduced log contention; Low

latency still critical for performance

Checkpoints are background sequential

IO

No V1 improvements in comm layers

Early customer performance gains

WL Derived from TPC-C

Legacy App

Ingest/Read heavy

Best Fit

0 5 10 15 20 25 30

2

5

10

25

Factor X Gains for Applications

X factor Gains

Despite 20 years of optimizing for existing OLTP benchmarks – we still get 2x on a workload derived from

TPC-C

Apps that take full advantage: e.g. web app session state

Apps with periodic bulk updates & heavy random reads

Existing apps typically see 4-7x improvement

Create Table DDLCREATE TABLE [Customer]( [CustomerID] INT NOT NULL

PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1000000), [Name] NVARCHAR(250) NOT NULL, [CustomerSince] DATETIME NULL

INDEX [ICustomerSince] NONCLUSTERED)WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

This table is memory

optimized

This table is durable

Non-durable tables:DURABILITY=SCHEMA_ON

LY

Indexes are specified inline

NONCLUSTERED indexes are supported

Hash IndexBUCKET_COUNT 1-2X nr of unique index key

values

Demo Table and Stored

Procedure Performance Gain

Memory-optimized Data Structures Rows

• New row format • Structure of the row is optimized for memory residency and access• No data page containers• Rows are versioned (payload never updated in place)

Indexes• Nonclustered Indexes only• Indexes point to rows, they do not duplicate them• <nonclustered hash> index for point lookups• <nonclustered> index for range (inequality) and ordered scans• Not logged and do not exist on disk – maintained online or recreated

during recovery

Memory-optimized Table: Row Format

Row header Payload (table columns)

Begin Ts End Ts StmtIdIdxLinkCou

nt

8 bytes 8 bytes 4 bytes 2 + 2 (padding) bytes

8 bytes * (IdxLinkCount)

Rows• New row format • Structure of the row is optimized for memory residency and access• Rows are versioned (payload never updated in place) and Begin/End timestamp determines row’s version validity and

visibility• No data pages; just rows• Row size limited to 8060 bytes (@table create time) to allow data to be moved to disk-based table• Not every SQL table schema is supported (for example LOB and sqlvariant)

Lightweight hash index data structure and MVCC

50, ∞ John Paris

Timestamps NameChain ptrs City

Hash index on Name

Transaction 100:

UPDATE City = ‘Prague’ where Name = ‘John’

No locks of any kind, no interference with transaction 99

100, ∞ John Prague

90, ∞ Susan Bogota

f(John)

100

Transaction 99: Running compiled querySELECT City WHERE Name = ‘John’Simple hash lookup returns direct pointer to ‘John’ row

Background operation will unlink and deallocate the old ‘John’ row after transaction 99 completes.

Hekaton Principle:

• Performance like a cache• Functionality like a RDMBS

Range index

10 20 28

5 8 10 11 15 18 21 24 27

PAGE

Page Mapping Table

0

1

2

3

14

15

PAGE

1 2 4 6 7 8 25 26 27

200, ∞ 1 50, 300 2

Root

Non-leaf pages

leaf pages

Data rows

PageID-0

PageID-3 PageID-2

PageID -14

Page size- up to 8K. Sized to the row Logical pointers

Indirect physical pointers through Page Mapping table

Page Mapping table grows (doubles) as table grows

Sibling pages linked one direction Require two indexes for ASC/DSC

No in-place updates on index pages Handled thru delta pages or building new pages

No covering columns (only the key is stored)

Key Key

LogicalPhysical

100,200 1

Memory-Optimized IndexesExist only in memoryRebuilt on database startup

Do not contain data rowsIndexes contain memory pointers to the data rowsNo duplication of data (logically all indexes are covering)

Hash Indexes

• Need to specify bucket_count – usually 1-2X the number of unique index keys

• Optimized for point-lookups• Cannot be used for search on subset of the

index key• Cannot be used for range lookups (search

on inequality predicate) or ordered scans (for ORDER BY)

Nonclustered Indexes

• No need to specify any options• Optimized for range lookups and ordered

scans• Support search on leading columns of the

index key• Support point-lookups, but hash indexes are

much faster

Create (Natively Compiled) Procedure DDLCREATE PROCEDURE [dbo].[InsertOrder] @id INT, @date DATETIME WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNERAS BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')

-- insert T-SQL hereEND

This proc is natively compiled

Native procs must be schema-bound

Atomic blocks • Create a

transaction if there is none

• Otherwise, create a savepoint

Execution context is required

0

Session settings are fixed at create time

Natively Compiled Procs• Access only memory optimized

tables• Maximum performance• Limited T-SQL surface area

When to use• OLTP-style operations• Optimize performance critical

business logic

Accessing Memory Optimized TablesInterpreted T-SQL Access

• Access both memory- and disk-based tables

• Less performant• Virtually full T-SQL surface

When to use• Ad hoc queries• Reporting-style queries• Surface area not available in

native

Estimating Memory ConsumptionMemory Size = Table Size + SUM(Index Size)

Table Size = Row Size * Row Count • Row Size = SUM(Column Sizes) + Row Header Size• Row Header Size = 24 + 8 * Index Count• Column Size = Dependent on column type and associated

padding/overhead

Hash Index Size = Bucket_Count * 8 bytesNonclustered Index Size = Row Count * (Key Size + 8) bytes

Guidance on provision memory ~2 times Memory Size due to row versioning overhead

Hekaton data needs to fit in memory

Buffer Pool

Memory Optimized

Tables

Memory Internal

Structures

Available Memory

Max S

erv

er

Mem

ory

Buffer Pool

Memory Internal

Structures

Memory Optimized

Tables

Buffer Pool

Memory Internal

Structures

Memory Optimized

Tables

Buffer Pool

Memory Internal

Structures

Memory Optimized

Tables

Memory Management with Resource GovernorData resides in memory at all times• Must configure SQL Server with sufficient memory to store memory-optimized tables• Failure to allocate memory will fail transactional workload at run-time• Integrated with SQL Server memory manager and reacts to memory pressure for GC

(Garbage Collection)• Guidance for SQL Server 2014 is not to exceed 256GB of in-memory table user data

Integrated with Resource Governor• Recommend using dedicated resource pool to ensure performance stability for disk-

based table workloads• “Bind” a database to a resource pool• Memory-optimized tables in a database cannot exceed the limit of the resource pool• Hard top limit (function of the physical memory) to ensure system remains stable under

memory pressure

Demo Memory Usage

and Control

Durability : Data and Delta Files

Data File

Delta File

0 100

TS (ins)

RowId TableIdTS

(ins)RowId TableId

TS (ins)

RowId TableId

TS (ins)

RowIdTS

(del)TS (ins)

RowIdTS

(del)TS (ins)

RowIdTS

(del)

Check

poin

t Fi

le P

air

Row pay load

Row pay load

Row pay load

Transaction Timestamp Range

Data file contains rows inserted

within a given transaction range

Delta file contains deleted rows

within a given transaction range

Populating Data/Delta files via sequential IO only

Offline Checkpoint Thread

Memory-optimized Table Filegroup

Ran

ge 1

00

-2

00

Ran

ge 2

00

-3

00

Ran

ge 3

00

-4

00

Ran

ge 4

00

-5

00

Ran

ge 5

00

-

New InsertsDelete 450 TSDelete 250 TS

Delete 150 TS

Data file with rows generated in

timestamp range IDs of Deleted Rows (height indicates

% deleted)

Del Tran2(TS 450)

Del Tran3(TS 250)

Del Tran1(TS150)

Insert into Hekaton T1

Log in disk Table

Del Tran1(row TS150)

Del Tran2(row TS 450)

Del Tran3(row TS 250) Insert into T1SQL Transaction log

(from LogPool)

• Data file has pre-allocated size (128 MB or 16 MB on smaller systems)

• Engine switches to new data file when the current file is full

• Transaction does not span data files

• Once a data file is closed, it becomes read-only

• Row deletes are tracked in delta file

• Files are append only

Merge to Minimize Checkpoint File SizeWhat is a Merge Operation?• Merges one or more adjacent data/delta files pairs into 1 pair

Need for Merge• Deleting rows causes data files to have stale rows• DMV: sys.dm_xtp_checkpoint_files can be used to find inserted/deleted rows

and freespace

Benefits of Merge• Reduces storage (i.e. fewer data/delta files) required to store active data rows• Improves the recovery time as there will be fewer files to load

Merge is a (non-blocking) background operation• Merge does not block concurrent deletes in the affected file pairs

Memory-optimized data Filegroup

Ran

ge 2

00-

299

Ran

ge 3

00-

399

Ran

ge 2

00-

300

Ran

ge 3

00-

400

Merge Operation Internals

Memory-optimized data Filegroup

Files as of Time 600R

an

ge 1

00-

200

Ran

ge 2

00-

300

Ran

ge 3

00-

400

R

an

ge 4

00-

500

Data file with rows generated in timestamp

range

IDs of Deleted Rows (height indicates %

deleted)

Merge200-400

Deleted Files

Files Under Merge

Files as of Time 500

Ran

ge 1

00-

200

Ran

ge 4

00-

500

Ran

ge 5

00-

600

Ran

ge 2

00-

400

Efficient Logging for Memory-Optimized TablesUses SQL Server transaction log to store content• Each In-Memory OLTP log record contains a log record header followed by opaque

memory optimized-specific log content.

All logging for memory-optimized tables is logical• No log records for physical structure modifications.• No index-specific / index-maintenance log records.• No UNDO information is logged

Recovery Models • All three recovery models (Simple, Full, Bulk) are supported

In-Memory OLTP Recovery – Speed of IO

Delta map

Recovery Data Loader

Delta

File1

Memory Optimized Tables



Delta mapDelta map

Data

File1

Delta

File2

Data

File2

Delta

File3

Data

File3

filter filter filter

Memory Optimized Container - 1 Memory Optimized Container - 2

Impact on Recovery Time Objective (RTO)• Load speed (IO) of data & Size

of durable tables

• Hash index with heavy collision (bucket count too low) and large non-clustered (Range) index have additional recovery overhead.

Suitable scenarios for V1 (in SQL Server 2014)

Optimal for V1 Not Optimal for V1

Business logic In database (as SPs) In mid-tier

Latency and contention locations

Concentrated on a sub-set of tables/SPs

Spread across the database

Client server communication

Less frequent Chatty

T-SQL surface area Basic Complex

Log IO Not limiting factor Limiting factor

Data size Hot data fit in memory Unbounded hot data size

Key factor in performance benefits: processing closer to data, i.e. Ideal for applications with data processing in the DB layer

Demo Analysis and Migration

tools

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

MICROSOFT CONFIDENTIAL – INTERNAL ONLY

In-Memory OLTP Concurrency Control

Multi-version data store

Snapshot-based transaction isolation

No TempDB

Conflict detection to ensure isolation

No deadlocks

No locks, no latches, minimal context switches

No blocking

Multi-version

Optimistic


Supported Isolation LevelsSNAPSHOTReads are consistent as of start of the transactionWrites are always consistent

REPEATABLE READRead operations yield same row versions if repeated at commit time

SERIALIZABLETransaction is executed as if there are no concurrent transactions – all actions happen at a single serialization point (commit time)

Example: Write conflict

Time Transaction T1 (SNAPSHOT) Transaction T2 (SNAPSHOT)

1 BEGIN

2 BEGIN

3 UPDATE t SET c1=‘value2’ WHERE c2=123

4 UPDATE t SET c1=‘value1’ WHERE c2=123 (write conflict)

First writer wins


Cross-container transactions

Disk-based tables Memory-optimized tables

Tab1 Tabn Tab1 Tabm

Regular Tx context Mem-opt Tx context


Disk-based Memory optimized

Usage recommendations

READCOMMITTED SNAPSHOT • Baseline combination – most cases that use READCOMMITTED today

READCOMMITTED REPEATABLEREAD/SERIALIZABLE

• Data migration • Hekaton-only Interop

REPEATABLEREAD/SERIALIZABLE

SNAPSHOT • Memory-optimized table access is INSERT-only• Useful for data migration and if no concurrent

writes on memory-optimized tables (e.g., ETL)

Disk-based Memory optimized

SNAPSHOT Any isolation level



Supported isolation level combinations (V1)

Unsupported isolation level combinations (V1)


Limitations in SQL Server 2014TablesTriggers: no DDL/DML triggersData types: no LOBs, no XML and no CLR data typesConstraints: no FOREIGN KEY and no CHECK constraintsNo schema changes (ALTER TABLE) – need to drop/recreate tableNo add/remove index – need to drop/recreate table

Natively Compiled Stored ProceduresNo OUTER JOIN, no DISTINCT no OR, no subqueries, no CASELimited built-in functions [core math, date/time, and string functions are supported]

Your Feedback is Important

Fill out an evaluation of this session and help shape future events.

Scan the QR code to evaluate this session on your mobile device.

You’ll also be entered into a daily prize drawing!

© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

sql server 2014: in in-memory oltp for database developers

Documents

memory dw5

x performance gain

performance benefits

agendaoverview of sql

oltp workload

migration tools sql

new updates

x of ram