sql server 2014: in in-memory oltp for database developers
TRANSCRIPT
Agenda
• Overview of SQL Server 2014 and In-memory improvements
• In-memory OLTP architecture• How to use• Memory usage and control• Data durability• Analysis and Migration Tools
SQL Server 2014 InvestmentsIn-Memory
TechnologiesEnhanced
High Availability New Hybrid Scenarios
In-Memory OLTP• 5-25X performance gain
for OLTP integrated into SQL Server
In-Memory DW• 5-25X performance gain
and high data compression
• Updatable and clustered
SSD Buffer Pool Extension• 4-10X of RAM and up to
3X performance gain transparently for apps
Always On Enhancements • Increased availability and
improved manageability of active secondaries
Online Database Operations• Increased availability for
index/partition maintenance
Backup to Azure• Easy to implement and
cost effective Disaster Recovery solution to Azure Storage
HA to Azure VM• Easy to implement and
cost effective high availability solution with Windows Azure VM
Deploy to Azure• Deployment wizard to
migrate database
Better together with Windows Server• WS2012 ReFS support• Online resizing VHDx• Hyper-V replica• Windows “Blue”
support
Extending Power View• Enable Power View on
existing analytic models and support new multi-dimensional models.
Other investments
In-memory TechnologiesIn-Memory
Technologies
In-Memory OLTP• 5-25X performance gain
for OLTP integrated into SQL Server
In-Memory DW• 5-30X performance gain
and high data compression
• Updatable and clustered
SSD Buffer Pool Extension• 4-10X of RAM and up to
3X performance gain transparently for apps
Applicable to
Transactional workloads: Concurrent data entry, processing and retrieval
Applicable to
Decision support workloads: Large scans and aggregates
Applicable to
Disk-based transactional workloads:Large working (data)set
Why In-Memory OLTP (Hekaton)
• Market need for higher throughput and predictable lower latency OLTP at a lower cost
• Hardware trends demand architectural changes on RDBMS
• In-Memory OLTP is: High performance,Memory-optimized OLTP engine, Integrated into SQL Server and Architected for modern hardware trends
1990
1991
1991
1992
1993
1993
1994
1994
1995
1995
1996
1997
1997
1998
1998
1999
2000
2000
2001
2002
2002
2003
2004
2006
2007
2008
2009
2009
2010
2011
1
10
100
1000
10000
100000
1000000
$ per GB of PC Class Memory
US$
/GB
Decreasing RAM cost
Moore’s Law on total CPU processing power holds but
in parallel processing…
CPU clock rate stalled…
Hardware trends
SQL Server Integration
• Same manageability, administration & development experience
• Integrated queries & transactions
• Integrated HA and backup/restore
Main-Memory Optimized
• Direct pointers to rows
• Indexes exist only in memory
• No buffer pool• No write-ahead
logging• Stream-based
storage
High Concurrency
• Multi-version optimistic concurrency control with full ACID support
• Lock-free data structures
• No locks, latches or spinlocks
• No I/O during transaction
T-SQL Compiled to Machine Code• T-SQL compiled to
machine code leveraging VC compiler
• Procedure and its queries, becomes a C function
• Aggressive optimizations @ compile-time
Steadily declining memory price,
NVRAM
Many-core processors
Stalling CPU clock rate
TCO
Hardware trends Business
In-Memory OLTP: architected to enable new real-time applications
Hybrid engine and integrated
experience
High performance
data operations
Frictionless scale-up
Efficient, business-logic
processingCu
sto
mer
Ben
efi
ts
Hekato
n T
ech
P
illa
rsD
rivers
Memory-optimized Table Filegroup Data Filegroup
SQL Server.exe
Hekaton Engine: Memory_optimized Tables &
Indexes
TDS Handler and Session Management
In-Memory OLTP: built into SQL Server 2014
Native-Compiled SPs and Schema
Buffer Pool
Execution Plan cache for ad-hoc T-
SQL and SPs
Application
Transaction Log
Query Interop
Non-durable Table T1 T3T2
T1 T3T2
T1 T3T2
T1 T3T2
TablesIndexes
T-SQL Interpreter
T1 T3T2
T1 T3T2
Access Methods
Parser, Catalog, Optimize
r
Hekaton Compiler Hekaton
Component
KeyExisting
SQL Compone
nt
Generated .dll
20-40x more efficient
Reduced log contention; Low
latency still critical for performance
Checkpoints are background sequential
IO
No V1 improvements in comm layers
Early customer performance gains
WL Derived from TPC-C
Legacy App
Ingest/Read heavy
Best Fit
0 5 10 15 20 25 30
2
5
10
25
Factor X Gains for Applications
X factor Gains
Despite 20 years of optimizing for existing OLTP benchmarks – we still get 2x on a workload derived from
TPC-C
Apps that take full advantage: e.g. web app session state
Apps with periodic bulk updates & heavy random reads
Existing apps typically see 4-7x improvement
Create Table DDLCREATE TABLE [Customer]( [CustomerID] INT NOT NULL
PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT = 1000000), [Name] NVARCHAR(250) NOT NULL, [CustomerSince] DATETIME NULL
INDEX [ICustomerSince] NONCLUSTERED)WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
This table is memory
optimized
This table is durable
Non-durable tables:DURABILITY=SCHEMA_ON
LY
Indexes are specified inline
NONCLUSTERED indexes are supported
Hash IndexBUCKET_COUNT 1-2X nr of unique index key
values
Memory-optimized Data Structures Rows
• New row format • Structure of the row is optimized for memory residency and access• No data page containers• Rows are versioned (payload never updated in place)
Indexes• Nonclustered Indexes only• Indexes point to rows, they do not duplicate them• <nonclustered hash> index for point lookups• <nonclustered> index for range (inequality) and ordered scans• Not logged and do not exist on disk – maintained online or recreated
during recovery
Memory-optimized Table: Row Format
Row header Payload (table columns)
Begin Ts End Ts StmtIdIdxLinkCou
nt
8 bytes 8 bytes 4 bytes 2 + 2 (padding) bytes
8 bytes * (IdxLinkCount)
Rows• New row format • Structure of the row is optimized for memory residency and access• Rows are versioned (payload never updated in place) and Begin/End timestamp determines row’s version validity and
visibility• No data pages; just rows• Row size limited to 8060 bytes (@table create time) to allow data to be moved to disk-based table• Not every SQL table schema is supported (for example LOB and sqlvariant)
Lightweight hash index data structure and MVCC
50, ∞ John Paris
Timestamps NameChain ptrs City
Hash index on Name
Transaction 100:
UPDATE City = ‘Prague’ where Name = ‘John’
No locks of any kind, no interference with transaction 99
100, ∞ John Prague
90, ∞ Susan Bogota
f(John)
100
Transaction 99: Running compiled querySELECT City WHERE Name = ‘John’Simple hash lookup returns direct pointer to ‘John’ row
Background operation will unlink and deallocate the old ‘John’ row after transaction 99 completes.
Hekaton Principle:
• Performance like a cache• Functionality like a RDMBS
Range index
10 20 28
5 8 10 11 15 18 21 24 27
PAGE
Page Mapping Table
0
1
2
3
14
15
PAGE
1 2 4 6 7 8 25 26 27
200, ∞ 1 50, 300 2
Root
Non-leaf pages
leaf pages
Data rows
PageID-0
PageID-3 PageID-2
PageID -14
Page size- up to 8K. Sized to the row Logical pointers
Indirect physical pointers through Page Mapping table
Page Mapping table grows (doubles) as table grows
Sibling pages linked one direction Require two indexes for ASC/DSC
No in-place updates on index pages Handled thru delta pages or building new pages
No covering columns (only the key is stored)
Key Key
LogicalPhysical
100,200 1
Memory-Optimized IndexesExist only in memoryRebuilt on database startup
Do not contain data rowsIndexes contain memory pointers to the data rowsNo duplication of data (logically all indexes are covering)
Hash Indexes
• Need to specify bucket_count – usually 1-2X the number of unique index keys
• Optimized for point-lookups• Cannot be used for search on subset of the
index key• Cannot be used for range lookups (search
on inequality predicate) or ordered scans (for ORDER BY)
Nonclustered Indexes
• No need to specify any options• Optimized for range lookups and ordered
scans• Support search on leading columns of the
index key• Support point-lookups, but hash indexes are
much faster
Create (Natively Compiled) Procedure DDLCREATE PROCEDURE [dbo].[InsertOrder] @id INT, @date DATETIME WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNERAS BEGIN ATOMIC WITH (TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english')
-- insert T-SQL hereEND
This proc is natively compiled
Native procs must be schema-bound
Atomic blocks • Create a
transaction if there is none
• Otherwise, create a savepoint
Execution context is required
0
Session settings are fixed at create time
Natively Compiled Procs• Access only memory optimized
tables• Maximum performance• Limited T-SQL surface area
When to use• OLTP-style operations• Optimize performance critical
business logic
Accessing Memory Optimized TablesInterpreted T-SQL Access
• Access both memory- and disk-based tables
• Less performant• Virtually full T-SQL surface
When to use• Ad hoc queries• Reporting-style queries• Surface area not available in
native
Estimating Memory ConsumptionMemory Size = Table Size + SUM(Index Size)
Table Size = Row Size * Row Count • Row Size = SUM(Column Sizes) + Row Header Size• Row Header Size = 24 + 8 * Index Count• Column Size = Dependent on column type and associated
padding/overhead
Hash Index Size = Bucket_Count * 8 bytesNonclustered Index Size = Row Count * (Key Size + 8) bytes
Guidance on provision memory ~2 times Memory Size due to row versioning overhead
Hekaton data needs to fit in memory
Buffer Pool
Memory Optimized
Tables
Memory Internal
Structures
Available Memory
Max S
erv
er
Mem
ory
Buffer Pool
Memory Internal
Structures
Memory Optimized
Tables
Buffer Pool
Memory Internal
Structures
Memory Optimized
Tables
Buffer Pool
Memory Internal
Structures
Memory Optimized
Tables
Memory Management with Resource GovernorData resides in memory at all times• Must configure SQL Server with sufficient memory to store memory-optimized tables• Failure to allocate memory will fail transactional workload at run-time• Integrated with SQL Server memory manager and reacts to memory pressure for GC
(Garbage Collection)• Guidance for SQL Server 2014 is not to exceed 256GB of in-memory table user data
Integrated with Resource Governor• Recommend using dedicated resource pool to ensure performance stability for disk-
based table workloads• “Bind” a database to a resource pool• Memory-optimized tables in a database cannot exceed the limit of the resource pool• Hard top limit (function of the physical memory) to ensure system remains stable under
memory pressure
Durability : Data and Delta Files
Data File
Delta File
0 100
TS (ins)
RowId TableIdTS
(ins)RowId TableId
TS (ins)
RowId TableId
TS (ins)
RowIdTS
(del)TS (ins)
RowIdTS
(del)TS (ins)
RowIdTS
(del)
Check
poin
t Fi
le P
air
Row pay load
Row pay load
Row pay load
Transaction Timestamp Range
Data file contains rows inserted
within a given transaction range
Delta file contains deleted rows
within a given transaction range
Populating Data/Delta files via sequential IO only
Offline Checkpoint Thread
Memory-optimized Table Filegroup
Ran
ge 1
00
-2
00
Ran
ge 2
00
-3
00
Ran
ge 3
00
-4
00
Ran
ge 4
00
-5
00
Ran
ge 5
00
-
New InsertsDelete 450 TSDelete 250 TS
Delete 150 TS
Data file with rows generated in
timestamp range IDs of Deleted Rows (height indicates
% deleted)
Del Tran2(TS 450)
Del Tran3(TS 250)
Del Tran1(TS150)
Insert into Hekaton T1
Log in disk Table
Del Tran1(row TS150)
Del Tran2(row TS 450)
Del Tran3(row TS 250) Insert into T1SQL Transaction log
(from LogPool)
• Data file has pre-allocated size (128 MB or 16 MB on smaller systems)
• Engine switches to new data file when the current file is full
• Transaction does not span data files
• Once a data file is closed, it becomes read-only
• Row deletes are tracked in delta file
• Files are append only
Merge to Minimize Checkpoint File SizeWhat is a Merge Operation?• Merges one or more adjacent data/delta files pairs into 1 pair
Need for Merge• Deleting rows causes data files to have stale rows• DMV: sys.dm_xtp_checkpoint_files can be used to find inserted/deleted rows
and freespace
Benefits of Merge• Reduces storage (i.e. fewer data/delta files) required to store active data rows• Improves the recovery time as there will be fewer files to load
Merge is a (non-blocking) background operation• Merge does not block concurrent deletes in the affected file pairs
Memory-optimized data Filegroup
Ran
ge 2
00-
299
Ran
ge 3
00-
399
Ran
ge 2
00-
300
Ran
ge 3
00-
400
Merge Operation Internals
Memory-optimized data Filegroup
Files as of Time 600R
an
ge 1
00-
200
Ran
ge 2
00-
300
Ran
ge 3
00-
400
R
an
ge 4
00-
500
Data file with rows generated in timestamp
range
IDs of Deleted Rows (height indicates %
deleted)
Merge200-400
Deleted Files
Files Under Merge
Files as of Time 500
Ran
ge 1
00-
200
Ran
ge 4
00-
500
Ran
ge 5
00-
600
Ran
ge 2
00-
400
Efficient Logging for Memory-Optimized TablesUses SQL Server transaction log to store content• Each In-Memory OLTP log record contains a log record header followed by opaque
memory optimized-specific log content.
All logging for memory-optimized tables is logical• No log records for physical structure modifications.• No index-specific / index-maintenance log records.• No UNDO information is logged
Recovery Models • All three recovery models (Simple, Full, Bulk) are supported
In-Memory OLTP Recovery – Speed of IO
Delta map
Recovery Data Loader
Delta
File1
Memory Optimized Tables
Recovery Data Loader
Recovery Data Loader
Delta mapDelta map
Data
File1
Delta
File2
Data
File2
Delta
File3
Data
File3
filter filter filter
Memory Optimized Container - 1 Memory Optimized Container - 2
Impact on Recovery Time Objective (RTO)• Load speed (IO) of data & Size
of durable tables
• Hash index with heavy collision (bucket count too low) and large non-clustered (Range) index have additional recovery overhead.
Suitable scenarios for V1 (in SQL Server 2014)
Optimal for V1 Not Optimal for V1
Business logic In database (as SPs) In mid-tier
Latency and contention locations
Concentrated on a sub-set of tables/SPs
Spread across the database
Client server communication
Less frequent Chatty
T-SQL surface area Basic Complex
Log IO Not limiting factor Limiting factor
Data size Hot data fit in memory Unbounded hot data size
Key factor in performance benefits: processing closer to data, i.e. Ideal for applications with data processing in the DB layer
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
MICROSOFT CONFIDENTIAL – INTERNAL ONLY
In-Memory OLTP Concurrency Control
Multi-version data store
Snapshot-based transaction isolation
No TempDB
Conflict detection to ensure isolation
No deadlocks
No locks, no latches, minimal context switches
No blocking
Multi-version
Optimistic
MICROSOFT CONFIDENTIAL – INTERNAL ONLY
Supported Isolation LevelsSNAPSHOTReads are consistent as of start of the transactionWrites are always consistent
REPEATABLE READRead operations yield same row versions if repeated at commit time
SERIALIZABLETransaction is executed as if there are no concurrent transactions – all actions happen at a single serialization point (commit time)
Example: Write conflict
Time Transaction T1 (SNAPSHOT) Transaction T2 (SNAPSHOT)
1 BEGIN
2 BEGIN
3 UPDATE t SET c1=‘value2’ WHERE c2=123
4 UPDATE t SET c1=‘value1’ WHERE c2=123 (write conflict)
First writer wins
MICROSOFT CONFIDENTIAL – INTERNAL ONLY
Cross-container transactions
Disk-based tables Memory-optimized tables
Tab1 Tabn Tab1 Tabm
Regular Tx context Mem-opt Tx context
MICROSOFT CONFIDENTIAL – INTERNAL ONLY
Disk-based Memory optimized
Usage recommendations
READCOMMITTED SNAPSHOT • Baseline combination – most cases that use READCOMMITTED today
READCOMMITTED REPEATABLEREAD/SERIALIZABLE
• Data migration • Hekaton-only Interop
REPEATABLEREAD/SERIALIZABLE
SNAPSHOT • Memory-optimized table access is INSERT-only• Useful for data migration and if no concurrent
writes on memory-optimized tables (e.g., ETL)
Disk-based Memory optimized
SNAPSHOT Any isolation level
REPEATABLEREAD/SERIALIZABLE
REPEATABLEREAD/SERIALIZABLE
Supported isolation level combinations (V1)
Unsupported isolation level combinations (V1)
MICROSOFT CONFIDENTIAL – INTERNAL ONLY
Limitations in SQL Server 2014TablesTriggers: no DDL/DML triggersData types: no LOBs, no XML and no CLR data typesConstraints: no FOREIGN KEY and no CHECK constraintsNo schema changes (ALTER TABLE) – need to drop/recreate tableNo add/remove index – need to drop/recreate table
Natively Compiled Stored ProceduresNo OUTER JOIN, no DISTINCT no OR, no subqueries, no CASELimited built-in functions [core math, date/time, and string functions are supported]
Your Feedback is Important
Fill out an evaluation of this session and help shape future events.
Scan the QR code to evaluate this session on your mobile device.
You’ll also be entered into a daily prize drawing!
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.