databases for storage engineers

Thomas Kejser

[email protected]

http://blog.kejser.org

@thomaskejser

Databases

For storage People

mailto:[email protected]

http://blog.kejser.org/

• The Microsoft Database Stack

• Hard problems the database solves

• File layout and I/O pattern

• Data and Log Files

• Analysis Services Files

• TempDb and other system databases

• Installation of SQL

• Q&A

Agenda

The SQL Server Stack

• SQL Server (aka: Core Engine)• SQL Server Analysis Services (SSAS)

• Tabular• Multi Dimensional

• SQL Server Service Broker (SSB)• SQL Server Integration Services (SSIS)• SQL Server Reporting Services (SSRS)• SQL Server Data Quality Tools• SQL Server Master Data Services• SQL Server Parallel Data Warehouse• .NET stuff…• Various Excel plug-ins

• A “full” stack!

Product Portfolio

What Type of Workload?

BigSmall

Small

Big

Dat

a R

etu

rned

Data Touched

OLTP BI/DW

Simulation ETL

A Template OLTP System

“App” tierWeb Server WindowsLicense

Database TierWeb/Core Licensing2 or 4 sockets

Core

.NET .NET .NET .NET

A Template Data Warehouse

SSIS

SSIS

SSIS

SSIS

Core

Core

SSAS

SSAS

Core

Integration TierBlades

CPU Intensive low IOPS

“Enterprise” Warehouse TierLarge machines

VERY CPU greedy VERY I/O greedy (GB/sec)

BI / Presentation / CubesMedium Servers

Can be IOPS greedy

SSRS

Fast Track Data Warehouses

A Template MPP Warehouse

SSIS

SSIS

SSIS

SSIS

SSAS

Core

Enterprise Warehouse TierAppliance (The “hub”)

Data Marts(The “spokes”)

Management Tools you Need to Know

Pre 2012 2012

Management Studio(AKA: Enterprise Manager)

(Management Studio)

Project Data Dude Data Tools

Configuration Manager Configuration Manager

SQL Server Profiler Xevent Tracing

Reporting Services ConfigManager

Reporting Services ConfigManager

Sp_configure Sp_configure / ALTER SERVER

Hard problemsdatabases help you solve

Query Plan Generation

Find all parts bought by Thomas Kejser

Express Problem, Auto get solutions

To do this well, we need Statistics

I did it

SQL Did it

THIS is not accurate and it will never be!

… and we Need Indexes

B+ Tree

95% of all database problems* are caused by:

A) Poor indexing

B) Wrong Statistics

A) Badly written queries

B) All of the above

* Low estimate, trying to be nice to humanity

And most of the time, there is nothing you can do about that*

… which is where storage come into the picture

* AKA: “Craplications”, technical term

• The CPU Bound• Have to help rewrite• Better storage does not help• But DBAs may still believe it is I/O

• The I/O bound• Can throw NAND at it• I will show you how to diagnose

• DBA people like to talk about this like…

Two types of bad Queries

CPU

L3

L2

L2

C

C

Response time = Service Time + Wait Time

Algorithmsand

Data Structures

“Bottlenecks”

• We normally end up talking about bad join plans

• Joins come in three flavours

• Merge

• Hash

• Loop

When Speaking about Service Time

Merge Join

m row result

1

1

2

3

n row result

1

2

3

4

4

43

43

Sort

ed

Sort

ed

Complexity: O(m + n)

Hash Join

m row result

1

43

13

7

n row join table

Hash(1)

n row hash table

Complexity: O(m + 2n)

3

Loop Join

n row B-tree

Log(n) reads

Complexity: O(m * log(n))

m row result

1

43

13

7

3

When Hash Joins hurt you

0

5

10

15

20

25

30

050100150200250300350400

Hash Memory (MB)

Runtime (seconds)

Spill Zone!

Join Hints

B probed, lower table in join(second table in join statement)

A probed, upper table in join(first table in join statement)

Just the way it is …

Why is it so hard to get joins right?

n

m

Time

Loop Join

Merge Join

Hash Join

No-one has been able to get joins consistently right!

P = NP ?

Getting I/O right…

SQL-OS (Schedulers, Buffer Pool, Memory Management, Synchronization Primitives, …)

Query Optimization (Plan Generation, View

Matching, Statistics, Costing)

Query Execution(Query Operators, Memory

Grants, Parallelism)

Language Processing (Parse/Bind)

Statement/Batch Execution

Plan Cache Management

Storage Engine (Access Methods, Database Page Cache, Locking, Transactions, …)

The Storage Engines makes I/O Transparent!

RAM Storage

Storage Engine

Rest of engineonly sees the API

Primitive SQL Server Analysis Services

Scheduling Voluntary Yield, User mode

Kernel mode, Preemptive

I/O Engine Dedicated I/O stack Windows Buffered I/O

Waiting / Spinning SQLOS Primitives Windows

Memory Management SQLOS / Storage Engine Windows Paging

Serialisation TDS special purpose XML

Network Fully optimizable, async,affinitized engine

Windows primitives,blocking

Two Different Philosophies

• Primitives are a different beast than Windows

• Scale issues are generally specific to the core, not Windows

• Exposes own “belly of the beast” profiling

• SQL Team build their own primitives, often better than Windows core

• Highest throughput app on Windows, drives all the scale stuff there

SQL Server is different

• Analysis Services relies fully on Windows primitives

• You can profile it by looking at how Windows behaves

• Upgrades to Windows are more likely to help it

• No TPC style benchmarks…

Analysis Services is “just another App”

A is for Atomic

LINEITEM

ORDER

ORDER_KEYPART_KEY

COMMITDATEQUANTITY

ORDER_KEYCUSTOMER_KEY

LINEITEM

ORDER

ORDER_KEYPART_KEYCOMMITDATEQUANTITY


LINEITEM


ORDER


C is for Consistency

LINEITEM

ORDER

ORDER_KEY = 42

ORDER_KEY!= 42

LINEITEM

ORDER

COMMITDATE= 2012-02-30

ORDER_KEY

LINEITEM


ORDER


I is for Isolation

SELECT @LastTransaction_ID = LastTransaction_ID

FROM ATM

WHERE ATM_ID = 13

SET @ID = @LastTransaction_ID + 1

UPDATE ATM

SET @LastTransaction_ID = @ID

WHERE ATM_ID = 13

SELECT @LastTransaction_ID = LastTransaction_ID

FROM ATM

WHERE ATM_ID = 13

SET @ID = @LastTransaction_ID + 1

UPDATE ATM

SET @LastTransaction_ID = @ID

WHERE ATM_ID = 13

(@LastTransaction_ID = 42)

(@LastTransaction_ID = 42)

D is for Durability

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

• Do complex operations in optimal time

• …at high parallelism

• Optimise I/O pattern

• Be ACID compliant

• Store stuff safely…

• noSQL/Big Data systems trade off >0 of these to get more of the others

Summary – Databases Help You

• Server won’t start without:

• master

• mssqlsystemressource

• System CAN start, but wont work well

• model

• msdb

• System will start under special conditions

• tempdb

System Databases

• Together, contain all system information

• Mssqlsystemressource

• Lives under: MSSQL\Binn

• Contains all system code

• Hidden by default

• Master

• Lives under: MSSQL\DATA

• You should move these to a safe location

Master and mssqlsystemressources

• You lost:• All passwords and server logins

• All system wide certificates (You may be unable to decrypt!)

• All System procedures you created

• You are not 100% screwed, but you are in for a long night• Both can be rebuild (empty) during server

start

• …Or restored from backup• if you remembered to take one

• Need /f and /T3608 to get back up

Disaster: Master or systemResources

• Every new created database is cloned from this

• Loss is not catastrophic

• Copy from healthy machine

• Tempdb can’t boot without it

• Lives with master

Database: model

• Database “swap file”

• Does not survive restarts

• No Durability guarantees here

• Fast I/O helps

Database tempdb

• Will rebuild itself after instance restart

• Configuration is stored in master

• Clones from msdb

• Nearly every installation must changedefaults

• If tempdb cannot be created, server will only start from command line

Loss of Tempdb…is…Temporary

• A database consists of• At least one Transaction Log File

• The PRIMARY filegroup

• At least one data file in PRIMARY

• If any of these are lost, the database is dead• You can in some cases bring a database

without a transaction log back alive

• But typically with data loss…

• Lesson: carefully protect all of above

User Databases and Failure

What is in the Files?

PRIMARY

Primary File

Metadata(system objects)

GAM / SGAM

PFS Map

User Data

Transaction Log

Headers

VLF

VLF

VLF

• Regular files in NTFS

• Secured

• Files can Auto Grow as needed

• Risky

• File Imbalance

Data Files

• ALTER or CREATE DATABASE

• Transaction log file always zeroed out• This looks super cool

on FusionIo by the way

• Data files MAY be zeroed out• Depends in privileges

• May use instant file init

How are Database Files Created?

• Filegroups (one word) are containers of files

• Used to group similar data together

• Oracle people know this concept as a table-spaces

• Files inside FG are accessed/allocated round-robin

Filegroups

PRIMARY

DATA

User Data

User Data

User Data

User Data

User Data

• DBCC SHRINKFILE

• REBUILD data

Reclaiming/Moving Space in Files

DBCC SHRINKFILE

1

3

5

2

4

6

87

LUN 1 LUN 2 LUN 3 LUN 4

How to reclaim space the right way…

LUN 3 LUN 4

1

3

5

2

4

6

87

LUN 1 LUN 2

New Filegroup

ALTER INDEX Foo WITH REBUILD, SORT_IN_TEMPDB = ON

1

3

5

2

4

6

87

• Too few PFS maps can lead to latch contention

• Diagnosed in:

sys.dm_os_waiting_tasks

• Look for PAGELATCH_UP

PFS Contention

File

PFS Map

User Data(8000 pages)

PFS Map

User Data(8000 pages)

• DBAs typically diagnose issues with waits stats

• Issues they look for:

• WRITELOG/LOGBUFFER waits

• PAGELATCHIO_<X> waits

• BACKUPIO waits

• IO_COMPLETION/ASYNC_IO_COMPLETION

I/O DBA people worry about

• Diagnosing ressource waits:

• sys.dm_os_wait_stats

• Post 2008R2 – can use Xevents (harder)

• More detail in:

• sys.dm_io_virtual_filestats(NULL, NULL)

• Confirm waits here!

• SQL Server errors in log file:

Places you need to know about

databases for storage engineers

Technology