databases for storage engineers

54
Thomas Kejser [email protected] http://blog.kejser.org @thomaskejser Databases For storage People

Upload: thomas-kejser

Post on 03-Jul-2015

675 views

Category:

Technology


2 download

DESCRIPTION

A short introduction to SQL Serv

TRANSCRIPT

Page 1: Databases for Storage Engineers

Thomas Kejser

[email protected]

http://blog.kejser.org

@thomaskejser

Databases

For storage People

Page 2: Databases for Storage Engineers

• The Microsoft Database Stack

• Hard problems the database solves

• File layout and I/O pattern

• Data and Log Files

• Analysis Services Files

• TempDb and other system databases

• Installation of SQL

• Q&A

Agenda

Page 3: Databases for Storage Engineers

The SQL Server Stack

Page 4: Databases for Storage Engineers

• SQL Server (aka: Core Engine)• SQL Server Analysis Services (SSAS)

• Tabular• Multi Dimensional

• SQL Server Service Broker (SSB)• SQL Server Integration Services (SSIS)• SQL Server Reporting Services (SSRS)• SQL Server Data Quality Tools• SQL Server Master Data Services• SQL Server Parallel Data Warehouse• .NET stuff…• Various Excel plug-ins

• A “full” stack!

Product Portfolio

Page 5: Databases for Storage Engineers

What Type of Workload?

BigSmall

Small

Big

Dat

a R

etu

rned

Data Touched

OLTP BI/DW

Simulation ETL

Page 6: Databases for Storage Engineers

A Template OLTP System

“App” tierWeb Server WindowsLicense

Database TierWeb/Core Licensing2 or 4 sockets

Core

.NET .NET .NET .NET

Page 7: Databases for Storage Engineers

A Template Data Warehouse

SSIS

SSIS

SSIS

SSIS

Core

Core

SSAS

SSAS

Core

Integration TierBlades

CPU Intensive low IOPS

“Enterprise” Warehouse TierLarge machines

VERY CPU greedy VERY I/O greedy (GB/sec)

BI / Presentation / CubesMedium Servers

Can be IOPS greedy

SSRS

Page 8: Databases for Storage Engineers

Fast Track Data Warehouses

Page 9: Databases for Storage Engineers

A Template MPP Warehouse

SSIS

SSIS

SSIS

SSIS

SSAS

Core

Enterprise Warehouse TierAppliance (The “hub”)

Data Marts(The “spokes”)

Page 10: Databases for Storage Engineers

Management Tools you Need to Know

Pre 2012 2012

Management Studio(AKA: Enterprise Manager)

(Management Studio)

Project Data Dude Data Tools

Configuration Manager Configuration Manager

SQL Server Profiler Xevent Tracing

Reporting Services ConfigManager

Reporting Services ConfigManager

Sp_configure Sp_configure / ALTER SERVER

Page 11: Databases for Storage Engineers

Hard problemsdatabases help you solve

Page 12: Databases for Storage Engineers

Query Plan Generation

Find all parts bought by Thomas Kejser

Page 13: Databases for Storage Engineers

Express Problem, Auto get solutions

Page 14: Databases for Storage Engineers

To do this well, we need Statistics

I did it

SQL Did it

THIS is not accurate and it will never be!

Page 15: Databases for Storage Engineers

… and we Need Indexes

B+ Tree

Page 16: Databases for Storage Engineers

95% of all database problems* are caused by:

A) Poor indexing

B) Wrong Statistics

A) Badly written queries

B) All of the above

* Low estimate, trying to be nice to humanity

Page 17: Databases for Storage Engineers

And most of the time, there is nothing you can do about that*

… which is where storage come into the picture

* AKA: “Craplications”, technical term

Page 18: Databases for Storage Engineers

• The CPU Bound• Have to help rewrite• Better storage does not help• But DBAs may still believe it is I/O

• The I/O bound• Can throw NAND at it• I will show you how to diagnose

• DBA people like to talk about this like…

Two types of bad Queries

CPU

L3

L2

L2

C

C

Page 19: Databases for Storage Engineers

Response time = Service Time + Wait Time

Algorithmsand

Data Structures

“Bottlenecks”

Page 20: Databases for Storage Engineers

• We normally end up talking about bad join plans

• Joins come in three flavours

• Merge

• Hash

• Loop

When Speaking about Service Time

Page 21: Databases for Storage Engineers

Merge Join

m row result

1

1

2

3

n row result

1

2

3

4

4

43

43

Sort

ed

Sort

ed

Complexity: O(m + n)

Page 22: Databases for Storage Engineers

Hash Join

m row result

1

43

13

7

n row join table

Hash(1)

n row hash table

Complexity: O(m + 2n)

3

Page 23: Databases for Storage Engineers

Loop Join

n row B-tree

Log(n) reads

Complexity: O(m * log(n))

m row result

1

43

13

7

3

Page 24: Databases for Storage Engineers

When Hash Joins hurt you

0

5

10

15

20

25

30

050100150200250300350400

Hash Memory (MB)

Runtime (seconds)

Spill Zone!

Page 25: Databases for Storage Engineers

Join Hints

B probed, lower table in join(second table in join statement)

A probed, upper table in join(first table in join statement)

Just the way it is …

Page 26: Databases for Storage Engineers

Why is it so hard to get joins right?

n

m

Time

Loop Join

Merge Join

Hash Join

Page 27: Databases for Storage Engineers

No-one has been able to get joins consistently right!

P = NP ?

Page 28: Databases for Storage Engineers

Getting I/O right…

SQL-OS (Schedulers, Buffer Pool, Memory Management, Synchronization Primitives, …)

Query Optimization (Plan Generation, View

Matching, Statistics, Costing)

Query Execution(Query Operators, Memory

Grants, Parallelism)

Language Processing (Parse/Bind)

Statement/Batch Execution

Plan Cache Management

Storage Engine (Access Methods, Database Page Cache, Locking, Transactions, …)

Page 29: Databases for Storage Engineers

The Storage Engines makes I/O Transparent!

RAM Storage

Storage Engine

Rest of engineonly sees the API

Page 30: Databases for Storage Engineers

Primitive SQL Server Analysis Services

Scheduling Voluntary Yield, User mode

Kernel mode, Preemptive

I/O Engine Dedicated I/O stack Windows Buffered I/O

Waiting / Spinning SQLOS Primitives Windows

Memory Management SQLOS / Storage Engine Windows Paging

Serialisation TDS special purpose XML

Network Fully optimizable, async,affinitized engine

Windows primitives,blocking

Two Different Philosophies

Page 31: Databases for Storage Engineers

• Primitives are a different beast than Windows

• Scale issues are generally specific to the core, not Windows

• Exposes own “belly of the beast” profiling

• SQL Team build their own primitives, often better than Windows core

• Highest throughput app on Windows, drives all the scale stuff there

SQL Server is different

Page 32: Databases for Storage Engineers

• Analysis Services relies fully on Windows primitives

• You can profile it by looking at how Windows behaves

• Upgrades to Windows are more likely to help it

• No TPC style benchmarks…

Analysis Services is “just another App”

Page 33: Databases for Storage Engineers

A is for Atomic

LINEITEM

ORDER

ORDER_KEYPART_KEY

COMMITDATEQUANTITY

ORDER_KEYCUSTOMER_KEY

LINEITEM

ORDER

ORDER_KEYPART_KEYCOMMITDATEQUANTITY

ORDER_KEYCUSTOMER_KEY

LINEITEM

ORDER_KEYPART_KEYCOMMITDATEQUANTITY

ORDER

ORDER_KEYCUSTOMER_KEY

Page 34: Databases for Storage Engineers

C is for Consistency

LINEITEM

ORDER

ORDER_KEY = 42

ORDER_KEY!= 42

LINEITEM

ORDER

COMMITDATE= 2012-02-30

ORDER_KEY

LINEITEM

ORDER_KEYPART_KEYCOMMITDATEQUANTITY

ORDER

ORDER_KEYCUSTOMER_KEY

Page 35: Databases for Storage Engineers

I is for Isolation

SELECT @LastTransaction_ID = LastTransaction_ID

FROM ATM

WHERE ATM_ID = 13

SET @ID = @LastTransaction_ID + 1

UPDATE ATM

SET @LastTransaction_ID = @ID

WHERE ATM_ID = 13

SELECT @LastTransaction_ID = LastTransaction_ID

FROM ATM

WHERE ATM_ID = 13

SET @ID = @LastTransaction_ID + 1

UPDATE ATM

SET @LastTransaction_ID = @ID

WHERE ATM_ID = 13

(@LastTransaction_ID = 42)

(@LastTransaction_ID = 42)

Page 36: Databases for Storage Engineers

D is for Durability

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Do Transactions

Ack

Page 37: Databases for Storage Engineers

• Do complex operations in optimal time

• …at high parallelism

• Optimise I/O pattern

• Be ACID compliant

• Store stuff safely…

• noSQL/Big Data systems trade off >0 of these to get more of the others

Summary – Databases Help You

Page 38: Databases for Storage Engineers

• Server won’t start without:

• master

• mssqlsystemressource

• System CAN start, but wont work well

• model

• msdb

• System will start under special conditions

• tempdb

System Databases

Page 39: Databases for Storage Engineers

• Together, contain all system information

• Mssqlsystemressource

• Lives under: MSSQL\Binn

• Contains all system code

• Hidden by default

• Master

• Lives under: MSSQL\DATA

• You should move these to a safe location

Master and mssqlsystemressources

Page 40: Databases for Storage Engineers

• You lost:• All passwords and server logins

• All system wide certificates (You may be unable to decrypt!)

• All System procedures you created

• You are not 100% screwed, but you are in for a long night• Both can be rebuild (empty) during server

start

• …Or restored from backup• if you remembered to take one

• Need /f and /T3608 to get back up

Disaster: Master or systemResources

Page 41: Databases for Storage Engineers

• Every new created database is cloned from this

• Loss is not catastrophic

• Copy from healthy machine

• Tempdb can’t boot without it

• Lives with master

Database: model

Page 42: Databases for Storage Engineers

• Database “swap file”

• Does not survive restarts

• No Durability guarantees here

• Fast I/O helps

Database tempdb

Page 43: Databases for Storage Engineers

• Will rebuild itself after instance restart

• Configuration is stored in master

• Clones from msdb

• Nearly every installation must changedefaults

• If tempdb cannot be created, server will only start from command line

Loss of Tempdb…is…Temporary

Page 44: Databases for Storage Engineers

• A database consists of• At least one Transaction Log File

• The PRIMARY filegroup

• At least one data file in PRIMARY

• If any of these are lost, the database is dead• You can in some cases bring a database

without a transaction log back alive

• But typically with data loss…

• Lesson: carefully protect all of above

User Databases and Failure

Page 45: Databases for Storage Engineers

What is in the Files?

PRIMARY

Primary File

Metadata(system objects)

GAM / SGAM

PFS Map

User Data

Transaction Log

Headers

VLF

VLF

VLF

Page 46: Databases for Storage Engineers

• Regular files in NTFS

• Secured

• Files can Auto Grow as needed

• Risky

• File Imbalance

Data Files

Page 47: Databases for Storage Engineers

• ALTER or CREATE DATABASE

• Transaction log file always zeroed out• This looks super cool

on FusionIo by the way

• Data files MAY be zeroed out• Depends in privileges

• May use instant file init

How are Database Files Created?

Page 48: Databases for Storage Engineers

• Filegroups (one word) are containers of files

• Used to group similar data together

• Oracle people know this concept as a table-spaces

• Files inside FG are accessed/allocated round-robin

Filegroups

PRIMARY

DATA

User Data

User Data

User Data

User Data

User Data

Page 49: Databases for Storage Engineers

• DBCC SHRINKFILE

• REBUILD data

Reclaiming/Moving Space in Files

Page 50: Databases for Storage Engineers

DBCC SHRINKFILE

1

3

5

2

4

6

87

LUN 1 LUN 2 LUN 3 LUN 4

Page 51: Databases for Storage Engineers

How to reclaim space the right way…

LUN 3 LUN 4

1

3

5

2

4

6

87

LUN 1 LUN 2

New Filegroup

ALTER INDEX Foo WITH REBUILD, SORT_IN_TEMPDB = ON

1

3

5

2

4

6

87

Page 52: Databases for Storage Engineers

• Too few PFS maps can lead to latch contention

• Diagnosed in:

sys.dm_os_waiting_tasks

• Look for PAGELATCH_UP

PFS Contention

File

PFS Map

User Data(8000 pages)

PFS Map

User Data(8000 pages)

Page 53: Databases for Storage Engineers

• DBAs typically diagnose issues with waits stats

• Issues they look for:

• WRITELOG/LOGBUFFER waits

• PAGELATCHIO_<X> waits

• BACKUPIO waits

• IO_COMPLETION/ASYNC_IO_COMPLETION

I/O DBA people worry about

Page 54: Databases for Storage Engineers

• Diagnosing ressource waits:

• sys.dm_os_wait_stats

• Post 2008R2 – can use Xevents (harder)

• More detail in:

• sys.dm_io_virtual_filestats(NULL, NULL)

• Confirm waits here!

• SQL Server errors in log file:

Places you need to know about