indexes: the second pillar of database wisdom

57
Indexes The second pillar of database wisdom

Upload: gisborne

Post on 30-Jun-2015

151 views

Category:

Software


2 download

DESCRIPTION

The most important things that most developers don't know about database index design. If you think designing the indexes in your database is a simple matter, you have a lot to learn!

TRANSCRIPT

Page 1: Indexes: The Second Pillar of Database Wisdom

IndexesThe second pillar of database wisdom

Page 2: Indexes: The Second Pillar of Database Wisdom

Guyren Howe(“Guy”)

[email protected]

Page 3: Indexes: The Second Pillar of Database Wisdom

“MySQL doesn’t support this”

MySQL

Page 4: Indexes: The Second Pillar of Database Wisdom

What everyone knows about indexes

• Trade insert/update/delete time for query time

• Trade storage space for query time

Page 5: Indexes: The Second Pillar of Database Wisdom

Other things everyone knows about indexes

• Index all join columns

• Index for every field you search

• Indexes should be rebuilt regularly

• Most selective first

• Dynamic SQL is slow

• Faster computers execute queries faster

Nonsense some folks believe

Page 6: Indexes: The Second Pillar of Database Wisdom

http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

Page 7: Indexes: The Second Pillar of Database Wisdom

http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

Page 8: Indexes: The Second Pillar of Database Wisdom

Howe’s LawBusinesses can’t afford the

servers they really need

Page 9: Indexes: The Second Pillar of Database Wisdom

Corollary to Howe’s Law

Your database is never fast enough

Page 10: Indexes: The Second Pillar of Database Wisdom

Query• not a program

• balances:

• random vs sequential I/O

• CPU use

• memory use

• optimization is np hard

Page 11: Indexes: The Second Pillar of Database Wisdom

“Or” and complex queries

• AND queries, particularly in applications are the most common type of query

• A query involving OR clauses can usually be considered as combining the results of two or more AND queries

• Complex to give general advice

• Can be reason to stick to small, simple indexes

Page 12: Indexes: The Second Pillar of Database Wisdom
Page 13: Indexes: The Second Pillar of Database Wisdom

How to use an index

Page 14: Indexes: The Second Pillar of Database Wisdom

Phone Book Theory of Indexing

Page 15: Indexes: The Second Pillar of Database Wisdom

WhoCo

Employee Register

Page 16: Indexes: The Second Pillar of Database Wisdom

Smith Andrew Houston 555-413-6182

Adams Maggie Houston 555-111-8943

Medville Anne Los Angeles 555-413-1183

Samuels Fred New York 555-112-8943

p72

Page 17: Indexes: The Second Pillar of Database Wisdom

Jacob, a programmer

Page 18: Indexes: The Second Pillar of Database Wisdom

Family Name Pages

Adams 17/3, 44/1, 44/5 72/4

Andrews 1/3, 19/5, 44/5

Ambers 11/2

Amberson19/1, 32/1, 32/2, 32/3, 32/4,

99/2

Family name index

Page 19: Indexes: The Second Pillar of Database Wisdom
Page 20: Indexes: The Second Pillar of Database Wisdom

Given Name Pages

Adam 13/3, 14/3, 124/4

Ainsley 12/3

Aston 134/2, 135/2

Atley 19/5, 32/1, 32/3, 32/4, 99/1

Given name index

Page 21: Indexes: The Second Pillar of Database Wisdom

Office Pages

Austin, TX

3/1, 4/1, 5/2, 6/5, 7/4, 11/4, 15/5, 18/3, 19/3, 20/2, 21/3, 22/1, 23/1, 25/4, 26/5, 27/2, 28/4, 29/1, 30/1 31, 33, 34…

Bakersville, MD1/1, 2/2, 3/3, 4/3, 7/2, 11/3, 15/5, 19/4, 20/2, 21/2, 23/4,

44/1, …

Boston, MA …

Branch index

Page 22: Indexes: The Second Pillar of Database Wisdom

Gwen, a programmer

Page 23: Indexes: The Second Pillar of Database Wisdom

Comprehensive index

Branch Family Name Given Name Pages

Austin Jeffries Leslie 11/2

Austin Kew Nelson 124/2

Austin Samuelson Steven 13/4

Austin Simpson Zaphod 98/1

Page 24: Indexes: The Second Pillar of Database Wisdom
Page 25: Indexes: The Second Pillar of Database Wisdom

Comprehensive index

Family Name Given Name Branch Pages

Adams Leslie Bakersville 44/2

Adams Nelson Boston 17/2

Adams Steven Bakersville 44/1

Adams Zaphod Austin 72/3

Page 26: Indexes: The Second Pillar of Database Wisdom
Page 27: Indexes: The Second Pillar of Database Wisdom

1. Prefer compound indexes

Page 28: Indexes: The Second Pillar of Database Wisdom

Full-time index

Full-time Pages

Y 1/7, 1/8 2/1, 2/2, 2/3, 2/4, 2/5…

N 1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 2/6, …

Page 29: Indexes: The Second Pillar of Database Wisdom

2. Prefer selective predicates

Page 30: Indexes: The Second Pillar of Database Wisdom

Comprehensive index

Family Name Given Name Branch Pages

Adams Leslie Bakersville 44/2

Adams Nelson Boston 17/2

Adams Steven Bakersville 44/1

Adams Zaphod Austin 72/3

Page 31: Indexes: The Second Pillar of Database Wisdom

3. Consider a covering index

Page 32: Indexes: The Second Pillar of Database Wisdom

Database index

1979-05-27

Tampa

1979-06-07

Grover

1979-07-08

Austin

1979-02-01 Austin

1979-02-12 Houston

1979-02-17

Bakersville

1979-03-08 Tampa

1979-05-27 Tampa

1979-05-29 Newtown

1979-05-29 Seattle

1979-06-04 Seattle

1979-06-07 Grover

1979-06-14

South End

1979-06-15 Newtown

1979-06-16 London

Page 33: Indexes: The Second Pillar of Database Wisdom

Database index

Austin 1990-01-01

Bakersville

1981-06-01

Bakersville

1985-06-04

Austin1980-01-

01

Austin1981-06-

01

Austin1983-01-

12

Austin1987-11-

31

Austin1990-01-

01

Austin1991-06-

01

Bakersville

1979-08-03

Bakersville

1979-09-07

Bakersville

1981-06-01

Bakersville

1982-12-13

Bakersville

1983-03-28

Bakersville

1985-01-01

Page 34: Indexes: The Second Pillar of Database Wisdom

4. Be subtle with compound index

predicates• Put covering fields you don’t search on at the

end

• Put a field to do range searches on next

• Put other fields you search on at the front in order of use and selectivity

• Unless you’re dealing with different searches, in which case compromise, or use multiple compound indexes

Page 35: Indexes: The Second Pillar of Database Wisdom

WhoCo

Employee Register

Page 36: Indexes: The Second Pillar of Database Wisdom

Senior Personnel Directory

DivisionFamily Name

Given Name Branch Pages

Accounting Adams Leslie Bakersville 44/2

Accounting Adams Nelson Boston 17/1

R&D Yu Stephen Austin 72/5

Page 37: Indexes: The Second Pillar of Database Wisdom

CREATE INDEXpersonnel_main_office_idx

ONpersonnel (

division,family_name,given_name)

WHEREdivision IS NOT NULL;

Partial index

SELECT*

FROMpersonnel

WHEREdivision IS NOT NULL

ANDdivision = ‘R&D’;

MySQL

Page 38: Indexes: The Second Pillar of Database Wisdom

CREATE INDEXpersonnel_name_lc

ONpersonnel (

lower(family_name),lower(given_name));

Functional index

SELECT*

FROMpersonnel

WHERElower(family_name) =

‘smith’ ANDlower(given_name) =

‘mary’;

MySQL

Page 39: Indexes: The Second Pillar of Database Wisdom

Comprehensive index

Family Name Given NameStreet

Address Pages

Smith Leslie 13 Main St 44/2

Smith Nelson 11a Rails St 17/2

Smith Steven 19 Main St 44/1

Smithers Zaphod 12 Rails St 72/3

SELECT*

FROMpersonnel

WHERElower(family_name) like ‘smi

%’;

SELECT*

FROMpersonnel

WHERElower(street_addr) like ‘%

st’;

Page 40: Indexes: The Second Pillar of Database Wisdom

CREATE INDEXpersonnel_addr_rev

ONpersonnel (

reverse(lower(street_addr)));

Functional index tricks

SELECT*

FROMpersonnel

WHEREreverse(lower(street_addr)) like ‘ts

%’;

MySQL

Page 41: Indexes: The Second Pillar of Database Wisdom

Sorting• Need same predicates in WHERE and ORDER

BY

• Index can be used in either direction

• But not both…

• not SELECT… ORDER BY family_name DESC, branch ASC

• Use Postgres’ CREATE INDEX… NULLS LAST (or FIRST)

Page 42: Indexes: The Second Pillar of Database Wisdom

Partial Results• LIMIT n or FETCH FIRST n

• Need pipelined ORDER BY

• Otherwise we have to sort the whole result set

• Database dependent

• Postgres window functions aren’t pipelined

Page 43: Indexes: The Second Pillar of Database Wisdom

Paginating“SELECT

*FROM

peopleORDER BY

hire_dateLIMIT

50OFFSET

1000"

Page 44: Indexes: The Second Pillar of Database Wisdom

Paginating: much better

People.where “

(hire_date, creation_date) >(:hire_date, :creation_date)”,

{hire_date: @last_person.hire_date,creation_date:

@last_person.creation_date}.order(:hire_date, :creation_date).limit: 50

.where “(hire_date > :hire_date)OR(hire_date = :hire_date ANDcreation_date > :creation_date)”

Otherdatabases

Possiblyor hire_date IS NULL

Postgres

Page 45: Indexes: The Second Pillar of Database Wisdom

Clustering

• Based on existing index

• Physically reorganize the heap in index order

• Reduce seek time for blocks of records

• One-off procedure; new records not clustered

• Locks table for duration

Page 46: Indexes: The Second Pillar of Database Wisdom

Use case: partitioning

• use Postgres table inheritance

• eg log data, very fast, last 30 days searchable

• split table into numerous identical subtables

• cluster a table once we’re done writing to it

Page 47: Indexes: The Second Pillar of Database Wisdom

Joins

Page 48: Indexes: The Second Pillar of Database Wisdom

Three Join Methods• Nested Loop

• Query 1 table; loop and query the other

• Hash Join

• Avoid nested loop cost by making hash of one side

• Merge Join

• Query each table, sort and compare

MySQL

MySQL

Page 49: Indexes: The Second Pillar of Database Wisdom

Nested Loop Join

• Index for each side separately

Page 50: Indexes: The Second Pillar of Database Wisdom

Hash Join

• Not for range joins; only equality

• No point indexing join fields

• Index independent predicates

• Reduce memory footprint of hash table by minimizing number of fields or rows

MySQL

Page 51: Indexes: The Second Pillar of Database Wisdom

Merge Join• Less often used because sort is expensive

• Mainly for outer joins

• Or when an index on both sides speeds sorting

• same index for where and order by

• No point indexing join fields, only independent predicates

• Reduce memory footprint by minimizing number of fields or rows

MySQL

Page 52: Indexes: The Second Pillar of Database Wisdom

Index Data Structures

Page 53: Indexes: The Second Pillar of Database Wisdom

Index Data Structures

• B-Tree

• “Normal” index type

• equality and range searches

• Hash

• Faster for equality

• Test

Page 54: Indexes: The Second Pillar of Database Wisdom

Index Data Structures

• GIST

• Variety of data types with complex operators eg spatial contains/near/etc

• hstore

• SP-GIST

• Similar to GIST but better for some data types

• GIN

• more-or-less GIST

• (3x) more space and (3x) slower insert gives faster (3x) search

• Some data types only support one

MySQL

Page 55: Indexes: The Second Pillar of Database Wisdom

Dynamic SQL• Prepared statement or bind queries lets

server cache query plan

• SQL Server or Oracle

• Save query plan time

• Usually saves time, but doesn’t consider values

• Uneven distributions suffer

MySQL

Postgres

Page 56: Indexes: The Second Pillar of Database Wisdom

Statistics

ALTER TABLEtable

ALTER COLUMNcolumn

SETSTATISTICS 1000