indexes: the second pillar of database wisdom

Post on 30-Jun-2015

151 Views

Category:

Software

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

The most important things that most developers don't know about database index design. If you think designing the indexes in your database is a simple matter, you have a lot to learn!

TRANSCRIPT

IndexesThe second pillar of database wisdom

Guyren Howe(“Guy”)

guyren@relevantlogic.com

“MySQL doesn’t support this”

MySQL

What everyone knows about indexes

• Trade insert/update/delete time for query time

• Trade storage space for query time

Other things everyone knows about indexes

• Index all join columns

• Index for every field you search

• Indexes should be rebuilt regularly

• Most selective first

• Dynamic SQL is slow

• Faster computers execute queries faster

Nonsense some folks believe

http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

Howe’s LawBusinesses can’t afford the

servers they really need

Corollary to Howe’s Law

Your database is never fast enough

Query• not a program

• balances:

• random vs sequential I/O

• CPU use

• memory use

• optimization is np hard

“Or” and complex queries

• AND queries, particularly in applications are the most common type of query

• A query involving OR clauses can usually be considered as combining the results of two or more AND queries

• Complex to give general advice

• Can be reason to stick to small, simple indexes

How to use an index

Phone Book Theory of Indexing

WhoCo

Employee Register

Smith Andrew Houston 555-413-6182

Adams Maggie Houston 555-111-8943

Medville Anne Los Angeles 555-413-1183

Samuels Fred New York 555-112-8943

p72

Jacob, a programmer

Family Name Pages

Adams 17/3, 44/1, 44/5 72/4

Andrews 1/3, 19/5, 44/5

Ambers 11/2

Amberson19/1, 32/1, 32/2, 32/3, 32/4,

99/2

Family name index

Given Name Pages

Adam 13/3, 14/3, 124/4

Ainsley 12/3

Aston 134/2, 135/2

Atley 19/5, 32/1, 32/3, 32/4, 99/1

Given name index

Office Pages

Austin, TX

3/1, 4/1, 5/2, 6/5, 7/4, 11/4, 15/5, 18/3, 19/3, 20/2, 21/3, 22/1, 23/1, 25/4, 26/5, 27/2, 28/4, 29/1, 30/1 31, 33, 34…

Bakersville, MD1/1, 2/2, 3/3, 4/3, 7/2, 11/3, 15/5, 19/4, 20/2, 21/2, 23/4,

44/1, …

Boston, MA …

Branch index

Gwen, a programmer

Comprehensive index

Branch Family Name Given Name Pages

Austin Jeffries Leslie 11/2

Austin Kew Nelson 124/2

Austin Samuelson Steven 13/4

Austin Simpson Zaphod 98/1

Comprehensive index

Family Name Given Name Branch Pages

Adams Leslie Bakersville 44/2

Adams Nelson Boston 17/2

Adams Steven Bakersville 44/1

Adams Zaphod Austin 72/3

1. Prefer compound indexes

Full-time index

Full-time Pages

Y 1/7, 1/8 2/1, 2/2, 2/3, 2/4, 2/5…

N 1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 2/6, …

2. Prefer selective predicates

Comprehensive index

Family Name Given Name Branch Pages

Adams Leslie Bakersville 44/2

Adams Nelson Boston 17/2

Adams Steven Bakersville 44/1

Adams Zaphod Austin 72/3

3. Consider a covering index

Database index

1979-05-27

Tampa

1979-06-07

Grover

1979-07-08

Austin

1979-02-01 Austin

1979-02-12 Houston

1979-02-17

Bakersville

1979-03-08 Tampa

1979-05-27 Tampa

1979-05-29 Newtown

1979-05-29 Seattle

1979-06-04 Seattle

1979-06-07 Grover

1979-06-14

South End

1979-06-15 Newtown

1979-06-16 London

Database index

Austin 1990-01-01

Bakersville

1981-06-01

Bakersville

1985-06-04

Austin1980-01-

01

Austin1981-06-

01

Austin1983-01-

12

Austin1987-11-

31

Austin1990-01-

01

Austin1991-06-

01

Bakersville

1979-08-03

Bakersville

1979-09-07

Bakersville

1981-06-01

Bakersville

1982-12-13

Bakersville

1983-03-28

Bakersville

1985-01-01

4. Be subtle with compound index

predicates• Put covering fields you don’t search on at the

end

• Put a field to do range searches on next

• Put other fields you search on at the front in order of use and selectivity

• Unless you’re dealing with different searches, in which case compromise, or use multiple compound indexes

WhoCo

Employee Register

Senior Personnel Directory

DivisionFamily Name

Given Name Branch Pages

Accounting Adams Leslie Bakersville 44/2

Accounting Adams Nelson Boston 17/1

R&D Yu Stephen Austin 72/5

CREATE INDEXpersonnel_main_office_idx

ONpersonnel (

division,family_name,given_name)

WHEREdivision IS NOT NULL;

Partial index

SELECT*

FROMpersonnel

WHEREdivision IS NOT NULL

ANDdivision = ‘R&D’;

MySQL

CREATE INDEXpersonnel_name_lc

ONpersonnel (

lower(family_name),lower(given_name));

Functional index

SELECT*

FROMpersonnel

WHERElower(family_name) =

‘smith’ ANDlower(given_name) =

‘mary’;

MySQL

Comprehensive index

Family Name Given NameStreet

Address Pages

Smith Leslie 13 Main St 44/2

Smith Nelson 11a Rails St 17/2

Smith Steven 19 Main St 44/1

Smithers Zaphod 12 Rails St 72/3

SELECT*

FROMpersonnel

WHERElower(family_name) like ‘smi

%’;

SELECT*

FROMpersonnel

WHERElower(street_addr) like ‘%

st’;

CREATE INDEXpersonnel_addr_rev

ONpersonnel (

reverse(lower(street_addr)));

Functional index tricks

SELECT*

FROMpersonnel

WHEREreverse(lower(street_addr)) like ‘ts

%’;

MySQL

Sorting• Need same predicates in WHERE and ORDER

BY

• Index can be used in either direction

• But not both…

• not SELECT… ORDER BY family_name DESC, branch ASC

• Use Postgres’ CREATE INDEX… NULLS LAST (or FIRST)

Partial Results• LIMIT n or FETCH FIRST n

• Need pipelined ORDER BY

• Otherwise we have to sort the whole result set

• Database dependent

• Postgres window functions aren’t pipelined

Paginating“SELECT

*FROM

peopleORDER BY

hire_dateLIMIT

50OFFSET

1000"

Paginating: much better

People.where “

(hire_date, creation_date) >(:hire_date, :creation_date)”,

{hire_date: @last_person.hire_date,creation_date:

@last_person.creation_date}.order(:hire_date, :creation_date).limit: 50

.where “(hire_date > :hire_date)OR(hire_date = :hire_date ANDcreation_date > :creation_date)”

Otherdatabases

Possiblyor hire_date IS NULL

Postgres

Clustering

• Based on existing index

• Physically reorganize the heap in index order

• Reduce seek time for blocks of records

• One-off procedure; new records not clustered

• Locks table for duration

Use case: partitioning

• use Postgres table inheritance

• eg log data, very fast, last 30 days searchable

• split table into numerous identical subtables

• cluster a table once we’re done writing to it

Joins

Three Join Methods• Nested Loop

• Query 1 table; loop and query the other

• Hash Join

• Avoid nested loop cost by making hash of one side

• Merge Join

• Query each table, sort and compare

MySQL

MySQL

Nested Loop Join

• Index for each side separately

Hash Join

• Not for range joins; only equality

• No point indexing join fields

• Index independent predicates

• Reduce memory footprint of hash table by minimizing number of fields or rows

MySQL

Merge Join• Less often used because sort is expensive

• Mainly for outer joins

• Or when an index on both sides speeds sorting

• same index for where and order by

• No point indexing join fields, only independent predicates

• Reduce memory footprint by minimizing number of fields or rows

MySQL

Index Data Structures

Index Data Structures

• B-Tree

• “Normal” index type

• equality and range searches

• Hash

• Faster for equality

• Test

Index Data Structures

• GIST

• Variety of data types with complex operators eg spatial contains/near/etc

• hstore

• SP-GIST

• Similar to GIST but better for some data types

• GIN

• more-or-less GIST

• (3x) more space and (3x) slower insert gives faster (3x) search

• Some data types only support one

MySQL

Dynamic SQL• Prepared statement or bind queries lets

server cache query plan

• SQL Server or Oracle

• Save query plan time

• Usually saves time, but doesn’t consider values

• Uneven distributions suffer

MySQL

Postgres

Statistics

ALTER TABLEtable

ALTER COLUMNcolumn

SETSTATISTICS 1000

Finguyren@relevantlogic.com

top related