indexes: the second pillar of database wisdom

IndexesThe second pillar of database wisdom

Guyren Howe(“Guy”)

guyren@relevantlogic.com

“MySQL doesn’t support this”

What everyone knows about indexes

• Trade insert/update/delete time for query time

• Trade storage space for query time

Other things everyone knows about indexes

• Index all join columns

• Index for every field you search

• Indexes should be rebuilt regularly

• Most selective first

• Dynamic SQL is slow

• Faster computers execute queries faster

Nonsense some folks believe

http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

Howe’s LawBusinesses can’t afford the

servers they really need

Corollary to Howe’s Law

Your database is never fast enough

Query• not a program

• balances:

• random vs sequential I/O

• CPU use

• memory use

• optimization is np hard

“Or” and complex queries

• AND queries, particularly in applications are the most common type of query

• A query involving OR clauses can usually be considered as combining the results of two or more AND queries

• Complex to give general advice

• Can be reason to stick to small, simple indexes

How to use an index

Phone Book Theory of Indexing

Employee Register

Smith Andrew Houston 555-413-6182

Adams Maggie Houston 555-111-8943

Medville Anne Los Angeles 555-413-1183

Samuels Fred New York 555-112-8943

Jacob, a programmer

Family Name Pages

Adams 17/3, 44/1, 44/5 72/4

Andrews 1/3, 19/5, 44/5

Ambers 11/2

Amberson19/1, 32/1, 32/2, 32/3, 32/4,

Family name index

Given Name Pages

Adam 13/3, 14/3, 124/4

Ainsley 12/3

Aston 134/2, 135/2

Atley 19/5, 32/1, 32/3, 32/4, 99/1

Given name index

Office Pages

Austin, TX

3/1, 4/1, 5/2, 6/5, 7/4, 11/4, 15/5, 18/3, 19/3, 20/2, 21/3, 22/1, 23/1, 25/4, 26/5, 27/2, 28/4, 29/1, 30/1 31, 33, 34…

Bakersville, MD1/1, 2/2, 3/3, 4/3, 7/2, 11/3, 15/5, 19/4, 20/2, 21/2, 23/4,

44/1, …

Boston, MA …

Branch index

Gwen, a programmer

Comprehensive index

Branch Family Name Given Name Pages

Austin Jeffries Leslie 11/2

Austin Kew Nelson 124/2

Austin Samuelson Steven 13/4

Austin Simpson Zaphod 98/1

Comprehensive index

Family Name Given Name Branch Pages

Adams Leslie Bakersville 44/2

Adams Nelson Boston 17/2

Adams Steven Bakersville 44/1

Adams Zaphod Austin 72/3

1. Prefer compound indexes

Full-time index

Full-time Pages

Y 1/7, 1/8 2/1, 2/2, 2/3, 2/4, 2/5…

N 1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 2/6, …

2. Prefer selective predicates

Comprehensive index

Family Name Given Name Branch Pages

Adams Leslie Bakersville 44/2

Adams Nelson Boston 17/2

Adams Steven Bakersville 44/1

Adams Zaphod Austin 72/3

3. Consider a covering index

Database index

1979-05-27

1979-06-07

Grover

1979-07-08

Austin

1979-02-01 Austin

1979-02-12 Houston

1979-02-17

Bakersville

1979-03-08 Tampa

1979-05-27 Tampa

1979-05-29 Newtown

1979-05-29 Seattle

1979-06-04 Seattle

1979-06-07 Grover

1979-06-14

South End

1979-06-15 Newtown

1979-06-16 London

Database index

Austin 1990-01-01

Bakersville

1981-06-01

Bakersville

1985-06-04

Austin1980-01-

Austin1981-06-

Austin1983-01-

Austin1987-11-

Austin1990-01-

Austin1991-06-

Bakersville

1979-08-03

Bakersville

1979-09-07

Bakersville

1981-06-01

Bakersville

1982-12-13

Bakersville

1983-03-28

Bakersville

1985-01-01

4. Be subtle with compound index

predicates• Put covering fields you don’t search on at the

• Put a field to do range searches on next

• Put other fields you search on at the front in order of use and selectivity

• Unless you’re dealing with different searches, in which case compromise, or use multiple compound indexes

Employee Register

Senior Personnel Directory

DivisionFamily Name

Given Name Branch Pages

Accounting Adams Leslie Bakersville 44/2

Accounting Adams Nelson Boston 17/1

R&D Yu Stephen Austin 72/5

CREATE INDEXpersonnel_main_office_idx

ONpersonnel (

division,family_name,given_name)

WHEREdivision IS NOT NULL;

Partial index

SELECT*

FROMpersonnel

WHEREdivision IS NOT NULL

ANDdivision = ‘R&D’;

CREATE INDEXpersonnel_name_lc

ONpersonnel (

lower(family_name),lower(given_name));

Functional index

SELECT*

FROMpersonnel

WHERElower(family_name) =

‘smith’ ANDlower(given_name) =

‘mary’;

Comprehensive index

Family Name Given NameStreet

Address Pages

Smith Leslie 13 Main St 44/2

Smith Nelson 11a Rails St 17/2

Smith Steven 19 Main St 44/1

Smithers Zaphod 12 Rails St 72/3

SELECT*

FROMpersonnel

WHERElower(family_name) like ‘smi

SELECT*

FROMpersonnel

WHERElower(street_addr) like ‘%

st’;

CREATE INDEXpersonnel_addr_rev

ONpersonnel (

reverse(lower(street_addr)));

Functional index tricks

SELECT*

FROMpersonnel

WHEREreverse(lower(street_addr)) like ‘ts

Sorting• Need same predicates in WHERE and ORDER

• Index can be used in either direction

• But not both…

• not SELECT… ORDER BY family_name DESC, branch ASC

• Use Postgres’ CREATE INDEX… NULLS LAST (or FIRST)

Partial Results• LIMIT n or FETCH FIRST n

• Need pipelined ORDER BY

• Otherwise we have to sort the whole result set

• Database dependent

• Postgres window functions aren’t pipelined

Paginating“SELECT

peopleORDER BY

hire_dateLIMIT

50OFFSET

Paginating: much better

People.where “

(hire_date, creation_date) >(:hire_date, :creation_date)”,

{hire_date: @last_person.hire_date,creation_date:

@last_person.creation_date}.order(:hire_date, :creation_date).limit: 50

.where “(hire_date > :hire_date)OR(hire_date = :hire_date ANDcreation_date > :creation_date)”

Otherdatabases

Possiblyor hire_date IS NULL

Postgres

Clustering

• Based on existing index

• Physically reorganize the heap in index order

• Reduce seek time for blocks of records

• One-off procedure; new records not clustered

• Locks table for duration

Use case: partitioning

• use Postgres table inheritance

• eg log data, very fast, last 30 days searchable

• split table into numerous identical subtables

• cluster a table once we’re done writing to it

Three Join Methods• Nested Loop

• Query 1 table; loop and query the other

• Hash Join

• Avoid nested loop cost by making hash of one side

• Merge Join

• Query each table, sort and compare

Nested Loop Join

• Index for each side separately

Hash Join

• Not for range joins; only equality

• No point indexing join fields

• Index independent predicates

• Reduce memory footprint of hash table by minimizing number of fields or rows

Merge Join• Less often used because sort is expensive

• Mainly for outer joins

• Or when an index on both sides speeds sorting

• same index for where and order by

• No point indexing join fields, only independent predicates

• Reduce memory footprint by minimizing number of fields or rows

Index Data Structures

• B-Tree

• “Normal” index type

• equality and range searches

• Hash

• Faster for equality

• Test

Index Data Structures

• GIST

• Variety of data types with complex operators eg spatial contains/near/etc

• hstore

• SP-GIST

• Similar to GIST but better for some data types

• GIN

• more-or-less GIST

• (3x) more space and (3x) slower insert gives faster (3x) search

• Some data types only support one

Dynamic SQL• Prepared statement or bind queries lets

server cache query plan

• SQL Server or Oracle

• Save query plan time

• Usually saves time, but doesn’t consider values

• Uneven distributions suffer

Postgres

Statistics

ALTER TABLEtable

ALTER COLUMNcolumn

SETSTATISTICS 1000

Finguyren@relevantlogic.com

indexes: the second pillar of database wisdom

Software

files indexes

unodc opioid strategy - oecd · regard, unodc a five-pillar...

wisdom commentary:...

bridges gp capitalisation report (v14) · pillar...

msci hedged indexes msci daily hedged indexes msci...

wisdom teeth removal...wisdom teeth removal receive $150 off...

1 chapter 5 index and clustering. 2 overview of indexes and...

web data management indexes. in this lecture indexes –xset...

wisdom of crowds business intelligence market study tm...

variant indexes

function-based indexes partitioned indexes

declarative indexes

cost indexes

mongo indexes

indexes last

chapter 14-1 chapter outline types of single-level ordered...

indexes, dividends

indexes 11. 1 cse2132 database systems week 11 lecture...

cogi indexes

dow jones sustainability indexes · 2 key facts – sam...