database indexes

Upload: emoosx

Post on 07-Jan-2016

16 views

Category:

Documents


0 download

DESCRIPTION

database

TRANSCRIPT

  • Test Slidesample code font

    {:;}

    Grokking Engineering

  • The Absolute Minimum Every Software Developer Needs To

    Know About Database Indexes

  • >whoami Cedric Chin (aka Eli James) - @ejames_c

    Project Manager, Floating Cube Studios (D7)

    Programmer, not a database expert

    But a heavy database user

    Most of us are heavy database users

  • Relational Database Management Systems

  • This Talk You will walk away with a basic understanding of

    database indexes.

    In general! Not specific to a particular DB

    A basic understanding is usually good enough

    Developers need to know how to make queries fast

    The rest of the DB can be a black box.

  • What are indexes?

  • Indexes Make Queries Fast

    Think: how do you look up words in a dictionary?

    You use an alphabetical index

  • Indexes Make Queries Fast

    Indexes are conceptually similar.

    But unlike a dictionary, databases are constantly updated

    This means that indexes have to be constantly updated.

  • Indexes Make Queries Fast

    In a RDBMs, indexes are like a map or guide that allows you to quickly find the data that youre looking for.

    But lets start from the basics. Pretend we dont care or know about how DBs work.

  • How does using an index feel like?A day in the life of Joe Random Dev

  • Creating an Index You can create an index on any field.

    Say you have a new, large table called clients:

    pk name age gender1 Binh Dang 23 Male2 Connor Tan 34 Male

    All tables have an index on create: the primary key index

  • Creating an Index

    ALTER TABLE `clients` ADD INDEX (`age`);

    the database does some magic

  • Creating an IndexNow: SELECT * FROM clients WHERE age = 34 is fast!

    pk name age gender1 Binh Dang 23 Male2 Connor Tan 34 Male

  • So how do indexes work?

    Data Structures Ahead

  • Two Data Structures

    All database indexes consist of two data structures:

    A doubly linked list

    A balanced search tree

    Note: this is just basic stuff! Real implementation has modifications.

  • Table Data

    Database data is not organised sequentially on disk like books in a library.

    (Or pages in a dictionary.)

    Instead it is stored in blocks all across the disk.

  • Table Data Table data is stored in a

    table block, in a heap structure

    No relationships between table blocks or rows

    Is not sorted.

    col1 col2 col3Binh 23 A

    Connor 45 XVivian 12 XBob 98 A

    Visual example of table block

  • Ordered Table Datacol1 col2 col3Binh 67 A

    Connor 45 XVivian 12 XBob 98 A

    col1 col2 col3Sean 23 ALoki 18 XThor 33 XTrung 29 A

    col2 rowID12 3A 2F13 2F AE18 2C 5023 5B 78

    col2 rowID26 65 2F29 2F 0E33 3D A033 5B F9

    Index Leaf Nodes

  • Ordered Table Datacol1 col2 col3Binh 67 A

    Connor 45 XVivian 12 XBob 98 A

    col1 col2 col3Sean 23 ALoki 18 XThor 33 XTrung 29 A

    col2 rowID12 3A 2F13 2F AE18 2C 5023 5B 78

    col2 rowID26 65 2F29 2F 0E33 3D A033 5B F9

    Index Leaf Nodes

  • Ordered Table Datacol1 col2 col3Binh 67 A

    Connor 45 XVivian 12 XBob 98 A

    col1 col2 col3Sean 23 ALoki 18 XThor 33 XTrung 29 A

    Index Leaf Nodes col2 rowID12 3A 2F13 2F AE18 2C 5023 5B 78

    col2 rowID26 65 2F29 2F 0E33 3D A033 5B F9

  • Ordered Table Data

    A database uses a sorted doubly linked list to keep track of order

    Doubly linked list means that the DB can traverse back and forth.

    O(n) transversal

  • Balanced Search Tree The index leaf nodes

    are connected by a balanced search tree

    The b-tree has equal depth at every point

    Searching the b-tree is O(logn)

    B-tree growth is also O(logn)

  • Balanced Search Tree In practice: 4, 5

    depth of the b-tree is millions of records.

    6 layers and up is rarely seen.

    Takeaway: b-trees are fast.

  • Balanced Search TreeRoot node

    Branch nodes

  • Balanced Search TreeLook for 54

  • Balanced Search TreeLook for 54

  • Balanced Search TreeLook for 54

  • Balanced Search TreeLook for 54

  • Ordered Table Datacol1 col2 col3Binh 67 A

    Connor 45 XVivian 12 XBob 98 A

    col1 col2 col3Sean 23 ALoki 18 XThor 54 XTrung 29 A

    col2 rowID12 3A 2F13 2F AE18 2C 5023 5B 78

    col2 rowID26 65 2F26 2F 0E .. ..54 5B F9

    Index Leaf Nodes

  • Database Query

    There are 3 steps to a database query

    1. Tree traversal - O(logn), fast

    2. Leaf node chain traversal - O(n), slow

    3. Table data retrieval - not stored physically in same location, slow

  • Database Query

    There are 3 steps to a database query

    1. Tree traversal - O(logn), fast

    2. Leaf node chain traversal - O(n), slow

    3. Table data retrieval - not stored physically in same location, slow

  • Database Queries

    Assuming we have indexed `age`

    Tree-traversal (fast)

    Table data retrieve (1 row only, fast).

    SELECT * FROM clients WHERE age = 34

  • Database Queries

    Assuming we have indexed `age`

    Tree-traversal to get age = 20 (fast)

    Leaf node chain traversal = ? (if many, slow)

    Table data retrieve (if thousands of rows, slow).

    SELECT * FROM clients WHERE age >= 20 AND age

  • The Full Table Scan

    The DB simply returns every single block in the table.

    Can sometimes be more efficient, if youre returning a large % of the data.

    Why? DB executes multi-block reads, optimises for fewer read operations compared to index scan

    SELECT * FROM clients

  • EXPLAIN How do you actually know how the DB is executing

    your queries?

    Use the EXPLAIN statement.

    Just add in front of the query; all RDBMs have some version of this.

    EXPLAIN SELECT * FROM clients WHERE age = 34

  • EXPLAIN

    select_type table type possible_keys key rows extra

    SIMPLE clients range age age 1342561 Using where

    MySQL example; different for others

    EXPLAIN SELECT * FROM clients WHERE age >= 20 AND age

  • EXPLAIN

    select_type table type possible_keys key rows extra

    SIMPLE clients range age age 1341 Using where

    MySQL example; different for others

    EXPLAIN SELECT * FROM clients WHERE age >= 20 AND age

  • MySQL types Some of the more important types:

    eq_ref - tree traversal only, unique index

    ref/range - tree traversal, then leaf node traversal

    index - the entire index is scanned (leaf node traversal)

    full - full table scan, everything is read

  • EXPLAINEXPLAIN SELECT * FROM clients WHERE age = 34

    select_type table type possible_keys key rows extra

    SIMPLE clients ref age age 1

    MySQL example; different for others

  • EXPLAINEXPLAIN SELECT * FROM clients WHERE id = 2

    select_type table type possible_keys key rows extra

    SIMPLE clients const PRIMARY PRIMARY 1

    MySQL example; different for others

  • Database Query

    1. Tree traversal - O(logn), fast

    2. Full leaf node chain traversal - O(n), slow

    3. Range leaf node chain traversal - O(k), ok la

    4. Full table scan

    Takeaway: understand your RDBMs equivalent to:

  • What drawbacks?INSERT, UPDATE, DELETE

  • INSERT Adding an index means INSERT operations now

    have more work to do. Find a table block to store the new data Update the index (e.g. balance the tree) for

    each index on the table!

    SQL Performance Explained, Page 160

  • INSERT In practice, not that bad.

    Speed is affected by size of table and number of indexes

    MySQL documentation: size of table slows down index insert by log(N)

    Point: dont add redundant or unnecessary indexes.

  • DELETE

    DELETE benefits from the WHERE clause.

    Is like a SELECT, but with the extra step of deleting row and rebalancing the index, for each index on the table

  • UPDATE

    UPDATE performance = DELETE + INSERT

  • More complex indexesStuff you should read up on.

  • Concatenated Indexes A concatenated index is an index over multiple

    columns

    Consider the following table:

    We query first_name and last_name a lot

    pk first_name last_name age1 Binh Nguyen 232 Connor Tan 34

  • Concatenated Indexes We want to index the first_name and last_name

    Are the following two indexes the same?

    ALTER TABLE `clients` ADD INDEX (`first_name`, `last_name`);

    ALTER TABLE `clients` ADD INDEX (`last_name`, `first_name`);

  • Concatenated Indexes

    This query will benefit from the index:

    SELECT * FROM clients WHERE first_name = `Binh` AND last_name = `Nguyen`

    This query will not benefit from the index:

    SELECT * FROM clients WHERE last_name = `Nguyen`

    ALTER TABLE `clients` ADD INDEX (`first_name`, `last_name`);

  • Concatenated Indexes

    first_name last_name

    Binh Dang

    Binh Nguyen

    Binh Pham

    Connor Chan

    Connor Tan

    last_name first_name

    Nguyen Huy

    Nguyen Trung

    Nguyen Tuan

    Tan Jonathan

    Tan Bob

    (`first_name`, `last_name`) (`last_name`, `first_name`)

  • Concatenated Indexes

    first_name last_name

    Binh Nguyen

    Binh Dang

    Binh Pham

    Connor Chan

    Connor Tan

    last_name first_name

    Nguyen Huy

    Nguyen Trung

    Nguyen Tuan

    Tan Jonathan

    Tan Bob

    (`first_name`, `last_name`) (`

    SELECT * FROM clients WHERE first_name = `Binh` AND last_name = `Nguyen`

  • Concatenated Indexes

    first_name last_name

    Binh Dang

    Binh Nguyen

    Binh Pham

    Connor Chan

    Connor Tan

    last_name first_name

    Nguyen Huy

    Nguyen Trung

    Nguyen Tuan

    Tan Jonathan

    Tan Bob

    (`first_name`, `last_name`) (`

    SELECT * FROM clients WHERE last_name = `Nguyen`

    ??

  • Concatenated Indexes

    first_name last_name

    Binh Dang

    Binh Nguyen

    Binh Pham

    Connor Chan

    Connor Tan

    last_name first_name

    Nguyen Huy

    Nguyen Trung

    Nguyen Tuan

    Tan Jonathan

    Tan Bob

    (` (`last_name`, `first_name`)

    SELECT * FROM clients WHERE last_name = `Nguyen`

  • How is this useful? Useful when you have associations

    e.g. client has many groups

    Principle: index in a way such that the left-most index is always used.

    pk client_id group_id

    1 1 3

    2 1 4

    (client_id, group_id)

  • Functions & Indexes Some databases have functions.

    e.g.: UPPER, LOWER

    If you have an index of `name` and do this:

    SELECT * FROM client WHERE UPPER(name) = `HUY NGUYEN`

    will it use the index?

    Answer: NO!

  • Functions & Indexes The index is unable to look at the result of functions

    But in some DBs you can create a function-based index:

    CREATE INDEX up_name ON clients (UPPER(name))

    MySQL < 5.6 does not have function-based indexing. LOL

  • Conclusion(Yay)

  • What Did You Learn? What is an index?

    How do indexes work?

    What does an indexed query consist of?

    The EXPLAIN statement

    Drawbacks of indexes

    Concatenated indexes

    Indexing database functions

  • More Stuff

    http://use-the-index-luke.com/

    There is quite a bit more to indexes than this talk

    But the basics are now covered

    Enjoy speedier queries =)