explain explain

Explaining EXPLAIN: Queries and what happens when you execute them.

Kristian Köhntopp

[email protected]

mailto:[email protected]

What is an index?

Library Example

How to find data… in a library

• Assume the following: • We have one million books in no particular order. • Task #1: Find all books written by Larry Wall. • Task #2: Do we have any books by Larry Wall at all?

• How many books do you need to touch to answer?

4

How to find data… fast.

Logarithmic complexity

• 2x = n,for n = 1 000, 1 000 000, 1 000 000 000 books

• 210 = 1024 ~ 1 000 (Kilo) • 220 ~ 1 000 000 (Mega) • 230 ~ 1 000 000 000 (Giga)

• x = log2(n) = ln(n)/ln(2) lookups to a sorted author file.

6

The Index…7

.MYD file

RecordRecordRecordRecordRecordRecordRecorddeletedRecordRecorddeletedRecord

.MYI File

Index 1 Index 2

Logarithmic complexity

• One Index Block = Fanout > 2

• size(block)/size(index record) • e.g. 16K/32 = 512

• Fanout 50-500

8

.MYD file

RecordRecordRecordRecordRecordRecordRecorddeletedRecordRecorddeletedRecord

.MYI File

Index 1 Index 2

How many lookups to find anything in a database?

• Bad case:x = ln(1 000 000 000)/l(50)

• = 5.2 • Good case:

x = ln(1 000 000)/l(500) • = 2.2

9

How many lookups to find anything in a database?

• Bad case:x = ln(1 000 000 000)/l(50)

• = 5.2 • Good case:

x = ln(1 000 000)/l(500) • = 2.2

9

~4 is not a bad estimation in many cases

The number of disk accesses to find anything through an index (no caching)

3-410

Going to the disk

Access times12

Access times13

Access times

• Disk • Linear read/write: 100-200 MB/s • 200k - 400k records/s, 500 Byte records

• Random read/write: 100-200 IOPS (5-10ms) • Yes, that’s about million times slower than RAM.

14

Access times

• SSD • Linear read/write: 200-400 MB/s • 400k - 800k records/s, 500 Byte records

• Random read/write: 10k-20k IOPS

15

Fast queries are aboutnot going to the disk.

16

Missing indexes?

Am I using enough indexes?18

Am I using enough indexes?18

Bad

Am I using enough indexes?

• show global status like ‘handler_%’;

19


• show global status like ‘handler_%’;

19

Index

Scan


• Handler_read_first + Handler_read_key + Handler_read_next + Handler_read_prev

• Handler_read_rnd + Handler_read_rnd_next

20

Index

Scan

An index is missing…

• … and data cannot be kept in memory: • Plenty of disk read activity • “Slow queries”

• … and data can be kept in memory: • High CPU, maybe not even slow query log entries

21

sys Schema

PERFORMANCE_SCHEMA (“P_S”)

• P_S appeared in 5.5, useful since 5.6 • Wait-Free, Allocation-Free Interface

• “rather be inaccurate or incomplete than slow down the server”

• Hard to read units, hard to decode references • Query Interface (“select * from …”)

23

sys

• http://github.com/MarkLeith/mysql-sys • started out as ps_helper, renamed “sys” • Part of a standard install since 5.7.7

• Series of views on p_s, installed in sys.* • additional stored procedures to handle p_s config

24

http://github.com/MarkLeith/mysql-sys

sys

• All views as name and x$name • name for human consumption • useful units and formatting

• x$name for programs • same data, no units and formatting

25

sys

• Am I using enough indexes? • schema_tables_with_full_table_scans • statements_with_full_table_scans

26

Running Queries…

SELECT * FROM a JOIN b ON a.aid = b.aid WHERE <acond> AND <bcond>

• MySQL always uses “nested loop join” • for each row in a matching <acond>

• look up aid in b, using b.aid index, • filter more, using <bcond>, …

• If you can use indices: • Fast and memory efficient, streamable

• Otherwise: Horrible.

28

EXPLAIN output

• Fixed plan: “Nested Loop Join” • Variables:

• Table order • Lookup methods

29

EXPLAIN this…

• EXPLAIN SELECT …FROM B_Rate_Room_Directory brrdJOIN B_CP_PolicyGroupType pgt ON (pg.pg_type_id = pgt.pg_type_id )

JOIN B_CP_PolicyGroup pg ON ( pg.object_id = brrd.default_policy_group_id and pg.eff_status='C' )

JOIN Translation t ON (t.item_id = pg.pg_type_id and t.language = 'en' and t.groupname='policy_group_name' )

WHERE brrd.active <> 0 and brrd.room_id = '25725602' and brrd.default_policy_group_id IS NOT NULL and brrd.default_policy_group_id <> 0

30

EXPLAIN this…31

EXPLAIN this…31

all simple joins, no unions or subselects.

EXPLAIN this…31

Join order, may be different

from order of tables as written in the SELECT

EXPLAIN this…31

Connection between tables

EXPLAIN this…31

Keys that could be used to resolve

the query

EXPLAIN this…31

Actual key chosen and the length of the prefix in bytes

EXPLAIN this…31

Badness (2x1x1314x1) number of lookup

operations

EXPLAIN this…31

Additional messages about query resolution

EXPLAIN this…31

Join Types

• The type can be:

• system, const - table has at most one matching row.The optimizer replaces the table with the value.

• eq_ref - compare using '=' operator, using a 1:1 relationship.

• ref, ref_or_null - using an index, but not on a 1:1 basis.

• fulltext - using a MyISAM FULLTEXT index.

• index_merge - using two (or more) indexes, intersecting the two partial results.

32

Join Types

• The type can be:

• range - using a part of the index to select values. SELECT … FROM t WHERE a < …SELECT … FROM t WHERE id IN (…, …, …)SELECT … FROM t WHERE v BETWEEN … AND …

• index - A full scan on an index. possible_keys is NULL, keys is not.

• ALL - A full table scan.

33

An unusual example

• select translation_id, translation, statusfrom B_CP_Translationwhere language_code = 'fr' and project_id = '0'

34

No compound index,

two individual indices

An unusual example• Explain Result:

id: 1select_type: SIMPLEtable: B_CP_Translationtype: index_mergepossible_keys: language_code,project_id key: language_code,project_id key_len: 6,4ref: NULLrows: 734Extra: Using intersect(language_code,project_id); Using where

35

Extra: Temporary tables

• Things that are not so good: • using filesort - extra sorting is necessary, rows not retrieved in

requested order.This can force an extra lookup, or a table materialization (using temporary also shown).

• using temporary - the result is pushed into a temp table at some point, and then processed further. The temp table can be in memory or go to disk.

36

Extra: Temporary Tables37

10:1 or better

Extra: Temporary Tables

• sys query: • select *

from sys.statements_with_temp_tables where db not in ('sys', 'mysql', 'information_schema', ‘performance_schema');

38

Other extra notices

• Not so bad news:

• Impossible WHERE noticed after reading const tables - there is no work left to do.

• Select tables optimzied away - the query contained aggregates that could be resolved at the optimizer stage already. There is no work left to do.

• Using index - Covering index, data need not be read.

• Distinct - implementing 'distinct' operator using a lose index scan.

• Using where - a where clause is being used to filter down a result from an index even more.

39

When and how to index

When to index

• Things to remember:

• The point of an index: Less disk operations. • Rows are on disk in pages, data is always read in pages. • If a wanted row is in a page, all of the page is read.

41

When to index

• Selectivity: • The percentage of rows selected by a condition.

• SELECT id, name FROM Person WHERE sex = ‘f' • Cardinality: 2 (or 3, when nullable) • Assumption: Equidistribution. Is it valid for our data?

42

INDEX(a,b,c)

• Compound index on three columns. • Used left to right,

for many conjunction (AND) of equalities/eq_ref/ref and one trailing relatation/range

• One point or subtree in the index.

43

INDEX(a,b,c)

• WHERE a = 10 AND b = 20 AND c = 30 • uses (a,b,c)

• WHERE a = 10 AND b = 20 ORDER BY c • uses (a,b,c)

• WHERE a = 10 AND b < 20 ORDER BY c • uses (a,b)

44

INDEX(a,b,c)

• WHERE a<10 AND b = 10 • uses (a)

• WHERE a<10 AND b<10 • uses (a)

• WHERE (a-10)<0 • no index used

45

Are we done, yet?

Is my query optimized enough?47

Remember this badness calculation?It is supposed to be the number of lookups.

How many lookups are minimal?

Is my query optimized enough?48

2 x 1 x 1314 x 1 ~ 2600 Lookups

Raw (before grouping, limit) result set size?

2600 lookups for 2600 result rows are ok. 2600 lookups for 26 result rows probably have potential for optimization.

explain explain

Data & Analytics

rnd handler

formatting x

giga x

prev handler

key handler

missing indexes

index block

next20indexscanan index