explain explain

60
Explaining EXPLAIN: Queries and what happens when you execute them. Kristian Köhntopp [email protected]

Upload: isotopp

Post on 18-Jul-2015

918 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Explain explain

Explaining EXPLAIN: Queries and what happens when you execute them.

Kristian Köhntopp

[email protected]

Page 2: Explain explain

What is an index?

Page 3: Explain explain

Library Example

Page 4: Explain explain

How to find data… in a library

• Assume the following: • We have one million books in no particular order. • Task #1: Find all books written by Larry Wall. • Task #2: Do we have any books by Larry Wall at all?

• How many books do you need to touch to answer?

4

Page 5: Explain explain

How to find data… fast.

Page 6: Explain explain

Logarithmic complexity

• 2x = n,for n = 1 000, 1 000 000, 1 000 000 000 books

• 210 = 1024 ~ 1 000 (Kilo) • 220 ~ 1 000 000 (Mega) • 230 ~ 1 000 000 000 (Giga)

• x = log2(n) = ln(n)/ln(2) lookups to a sorted author file.

6

Page 7: Explain explain

The Index…7

.MYD file

RecordRecordRecordRecordRecordRecordRecorddeletedRecordRecorddeletedRecord

.MYI File

Index 1 Index 2

Page 8: Explain explain

Logarithmic complexity

• One Index Block = Fanout > 2

• size(block)/size(index record) • e.g. 16K/32 = 512

• Fanout 50-500

8

.MYD file

RecordRecordRecordRecordRecordRecordRecorddeletedRecordRecorddeletedRecord

.MYI File

Index 1 Index 2

Page 9: Explain explain

How many lookups to find anything in a database?

• Bad case:x = ln(1 000 000 000)/l(50)

• = 5.2 • Good case:

x = ln(1 000 000)/l(500) • = 2.2

9

Page 10: Explain explain

How many lookups to find anything in a database?

• Bad case:x = ln(1 000 000 000)/l(50)

• = 5.2 • Good case:

x = ln(1 000 000)/l(500) • = 2.2

9

~4 is not a bad estimation in many cases

Page 11: Explain explain

The number of disk accesses to find anything through an index (no caching)

3-410

Page 12: Explain explain

Going to the disk

Page 13: Explain explain

Access times12

Page 14: Explain explain

Access times13

Page 15: Explain explain

Access times13

Page 16: Explain explain

Access times

• Disk • Linear read/write: 100-200 MB/s • 200k - 400k records/s, 500 Byte records

• Random read/write: 100-200 IOPS (5-10ms) • Yes, that’s about million times slower than RAM.

14

Page 17: Explain explain

Access times

• SSD • Linear read/write: 200-400 MB/s • 400k - 800k records/s, 500 Byte records

• Random read/write: 10k-20k IOPS

15

Page 18: Explain explain

Fast queries are aboutnot going to the disk.

16

Page 19: Explain explain

Missing indexes?

Page 20: Explain explain

Am I using enough indexes?18

Page 21: Explain explain

Am I using enough indexes?18

Bad

Page 22: Explain explain

Am I using enough indexes?

• show global status like ‘handler_%’;

19

Page 23: Explain explain

Am I using enough indexes?

• show global status like ‘handler_%’;

19

Index

Scan

Page 24: Explain explain

Am I using enough indexes?

• Handler_read_first + Handler_read_key + Handler_read_next + Handler_read_prev

• Handler_read_rnd + Handler_read_rnd_next

20

Index

Scan

Page 25: Explain explain

An index is missing…

• … and data cannot be kept in memory: • Plenty of disk read activity • “Slow queries”

• … and data can be kept in memory: • High CPU, maybe not even slow query log entries

21

Page 26: Explain explain

sys Schema

Page 27: Explain explain

PERFORMANCE_SCHEMA (“P_S”)

• P_S appeared in 5.5, useful since 5.6 • Wait-Free, Allocation-Free Interface

• “rather be inaccurate or incomplete than slow down the server”

• Hard to read units, hard to decode references • Query Interface (“select * from …”)

23

Page 28: Explain explain

sys

• http://github.com/MarkLeith/mysql-sys • started out as ps_helper, renamed “sys” • Part of a standard install since 5.7.7

• Series of views on p_s, installed in sys.* • additional stored procedures to handle p_s config

24

Page 29: Explain explain

sys

• All views as name and x$name • name for human consumption • useful units and formatting

• x$name for programs • same data, no units and formatting

25

Page 30: Explain explain

sys

• Am I using enough indexes? • schema_tables_with_full_table_scans • statements_with_full_table_scans

26

Page 31: Explain explain

Running Queries…

Page 32: Explain explain

SELECT * FROM a JOIN b ON a.aid = b.aid WHERE <acond> AND <bcond>

• MySQL always uses “nested loop join” • for each row in a matching <acond>

• look up aid in b, using b.aid index, • filter more, using <bcond>, …

• If you can use indices: • Fast and memory efficient, streamable

• Otherwise: Horrible.

28

Page 33: Explain explain

EXPLAIN output

• Fixed plan: “Nested Loop Join” • Variables:

• Table order • Lookup methods

29

Page 34: Explain explain

EXPLAIN this…

• EXPLAIN SELECT …FROM B_Rate_Room_Directory brrdJOIN B_CP_PolicyGroupType pgt ON (pg.pg_type_id = pgt.pg_type_id )

JOIN B_CP_PolicyGroup pg ON ( pg.object_id = brrd.default_policy_group_id and pg.eff_status='C' )

JOIN Translation t ON (t.item_id = pg.pg_type_id and t.language = 'en' and t.groupname='policy_group_name' )

WHERE brrd.active <> 0 and brrd.room_id = '25725602' and brrd.default_policy_group_id IS NOT NULL and brrd.default_policy_group_id <> 0

30

Page 35: Explain explain

EXPLAIN this…31

Page 36: Explain explain

EXPLAIN this…31

all simple joins, no unions or subselects.

Page 37: Explain explain

EXPLAIN this…31

Join order, may be different

from order of tables as written in the SELECT

Page 38: Explain explain

EXPLAIN this…31

Connection between tables

Page 39: Explain explain

EXPLAIN this…31

Keys that could be used to resolve

the query

Page 40: Explain explain

EXPLAIN this…31

Actual key chosen and the length of the prefix in bytes

Page 41: Explain explain

EXPLAIN this…31

Badness (2x1x1314x1) number of lookup

operations

Page 42: Explain explain

EXPLAIN this…31

Additional messages about query resolution

Page 43: Explain explain

EXPLAIN this…31

Page 44: Explain explain

Join Types

• The type can be:

• system, const - table has at most one matching row.The optimizer replaces the table with the value.

• eq_ref - compare using '=' operator, using a 1:1 relationship.

• ref, ref_or_null - using an index, but not on a 1:1 basis.

• fulltext - using a MyISAM FULLTEXT index.

• index_merge - using two (or more) indexes, intersecting the two partial results.

32

Page 45: Explain explain

Join Types

• The type can be:

• range - using a part of the index to select values. SELECT … FROM t WHERE a < …SELECT … FROM t WHERE id IN (…, …, …)SELECT … FROM t WHERE v BETWEEN … AND …

• index - A full scan on an index. possible_keys is NULL, keys is not.

• ALL - A full table scan.

33

Page 46: Explain explain

An unusual example

• select translation_id, translation, statusfrom B_CP_Translationwhere language_code = 'fr' and project_id = '0'

34

No compound index,

two individual indices

Page 47: Explain explain

An unusual example• Explain Result:

id: 1select_type: SIMPLEtable: B_CP_Translationtype: index_mergepossible_keys: language_code,project_id key: language_code,project_id key_len: 6,4ref: NULLrows: 734Extra: Using intersect(language_code,project_id); Using where

35

Page 48: Explain explain

Extra: Temporary tables

• Things that are not so good: • using filesort - extra sorting is necessary, rows not retrieved in

requested order.This can force an extra lookup, or a table materialization (using temporary also shown).

• using temporary - the result is pushed into a temp table at some point, and then processed further. The temp table can be in memory or go to disk.

36

Page 49: Explain explain

Extra: Temporary Tables37

10:1 or better

Page 50: Explain explain

Extra: Temporary Tables

• sys query: • select *

from sys.statements_with_temp_tables where db not in ('sys', 'mysql', 'information_schema', ‘performance_schema');

38

Page 51: Explain explain

Other extra notices

• Not so bad news:

• Impossible WHERE noticed after reading const tables - there is no work left to do.

• Select tables optimzied away - the query contained aggregates that could be resolved at the optimizer stage already. There is no work left to do.

• Using index - Covering index, data need not be read.

• Distinct - implementing 'distinct' operator using a lose index scan.

• Using where - a where clause is being used to filter down a result from an index even more.

39

Page 52: Explain explain

When and how to index

Page 53: Explain explain

When to index

• Things to remember:

• The point of an index: Less disk operations. • Rows are on disk in pages, data is always read in pages. • If a wanted row is in a page, all of the page is read.

41

Page 54: Explain explain

When to index

• Selectivity: • The percentage of rows selected by a condition.

• SELECT id, name FROM Person WHERE sex = ‘f' • Cardinality: 2 (or 3, when nullable) • Assumption: Equidistribution. Is it valid for our data?

42

Page 55: Explain explain

INDEX(a,b,c)

• Compound index on three columns. • Used left to right,

for many conjunction (AND) of equalities/eq_ref/ref and one trailing relatation/range

• One point or subtree in the index.

43

Page 56: Explain explain

INDEX(a,b,c)

• WHERE a = 10 AND b = 20 AND c = 30 • uses (a,b,c)

• WHERE a = 10 AND b = 20 ORDER BY c • uses (a,b,c)

• WHERE a = 10 AND b < 20 ORDER BY c • uses (a,b)

44

Page 57: Explain explain

INDEX(a,b,c)

• WHERE a<10 AND b = 10 • uses (a)

• WHERE a<10 AND b<10 • uses (a)

• WHERE (a-10)<0 • no index used

45

Page 58: Explain explain

Are we done, yet?

Page 59: Explain explain

Is my query optimized enough?47

Remember this badness calculation?It is supposed to be the number of lookups.

How many lookups are minimal?

Page 60: Explain explain

Is my query optimized enough?48

2 x 1 x 1314 x 1 ~ 2600 Lookups

Raw (before grouping, limit) result set size?

2600 lookups for 2600 result rows are ok. 2600 lookups for 26 result rows probably have potential for optimization.