Slide 1

Slide 1 text

MySQL Index

Slide 2

Slide 2 text

MySQL Server Overview Clients Connection/Thread Handler Query Cache Parser Optimizer Storage Engine (InnoDB)

Slide 3

Slide 3 text

What is a Database Index? ● a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure (Wikipedia)

Slide 4

Slide 4 text

How Table Data Is Stored 19 17 15 13 0C 06 Field Start Offsets /* First Row */ 00 00 78 0D 02 BF Extra Bytes 00 00 00 00 04 21 System Column #1 00 00 00 00 09 2A System Column #2 80 00 00 00 2D 00 84 System Column #3 50 50 Field1 'PP' 50 50 Field2 'PP' 50 50 Field3 'PP' 16 15 14 13 0C 06 Field Start Offsets /* Second Row */ 00 00 80 0D 02 E1 Extra Bytes 00 00 00 00 04 22 System Column #1 00 00 00 00 09 2B 80 System Column #2 00 00 00 2D 00 84 System Column #3 51 Field1 'Q' 51 Field2 'Q' 51 Field3 'Q' 94 94 14 13 0C 06 Field Start Offsets /* Third Row */ 00 00 88 0D 00 74 Extra Bytes 00 00 00 00 04 23 System Column #1 00 00 00 00 09 2C System Column #2 80 00 00 00 2D 00 84 System Column #3 52 Field1 'R' CREATE TABLE T (FIELD1 VARCHAR(3), FIELD2 VARCHAR(3), FIELD3 VARCHAR(3)) Type=InnoDB; INSERT INTO T VALUES ('PP', 'PP', 'PP'); INSERT INTO T VALUES ('Q', 'Q', 'Q'); INSERT INTO T VALUES ('R', NULL, NULL);

Slide 5

Slide 5 text

B+tree Node ● N-1 keys ● N references to child node ● All keys are sorted Leaf node ● points to data in the disk ● same distance from the root B+ Tree with N = 100 with tree depth 5 can support 10,000,000,000 records

Slide 6

Slide 6 text

B+tree (Read) ● Read 10 ● Read 5 ● Read 10 ~ 15

Slide 7

Slide 7 text

B+tree (Insert) ● Insert 6 ● Insert 30 6 30 30

Slide 8

Slide 8 text

Benefits of Indexes ● Reduce the amount of scan on the disk ● Avoid sorting ● Random IO -> Sequential IO

Slide 9

Slide 9 text

Why Not Index Every Column(Index shotgun!) ● Some are never used ● Takes space on the disk and memory buffer ● Overhead during insert/update/delete ● Overhead from query optimizer

Slide 10

Slide 10 text

Types of Indexes ● Clustered (primary key) ○ Leaf node holds physical page of the row ● Secondary ○ Leaf node holds primary key value and selected column ● Composite (multicolumn) ○ index(a,b) ● Covering ○ all data selected in query are ‘covered’ by indexes ● Fulltext Index (CHAR, VARCHAR, TEXT) ○ Inverted index (not b+tree)

Slide 11

Slide 11 text

Composite Index CREATE TABLE test ( id INT NOT NULL, last_name CHAR(30) NOT NULL, first_name CHAR(30) NOT NULL, PRIMARY KEY (id), INDEX name (last_name,first_name) ); Good SELECT * FROM test WHERE last_name='Widenius' AND first_name='Michael'; SELECT * FROM test WHERE last_name='Widenius' AND (first_name='Michael' OR first_name='Monty'); Bad SELECT * FROM test WHERE first_name='Michael'; SELECT * FROM test WHERE last_name='Widenius' OR first_name='Michael';

Slide 12

Slide 12 text

Covering Index ● all data selected in query are ‘covered’ by indexes CREATE TABLE test2 ( id INT NOT NULL, Continent CHAR(30) NOT NULL, Population CHAR(30) NOT NULL, Name CHAR(30) NOT NULL, PRIMARY KEY (id), INDEX name (Contient ,Population, Name) ); Good SELECT Name FROM test2 WHERE Continent=’Asia’' AND Population > 100000; Bad SELECT * FROM test2 WHERE Continent='Asia';

Slide 13

Slide 13 text

Fulltext Index (Inverted Index)

Slide 14

Slide 14 text

Selecting Primary Key ● Best Choice (in my opinion): unsigned int with auto_increment ● “Natural” keys such as social security number ● UUID/GUID/MD5 ● No PK

Slide 15

Slide 15 text

Index Design ● Design index around queries ● Profile slow queries ● Dive deep with analysis tool

Slide 16

Slide 16 text

Questions?

Slide 17

Slide 17 text

Cost of Sparse Column ● A column that mostly holds NULL and rarely holds real data