Slide 1

Slide 1 text

Test Slide sample code font {:;} Grokking Engineering

Slide 2

Slide 2 text

The Absolute Minimum Every Software Developer Needs To Know About Database Indexes

Slide 3

Slide 3 text

>whoami • Cedric Chin (aka Eli James) - @ejames_c • Project Manager, Floating Cube Studios (D7) • Programmer, not a database expert • But a heavy database user • Most of us are heavy database users

Slide 4

Slide 4 text

Relational Database Management Systems

Slide 5

Slide 5 text

This Talk • You will walk away with a basic understanding of database indexes. • In general! Not specific to a particular DB • A basic understanding is usually good enough • Developers need to know how to make queries fast • The rest of the DB can be a black box.

Slide 6

Slide 6 text

What are indexes?

Slide 7

Slide 7 text

Indexes Make Queries Fast • Think: how do you look up words in a dictionary? • You use an alphabetical index

Slide 8

Slide 8 text

Indexes Make Queries Fast • Indexes are conceptually similar. • But unlike a dictionary, databases are constantly updated • This means that indexes have to be constantly updated.

Slide 9

Slide 9 text

Indexes Make Queries Fast • In a RDBMs, indexes are like a map or guide that allows you to quickly find the data that you’re looking for. • But let’s start from the basics. Pretend we don’t care or know about how DBs work.

Slide 10

Slide 10 text

How does using an index feel like? A day in the life of Joe Random Dev

Slide 11

Slide 11 text

Creating an Index • You can create an index on any field. • Say you have a new, large table called clients: pk name age gender 1 Binh Dang 23 Male 2 Connor Tan 34 Male … … … … • All tables have an index on create: the primary key index

Slide 12

Slide 12 text

Creating an Index ALTER TABLE `clients` ADD INDEX (`age`); the database does some magic …

Slide 13

Slide 13 text

Creating an Index Now:
 SELECT * FROM clients WHERE age = 34
 is fast! pk name age gender 1 Binh Dang 23 Male 2 Connor Tan 34 Male … … … …

Slide 14

Slide 14 text

So how do indexes work? Data Structures Ahead

Slide 15

Slide 15 text

Two Data Structures • All database indexes consist of two data structures: • A doubly linked list • A balanced search tree • Note: this is just basic stuff! Real implementation has modifications.

Slide 16

Slide 16 text

Table Data • Database data is not organised sequentially on disk like books in a library. • (Or pages in a dictionary.) • Instead it is stored in blocks all across the disk.

Slide 17

Slide 17 text

Table Data • Table data is stored in a table block, in a heap structure • No relationships between table blocks or rows • Is not sorted. col1 col2 col3 Binh 23 A Connor 45 X Vivian 12 X Bob 98 A Visual example of table block

Slide 18

Slide 18 text

Ordered Table Data col1 col2 col3 Binh 67 A Connor 45 X Vivian 12 X Bob 98 A col1 col2 col3 Sean 23 A Loki 18 X Thor 33 X Trung 29 A col2 rowID 12 3A 2F 13 2F AE 18 2C 50 23 5B 78 col2 rowID 26 65 2F 29 2F 0E 33 3D A0 33 5B F9 Index Leaf Nodes

Slide 19

Slide 19 text

Ordered Table Data col1 col2 col3 Binh 67 A Connor 45 X Vivian 12 X Bob 98 A col1 col2 col3 Sean 23 A Loki 18 X Thor 33 X Trung 29 A col2 rowID 12 3A 2F 13 2F AE 18 2C 50 23 5B 78 col2 rowID 26 65 2F 29 2F 0E 33 3D A0 33 5B F9 Index Leaf Nodes

Slide 20

Slide 20 text

Ordered Table Data col1 col2 col3 Binh 67 A Connor 45 X Vivian 12 X Bob 98 A col1 col2 col3 Sean 23 A Loki 18 X Thor 33 X Trung 29 A Index Leaf Nodes col2 rowID 12 3A 2F 13 2F AE 18 2C 50 23 5B 78 col2 rowID 26 65 2F 29 2F 0E 33 3D A0 33 5B F9

Slide 21

Slide 21 text

Ordered Table Data • A database uses a sorted doubly linked list to keep track of order • Doubly linked list means that the DB can traverse back and forth. • O(n) transversal

Slide 22

Slide 22 text

Balanced Search Tree • The index leaf nodes are connected by a balanced search tree • The b-tree has equal depth at every point • Searching the b-tree is O(logn) • B-tree growth is also O(logn)

Slide 23

Slide 23 text

Balanced Search Tree • In practice: 4, 5 depth of the b-tree is millions of records. • 6 layers and up is rarely seen. • Takeaway: b-trees are fast.

Slide 24

Slide 24 text

Balanced Search Tree Root node Branch nodes

Slide 25

Slide 25 text

Balanced Search Tree Look for 54

Slide 26

Slide 26 text

Balanced Search Tree Look for 54

Slide 27

Slide 27 text

Balanced Search Tree Look for 54

Slide 28

Slide 28 text

Balanced Search Tree Look for 54

Slide 29

Slide 29 text

Ordered Table Data col1 col2 col3 Binh 67 A Connor 45 X Vivian 12 X Bob 98 A col1 col2 col3 Sean 23 A Loki 18 X Thor 54 X Trung 29 A col2 rowID 12 3A 2F 13 2F AE 18 2C 50 23 5B 78 col2 rowID 26 65 2F 26 2F 0E … .. .. 54 5B F9 Index Leaf Nodes

Slide 30

Slide 30 text

Database Query • There are 3 steps to a database query 1. Tree traversal - O(logn), fast 2. Leaf node chain traversal - O(n), slow 3. Table data retrieval - not stored physically in same location, slow

Slide 31

Slide 31 text

Database Query • There are 3 steps to a database query 1. Tree traversal - O(logn), fast 2. Leaf node chain traversal - O(n), slow 3. Table data retrieval - not stored physically in same location, slow

Slide 32

Slide 32 text

Database Queries • Assuming we have indexed `age` • Tree-traversal (fast) • Table data retrieve (1 row only, fast). SELECT * FROM clients WHERE age = 34


Slide 33

Slide 33 text

Database Queries • Assuming we have indexed `age` • Tree-traversal to get age = 20 (fast) • Leaf node chain traversal = ? (if many, slow) • Table data retrieve (if thousands of rows, slow). SELECT * FROM clients WHERE age >= 20 AND age <= 50

Slide 34

Slide 34 text

The Full Table Scan • The DB simply returns every single block in the table. • Can sometimes be more efficient, if you’re returning a large % of the data. • Why? DB executes multi-block reads, optimises for fewer read operations compared to index scan SELECT * FROM clients

Slide 35

Slide 35 text

EXPLAIN • How do you actually know how the DB is executing your queries? • Use the EXPLAIN statement. • Just add in front of the query; all RDBMs have some version of this. EXPLAIN SELECT * FROM clients WHERE age = 34


Slide 36

Slide 36 text

EXPLAIN select_type table type possible_keys key rows extra SIMPLE clients range age age 1342561 Using where MySQL example; different for others EXPLAIN SELECT * FROM clients WHERE age >= 20 AND age <= 50

Slide 37

Slide 37 text

EXPLAIN select_type table type possible_keys key rows extra SIMPLE clients range age age 1341 Using where MySQL example; different for others EXPLAIN SELECT * FROM clients WHERE age >= 20 AND age <= 50

Slide 38

Slide 38 text

MySQL types • Some of the more important types: • eq_ref - tree traversal only, unique index • ref/range - tree traversal, then leaf node traversal • index - the entire index is scanned (leaf node traversal) • full - full table scan, everything is read

Slide 39

Slide 39 text

EXPLAIN EXPLAIN SELECT * FROM clients WHERE age = 34
 select_type table type possible_keys key rows extra SIMPLE clients ref age age 1 MySQL example; different for others

Slide 40

Slide 40 text

EXPLAIN EXPLAIN SELECT * FROM clients WHERE id = 2
 select_type table type possible_keys key rows extra SIMPLE clients const PRIMARY PRIMARY 1 MySQL example; different for others

Slide 41

Slide 41 text

Database Query 1. Tree traversal - O(logn), fast 2. Full leaf node chain traversal - O(n), slow 3. Range leaf node chain traversal - O(k), ok la 4. Full table scan Takeaway: understand your RDBM’s equivalent to:

Slide 42

Slide 42 text

What drawbacks? INSERT, UPDATE, DELETE

Slide 43

Slide 43 text

INSERT • Adding an index means INSERT operations now have more work to do. • Find a table block to store the new data • Update the index (e.g. balance the tree) for each index on the table! SQL Performance Explained, Page 160

Slide 44

Slide 44 text

INSERT • In practice, not that bad. • Speed is affected by size of table and number of indexes • MySQL documentation: size of table slows down index insert by log(N) • Point: don’t add redundant or unnecessary indexes.

Slide 45

Slide 45 text

DELETE • DELETE benefits from the WHERE clause. • Is like a SELECT, but with the extra step of deleting row and rebalancing the index, for each index on the table

Slide 46

Slide 46 text

UPDATE • UPDATE performance = DELETE + INSERT

Slide 47

Slide 47 text

More complex indexes Stuff you should read up on.

Slide 48

Slide 48 text

Concatenated Indexes • A concatenated index is an index over multiple columns • Consider the following table: • We query first_name and last_name a lot pk first_name last_name age 1 Binh Nguyen 23 2 Connor Tan 34 … … …

Slide 49

Slide 49 text

Concatenated Indexes • We want to index the first_name and last_name • Are the following two indexes the same? ALTER TABLE `clients` ADD INDEX (`first_name`, `last_name`); ALTER TABLE `clients` ADD INDEX (`last_name`, `first_name`);

Slide 50

Slide 50 text

Concatenated Indexes • This query will benefit from the index: • SELECT * FROM clients WHERE first_name = `Binh` AND last_name = `Nguyen` • This query will not benefit from the index: • SELECT * FROM clients WHERE last_name = `Nguyen` ALTER TABLE `clients` ADD INDEX (`first_name`, `last_name`);

Slide 51

Slide 51 text

Concatenated Indexes first_name last_name Binh Dang Binh Nguyen Binh Pham Connor Chan Connor Tan last_name first_name Nguyen Huy Nguyen Trung Nguyen Tuan Tan Jonathan Tan Bob (`first_name`, `last_name`) (`last_name`, `first_name`)

Slide 52

Slide 52 text

Concatenated Indexes first_name last_name Binh Nguyen Binh Dang Binh Pham Connor Chan Connor Tan last_name first_name Nguyen Huy Nguyen Trung Nguyen Tuan Tan Jonathan Tan Bob (`first_name`, `last_name`) (` SELECT * FROM clients WHERE first_name = `Binh` AND last_name = `Nguyen`

Slide 53

Slide 53 text

Concatenated Indexes first_name last_name Binh Dang Binh Nguyen Binh Pham Connor Chan Connor Tan last_name first_name Nguyen Huy Nguyen Trung Nguyen Tuan Tan Jonathan Tan Bob (`first_name`, `last_name`) (` SELECT * FROM clients WHERE last_name = `Nguyen` ??

Slide 54

Slide 54 text

Concatenated Indexes first_name last_name Binh Dang Binh Nguyen Binh Pham Connor Chan Connor Tan last_name first_name Nguyen Huy Nguyen Trung Nguyen Tuan Tan Jonathan Tan Bob (` (`last_name`, `first_name`) SELECT * FROM clients WHERE last_name = `Nguyen`

Slide 55

Slide 55 text

How is this useful? • Useful when you have associations • e.g. client has many groups • Principle: index in a way such that the left-most index is always used. pk client_id group_id 1 1 3 2 1 4 (client_id, group_id)

Slide 56

Slide 56 text

Functions & Indexes • Some databases have functions. • e.g.: UPPER, LOWER • If you have an index of `name` and do this: • SELECT * FROM client WHERE UPPER(name) = `HUY NGUYEN` • will it use the index? • Answer: NO!

Slide 57

Slide 57 text

Functions & Indexes • The index is unable to look at the result of functions • But in some DBs you can create a function-based index: • CREATE INDEX up_name ON clients (UPPER(name)) • MySQL < 5.6 does not have function-based indexing. LOL

Slide 58

Slide 58 text

Conclusion (Yay)

Slide 59

Slide 59 text

What Did You Learn? • What is an index? • How do indexes work? • What does an indexed query consist of? • The EXPLAIN statement • Drawbacks of indexes • Concatenated indexes • Indexing database functions

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

More Stuff • http://use-the-index-luke.com/ • There is quite a bit more to indexes than this talk • But the basics are now covered • Enjoy speedier queries =)