understanding of database indexes. • In general! Not specific to a particular DB • A basic understanding is usually good enough • Developers need to know how to make queries fast • The rest of the DB can be a black box.
like a map or guide that allows you to quickly find the data that you’re looking for. • But let’s start from the basics. Pretend we don’t care or know about how DBs work.
any field. • Say you have a new, large table called clients: pk name age gender 1 Binh Dang 23 Male 2 Connor Tan 34 Male … … … … • All tables have an index on create: the primary key index
block, in a heap structure • No relationships between table blocks or rows • Is not sorted. col1 col2 col3 Binh 23 A Connor 45 X Vivian 12 X Bob 98 A Visual example of table block
45 X Vivian 12 X Bob 98 A col1 col2 col3 Sean 23 A Loki 18 X Thor 33 X Trung 29 A col2 rowID 12 3A 2F 13 2F AE 18 2C 50 23 5B 78 col2 rowID 26 65 2F 29 2F 0E 33 3D A0 33 5B F9 Index Leaf Nodes
45 X Vivian 12 X Bob 98 A col1 col2 col3 Sean 23 A Loki 18 X Thor 33 X Trung 29 A col2 rowID 12 3A 2F 13 2F AE 18 2C 50 23 5B 78 col2 rowID 26 65 2F 29 2F 0E 33 3D A0 33 5B F9 Index Leaf Nodes
45 X Vivian 12 X Bob 98 A col1 col2 col3 Sean 23 A Loki 18 X Thor 33 X Trung 29 A Index Leaf Nodes col2 rowID 12 3A 2F 13 2F AE 18 2C 50 23 5B 78 col2 rowID 26 65 2F 29 2F 0E 33 3D A0 33 5B F9
45 X Vivian 12 X Bob 98 A col1 col2 col3 Sean 23 A Loki 18 X Thor 54 X Trung 29 A col2 rowID 12 3A 2F 13 2F AE 18 2C 50 23 5B 78 col2 rowID 26 65 2F 26 2F 0E … .. .. 54 5B F9 Index Leaf Nodes
query 1. Tree traversal - O(logn), fast 2. Leaf node chain traversal - O(n), slow 3. Table data retrieval - not stored physically in same location, slow
query 1. Tree traversal - O(logn), fast 2. Leaf node chain traversal - O(n), slow 3. Table data retrieval - not stored physically in same location, slow
to get age = 20 (fast) • Leaf node chain traversal = ? (if many, slow) • Table data retrieve (if thousands of rows, slow). SELECT * FROM clients WHERE age >= 20 AND age <= 50
single block in the table. • Can sometimes be more efficient, if you’re returning a large % of the data. • Why? DB executes multi-block reads, optimises for fewer read operations compared to index scan SELECT * FROM clients
is executing your queries? • Use the EXPLAIN statement. • Just add in front of the query; all RDBMs have some version of this. EXPLAIN SELECT * FROM clients WHERE age = 34
eq_ref - tree traversal only, unique index • ref/range - tree traversal, then leaf node traversal • index - the entire index is scanned (leaf node traversal) • full - full table scan, everything is read
leaf node chain traversal - O(n), slow 3. Range leaf node chain traversal - O(k), ok la 4. Full table scan Takeaway: understand your RDBM’s equivalent to:
more work to do. • Find a table block to store the new data • Update the index (e.g. balance the tree) for each index on the table! SQL Performance Explained, Page 160
affected by size of table and number of indexes • MySQL documentation: size of table slows down index insert by log(N) • Point: don’t add redundant or unnecessary indexes.
multiple columns • Consider the following table: • We query first_name and last_name a lot pk first_name last_name age 1 Binh Nguyen 23 2 Connor Tan 34 … … …
last_name • Are the following two indexes the same? ALTER TABLE `clients` ADD INDEX (`first_name`, `last_name`); ALTER TABLE `clients` ADD INDEX (`last_name`, `first_name`);
• SELECT * FROM clients WHERE first_name = `Binh` AND last_name = `Nguyen` • This query will not benefit from the index: • SELECT * FROM clients WHERE last_name = `Nguyen` ALTER TABLE `clients` ADD INDEX (`first_name`, `last_name`);
Connor Chan Connor Tan last_name first_name Nguyen Huy Nguyen Trung Nguyen Tuan Tan Jonathan Tan Bob (`first_name`, `last_name`) (`last_name`, `first_name`)
Connor Chan Connor Tan last_name first_name Nguyen Huy Nguyen Trung Nguyen Tuan Tan Jonathan Tan Bob (`first_name`, `last_name`) (` SELECT * FROM clients WHERE first_name = `Binh` AND last_name = `Nguyen`
Connor Chan Connor Tan last_name first_name Nguyen Huy Nguyen Trung Nguyen Tuan Tan Jonathan Tan Bob (`first_name`, `last_name`) (` SELECT * FROM clients WHERE last_name = `Nguyen` ??
Connor Chan Connor Tan last_name first_name Nguyen Huy Nguyen Trung Nguyen Tuan Tan Jonathan Tan Bob (` (`last_name`, `first_name`) SELECT * FROM clients WHERE last_name = `Nguyen`
• e.g. client has many groups • Principle: index in a way such that the left-most index is always used. pk client_id group_id 1 1 3 2 1 4 (client_id, group_id)
UPPER, LOWER • If you have an index of `name` and do this: • SELECT * FROM client WHERE UPPER(name) = `HUY NGUYEN` • will it use the index? • Answer: NO!
at the result of functions • But in some DBs you can create a function-based index: • CREATE INDEX up_name ON clients (UPPER(name)) • MySQL < 5.6 does not have function-based indexing. LOL
How do indexes work? • What does an indexed query consist of? • The EXPLAIN statement • Drawbacks of indexes • Concatenated indexes • Indexing database functions