Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Absolute Minimum Every Software Developer Needs To Know About Database Indexes

The Absolute Minimum Every Software Developer Needs To Know About Database Indexes

Eli James (Cedric Chin)

December 06, 2014
Tweet

Other Decks in Technology

Transcript

  1. Test Slide
    sample code font
    {:;}
    Grokking Engineering

    View full-size slide

  2. The Absolute Minimum Every
    Software Developer Needs To
    Know About Database Indexes

    View full-size slide

  3. >whoami
    • Cedric Chin (aka Eli James) - @ejames_c
    • Project Manager, Floating Cube Studios (D7)
    • Programmer, not a database expert
    • But a heavy database user
    • Most of us are heavy database users

    View full-size slide

  4. Relational Database
    Management Systems

    View full-size slide

  5. This Talk
    • You will walk away with a basic understanding of
    database indexes.
    • In general! Not specific to a particular DB
    • A basic understanding is usually good enough
    • Developers need to know how to make queries fast
    • The rest of the DB can be a black box.

    View full-size slide

  6. What are indexes?

    View full-size slide

  7. Indexes Make Queries Fast
    • Think: how do you look up
    words in a dictionary?
    • You use an alphabetical
    index

    View full-size slide

  8. Indexes Make Queries Fast
    • Indexes are conceptually similar.
    • But unlike a dictionary, databases are constantly
    updated
    • This means that indexes have to be constantly
    updated.

    View full-size slide

  9. Indexes Make Queries Fast
    • In a RDBMs, indexes are like a
    map or guide that allows you
    to quickly find the data that
    you’re looking for.
    • But let’s start from the basics.
    Pretend we don’t care or know
    about how DBs work.

    View full-size slide

  10. How does using an
    index feel like?
    A day in the life of Joe Random Dev

    View full-size slide

  11. Creating an Index
    • You can create an index on any field.
    • Say you have a new, large table called clients:
    pk name age gender
    1 Binh Dang 23 Male
    2 Connor Tan 34 Male
    … … … …
    • All tables have an index on create: the primary key
    index

    View full-size slide

  12. Creating an Index
    ALTER TABLE `clients` ADD INDEX (`age`);
    the database does some magic …

    View full-size slide

  13. Creating an Index
    Now:

    SELECT * FROM clients WHERE age = 34

    is fast!
    pk name age gender
    1 Binh Dang 23 Male
    2 Connor Tan 34 Male
    … … … …

    View full-size slide

  14. So how do indexes
    work?
    Data Structures Ahead

    View full-size slide

  15. Two Data Structures
    • All database indexes consist of two data structures:
    • A doubly linked list
    • A balanced search tree
    • Note: this is just basic stuff! Real implementation
    has modifications.

    View full-size slide

  16. Table Data
    • Database data is not
    organised sequentially on
    disk like books in a library.
    • (Or pages in a dictionary.)
    • Instead it is stored in blocks
    all across the disk.

    View full-size slide

  17. Table Data
    • Table data is stored in a
    table block, in a heap
    structure
    • No relationships
    between table blocks
    or rows
    • Is not sorted.
    col1 col2 col3
    Binh 23 A
    Connor 45 X
    Vivian 12 X
    Bob 98 A
    Visual example of table block

    View full-size slide

  18. Ordered Table Data
    col1 col2 col3
    Binh 67 A
    Connor 45 X
    Vivian 12 X
    Bob 98 A
    col1 col2 col3
    Sean 23 A
    Loki 18 X
    Thor 33 X
    Trung 29 A
    col2 rowID
    12 3A 2F
    13 2F AE
    18 2C 50
    23 5B 78
    col2 rowID
    26 65 2F
    29 2F 0E
    33 3D A0
    33 5B F9
    Index Leaf Nodes

    View full-size slide

  19. Ordered Table Data
    col1 col2 col3
    Binh 67 A
    Connor 45 X
    Vivian 12 X
    Bob 98 A
    col1 col2 col3
    Sean 23 A
    Loki 18 X
    Thor 33 X
    Trung 29 A
    col2 rowID
    12 3A 2F
    13 2F AE
    18 2C 50
    23 5B 78
    col2 rowID
    26 65 2F
    29 2F 0E
    33 3D A0
    33 5B F9
    Index Leaf Nodes

    View full-size slide

  20. Ordered Table Data
    col1 col2 col3
    Binh 67 A
    Connor 45 X
    Vivian 12 X
    Bob 98 A
    col1 col2 col3
    Sean 23 A
    Loki 18 X
    Thor 33 X
    Trung 29 A
    Index Leaf Nodes
    col2 rowID
    12 3A 2F
    13 2F AE
    18 2C 50
    23 5B 78
    col2 rowID
    26 65 2F
    29 2F 0E
    33 3D A0
    33 5B F9

    View full-size slide

  21. Ordered Table Data
    • A database uses a sorted
    doubly linked list to keep
    track of order
    • Doubly linked list means
    that the DB can traverse
    back and forth.
    • O(n) transversal

    View full-size slide

  22. Balanced Search Tree
    • The index leaf nodes
    are connected by a
    balanced search tree
    • The b-tree has equal
    depth at every point
    • Searching the b-tree is
    O(logn)
    • B-tree growth is also
    O(logn)

    View full-size slide

  23. Balanced Search Tree
    • In practice: 4, 5
    depth of the b-tree is
    millions of records.
    • 6 layers and up is
    rarely seen.
    • Takeaway: b-trees
    are fast.

    View full-size slide

  24. Balanced Search Tree
    Root node
    Branch nodes

    View full-size slide

  25. Balanced Search Tree
    Look for 54

    View full-size slide

  26. Balanced Search Tree
    Look for 54

    View full-size slide

  27. Balanced Search Tree
    Look for 54

    View full-size slide

  28. Balanced Search Tree
    Look for 54

    View full-size slide

  29. Ordered Table Data
    col1 col2 col3
    Binh 67 A
    Connor 45 X
    Vivian 12 X
    Bob 98 A
    col1 col2 col3
    Sean 23 A
    Loki 18 X
    Thor 54 X
    Trung 29 A
    col2 rowID
    12 3A 2F
    13 2F AE
    18 2C 50
    23 5B 78
    col2 rowID
    26 65 2F
    26 2F 0E
    … .. ..
    54 5B F9
    Index Leaf Nodes

    View full-size slide

  30. Database Query
    • There are 3 steps to a database query
    1. Tree traversal - O(logn), fast
    2. Leaf node chain traversal - O(n), slow
    3. Table data retrieval - not stored physically in
    same location, slow

    View full-size slide

  31. Database Query
    • There are 3 steps to a database query
    1. Tree traversal - O(logn), fast
    2. Leaf node chain traversal - O(n), slow
    3. Table data retrieval - not stored physically in
    same location, slow

    View full-size slide

  32. Database Queries
    • Assuming we have indexed `age`
    • Tree-traversal (fast)
    • Table data retrieve (1 row only, fast).
    SELECT * FROM clients WHERE age = 34


    View full-size slide

  33. Database Queries
    • Assuming we have indexed `age`
    • Tree-traversal to get age = 20 (fast)
    • Leaf node chain traversal = ? (if many, slow)
    • Table data retrieve (if thousands of rows, slow).
    SELECT * FROM clients WHERE age >= 20
    AND age <= 50

    View full-size slide

  34. The Full Table Scan
    • The DB simply returns every single block in the
    table.
    • Can sometimes be more efficient, if you’re returning
    a large % of the data.
    • Why? DB executes multi-block reads, optimises for
    fewer read operations compared to index scan
    SELECT * FROM clients

    View full-size slide

  35. EXPLAIN
    • How do you actually know how the DB is executing
    your queries?
    • Use the EXPLAIN statement.
    • Just add in front of the query; all RDBMs have
    some version of this.
    EXPLAIN SELECT * FROM clients WHERE age = 34


    View full-size slide

  36. EXPLAIN
    select_type table type possible_keys key rows extra
    SIMPLE clients range age age 1342561
    Using
    where
    MySQL example; different for others
    EXPLAIN SELECT * FROM clients WHERE age
    >= 20 AND age <= 50

    View full-size slide

  37. EXPLAIN
    select_type table type possible_keys key rows extra
    SIMPLE clients range age age 1341
    Using
    where
    MySQL example; different for others
    EXPLAIN SELECT * FROM clients WHERE age
    >= 20 AND age <= 50

    View full-size slide

  38. MySQL types
    • Some of the more important types:
    • eq_ref - tree traversal only, unique index
    • ref/range - tree traversal, then leaf node traversal
    • index - the entire index is scanned (leaf node
    traversal)
    • full - full table scan, everything is read

    View full-size slide

  39. EXPLAIN
    EXPLAIN SELECT * FROM clients WHERE age = 34

    select_type table type possible_keys key rows extra
    SIMPLE clients ref age age 1
    MySQL example; different for others

    View full-size slide

  40. EXPLAIN
    EXPLAIN SELECT * FROM clients WHERE id = 2

    select_type table type possible_keys key rows extra
    SIMPLE clients const PRIMARY PRIMARY 1
    MySQL example; different for others

    View full-size slide

  41. Database Query
    1. Tree traversal - O(logn), fast
    2. Full leaf node chain traversal - O(n), slow
    3. Range leaf node chain traversal - O(k), ok la
    4. Full table scan
    Takeaway: understand your RDBM’s equivalent to:

    View full-size slide

  42. What drawbacks?
    INSERT, UPDATE, DELETE

    View full-size slide

  43. INSERT
    • Adding an index means INSERT operations now
    have more work to do.
    • Find a table block to store the new data
    • Update the index (e.g. balance the tree) for
    each index on the table!
    SQL Performance Explained,
    Page 160

    View full-size slide

  44. INSERT
    • In practice, not that bad.
    • Speed is affected by size of table and number of
    indexes
    • MySQL documentation: size of table slows
    down index insert by log(N)
    • Point: don’t add redundant or unnecessary
    indexes.

    View full-size slide

  45. DELETE
    • DELETE benefits from the WHERE clause.
    • Is like a SELECT, but with the extra step of deleting
    row and rebalancing the index, for each index on
    the table

    View full-size slide

  46. UPDATE
    • UPDATE performance = DELETE + INSERT

    View full-size slide

  47. More complex indexes
    Stuff you should read up on.

    View full-size slide

  48. Concatenated Indexes
    • A concatenated index is an index over multiple
    columns
    • Consider the following table:
    • We query first_name and last_name a lot
    pk first_name last_name age
    1 Binh Nguyen 23
    2 Connor Tan 34
    … … …

    View full-size slide

  49. Concatenated Indexes
    • We want to index the first_name and last_name
    • Are the following two indexes the same?
    ALTER TABLE `clients` ADD INDEX (`first_name`,
    `last_name`);
    ALTER TABLE `clients` ADD INDEX (`last_name`,
    `first_name`);

    View full-size slide

  50. Concatenated Indexes
    • This query will benefit from the index:
    • SELECT * FROM clients WHERE first_name =
    `Binh` AND last_name = `Nguyen`
    • This query will not benefit from the index:
    • SELECT * FROM clients WHERE last_name =
    `Nguyen`
    ALTER TABLE `clients` ADD INDEX (`first_name`,
    `last_name`);

    View full-size slide

  51. Concatenated Indexes
    first_name last_name
    Binh Dang
    Binh Nguyen
    Binh Pham
    Connor Chan
    Connor Tan
    last_name first_name
    Nguyen Huy
    Nguyen Trung
    Nguyen Tuan
    Tan Jonathan
    Tan Bob
    (`first_name`, `last_name`) (`last_name`, `first_name`)

    View full-size slide

  52. Concatenated Indexes
    first_name last_name
    Binh Nguyen
    Binh Dang
    Binh Pham
    Connor Chan
    Connor Tan
    last_name first_name
    Nguyen Huy
    Nguyen Trung
    Nguyen Tuan
    Tan Jonathan
    Tan Bob
    (`first_name`, `last_name`) (`
    SELECT * FROM clients WHERE first_name = `Binh` AND
    last_name = `Nguyen`

    View full-size slide

  53. Concatenated Indexes
    first_name last_name
    Binh Dang
    Binh Nguyen
    Binh Pham
    Connor Chan
    Connor Tan
    last_name first_name
    Nguyen Huy
    Nguyen Trung
    Nguyen Tuan
    Tan Jonathan
    Tan Bob
    (`first_name`, `last_name`) (`
    SELECT * FROM clients WHERE last_name = `Nguyen`
    ??

    View full-size slide

  54. Concatenated Indexes
    first_name last_name
    Binh Dang
    Binh Nguyen
    Binh Pham
    Connor Chan
    Connor Tan
    last_name first_name
    Nguyen Huy
    Nguyen Trung
    Nguyen Tuan
    Tan Jonathan
    Tan Bob
    (` (`last_name`, `first_name`)
    SELECT * FROM clients WHERE last_name = `Nguyen`

    View full-size slide

  55. How is this useful?
    • Useful when you have associations
    • e.g. client has many groups
    • Principle: index in a way such that the left-most index
    is always used.
    pk client_id group_id
    1 1 3
    2 1 4
    (client_id, group_id)

    View full-size slide

  56. Functions & Indexes
    • Some databases have functions.
    • e.g.: UPPER, LOWER
    • If you have an index of `name` and do this:
    • SELECT * FROM client WHERE UPPER(name) =
    `HUY NGUYEN`
    • will it use the index?
    • Answer: NO!

    View full-size slide

  57. Functions & Indexes
    • The index is unable to look at the result of functions
    • But in some DBs you can create a function-based
    index:
    • CREATE INDEX up_name ON clients
    (UPPER(name))
    • MySQL < 5.6 does not have function-based
    indexing. LOL

    View full-size slide

  58. Conclusion
    (Yay)

    View full-size slide

  59. What Did You Learn?
    • What is an index?
    • How do indexes work?
    • What does an indexed query consist of?
    • The EXPLAIN statement
    • Drawbacks of indexes
    • Concatenated indexes
    • Indexing database functions

    View full-size slide

  60. More Stuff
    • http://use-the-index-luke.com/
    • There is quite a bit more to indexes than this talk
    • But the basics are now covered
    • Enjoy speedier queries =)

    View full-size slide