Slide 1

Slide 1 text

How people build software ! " The Secret Life of SQL How to Optimize Database Performance @bryanaknight

Slide 2

Slide 2 text

How people build software ! What’s this talk about? Some “best practices” aren’t always “best practices” after all 2 !

Slide 3

Slide 3 text

How people build software ! 3 ! WHAT ARE INDEXES COMMON RULES OF INDEXING, QUERYING, DATA MODELING • Let’s visualize them • How do they work • What are they • When/Why do they break down • Tools you can use • How to move forward Overview ! !

Slide 4

Slide 4 text

How people build software ! 4 ! What even is an index? • An index is a copy of selected columns of data from a table that can be searched quickly and efficiently • Index is sorted • Has as many rows as the table (1 row in the index for 1 row in the table itself) • Indexes and tables are really the same thing - but you can get to the data you need in an index faster and then lookup whatever else you need in the main table knowing exactly where to look

Slide 5

Slide 5 text

How people build software ! 5 ! All about indexes.

Slide 6

Slide 6 text

How people build software ! 6 ! All about indexes. • Filter data • Sort data • Retrieve records • Get to the smallest amount of data the soonest

Slide 7

Slide 7 text

How people build software ! 7 ! All about indexes. • SELECT * from repositories WHERE owner_id > 500 • without any index, full table scan Repositories

Slide 8

Slide 8 text

How people build software ! 8 ! All about indexes. • SELECT * from repositories WHERE owner_id > 500 Index over owner_id Repositories

Slide 9

Slide 9 text

How people build software ! 9 ! All about indexes. Repositories Index over owner_id

Slide 10

Slide 10 text

How people build software ! 10 ! All about indexes. • SELECT name from repositories WHERE owner_id > 500 Repositories Index over owner_id_and_name

Slide 11

Slide 11 text

How people build software ! 11 ! All about indexes. • Index contents are used left to right • Example • ‘SELECT * FROM repositories where name = ‘rails’ ‘ would NOT use the index Index over owner_id_and_name

Slide 12

Slide 12 text

How people build software ! 12 ! • What are they • When/Why do they break down • Tools you can use • How to move forward COMMON RULES OF INDEXING, QUERYING, AND DATA MODELING

Slide 13

Slide 13 text

How people build software ! Rule #1 Any columns involved in queries should be covered by an index 13 !

Slide 14

Slide 14 text

How people build software ! Rule #1: Any columns in queries should be indexed 14 ! `index_labels_on_repository_id`

Slide 15

Slide 15 text

How people build software ! Rule #1: Any columns in queries should be indexed 15 ! `index_labels_on_repository_id_and_name` `index_labels_on_repository_id`

Slide 16

Slide 16 text

How people build software ! 16 ! Redundant indexes `index_labels_on_repository_id` `index_labels_on_repository_id_and_name`

Slide 17

Slide 17 text

How people build software ! 17 ! When to break the rule. • We’re not really breaking it, we’re adding a caveat of “unless there is already a covering index.” Rule #1: Any columns involved in queries should be covered by an index

Slide 18

Slide 18 text

How people build software ! 18 ! Why does it matter? • So we have a redundant index. So what? • Takes up precious space • Adding an index slows down UPDATEs, INSERTs and DELETEs, because those operations need to be performed on the table and the index

Slide 19

Slide 19 text

How people build software ! 19 ! Redundant indexes tools

Slide 20

Slide 20 text

How people build software ! Rule #1 Any columns involved in queries should be covered by an index* 20 ! *BUT avoid redundant or unused indexes *Sometimes it’s better to use an index prefix

Slide 21

Slide 21 text

How people build software ! 21 ! What is an index prefix? Index over repository name Index prefix over repository name

Slide 22

Slide 22 text

How people build software ! 22 ! Rule #1: Any columns in queries should be indexed “Give me the recent code reviewers for a certain code path”

Slide 23

Slide 23 text

How people build software ! 23 ! Rule #1: Any columns in queries should be indexed `index_pull_request_reviews_on_repository_id_and_path_and_created_at`

Slide 24

Slide 24 text

How people build software ! 24 ! Redundant indexes tools

Slide 25

Slide 25 text

How people build software ! 25 ! When to break the rule. • When should you use an index prefix instead of a regular index? • long datatype (text, blob, varbinary) • querying the data works well with a prefix (i.e. usernames starting with ‘a’) Rule #1: Any columns involved in queries should be covered by an index

Slide 26

Slide 26 text

How people build software ! 26 ! Sometimes it’s better to use an index prefix • How long should we make the prefix? • long enough to differentiate values within this field • base it off of real data if possible “I chose 255 bytes as the prefix because the longest path in github/github is 199 bytes”

Slide 27

Slide 27 text

How people build software ! 27 ! Sometimes it’s better to use an index prefix • What do we gain from using an index prefix? • less space required for the index • ability to index big data types • comparable performance improvement as a full index if length calculated thoughtfully or query tuned for index prefix

Slide 28

Slide 28 text

How people build software ! Rule #1 Any columns involved in queries should be covered by an index* 28 ! *BUT avoid redundant or unused indexes *Sometimes it’s better to use an index prefix

Slide 29

Slide 29 text

How people build software ! Rule #2 29 ! Use an OR to return records satisfying one or more of several conditions

Slide 30

Slide 30 text

How people build software ! Rule #2: Use an “OR” 30 ! “SELECT MAX(id) FROM users WHERE spammy=1 OR suspended_at IS NOT NULL”

Slide 31

Slide 31 text

How people build software ! EXPLAIN it to me, MySQL 31 ! •MySQL explains how it would process the statements •How are tables JOINed and in which order •Which indexes, if any, are used

Slide 32

Slide 32 text

How people build software ! EXPLAIN it to me, MySQL 32 !

Slide 33

Slide 33 text

How people build software ! What indexes are we working with? 33 ! • Indexes with columns in our query: • `index_users_on_spammy` (`spammy`) • `index_users_on_suspended_at` (`suspended_at`) • the `OR` operator is limited because MySQL can only use one index per table during a query, so it chooses to use neither

Slide 34

Slide 34 text

How people build software ! When to break the rule. 34 ! • The table being queried isn’t small, so a full table scan is not performant • The use of the ‘OR’ is preventing any index from being used Rule #2: Use an OR to return records satisfying one or more of several conditions

Slide 35

Slide 35 text

How people build software ! Rule #2: Use “OR” “UNION” 35 !

Slide 36

Slide 36 text

How people build software ! Rule #2: Use an “OR” 36 ! “SELECT MAX(id) FROM users WHERE spammy=1 OR suspended_at IS NOT NULL” PostgreSQL has an index merge feature! .01 sec…yay!

Slide 37

Slide 37 text

How people build software ! Rule #2 Use an OR to return records satisfying one of more of several conditions* 37 ! *except when the OR is preventing an index from being used and you have to do a full table scan

Slide 38

Slide 38 text

How people build software ! Rule #3 If there’s an index over all the fields in your query, you’re all set 38 !

Slide 39

Slide 39 text

How people build software ! 39 ! Rule #3: If there’s an index, you’re all set “Give me the commit comments for this repository, ordered by id”

Slide 40

Slide 40 text

How people build software ! 40 ! Rule #3: If there’s an index, you’re all set • EXPLAIN it to me, MySQL `index_commit_comments_on_repository_id_and_user_id`

Slide 41

Slide 41 text

How people build software ! 41 ! Rule #3: If there’s an index, you’re all set • MySQL thinks its better to scan the PK index just so that the rows are generated in the right order

Slide 42

Slide 42 text

How people build software ! When to break the rule. 42 ! • MySQL is choosing to use an index that isn’t performant, or no index at all when one is available Rule #3: If there’s an index over all the fields in your query, you’re all set

Slide 43

Slide 43 text

How people build software ! 43 ! Let’s help MySQL Out!

Slide 44

Slide 44 text

How people build software ! 44 ! Rule #3: If there’s an index, you’re all set • Other options • USE INDEX - more of a hint • IGNORE INDEX - explicitly state which index not to use, but let MySQL decide from the remaining indexes

Slide 45

Slide 45 text

How people build software ! 45 # ! FUTURE PROOFING POSTGRESQL DOESN’T HAVE THEM • Beware if you use one and the index is deleted • Sorry! Index Hint gotchas. #

Slide 46

Slide 46 text

How people build software ! Rule #3 If there’s an index over all the fields in your query, you’re all set 46 ! *except when the query planner doesn’t know to use the index, in which case, help it out

Slide 47

Slide 47 text

How people build software ! Rule #4 Avoid redundant data across tables 47 !

Slide 48

Slide 48 text

How people build software ! A normalized database 48 !

Slide 49

Slide 49 text

How people build software ! A normalized database 49 ! “Give me all these Pull Requests, excluding those opened by spammy user” “Give me this group of repositories, except those owned by a spammy user” “Give me all these users’ gists, unless the user is spammy”

Slide 50

Slide 50 text

How people build software ! When to break the rule. 50 ! • Additional reads (JOINs) are causing noticeable performance degradation •High ratio of reads to writes Rule #4: Avoid redundant data across tables

Slide 51

Slide 51 text

How people build software ! A denormalized database 51 !

Slide 52

Slide 52 text

How people build software ! Rule #4: normalize denormalize data 52 ! “Give me this issue’s comments from non-spammy users”

Slide 53

Slide 53 text

How people build software ! Results. 53 !

Slide 54

Slide 54 text

How people build software ! 54 # ! DATABASE CHANGES DATA QUALITY • Lots of migrations to add new column to tables • Data transitions to backfill data • The `user_hidden` column is updated in the tables via a background job after a user is marked as spammy • Nightly job to resolve any mismatches b/w the tables Gotchas. #

Slide 55

Slide 55 text

How people build software ! Rule #4 Avoid redundant data across tables* 55 ! *except when you are JOINing on another table for almost every request, and it’s getting costly, and you have a high volume of reads to writes for this data

Slide 56

Slide 56 text

How people build software ! 56 ! Recap • Rule #1: Any columns involved in queries should be covered by an index • *but avoid redundant and unused indexes • *sometimes an index prefix is needed and sufficient • Rule #2: Use an OR to return records satisfying one or more of several conditions • *but a UNION might be necessary to exploit the indexes • Rule #3: If there’s an index over all the fields in your query, you’re all set • *if MySQL is making the wrong choice, help it make the right one • Rule #4: Avoid redundant data across tables • *except when reads are slow, read/write ratio is high, you can denormalize

Slide 57

Slide 57 text

How people build software ! 57 ! Takeaways • Index, but don’t over index • Tune your query to exploit the indexes that give you the best results • There are tools to help you • explain queries • examine existing columns and indexes • You can “do everything right” and still have performance problems

Slide 58

Slide 58 text

How people build software ! " come work with us! bit.ly/platform-data Thanks!