Slide 1

Slide 1 text

MySQL Query Optimisation

Slide 2

Slide 2 text

Who? Adrian Hardy (@adrianhardy) Writing high performance business systems for ten years

Slide 3

Slide 3 text

The secret to query optimisation A large, slow dataset A basic understanding of how MySQL works Terminology Resources Time

Slide 4

Slide 4 text

What are we trying to achieve? Make stuff faster Shield calling applications from changes

Slide 5

Slide 5 text

Identify

Slide 6

Slide 6 text

Identify - Your workflow and tools From the command line s s h p h p n w 1 3 - { 1 , 2 , 3 , 4 , . . . } @ 1 0 . 0 . 0 . 2 (password = phpnw13) s u d o a t o p From inside a MySQL session [ p h p n w 1 3 - 1 @ s e r v e r ~ ] $ m y s q l - u p h p n w 1 3 - 1 - p (password = phpnw13) S H O W F U L L P R O C E S S L I S T \ G

Slide 7

Slide 7 text

Exercise 1 - your environment System Load [ p h p n w 1 3 @ s e r v e r ~ ] $ s u d o a t o p Press d MySQL [ p h p n w 1 3 @ s e r v e r ~ ] $ m y s q l - u $ u s e r n a m e (when prompted, password = password) m y s q l > s h o w f u l l p r o c e s s l i s t \ G m y s q l > q u i t

Slide 8

Slide 8 text

Exercise 1a - Experience a slap m y s q l s l a p is a very simple MySQL stress testing tool. It runs a set of queries in parallel, hammering your server System Load [ p h p n w 1 3 @ s e r v e r ~ ] $ s u d o a t o p Press d MySQL [ p h p n w 1 3 @ s e r v e r ~ ] $ m y s q l - u $ u s e r n a m e m y s q l > s h o w f u l l p r o c e s s l i s t \ G

Slide 9

Slide 9 text

Identify - Slow Query Log Feels a little like "Percona" product placement Drop in replacement for MySQL/MariaDB - no migration process Lots of cool diagnostic tools & performance improvements Broadly kept in line with upstream, available as packages For the risk averse - can just be used as a dev tool Some of improvments in Percona are available in MariaDB

Slide 10

Slide 10 text

Exercise 2 - slow query log [ p h p n w 1 3 @ s e r v e r ~ ] $ l e s s / v a r / l o g / m y s q l d / s l o w . l o g

Slide 11

Slide 11 text

Percona's Slow Query Log Statistics Available Query_time Lock_time Rows_sent Rows_examined * QC_Hit * Full_scan * Tmp_table (_on_disk) * Filesort (_on_disk) and many more!

Slide 12

Slide 12 text

Percona's pt-query-digest T i m e r a n g e : 2 0 1 3 - 0 7 - 2 1 0 9 : 2 3 : 0 2 t o 2 0 1 3 - 0 9 - 0 2 1 1 : 2 4 : 3 3 A t t r i b u t e p c t t o t a l m i n m a x a v g 9 5 % s t d d e v m e d i a n = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = C o u n t 6 2 2 5 E x e c t i m e 7 5 5 7 5 s 1 8 s 3 5 s 2 5 s 3 2 s 5 s 2 5 s L o c k t i m e 0 3 5 m s 1 0 3 u s 2 1 6 u s 1 5 7 u s 1 6 7 u s 1 1 u s 1 5 2 u s R o w s s e n t 0 8 9 . 4 2 k 3 1 3 5 3 5 4 0 6 . 9 4 5 1 1 . 4 5 7 3 . 3 3 4 2 0 . 7 7 R o w s e x a m i n e 9 1 . 7 3 G 5 . 8 8 M 1 0 . 6 2 M 7 . 8 6 M 9 . 7 6 M 1 . 4 7 M 8 . 0 3 M B y t e s s e n t 0 2 3 . 1 5 M 8 0 . 6 8 k 1 4 0 . 0 9 k 1 0 5 . 3 5 k 1 3 0 . 0 4 k 1 8 . 9 2 k 1 0 6 . 9 9 k T m p t a b l e s 0 2 2 5 1 1 1 1 0 1 T m p d i s k t b l 1 0 2 2 5 1 1 1 1 0 1 T m p t b l s i z e 4 2 1 2 . 9 6 M 7 4 5 . 8 7 k 1 . 2 5 M 9 6 9 . 2 2 k 1 . 2 0 M 1 8 5 . 7 3 k 1 0 0 9 . 3 3 k Q u e r y s i z e 1 2 4 2 . 4 4 k 1 . 0 7 k 1 . 1 1 k 1 . 0 8 k 1 . 0 9 k 1 9 . 7 1 1 . 0 4 k

Slide 13

Slide 13 text

Exercise 3 - pt-query-digest aka "cheating" Task 1 [ p h p n w 1 3 @ s e r v e r ~ ] $ p t - q u e r y - d i g e s t / v a r / l i b / m y s q l / s l o w . l o g | l e s s Task 2 [ p h p n w 1 3 @ s e r v e r ~ ] $ p t - q u e r y - d i g e s t - - r e v i e w \ u = p h p n w 1 3 - 1 , D = p h p n w 1 3 - 1 , t = r e v i e w _ m e , h = l o c a l h o s t \ / v a r / l i b / m y s q l / s l o w . l o g

Slide 14

Slide 14 text

Exercise 3a - pt-query-advisor aka "really cheating" Task 3 [ p h p n w 1 3 @ s e r v e r ~ ] $ . / p t - q u e r y - a d v i s o r - - r e v i e w u = r o o t , h = l o c a l h o s t , D = t e m p , t = r e v i e w _ m e Some hints A R G . 0 0 1 You've got a LIKE comparison with a % prefix C L A . 0 0 1 You've issued a SELECT without a WHERE clause C L A . 0 0 3 LIMIT,OFFSET used (bad news, unfortunately)

Slide 15

Slide 15 text

"Identify" - Review Is there a problem, and is it MySQL? Is it caught by the slow query log? If it's in the slow query log, it's a problem - no scope for "works on my machine" What is the scale of problem we're dealing with? Do we have the luxury of reproducing it locally? Do we care? Is it easier to ignore the problem?

Slide 16

Slide 16 text

Improve

Slide 17

Slide 17 text

Dive in to my.cnf Get started by visiting tools.percona.com/wizard Get started using mysqltuner.pl The manual is a fantastic resource with gems like: If you see many Sort_merge_passes per second in SHOW GLOBAL STATUS output, you can consider increasing the sort_buffer_size value to speed up ORDER BY or GROUP BY operations that cannot be improved with query optimization or improved indexing. Any variable in the MySQL "Server System Variables" section of the manual marked as "Dynamic" can be changed live. You can capture "real" query load using g e n e r a l _ l o g and replay locally using m y s q l s l a p

Slide 18

Slide 18 text

Dive in to my.cnf This sounds suspiciously like "RTFM" [ p h p n w 1 3 @ s e r v e r ~ ] $ l e s s / e t c / m y . c n f What to tune in MySQL server after installation Section 5.1.6. Server Status Variables Section 5.1.4. Server System Variables

Slide 19

Slide 19 text

Dive in to my.cnf Sounds obvious but ... be scientific F L U S H S T A T U S ; Wait 10 minutes.... S H O W G L O B A L S T A T U S L I K E ' % m e r g e % ' ; S H O W G L O B A L S T A T U S L I K E ' % t m p % ' ; S E T s o r t _ b u f f e r _ s i z e = 8 M ; S E T t m p _ t a b l e _ s i z e = 3 2 M ; S E T m a x _ h e a p _ t a b l e _ s i z e = 3 2 M ; F L U S H S T A T U S ; Check again in 10 minutes...

Slide 20

Slide 20 text

How to use E X P L A I N You prefix your S E L E C T query with E X P L A I N (As of 5.6, you can now explain INSERTs and UPDATEs) That's it ... (Seriously. E X P L A I N ALL the queries!) Add an index E X P L A I N again Take the rest of the week off

Slide 21

Slide 21 text

Example output from EXPLAIN E X P L A I N S E L E C T * F R O M p a y m e n t W H E R E p a y m e n t _ d a t e = ' 2 0 0 5 - 0 8 - 0 1 ' s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E p a y m e n t A L L N U L L N U L L N U L L N U L L 1 6 5 9 1 U s i n g W h e r e Let's take a look at MySQL's analysis: It found a WHERE clause? Cheers, Sherlock (split architecture) MySQL estimates that it'll have to scan through 16k rows Note that the estimate is approximate - 16,049 rows There are no suitable candidates for indexes (or keys) There was no key used, so it did a "Full Table Scan"

Slide 22

Slide 22 text

Predict bad queries Proof at last! Even though last_name is indexed, we can now prove that % s t r i n g % searches are bad E X P L A I N S E L E C T * F R O M c u s t o m e r W H E R E l a s t _ n a m e L I K E ' % o n e % ' s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E c u s t o m e r A L L N U L L N U L L N U L L N U L L 6 6 1 U s i n g W h e r e Again, this query resulted in a "Full Table Scan"

Slide 23

Slide 23 text

"Full Table Scan" Every row on the table is fetched during scanning Any filtering is done by "MySQL" rather than the storage engine Table might not be in memory - disk can be contended Things like "country lists" for are an exception

Slide 24

Slide 24 text

Exercise 4 - our first index Let's remember that the S E L E C T on p a y m e n t triggered a scan of 16,000 rows Add the carefully crafted index A L T E R T A B L E p a y m e n t A D D I N D E X d a t e _ o n l y ( p a y m e n t _ d a t e ) ; E X P L A I N again E X P L A I N S E L E C T * F R O M p a y m e n t W H E R E p a y m e n t _ d a t e = ' 2 0 0 5 - 0 8 - 0 1 ' s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E e m p l o y e e s r e f d a t e _ o n l y d a t e _ o n l y 3 c o n s t 6 7 1 U s i n g W h e r e

Slide 25

Slide 25 text

Compound indexes Compound indexes comprise more than one field It's important to note that their definition is order sensitive You can use part of a compound index too - but the index must be used "contiguously"

Slide 26

Slide 26 text

Exercise 5: Brute-indexing Take the following query S E L E C T * F R O M f i l m W H E R E r a t i n g I N ( ' G ' , ' P G ' , ' P G - 1 3 ' ) A N D l e n g t h < 8 0 ; Add indexes and EXPLAIN A L T E R T A B L E f i l m A D D I N D E X i d x _ r a t i n g ( r a t i n g ) ; - - a n d n o w E X P L A I N A L T E R T A B L E f i l m A D D I N D E X i d x _ l e n g t h ( l e n g t h ) ; - - a n d n o w E X P L A I N A L T E R T A B L E f i l m A D D I N D E X i d x _ r a t i n g _ l e n g t h ( r a t i n g , l e n g t h ) ; - - E X P L A I N ! There's no shame in brute forcing combos

Slide 27

Slide 27 text

Exercise 5a: Brute-indexing So compound indexes are better, right? A L T E R T A B L E f i l m D R O P I N D E X i d x _ r a t i n g , D R O P I N D E X i d x _ l e n g t h ; E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h < 8 0 ; E X P L A I N S E L E C T * F R O M f i l m W H E R E r a t i n g I N ( ' G ' ) E X P L A I N S E L E C T * F R O M f i l m W H E R E r a t i n g I N ( ' G ' ) A N D l e n g t h < 8 0 "key_len" Can tell us when partial indexes are used Indexes must be used "left-to-right"

Slide 28

Slide 28 text

Exercise 5b: Multi-range scans S H O W I N D E X E S I N f i l m ; - - s h o u l d o n l y h a v e i d x _ r a t i n g _ l e n g t h E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 7 0 A N D r a t i n g = ' G ' ; A L T E R T A B L E f i l m D R O P I N D E X i d x _ r a t i n g _ l e n g t h , C R E A T E I N D E X i d x _ l e n g t h _ r a t i n g ( l e n g t h , r a t i n g ) ; E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 7 0 A N D r a t i n g = ' G ' ; Notice the reduced key_len - Multi-range scans!

Slide 29

Slide 29 text

Exercise 5c: Multi-range scans 2 A better example S E L E C T * F R O M r e n t a l W H E R E r e n t a l _ d a t e > ' 2 0 0 5 - 0 5 - 3 0 ' A N D r e t u r n _ d a t e < ' 2 0 0 5 - 0 6 - 0 6 ' ; Find all rentals which took place within a given week No matter how you index, MySQL will only use the first range scan

Slide 30

Slide 30 text

Exercise 5d: Cardinality E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 7 0 ; E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 5 0 ; MySQL CBA using an index with the last query "If I'm going to have to scan > 30% of the index, don't use an index" S E L E C T * F R O M f i l m F O R C E I N D E X ( l e n g t h _ o n l y ) W H E R E l e n g t h > 1 5 0

Slide 31

Slide 31 text

Index "Cardinality" MySQL makes decisions on which indexes to use in a fraction of a second - cardinality Cardinality refers to the uniqueness of values contained in a particular column. A column with the lowest possible cardinality would have the same value for every row - Wikipedia S H O W I N D E X E S I N t a b l e _ n a m e A N A L Y Z E T A B L E t a b l e _ n a m e ; - - u p d a t e s / g u e s s e s c a r d i n a l i t y

Slide 32

Slide 32 text

Exercise 5e: Custom Index length Index an n-char prefix of a column This isn't really about speed, but only indexing what you need A L T E R T A B L E s o _ u s e r A D D I N D E X i d x _ n a m e ( d i s p l a y _ n a m e ) ; S E L E C T * F R O M s o _ u s e r W H E R E d i s p l a y _ n a m e L I K E ' A d r i a n % ' A L T E R T A B L E s o _ u s e r A D D I N D E X i d x _ n a m e ( d i s p l a y _ n a m e ( 6 ) ) ; S E L E C T * F R O M s o _ u s e r W H E R E d i s p l a y _ n a m e L I K E ' A d r i a n % ' Look at the key_len column change

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

Exercise 5: Indexing review Compound indexes allow us to index over more than one WHERE clause Index definitions (and to some degree, their usage) is order sensitve MySQL will allow you to use a sub-set of an index Which leads to "Indexing Strategy" Analyse your own queries and order index definition such that you hit indexes in the majority of cases Index all the things? Cardinality can give us frustrating outcomes Multi Range scans are a bitch

Slide 35

Slide 35 text

E X P L A I N works for JOINs too E X P L A I N S E L E C T * F R O M c u s t o m e r L E F T J O I N a d d r e s s U S I N G ( a d d r e s s _ i d ) ; s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E c u s t o m e r A L L N U L L N U L L N U L L N U L L 6 6 1 N U L L S I M P L E a d d r e s s e q _ r e f P R I M A R Y P R I M A R Y 2 c u s t . a d d r e s s _ i d 1 N U L L Rough Interpretation MySQL is going to start with scanning ALL customers and for each one it attempts to join on to address. Fortunately, MySQL doesn't have to work hard to find the address for each customer. Nested-loop approach - aka don't make JOINs work hard Pro-tip: ORDER / GROUP BY over two tables = bad times

Slide 36

Slide 36 text

MySQL will change the order of your JOINs Not terribly useful, just interesting E X P L A I N S E L E C T * F R O M c u s t o m e r I N N E R J O I N a d d r e s s U S I N G ( a d d r e s s _ i d ) ; s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E a d d r e s s A L L P R I M A R Y N U L L N U L L N U L L 6 2 8 N U L L S I M P L E c u s t o m e r r e f f k _ a d d r _ i d f k _ a d d r _ i d 2 a d d r . a d d r e s s _ i d 1 N U L L Rough Interpretation Because the INNER JOIN means that the address must exist for the customer, MySQL switches the order of JOINs

Slide 37

Slide 37 text

Trigger Index Optimisation A L T E R T A B L E s o _ u s e r A D D I N D E X i d x _ r e p u t a t i o n ( r e p u t a t i o n ) ; E X P L A I N S E L E C T * F R O M s o _ u s e r W H E R E r e p u t a t i o n > 2 5 0 0 E X P L A I N S E L E C T i d F R O M s o _ u s e r W H E R E r e p u t a t i o n > 2 5 0 0 Using Index = good stuff MySQL did not have to go to the table (which might not be loaded) to return the results to the client i d x _ r e p u t a t i o n is now a "covering index" Compound indexes also work as covering indexes, e.g. (reputation,display_name) But this is at odds with our "reduce index lengths" maxim

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Trigger Index Optimisation Can we avoid temp tables through indexing? If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created. (and four more conditions) No :(

Slide 40

Slide 40 text

Trigger Index Optimisations Avoiding "filesort" A quick reminder on filesorts... Sample Query S E L E C T * F R O M s a k i l a . r e n t a l W H E R E r e t u r n _ d a t e B E T W E E N ' 2 0 0 5 - 0 1 - 0 1 ' A N D ' 2 0 0 5 - 0 6 - 0 1 ' O R D E R B Y r e t u r n _ d a t e A S C We can avoid a file sort More at 8.3.1.11 ORDER BY Optimisations Documentation details very specific circumstances where an index will be used

Slide 41

Slide 41 text

Large datasets and L I M I T You have a large dataset, and you want to paginate Technically, without an ORDER BY, LIMIT is "non-deterministic", so we really should have an ORDER BY A L T E R T A B L E c o m m e n t A D D I N D E X i d x _ c r e a t e d ( c r e a t e d ) ; E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 0 0 , 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 0 0 0 0 , 2 5

Slide 42

Slide 42 text

Large datasets and L I M I T Manual last row pagination (dark territory) E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t W H E R E i d > 2 0 0 O R D E R B Y c r e a t e d L I M I T 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t W H E R E i d > 2 0 0 0 0 O R D E R B Y c r e a t e d L I M I T 2 5

Slide 43

Slide 43 text

Large datasets and L I M I T Delayed JOIN (very dark) S E L E C T * F R O M p o s t s I N N E R J O I N ( S E L E C T i d F R O M p o s t s O R D E R B Y c r e a t e d L I M I T 2 0 0 0 0 , 2 5 ) A S t m p U S I N G ( i d ) The sub query is very fast ("using index") which makes the outer select's job easy

Slide 44

Slide 44 text

O R D E R B Y R A N D ( ) : discuss S E L E C T * F R O M f i l m O R D E R B Y R A N D ( ) L I M I T 1 How does this "Random Film" feature work Using what we now know about Full Table Scans, filesorts and limiting, why is the above query bad? RAND() forces all rows to have a function called ORDER BY forces a temporary table and filesort LIMIT ditches 99% of the temporary table!

Slide 45

Slide 45 text

O R D E R B Y R A N D ( ) : a fix ..?! Strategy 1 - maintain a list Use your app to find valid IDs randomly (i.e. 1 Query to identify valid ID, another to execute on PK) Maintain a list (somewhere) of valid PKs (i.e. a cron job, or a trigger)

Slide 46

Slide 46 text

O R D E R B Y R A N D ( ) : a fix ..?! Strategy 2 - maintain a different list Maintain an extra column containing a random number, which is indexed Use a similar approach to our cheap LIMIT hack S E L E C T * F R O M f i l m W H E R E i d > $ r a n d o m L I M I T 1

Slide 47

Slide 47 text

O R D E R B Y R A N D ( ) : a fix ..?! Strategy 3 - Level up S E L E C T * F R O M f i l m J O I N ( S E L E C T C E I L ( R A N D ( ) * ( S E L E C T M A X ( f i l m _ i d ) F R O M f i l m ) ) A S f i l m _ i d ) A S r 2 U S I N G ( f i l m _ i d ) ; O(1) method for selecting a single random record from the DB Courtesy of jan.kneschke.de/projects/mysql/order-by-rand/ No-one can write this from memory - this would've taken hours of research

Slide 48

Slide 48 text

No content

Slide 49

Slide 49 text

Modify the query or data Clearly more disruptive to the application E.g. Find all rentals which took place on a Saturday or Sunday E X P L A I N S E L E C T * F R O M r e n t a l W H E R E D A T E _ F O R M A T ( r e n t a l _ d a t e , ' % w ' ) I N ( 0 , 6 ) ; s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E r e n t a l A L L N U L L N U L L N U L L N U L L 1 6 4 8 6 U s i n g w h e r e Problem? MySQL must execute the DATE_FORMAT function on every row first, before cutting down the resultset Solutions? Pre-calculate the Saturdays and Sundays and use IN(...) Summarise the weekday in another column, and make sure that's indexed

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

Modify data - how? A L T E R T A B L E r e n t a l A D D C O L U M N w e e k d a y I N T ( 1 ) N O T N U L L D E F A U L T 0 ; A L T E R T A B L E r e n t a l A D D I N D E X i d x _ w e e k d a y ( w e e k d a y ) ; C R E A T E T R I G G E R t r g _ w e e k d a y _ u p B E F O R E U P D A T E O N r e n t a l F O R E A C H R O W S E T N E W . w e e k d a y = D A T E _ F O R M A T ( N E W . r e n t a l _ d a t e , ' % w ' ) ; E X P L A I N S E L E C T * F R O M r e n t a l W H E R E D A T E _ F O R M A T ( r e n t a l _ d a t e , ' % w ' ) I N ( 0 , 6 ) ;

Slide 52

Slide 52 text

Modify the query/data (story time) The problem Imagine a large, hand-rolled ecommerce system (7 years ago) Thousands and thousands of products Each product is tagged, categorised, searchable on title/description Tags and categories used many-to-many link tables The solution Create a summary table - denormalise the data Maintain summary table with triggers or (lazy) cron jobs Abuse full text indexes in MySQL _ T A G _ 2 _ _ T A G _ 3 4 _ _ T A G _ 5 6 _ _ C A T E G O R Y _ 4 5 _ P r o d u c t n a m e a n d d e s c r i p t i o n . . .

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

Questions

Slide 55

Slide 55 text

Tools & Resources Resources Section 8.8.2. Explain Output Format High Performance MySQL (3rd Edition) mysqlperformanceblog.com Everything that Percona ever made (e.g p t - v i s u a l - e x p l a i n , p t - q u e r y - d i g e s t ) New in 5.6 - Optimizer Tracing (mindblown)