Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL Query Optimisation

adrianhardy
October 04, 2013

MySQL Query Optimisation

An overview of the working patterns, tools and approaches to getting the most out of your MySQL queries.

adrianhardy

October 04, 2013
Tweet

More Decks by adrianhardy

Other Decks in Technology

Transcript

  1. The secret to query optimisation A large, slow dataset A

    basic understanding of how MySQL works Terminology Resources Time
  2. Identify - Your workflow and tools From the command line

    s s h p h p n w 1 3 - { 1 , 2 , 3 , 4 , . . . } @ 1 0 . 0 . 0 . 2 (password = phpnw13) s u d o a t o p From inside a MySQL session [ p h p n w 1 3 - 1 @ s e r v e r ~ ] $ m y s q l - u p h p n w 1 3 - 1 - p (password = phpnw13) S H O W F U L L P R O C E S S L I S T \ G
  3. Exercise 1 - your environment System Load [ p h

    p n w 1 3 @ s e r v e r ~ ] $ s u d o a t o p Press d MySQL [ p h p n w 1 3 @ s e r v e r ~ ] $ m y s q l - u $ u s e r n a m e (when prompted, password = password) m y s q l > s h o w f u l l p r o c e s s l i s t \ G m y s q l > q u i t
  4. Exercise 1a - Experience a slap m y s q

    l s l a p is a very simple MySQL stress testing tool. It runs a set of queries in parallel, hammering your server System Load [ p h p n w 1 3 @ s e r v e r ~ ] $ s u d o a t o p Press d MySQL [ p h p n w 1 3 @ s e r v e r ~ ] $ m y s q l - u $ u s e r n a m e m y s q l > s h o w f u l l p r o c e s s l i s t \ G
  5. Identify - Slow Query Log Feels a little like "Percona"

    product placement Drop in replacement for MySQL/MariaDB - no migration process Lots of cool diagnostic tools & performance improvements Broadly kept in line with upstream, available as packages For the risk averse - can just be used as a dev tool Some of improvments in Percona are available in MariaDB
  6. Exercise 2 - slow query log [ p h p

    n w 1 3 @ s e r v e r ~ ] $ l e s s / v a r / l o g / m y s q l d / s l o w . l o g
  7. Percona's Slow Query Log Statistics Available Query_time Lock_time Rows_sent Rows_examined

    * QC_Hit * Full_scan * Tmp_table (_on_disk) * Filesort (_on_disk) and many more!
  8. Percona's pt-query-digest T i m e r a n g

    e : 2 0 1 3 - 0 7 - 2 1 0 9 : 2 3 : 0 2 t o 2 0 1 3 - 0 9 - 0 2 1 1 : 2 4 : 3 3 A t t r i b u t e p c t t o t a l m i n m a x a v g 9 5 % s t d d e v m e d i a n = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = C o u n t 6 2 2 5 E x e c t i m e 7 5 5 7 5 s 1 8 s 3 5 s 2 5 s 3 2 s 5 s 2 5 s L o c k t i m e 0 3 5 m s 1 0 3 u s 2 1 6 u s 1 5 7 u s 1 6 7 u s 1 1 u s 1 5 2 u s R o w s s e n t 0 8 9 . 4 2 k 3 1 3 5 3 5 4 0 6 . 9 4 5 1 1 . 4 5 7 3 . 3 3 4 2 0 . 7 7 R o w s e x a m i n e 9 1 . 7 3 G 5 . 8 8 M 1 0 . 6 2 M 7 . 8 6 M 9 . 7 6 M 1 . 4 7 M 8 . 0 3 M B y t e s s e n t 0 2 3 . 1 5 M 8 0 . 6 8 k 1 4 0 . 0 9 k 1 0 5 . 3 5 k 1 3 0 . 0 4 k 1 8 . 9 2 k 1 0 6 . 9 9 k T m p t a b l e s 0 2 2 5 1 1 1 1 0 1 T m p d i s k t b l 1 0 2 2 5 1 1 1 1 0 1 T m p t b l s i z e 4 2 1 2 . 9 6 M 7 4 5 . 8 7 k 1 . 2 5 M 9 6 9 . 2 2 k 1 . 2 0 M 1 8 5 . 7 3 k 1 0 0 9 . 3 3 k Q u e r y s i z e 1 2 4 2 . 4 4 k 1 . 0 7 k 1 . 1 1 k 1 . 0 8 k 1 . 0 9 k 1 9 . 7 1 1 . 0 4 k
  9. Exercise 3 - pt-query-digest aka "cheating" Task 1 [ p

    h p n w 1 3 @ s e r v e r ~ ] $ p t - q u e r y - d i g e s t / v a r / l i b / m y s q l / s l o w . l o g | l e s s Task 2 [ p h p n w 1 3 @ s e r v e r ~ ] $ p t - q u e r y - d i g e s t - - r e v i e w \ u = p h p n w 1 3 - 1 , D = p h p n w 1 3 - 1 , t = r e v i e w _ m e , h = l o c a l h o s t \ / v a r / l i b / m y s q l / s l o w . l o g
  10. Exercise 3a - pt-query-advisor aka "really cheating" Task 3 [

    p h p n w 1 3 @ s e r v e r ~ ] $ . / p t - q u e r y - a d v i s o r - - r e v i e w u = r o o t , h = l o c a l h o s t , D = t e m p , t = r e v i e w _ m e Some hints A R G . 0 0 1 You've got a LIKE comparison with a % prefix C L A . 0 0 1 You've issued a SELECT without a WHERE clause C L A . 0 0 3 LIMIT,OFFSET used (bad news, unfortunately)
  11. "Identify" - Review Is there a problem, and is it

    MySQL? Is it caught by the slow query log? If it's in the slow query log, it's a problem - no scope for "works on my machine" What is the scale of problem we're dealing with? Do we have the luxury of reproducing it locally? Do we care? Is it easier to ignore the problem?
  12. Dive in to my.cnf Get started by visiting tools.percona.com/wizard Get

    started using mysqltuner.pl The manual is a fantastic resource with gems like: If you see many Sort_merge_passes per second in SHOW GLOBAL STATUS output, you can consider increasing the sort_buffer_size value to speed up ORDER BY or GROUP BY operations that cannot be improved with query optimization or improved indexing. Any variable in the MySQL "Server System Variables" section of the manual marked as "Dynamic" can be changed live. You can capture "real" query load using g e n e r a l _ l o g and replay locally using m y s q l s l a p
  13. Dive in to my.cnf This sounds suspiciously like "RTFM" [

    p h p n w 1 3 @ s e r v e r ~ ] $ l e s s / e t c / m y . c n f What to tune in MySQL server after installation Section 5.1.6. Server Status Variables Section 5.1.4. Server System Variables
  14. Dive in to my.cnf Sounds obvious but ... be scientific

    F L U S H S T A T U S ; Wait 10 minutes.... S H O W G L O B A L S T A T U S L I K E ' % m e r g e % ' ; S H O W G L O B A L S T A T U S L I K E ' % t m p % ' ; S E T s o r t _ b u f f e r _ s i z e = 8 M ; S E T t m p _ t a b l e _ s i z e = 3 2 M ; S E T m a x _ h e a p _ t a b l e _ s i z e = 3 2 M ; F L U S H S T A T U S ; Check again in 10 minutes...
  15. How to use E X P L A I N

    You prefix your S E L E C T query with E X P L A I N (As of 5.6, you can now explain INSERTs and UPDATEs) That's it ... (Seriously. E X P L A I N ALL the queries!) Add an index E X P L A I N again Take the rest of the week off
  16. Example output from EXPLAIN E X P L A I

    N S E L E C T * F R O M p a y m e n t W H E R E p a y m e n t _ d a t e = ' 2 0 0 5 - 0 8 - 0 1 ' s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E p a y m e n t A L L N U L L N U L L N U L L N U L L 1 6 5 9 1 U s i n g W h e r e Let's take a look at MySQL's analysis: It found a WHERE clause? Cheers, Sherlock (split architecture) MySQL estimates that it'll have to scan through 16k rows Note that the estimate is approximate - 16,049 rows There are no suitable candidates for indexes (or keys) There was no key used, so it did a "Full Table Scan"
  17. Predict bad queries Proof at last! Even though last_name is

    indexed, we can now prove that % s t r i n g % searches are bad E X P L A I N S E L E C T * F R O M c u s t o m e r W H E R E l a s t _ n a m e L I K E ' % o n e % ' s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E c u s t o m e r A L L N U L L N U L L N U L L N U L L 6 6 1 U s i n g W h e r e Again, this query resulted in a "Full Table Scan"
  18. "Full Table Scan" Every row on the table is fetched

    during scanning Any filtering is done by "MySQL" rather than the storage engine Table might not be in memory - disk can be contended Things like "country lists" for <select> are an exception
  19. Exercise 4 - our first index Let's remember that the

    S E L E C T on p a y m e n t triggered a scan of 16,000 rows Add the carefully crafted index A L T E R T A B L E p a y m e n t A D D I N D E X d a t e _ o n l y ( p a y m e n t _ d a t e ) ; E X P L A I N again E X P L A I N S E L E C T * F R O M p a y m e n t W H E R E p a y m e n t _ d a t e = ' 2 0 0 5 - 0 8 - 0 1 ' s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E e m p l o y e e s r e f d a t e _ o n l y d a t e _ o n l y 3 c o n s t 6 7 1 U s i n g W h e r e
  20. Compound indexes Compound indexes comprise more than one field It's

    important to note that their definition is order sensitive You can use part of a compound index too - but the index must be used "contiguously"
  21. Exercise 5: Brute-indexing Take the following query S E L

    E C T * F R O M f i l m W H E R E r a t i n g I N ( ' G ' , ' P G ' , ' P G - 1 3 ' ) A N D l e n g t h < 8 0 ; Add indexes and EXPLAIN A L T E R T A B L E f i l m A D D I N D E X i d x _ r a t i n g ( r a t i n g ) ; - - a n d n o w E X P L A I N A L T E R T A B L E f i l m A D D I N D E X i d x _ l e n g t h ( l e n g t h ) ; - - a n d n o w E X P L A I N A L T E R T A B L E f i l m A D D I N D E X i d x _ r a t i n g _ l e n g t h ( r a t i n g , l e n g t h ) ; - - E X P L A I N ! There's no shame in brute forcing combos
  22. Exercise 5a: Brute-indexing So compound indexes are better, right? A

    L T E R T A B L E f i l m D R O P I N D E X i d x _ r a t i n g , D R O P I N D E X i d x _ l e n g t h ; E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h < 8 0 ; E X P L A I N S E L E C T * F R O M f i l m W H E R E r a t i n g I N ( ' G ' ) E X P L A I N S E L E C T * F R O M f i l m W H E R E r a t i n g I N ( ' G ' ) A N D l e n g t h < 8 0 "key_len" Can tell us when partial indexes are used Indexes must be used "left-to-right"
  23. Exercise 5b: Multi-range scans S H O W I N

    D E X E S I N f i l m ; - - s h o u l d o n l y h a v e i d x _ r a t i n g _ l e n g t h E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 7 0 A N D r a t i n g = ' G ' ; A L T E R T A B L E f i l m D R O P I N D E X i d x _ r a t i n g _ l e n g t h , C R E A T E I N D E X i d x _ l e n g t h _ r a t i n g ( l e n g t h , r a t i n g ) ; E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 7 0 A N D r a t i n g = ' G ' ; Notice the reduced key_len - Multi-range scans!
  24. Exercise 5c: Multi-range scans 2 A better example S E

    L E C T * F R O M r e n t a l W H E R E r e n t a l _ d a t e > ' 2 0 0 5 - 0 5 - 3 0 ' A N D r e t u r n _ d a t e < ' 2 0 0 5 - 0 6 - 0 6 ' ; Find all rentals which took place within a given week No matter how you index, MySQL will only use the first range scan
  25. Exercise 5d: Cardinality E X P L A I N

    S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 7 0 ; E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 5 0 ; MySQL CBA using an index with the last query "If I'm going to have to scan > 30% of the index, don't use an index" S E L E C T * F R O M f i l m F O R C E I N D E X ( l e n g t h _ o n l y ) W H E R E l e n g t h > 1 5 0
  26. Index "Cardinality" MySQL makes decisions on which indexes to use

    in a fraction of a second - cardinality Cardinality refers to the uniqueness of values contained in a particular column. A column with the lowest possible cardinality would have the same value for every row - Wikipedia S H O W I N D E X E S I N t a b l e _ n a m e A N A L Y Z E T A B L E t a b l e _ n a m e ; - - u p d a t e s / g u e s s e s c a r d i n a l i t y
  27. Exercise 5e: Custom Index length Index an n-char prefix of

    a column This isn't really about speed, but only indexing what you need A L T E R T A B L E s o _ u s e r A D D I N D E X i d x _ n a m e ( d i s p l a y _ n a m e ) ; S E L E C T * F R O M s o _ u s e r W H E R E d i s p l a y _ n a m e L I K E ' A d r i a n % ' A L T E R T A B L E s o _ u s e r A D D I N D E X i d x _ n a m e ( d i s p l a y _ n a m e ( 6 ) ) ; S E L E C T * F R O M s o _ u s e r W H E R E d i s p l a y _ n a m e L I K E ' A d r i a n % ' Look at the key_len column change
  28. Exercise 5: Indexing review Compound indexes allow us to index

    over more than one WHERE clause Index definitions (and to some degree, their usage) is order sensitve MySQL will allow you to use a sub-set of an index Which leads to "Indexing Strategy" Analyse your own queries and order index definition such that you hit indexes in the majority of cases Index all the things? Cardinality can give us frustrating outcomes Multi Range scans are a bitch
  29. E X P L A I N works for JOINs

    too E X P L A I N S E L E C T * F R O M c u s t o m e r L E F T J O I N a d d r e s s U S I N G ( a d d r e s s _ i d ) ; s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E c u s t o m e r A L L N U L L N U L L N U L L N U L L 6 6 1 N U L L S I M P L E a d d r e s s e q _ r e f P R I M A R Y P R I M A R Y 2 c u s t . a d d r e s s _ i d 1 N U L L Rough Interpretation MySQL is going to start with scanning ALL customers and for each one it attempts to join on to address. Fortunately, MySQL doesn't have to work hard to find the address for each customer. Nested-loop approach - aka don't make JOINs work hard Pro-tip: ORDER / GROUP BY over two tables = bad times
  30. MySQL will change the order of your JOINs Not terribly

    useful, just interesting E X P L A I N S E L E C T * F R O M c u s t o m e r I N N E R J O I N a d d r e s s U S I N G ( a d d r e s s _ i d ) ; s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E a d d r e s s A L L P R I M A R Y N U L L N U L L N U L L 6 2 8 N U L L S I M P L E c u s t o m e r r e f f k _ a d d r _ i d f k _ a d d r _ i d 2 a d d r . a d d r e s s _ i d 1 N U L L Rough Interpretation Because the INNER JOIN means that the address must exist for the customer, MySQL switches the order of JOINs
  31. Trigger Index Optimisation A L T E R T A

    B L E s o _ u s e r A D D I N D E X i d x _ r e p u t a t i o n ( r e p u t a t i o n ) ; E X P L A I N S E L E C T * F R O M s o _ u s e r W H E R E r e p u t a t i o n > 2 5 0 0 E X P L A I N S E L E C T i d F R O M s o _ u s e r W H E R E r e p u t a t i o n > 2 5 0 0 Using Index = good stuff MySQL did not have to go to the table (which might not be loaded) to return the results to the client i d x _ r e p u t a t i o n is now a "covering index" Compound indexes also work as covering indexes, e.g. (reputation,display_name) But this is at odds with our "reduce index lengths" maxim
  32. Trigger Index Optimisation Can we avoid temp tables through indexing?

    If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created. (and four more conditions) No :(
  33. Trigger Index Optimisations Avoiding "filesort" A quick reminder on filesorts...

    Sample Query S E L E C T * F R O M s a k i l a . r e n t a l W H E R E r e t u r n _ d a t e B E T W E E N ' 2 0 0 5 - 0 1 - 0 1 ' A N D ' 2 0 0 5 - 0 6 - 0 1 ' O R D E R B Y r e t u r n _ d a t e A S C We can avoid a file sort More at 8.3.1.11 ORDER BY Optimisations Documentation details very specific circumstances where an index will be used
  34. Large datasets and L I M I T You have

    a large dataset, and you want to paginate Technically, without an ORDER BY, LIMIT is "non-deterministic", so we really should have an ORDER BY A L T E R T A B L E c o m m e n t A D D I N D E X i d x _ c r e a t e d ( c r e a t e d ) ; E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 0 0 , 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 0 0 0 0 , 2 5
  35. Large datasets and L I M I T Manual last

    row pagination (dark territory) E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t W H E R E i d > 2 0 0 O R D E R B Y c r e a t e d L I M I T 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t W H E R E i d > 2 0 0 0 0 O R D E R B Y c r e a t e d L I M I T 2 5
  36. Large datasets and L I M I T Delayed JOIN

    (very dark) S E L E C T * F R O M p o s t s I N N E R J O I N ( S E L E C T i d F R O M p o s t s O R D E R B Y c r e a t e d L I M I T 2 0 0 0 0 , 2 5 ) A S t m p U S I N G ( i d ) The sub query is very fast ("using index") which makes the outer select's job easy
  37. O R D E R B Y R A N

    D ( ) : discuss S E L E C T * F R O M f i l m O R D E R B Y R A N D ( ) L I M I T 1 How does this "Random Film" feature work Using what we now know about Full Table Scans, filesorts and limiting, why is the above query bad? RAND() forces all rows to have a function called ORDER BY forces a temporary table and filesort LIMIT ditches 99% of the temporary table!
  38. O R D E R B Y R A N

    D ( ) : a fix ..?! Strategy 1 - maintain a list Use your app to find valid IDs randomly (i.e. 1 Query to identify valid ID, another to execute on PK) Maintain a list (somewhere) of valid PKs (i.e. a cron job, or a trigger)
  39. O R D E R B Y R A N

    D ( ) : a fix ..?! Strategy 2 - maintain a different list Maintain an extra column containing a random number, which is indexed Use a similar approach to our cheap LIMIT hack S E L E C T * F R O M f i l m W H E R E i d > $ r a n d o m L I M I T 1
  40. O R D E R B Y R A N

    D ( ) : a fix ..?! Strategy 3 - Level up S E L E C T * F R O M f i l m J O I N ( S E L E C T C E I L ( R A N D ( ) * ( S E L E C T M A X ( f i l m _ i d ) F R O M f i l m ) ) A S f i l m _ i d ) A S r 2 U S I N G ( f i l m _ i d ) ; O(1) method for selecting a single random record from the DB Courtesy of jan.kneschke.de/projects/mysql/order-by-rand/ No-one can write this from memory - this would've taken hours of research
  41. Modify the query or data Clearly more disruptive to the

    application E.g. Find all rentals which took place on a Saturday or Sunday E X P L A I N S E L E C T * F R O M r e n t a l W H E R E D A T E _ F O R M A T ( r e n t a l _ d a t e , ' % w ' ) I N ( 0 , 6 ) ; s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E r e n t a l A L L N U L L N U L L N U L L N U L L 1 6 4 8 6 U s i n g w h e r e Problem? MySQL must execute the DATE_FORMAT function on every row first, before cutting down the resultset Solutions? Pre-calculate the Saturdays and Sundays and use IN(...) Summarise the weekday in another column, and make sure that's indexed
  42. Modify data - how? A L T E R T

    A B L E r e n t a l A D D C O L U M N w e e k d a y I N T ( 1 ) N O T N U L L D E F A U L T 0 ; A L T E R T A B L E r e n t a l A D D I N D E X i d x _ w e e k d a y ( w e e k d a y ) ; C R E A T E T R I G G E R t r g _ w e e k d a y _ u p B E F O R E U P D A T E O N r e n t a l F O R E A C H R O W S E T N E W . w e e k d a y = D A T E _ F O R M A T ( N E W . r e n t a l _ d a t e , ' % w ' ) ; E X P L A I N S E L E C T * F R O M r e n t a l W H E R E D A T E _ F O R M A T ( r e n t a l _ d a t e , ' % w ' ) I N ( 0 , 6 ) ;
  43. Modify the query/data (story time) The problem Imagine a large,

    hand-rolled ecommerce system (7 years ago) Thousands and thousands of products Each product is tagged, categorised, searchable on title/description Tags and categories used many-to-many link tables The solution Create a summary table - denormalise the data Maintain summary table with triggers or (lazy) cron jobs Abuse full text indexes in MySQL _ T A G _ 2 _ _ T A G _ 3 4 _ _ T A G _ 5 6 _ _ C A T E G O R Y _ 4 5 _ P r o d u c t n a m e a n d d e s c r i p t i o n . . .
  44. Tools & Resources Resources Section 8.8.2. Explain Output Format High

    Performance MySQL (3rd Edition) mysqlperformanceblog.com Everything that Percona ever made (e.g p t - v i s u a l - e x p l a i n , p t - q u e r y - d i g e s t ) New in 5.6 - Optimizer Tracing (mindblown)