s s h p h p n w 1 3 - { 1 , 2 , 3 , 4 , . . . } @ 1 0 . 0 . 0 . 2 (password = phpnw13) s u d o a t o p From inside a MySQL session [ p h p n w 1 3 - 1 @ s e r v e r ~ ] $ m y s q l - u p h p n w 1 3 - 1 - p (password = phpnw13) S H O W F U L L P R O C E S S L I S T \ G
p n w 1 3 @ s e r v e r ~ ] $ s u d o a t o p Press d MySQL [ p h p n w 1 3 @ s e r v e r ~ ] $ m y s q l - u $ u s e r n a m e (when prompted, password = password) m y s q l > s h o w f u l l p r o c e s s l i s t \ G m y s q l > q u i t
l s l a p is a very simple MySQL stress testing tool. It runs a set of queries in parallel, hammering your server System Load [ p h p n w 1 3 @ s e r v e r ~ ] $ s u d o a t o p Press d MySQL [ p h p n w 1 3 @ s e r v e r ~ ] $ m y s q l - u $ u s e r n a m e m y s q l > s h o w f u l l p r o c e s s l i s t \ G
product placement Drop in replacement for MySQL/MariaDB - no migration process Lots of cool diagnostic tools & performance improvements Broadly kept in line with upstream, available as packages For the risk averse - can just be used as a dev tool Some of improvments in Percona are available in MariaDB
e : 2 0 1 3 - 0 7 - 2 1 0 9 : 2 3 : 0 2 t o 2 0 1 3 - 0 9 - 0 2 1 1 : 2 4 : 3 3 A t t r i b u t e p c t t o t a l m i n m a x a v g 9 5 % s t d d e v m e d i a n = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = C o u n t 6 2 2 5 E x e c t i m e 7 5 5 7 5 s 1 8 s 3 5 s 2 5 s 3 2 s 5 s 2 5 s L o c k t i m e 0 3 5 m s 1 0 3 u s 2 1 6 u s 1 5 7 u s 1 6 7 u s 1 1 u s 1 5 2 u s R o w s s e n t 0 8 9 . 4 2 k 3 1 3 5 3 5 4 0 6 . 9 4 5 1 1 . 4 5 7 3 . 3 3 4 2 0 . 7 7 R o w s e x a m i n e 9 1 . 7 3 G 5 . 8 8 M 1 0 . 6 2 M 7 . 8 6 M 9 . 7 6 M 1 . 4 7 M 8 . 0 3 M B y t e s s e n t 0 2 3 . 1 5 M 8 0 . 6 8 k 1 4 0 . 0 9 k 1 0 5 . 3 5 k 1 3 0 . 0 4 k 1 8 . 9 2 k 1 0 6 . 9 9 k T m p t a b l e s 0 2 2 5 1 1 1 1 0 1 T m p d i s k t b l 1 0 2 2 5 1 1 1 1 0 1 T m p t b l s i z e 4 2 1 2 . 9 6 M 7 4 5 . 8 7 k 1 . 2 5 M 9 6 9 . 2 2 k 1 . 2 0 M 1 8 5 . 7 3 k 1 0 0 9 . 3 3 k Q u e r y s i z e 1 2 4 2 . 4 4 k 1 . 0 7 k 1 . 1 1 k 1 . 0 8 k 1 . 0 9 k 1 9 . 7 1 1 . 0 4 k
h p n w 1 3 @ s e r v e r ~ ] $ p t - q u e r y - d i g e s t / v a r / l i b / m y s q l / s l o w . l o g | l e s s Task 2 [ p h p n w 1 3 @ s e r v e r ~ ] $ p t - q u e r y - d i g e s t - - r e v i e w \ u = p h p n w 1 3 - 1 , D = p h p n w 1 3 - 1 , t = r e v i e w _ m e , h = l o c a l h o s t \ / v a r / l i b / m y s q l / s l o w . l o g
p h p n w 1 3 @ s e r v e r ~ ] $ . / p t - q u e r y - a d v i s o r - - r e v i e w u = r o o t , h = l o c a l h o s t , D = t e m p , t = r e v i e w _ m e Some hints A R G . 0 0 1 You've got a LIKE comparison with a % prefix C L A . 0 0 1 You've issued a SELECT without a WHERE clause C L A . 0 0 3 LIMIT,OFFSET used (bad news, unfortunately)
MySQL? Is it caught by the slow query log? If it's in the slow query log, it's a problem - no scope for "works on my machine" What is the scale of problem we're dealing with? Do we have the luxury of reproducing it locally? Do we care? Is it easier to ignore the problem?
started using mysqltuner.pl The manual is a fantastic resource with gems like: If you see many Sort_merge_passes per second in SHOW GLOBAL STATUS output, you can consider increasing the sort_buffer_size value to speed up ORDER BY or GROUP BY operations that cannot be improved with query optimization or improved indexing. Any variable in the MySQL "Server System Variables" section of the manual marked as "Dynamic" can be changed live. You can capture "real" query load using g e n e r a l _ l o g and replay locally using m y s q l s l a p
p h p n w 1 3 @ s e r v e r ~ ] $ l e s s / e t c / m y . c n f What to tune in MySQL server after installation Section 5.1.6. Server Status Variables Section 5.1.4. Server System Variables
F L U S H S T A T U S ; Wait 10 minutes.... S H O W G L O B A L S T A T U S L I K E ' % m e r g e % ' ; S H O W G L O B A L S T A T U S L I K E ' % t m p % ' ; S E T s o r t _ b u f f e r _ s i z e = 8 M ; S E T t m p _ t a b l e _ s i z e = 3 2 M ; S E T m a x _ h e a p _ t a b l e _ s i z e = 3 2 M ; F L U S H S T A T U S ; Check again in 10 minutes...
You prefix your S E L E C T query with E X P L A I N (As of 5.6, you can now explain INSERTs and UPDATEs) That's it ... (Seriously. E X P L A I N ALL the queries!) Add an index E X P L A I N again Take the rest of the week off
N S E L E C T * F R O M p a y m e n t W H E R E p a y m e n t _ d a t e = ' 2 0 0 5 - 0 8 - 0 1 ' s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E p a y m e n t A L L N U L L N U L L N U L L N U L L 1 6 5 9 1 U s i n g W h e r e Let's take a look at MySQL's analysis: It found a WHERE clause? Cheers, Sherlock (split architecture) MySQL estimates that it'll have to scan through 16k rows Note that the estimate is approximate - 16,049 rows There are no suitable candidates for indexes (or keys) There was no key used, so it did a "Full Table Scan"
indexed, we can now prove that % s t r i n g % searches are bad E X P L A I N S E L E C T * F R O M c u s t o m e r W H E R E l a s t _ n a m e L I K E ' % o n e % ' s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E c u s t o m e r A L L N U L L N U L L N U L L N U L L 6 6 1 U s i n g W h e r e Again, this query resulted in a "Full Table Scan"
during scanning Any filtering is done by "MySQL" rather than the storage engine Table might not be in memory - disk can be contended Things like "country lists" for <select> are an exception
S E L E C T on p a y m e n t triggered a scan of 16,000 rows Add the carefully crafted index A L T E R T A B L E p a y m e n t A D D I N D E X d a t e _ o n l y ( p a y m e n t _ d a t e ) ; E X P L A I N again E X P L A I N S E L E C T * F R O M p a y m e n t W H E R E p a y m e n t _ d a t e = ' 2 0 0 5 - 0 8 - 0 1 ' s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E e m p l o y e e s r e f d a t e _ o n l y d a t e _ o n l y 3 c o n s t 6 7 1 U s i n g W h e r e
E C T * F R O M f i l m W H E R E r a t i n g I N ( ' G ' , ' P G ' , ' P G - 1 3 ' ) A N D l e n g t h < 8 0 ; Add indexes and EXPLAIN A L T E R T A B L E f i l m A D D I N D E X i d x _ r a t i n g ( r a t i n g ) ; - - a n d n o w E X P L A I N A L T E R T A B L E f i l m A D D I N D E X i d x _ l e n g t h ( l e n g t h ) ; - - a n d n o w E X P L A I N A L T E R T A B L E f i l m A D D I N D E X i d x _ r a t i n g _ l e n g t h ( r a t i n g , l e n g t h ) ; - - E X P L A I N ! There's no shame in brute forcing combos
L T E R T A B L E f i l m D R O P I N D E X i d x _ r a t i n g , D R O P I N D E X i d x _ l e n g t h ; E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h < 8 0 ; E X P L A I N S E L E C T * F R O M f i l m W H E R E r a t i n g I N ( ' G ' ) E X P L A I N S E L E C T * F R O M f i l m W H E R E r a t i n g I N ( ' G ' ) A N D l e n g t h < 8 0 "key_len" Can tell us when partial indexes are used Indexes must be used "left-to-right"
D E X E S I N f i l m ; - - s h o u l d o n l y h a v e i d x _ r a t i n g _ l e n g t h E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 7 0 A N D r a t i n g = ' G ' ; A L T E R T A B L E f i l m D R O P I N D E X i d x _ r a t i n g _ l e n g t h , C R E A T E I N D E X i d x _ l e n g t h _ r a t i n g ( l e n g t h , r a t i n g ) ; E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 7 0 A N D r a t i n g = ' G ' ; Notice the reduced key_len - Multi-range scans!
L E C T * F R O M r e n t a l W H E R E r e n t a l _ d a t e > ' 2 0 0 5 - 0 5 - 3 0 ' A N D r e t u r n _ d a t e < ' 2 0 0 5 - 0 6 - 0 6 ' ; Find all rentals which took place within a given week No matter how you index, MySQL will only use the first range scan
S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 7 0 ; E X P L A I N S E L E C T * F R O M f i l m W H E R E l e n g t h > 1 5 0 ; MySQL CBA using an index with the last query "If I'm going to have to scan > 30% of the index, don't use an index" S E L E C T * F R O M f i l m F O R C E I N D E X ( l e n g t h _ o n l y ) W H E R E l e n g t h > 1 5 0
in a fraction of a second - cardinality Cardinality refers to the uniqueness of values contained in a particular column. A column with the lowest possible cardinality would have the same value for every row - Wikipedia S H O W I N D E X E S I N t a b l e _ n a m e A N A L Y Z E T A B L E t a b l e _ n a m e ; - - u p d a t e s / g u e s s e s c a r d i n a l i t y
a column This isn't really about speed, but only indexing what you need A L T E R T A B L E s o _ u s e r A D D I N D E X i d x _ n a m e ( d i s p l a y _ n a m e ) ; S E L E C T * F R O M s o _ u s e r W H E R E d i s p l a y _ n a m e L I K E ' A d r i a n % ' A L T E R T A B L E s o _ u s e r A D D I N D E X i d x _ n a m e ( d i s p l a y _ n a m e ( 6 ) ) ; S E L E C T * F R O M s o _ u s e r W H E R E d i s p l a y _ n a m e L I K E ' A d r i a n % ' Look at the key_len column change
over more than one WHERE clause Index definitions (and to some degree, their usage) is order sensitve MySQL will allow you to use a sub-set of an index Which leads to "Indexing Strategy" Analyse your own queries and order index definition such that you hit indexes in the majority of cases Index all the things? Cardinality can give us frustrating outcomes Multi Range scans are a bitch
too E X P L A I N S E L E C T * F R O M c u s t o m e r L E F T J O I N a d d r e s s U S I N G ( a d d r e s s _ i d ) ; s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E c u s t o m e r A L L N U L L N U L L N U L L N U L L 6 6 1 N U L L S I M P L E a d d r e s s e q _ r e f P R I M A R Y P R I M A R Y 2 c u s t . a d d r e s s _ i d 1 N U L L Rough Interpretation MySQL is going to start with scanning ALL customers and for each one it attempts to join on to address. Fortunately, MySQL doesn't have to work hard to find the address for each customer. Nested-loop approach - aka don't make JOINs work hard Pro-tip: ORDER / GROUP BY over two tables = bad times
useful, just interesting E X P L A I N S E L E C T * F R O M c u s t o m e r I N N E R J O I N a d d r e s s U S I N G ( a d d r e s s _ i d ) ; s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E a d d r e s s A L L P R I M A R Y N U L L N U L L N U L L 6 2 8 N U L L S I M P L E c u s t o m e r r e f f k _ a d d r _ i d f k _ a d d r _ i d 2 a d d r . a d d r e s s _ i d 1 N U L L Rough Interpretation Because the INNER JOIN means that the address must exist for the customer, MySQL switches the order of JOINs
B L E s o _ u s e r A D D I N D E X i d x _ r e p u t a t i o n ( r e p u t a t i o n ) ; E X P L A I N S E L E C T * F R O M s o _ u s e r W H E R E r e p u t a t i o n > 2 5 0 0 E X P L A I N S E L E C T i d F R O M s o _ u s e r W H E R E r e p u t a t i o n > 2 5 0 0 Using Index = good stuff MySQL did not have to go to the table (which might not be loaded) to return the results to the client i d x _ r e p u t a t i o n is now a "covering index" Compound indexes also work as covering indexes, e.g. (reputation,display_name) But this is at odds with our "reduce index lengths" maxim
If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created. (and four more conditions) No :(
Sample Query S E L E C T * F R O M s a k i l a . r e n t a l W H E R E r e t u r n _ d a t e B E T W E E N ' 2 0 0 5 - 0 1 - 0 1 ' A N D ' 2 0 0 5 - 0 6 - 0 1 ' O R D E R B Y r e t u r n _ d a t e A S C We can avoid a file sort More at 8.3.1.11 ORDER BY Optimisations Documentation details very specific circumstances where an index will be used
a large dataset, and you want to paginate Technically, without an ORDER BY, LIMIT is "non-deterministic", so we really should have an ORDER BY A L T E R T A B L E c o m m e n t A D D I N D E X i d x _ c r e a t e d ( c r e a t e d ) ; E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 0 0 , 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 0 0 0 0 , 2 5
row pagination (dark territory) E X P L A I N S E L E C T * F R O M c o m m e n t O R D E R B Y c r e a t e d L I M I T 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t W H E R E i d > 2 0 0 O R D E R B Y c r e a t e d L I M I T 2 5 E X P L A I N S E L E C T * F R O M c o m m e n t W H E R E i d > 2 0 0 0 0 O R D E R B Y c r e a t e d L I M I T 2 5
(very dark) S E L E C T * F R O M p o s t s I N N E R J O I N ( S E L E C T i d F R O M p o s t s O R D E R B Y c r e a t e d L I M I T 2 0 0 0 0 , 2 5 ) A S t m p U S I N G ( i d ) The sub query is very fast ("using index") which makes the outer select's job easy
D ( ) : discuss S E L E C T * F R O M f i l m O R D E R B Y R A N D ( ) L I M I T 1 How does this "Random Film" feature work Using what we now know about Full Table Scans, filesorts and limiting, why is the above query bad? RAND() forces all rows to have a function called ORDER BY forces a temporary table and filesort LIMIT ditches 99% of the temporary table!
D ( ) : a fix ..?! Strategy 1 - maintain a list Use your app to find valid IDs randomly (i.e. 1 Query to identify valid ID, another to execute on PK) Maintain a list (somewhere) of valid PKs (i.e. a cron job, or a trigger)
D ( ) : a fix ..?! Strategy 2 - maintain a different list Maintain an extra column containing a random number, which is indexed Use a similar approach to our cheap LIMIT hack S E L E C T * F R O M f i l m W H E R E i d > $ r a n d o m L I M I T 1
D ( ) : a fix ..?! Strategy 3 - Level up S E L E C T * F R O M f i l m J O I N ( S E L E C T C E I L ( R A N D ( ) * ( S E L E C T M A X ( f i l m _ i d ) F R O M f i l m ) ) A S f i l m _ i d ) A S r 2 U S I N G ( f i l m _ i d ) ; O(1) method for selecting a single random record from the DB Courtesy of jan.kneschke.de/projects/mysql/order-by-rand/ No-one can write this from memory - this would've taken hours of research
application E.g. Find all rentals which took place on a Saturday or Sunday E X P L A I N S E L E C T * F R O M r e n t a l W H E R E D A T E _ F O R M A T ( r e n t a l _ d a t e , ' % w ' ) I N ( 0 , 6 ) ; s e l e c t _ t y p e t a b l e t y p e p o s s i b l e _ k e y s k e y k e y _ l e n r e f r o w s E x t r a S I M P L E r e n t a l A L L N U L L N U L L N U L L N U L L 1 6 4 8 6 U s i n g w h e r e Problem? MySQL must execute the DATE_FORMAT function on every row first, before cutting down the resultset Solutions? Pre-calculate the Saturdays and Sundays and use IN(...) Summarise the weekday in another column, and make sure that's indexed
A B L E r e n t a l A D D C O L U M N w e e k d a y I N T ( 1 ) N O T N U L L D E F A U L T 0 ; A L T E R T A B L E r e n t a l A D D I N D E X i d x _ w e e k d a y ( w e e k d a y ) ; C R E A T E T R I G G E R t r g _ w e e k d a y _ u p B E F O R E U P D A T E O N r e n t a l F O R E A C H R O W S E T N E W . w e e k d a y = D A T E _ F O R M A T ( N E W . r e n t a l _ d a t e , ' % w ' ) ; E X P L A I N S E L E C T * F R O M r e n t a l W H E R E D A T E _ F O R M A T ( r e n t a l _ d a t e , ' % w ' ) I N ( 0 , 6 ) ;
hand-rolled ecommerce system (7 years ago) Thousands and thousands of products Each product is tagged, categorised, searchable on title/description Tags and categories used many-to-many link tables The solution Create a summary table - denormalise the data Maintain summary table with triggers or (lazy) cron jobs Abuse full text indexes in MySQL _ T A G _ 2 _ _ T A G _ 3 4 _ _ T A G _ 5 6 _ _ C A T E G O R Y _ 4 5 _ P r o d u c t n a m e a n d d e s c r i p t i o n . . .
Performance MySQL (3rd Edition) mysqlperformanceblog.com Everything that Percona ever made (e.g p t - v i s u a l - e x p l a i n , p t - q u e r y - d i g e s t ) New in 5.6 - Optimizer Tracing (mindblown)