FooLab
http://foolab.ca
@foolabca
Speed up Your
Database
Pycon Canada - November 11, 2012
Slide 2
Slide 2 text
FooLab
Speed Matters
• What is slow?
• Indexes.
• Better queries.
• Checklist.
2
Indices*
Slide 3
Slide 3 text
FooLab
Anna Filina
3
User group
Non profit
conference
FooLab
Training
Project rescue
@afilina
Slide 4
Slide 4 text
FooLab
What is Slow?
Slide 5
Slide 5 text
FooLab
Multitude of Queries
• Loops:
• Reduce the number of queries.
• Check for lazy loading in your ORM.
• Query a set instead.
5
for id in pictures:
query_pic_details(id)
FooLab
Count Queries
• Manual: add custom code before queries.
• Auto: ORMs (like SQLAlchemy) have listeners.
7
Slide 8
Slide 8 text
FooLab
Slow Query Log
• Open my.cnf
• Queries taking 0.5s or longer will be logged.
• Open the log file and analyze the queries.
8
slow_query_log=ON
long_query_time=0.5
slow_query_log_file=/path/to/file
Slide 9
Slide 9 text
FooLab
Example Schema
9
200
users
100
albums each
100
pictures each
2,000,000 total pics
Slide 10
Slide 10 text
FooLab
Simple Query
• Get all pictures from user #1
• 38 seconds.
10
SELECT picture.id, picture.title
FROM picture
LEFT JOIN album ON picture.album.id = album.id
WHERE album.user_id = 1;
Slide 11
Slide 11 text
FooLab
Let me do a space jump
while it fetches my pics...
11
Slide 12
Slide 12 text
FooLab
Why is it Slow?
Slide 13
Slide 13 text
FooLab
Explain
13
• Prefix with EXPLAIN:
• Each row = table scanned:
EXPLAIN SELECT picture.id, picture.title
FROM picture
LEFT JOIN album ON picture.album.id = album.id
WHERE album.user_id = 1;
+----+-------------+---------+------+---------------+------+---------+------+---------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+---------+--------------------------------+
| 1 | SIMPLE | album | ALL | NULL | NULL | NULL | NULL | 20000 | Using where |
| 1 | SIMPLE | picture | ALL | NULL | NULL | NULL | NULL | 2000000 | Using where; Using join buffer |
+----+-------------+---------+------+---------------+------+---------+------+---------+--------------------------------+
Slide 14
Slide 14 text
FooLab
Explain
14
• Details:
• Because of the join,
we’re scanning all albums for each picture.
• Total 40 billion rows scanned.
table : album
key : NULL
rows : 20,000
table : picture
key : NULL
rows : 2,000,000
Slide 15
Slide 15 text
FooLab
Indexes
Slide 16
Slide 16 text
FooLab
Metaphor
16
Flip through all pages of a address book...
... or pull on the letter tab.
Slide 17
Slide 17 text
FooLab
Add Index
17
• album.id
• Same query down to 2.38 sec.
• We just saved 36 sec on every request!
ALTER TABLE album ADD PRIMARY KEY(id);
Slide 18
Slide 18 text
FooLab
Under the Hood
18
• Details:
• With the index,
we find our album right away.
• Total 2,000,000 rows scanned.
table : album
key : PRIMARY
rows : 1
table : picture
key : NULL
rows : 2,000,000
Slide 19
Slide 19 text
FooLab
Add Next Index
19
• picture.album_id
• Now down to 0.12 sec.
• 317 times faster than original.
ALTER TABLE picture ADD INDEX(album_id);
Slide 20
Slide 20 text
FooLab
Under the Hood
20
• Details:
• Total 200,000 rows scanned.
table : album
key : NULL
rows : 20,000
table : picture
key : album_id
rows : 100
Slide 21
Slide 21 text
FooLab
Add Last Index
21
• album.user_id
• Now down to 0.10 sec.
• Gained another 17% in speed.
ALTER TABLE album ADD INDEX(user_id);
Slide 22
Slide 22 text
FooLab
Under the Hood
22
• Details:
• With the second index,
we’re scanning 100 pics.
• Total 10,000 rows scanned.
table : album
key : user_id
rows : 100
table : picture
key : album_id
rows : 100
Slide 23
Slide 23 text
FooLab
Indexes Used
23
• Primary and foreign keys make for great indexes.
• Also whatever you use in JOINs and WHEREs.
Slide 24
Slide 24 text
FooLab
Do We Need to
Optimize Further?
We’re quite fast already.
Slide 25
Slide 25 text
FooLab
Airport Customs
25
• 5 customs officers, 1 min to process a traveler.
• 5 travelers / min = fluid.
• 20 travelers / min = queue slowly builds.
• 1000 travellers / min = 3h wait if you arrive on 2nd
minute.
Slide 26
Slide 26 text
FooLab
Microseconds Matter
26
Evergrowing queues Alone at full speed
Slide 27
Slide 27 text
FooLab
Smaller Indexes
27
• BIGINT as index is slower than TINYINT.
• Use UNSIGNED to double the maximum value.
Slide 28
Slide 28 text
FooLab
Tips for Better Queries
Slide 29
Slide 29 text
FooLab
Functions & Indexes
29
• Avoid functions on an indexed column:
• The index won’t be used here.
SELECT id, title FROM picture
WHERE YEAR(create_date) >= 2011;
Slide 30
Slide 30 text
FooLab
Limit
30
• Stop picking up firewood when your hands are full:
• Now down to 0.0009 sec.
• 100 times faster than the last query.
SELECT picture.id, picture.title
FROM picture
LEFT JOIN album ON picture.album.id = album.id
WHERE album.user_id = 1
LIMIT 25;
Slide 31
Slide 31 text
FooLab
Other Constraints
31
• Range:
• Only makes sense if that column is indexed:
SELECT picture.id, picture.title
FROM picture
LEFT JOIN album ON picture.album.id = album.id
WHERE album.user_id = 1
AND picture.id BETWEEN 26 AND 50; # 0.023 sec
ALTER TABLE picture ADD INDEX(id);
Slide 32
Slide 32 text
FooLab
Other Constraints
32
• IN clause:
SELECT picture.id, picture.title
FROM picture
LEFT JOIN album ON picture.album.id = album.id
WHERE album.user_id = 1
AND picture.id IN (15,26,29,32,45); # 0.048 sec
Slide 33
Slide 33 text
FooLab
Split Tables
33
• MySQL can partition your tables transparently.
• Creates multiple files on disk.
• Shows data as a single table.
Slide 34
Slide 34 text
FooLab
Archives
34
• Don’t underestimate archive tables.
• MySQL ARCHIVE storage engine:
• Doesn’t support UPDATE or DELETE
• Very fast for SELECT
• You can archive old transactions, news, etc.
Slide 35
Slide 35 text
FooLab
Split Databases
35
• Also called horizontal partitionning or sharding.
• Users 1-1000 and their data in database #1
• Users 1001-2000 and their data in database #2
Slide 36
Slide 36 text
FooLab
The Checklist
36
✓Count your queries.
✓Log the slow ones.
✓Use EXPLAIN and check
number of rows scanned.
✓Try different indexes,
remove the unused ones.
✓Use small indexes,
preferably numeric.
✓Don’t use functions on
indexed columns.
✓Limit number of results.
✓Split tables and
databases.