Slide 1

Slide 1 text

FooLab http://foolab.ca @foolabca Speed up Your Database Pycon Canada - November 11, 2012

Slide 2

Slide 2 text

FooLab Speed Matters • What is slow? • Indexes. • Better queries. • Checklist. 2 Indices*

Slide 3

Slide 3 text

FooLab Anna Filina 3 User group Non profit conference FooLab Training Project rescue @afilina

Slide 4

Slide 4 text

FooLab What is Slow?

Slide 5

Slide 5 text

FooLab Multitude of Queries • Loops: • Reduce the number of queries. • Check for lazy loading in your ORM. • Query a set instead. 5 for id in pictures: query_pic_details(id)

Slide 6

Slide 6 text

FooLab Multitude of Queries • Deeply nested code. • Try get_first_thumb instead. 6

Slide 7

Slide 7 text

FooLab Count Queries • Manual: add custom code before queries. • Auto: ORMs (like SQLAlchemy) have listeners. 7

Slide 8

Slide 8 text

FooLab Slow Query Log • Open my.cnf • Queries taking 0.5s or longer will be logged. • Open the log file and analyze the queries. 8 slow_query_log=ON long_query_time=0.5 slow_query_log_file=/path/to/file

Slide 9

Slide 9 text

FooLab Example Schema 9 200 users 100 albums each 100 pictures each 2,000,000 total pics

Slide 10

Slide 10 text

FooLab Simple Query • Get all pictures from user #1 • 38 seconds. 10 SELECT picture.id, picture.title FROM picture LEFT JOIN album ON picture.album.id = album.id WHERE album.user_id = 1;

Slide 11

Slide 11 text

FooLab Let me do a space jump while it fetches my pics... 11

Slide 12

Slide 12 text

FooLab Why is it Slow?

Slide 13

Slide 13 text

FooLab Explain 13 • Prefix with EXPLAIN: • Each row = table scanned: EXPLAIN SELECT picture.id, picture.title FROM picture LEFT JOIN album ON picture.album.id = album.id WHERE album.user_id = 1; +----+-------------+---------+------+---------------+------+---------+------+---------+--------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+------+---------------+------+---------+------+---------+--------------------------------+ | 1 | SIMPLE | album | ALL | NULL | NULL | NULL | NULL | 20000 | Using where | | 1 | SIMPLE | picture | ALL | NULL | NULL | NULL | NULL | 2000000 | Using where; Using join buffer | +----+-------------+---------+------+---------------+------+---------+------+---------+--------------------------------+

Slide 14

Slide 14 text

FooLab Explain 14 • Details: • Because of the join, we’re scanning all albums for each picture. • Total 40 billion rows scanned. table : album key : NULL rows : 20,000 table : picture key : NULL rows : 2,000,000

Slide 15

Slide 15 text

FooLab Indexes

Slide 16

Slide 16 text

FooLab Metaphor 16 Flip through all pages of a address book... ... or pull on the letter tab.

Slide 17

Slide 17 text

FooLab Add Index 17 • album.id • Same query down to 2.38 sec. • We just saved 36 sec on every request! ALTER TABLE album ADD PRIMARY KEY(id);

Slide 18

Slide 18 text

FooLab Under the Hood 18 • Details: • With the index, we find our album right away. • Total 2,000,000 rows scanned. table : album key : PRIMARY rows : 1 table : picture key : NULL rows : 2,000,000

Slide 19

Slide 19 text

FooLab Add Next Index 19 • picture.album_id • Now down to 0.12 sec. • 317 times faster than original. ALTER TABLE picture ADD INDEX(album_id);

Slide 20

Slide 20 text

FooLab Under the Hood 20 • Details: • Total 200,000 rows scanned. table : album key : NULL rows : 20,000 table : picture key : album_id rows : 100

Slide 21

Slide 21 text

FooLab Add Last Index 21 • album.user_id • Now down to 0.10 sec. • Gained another 17% in speed. ALTER TABLE album ADD INDEX(user_id);

Slide 22

Slide 22 text

FooLab Under the Hood 22 • Details: • With the second index, we’re scanning 100 pics. • Total 10,000 rows scanned. table : album key : user_id rows : 100 table : picture key : album_id rows : 100

Slide 23

Slide 23 text

FooLab Indexes Used 23 • Primary and foreign keys make for great indexes. • Also whatever you use in JOINs and WHEREs.

Slide 24

Slide 24 text

FooLab Do We Need to Optimize Further? We’re quite fast already.

Slide 25

Slide 25 text

FooLab Airport Customs 25 • 5 customs officers, 1 min to process a traveler. • 5 travelers / min = fluid. • 20 travelers / min = queue slowly builds. • 1000 travellers / min = 3h wait if you arrive on 2nd minute.

Slide 26

Slide 26 text

FooLab Microseconds Matter 26 Evergrowing queues Alone at full speed

Slide 27

Slide 27 text

FooLab Smaller Indexes 27 • BIGINT as index is slower than TINYINT. • Use UNSIGNED to double the maximum value.

Slide 28

Slide 28 text

FooLab Tips for Better Queries

Slide 29

Slide 29 text

FooLab Functions & Indexes 29 • Avoid functions on an indexed column: • The index won’t be used here. SELECT id, title FROM picture WHERE YEAR(create_date) >= 2011;

Slide 30

Slide 30 text

FooLab Limit 30 • Stop picking up firewood when your hands are full: • Now down to 0.0009 sec. • 100 times faster than the last query. SELECT picture.id, picture.title FROM picture LEFT JOIN album ON picture.album.id = album.id WHERE album.user_id = 1 LIMIT 25;

Slide 31

Slide 31 text

FooLab Other Constraints 31 • Range: • Only makes sense if that column is indexed: SELECT picture.id, picture.title FROM picture LEFT JOIN album ON picture.album.id = album.id WHERE album.user_id = 1 AND picture.id BETWEEN 26 AND 50; # 0.023 sec ALTER TABLE picture ADD INDEX(id);

Slide 32

Slide 32 text

FooLab Other Constraints 32 • IN clause: SELECT picture.id, picture.title FROM picture LEFT JOIN album ON picture.album.id = album.id WHERE album.user_id = 1 AND picture.id IN (15,26,29,32,45); # 0.048 sec

Slide 33

Slide 33 text

FooLab Split Tables 33 • MySQL can partition your tables transparently. • Creates multiple files on disk. • Shows data as a single table.

Slide 34

Slide 34 text

FooLab Archives 34 • Don’t underestimate archive tables. • MySQL ARCHIVE storage engine: • Doesn’t support UPDATE or DELETE • Very fast for SELECT • You can archive old transactions, news, etc.

Slide 35

Slide 35 text

FooLab Split Databases 35 • Also called horizontal partitionning or sharding. • Users 1-1000 and their data in database #1 • Users 1001-2000 and their data in database #2

Slide 36

Slide 36 text

FooLab The Checklist 36 ✓Count your queries. ✓Log the slow ones. ✓Use EXPLAIN and check number of rows scanned. ✓Try different indexes, remove the unused ones. ✓Use small indexes, preferably numeric. ✓Don’t use functions on indexed columns. ✓Limit number of results. ✓Split tables and databases.

Slide 37

Slide 37 text

FooLab Further Reading • SQLAlchemy profiling: http://www.sqlalchemy.org/trac/ wiki/UsageRecipes/Profiling • Storage Engines: http://www.linux.org/article/view/an- introduction-to-mysql-storage-engines • Sharding: http://www.jurriaanpersyn.com/archives/ 2009/02/12/database-sharding-at-netlog-with-mysql- and-php/ 37

Slide 38

Slide 38 text

FooLab Next Steps • I will tweet the slides: @afilina • E-mail your feedback: [email protected] 38