Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MySQL Query Optimisation

adrianhardy
October 04, 2013

MySQL Query Optimisation

An overview of the working patterns, tools and approaches to getting the most out of your MySQL queries.

adrianhardy

October 04, 2013
Tweet

More Decks by adrianhardy

Other Decks in Technology

Transcript

  1. MySQL Query
    Optimisation

    View Slide

  2. Who?
    Adrian Hardy (@adrianhardy)
    Writing high performance business systems for
    ten years

    View Slide

  3. The secret to query
    optimisation
    A large, slow dataset
    A basic understanding of how MySQL works
    Terminology
    Resources
    Time

    View Slide

  4. What are we trying to achieve?
    Make stuff faster
    Shield calling applications from changes

    View Slide

  5. Identify

    View Slide

  6. Identify - Your workflow and
    tools
    From the command line
    s
    s
    h p
    h
    p
    n
    w
    1
    3
    -
    {
    1
    ,
    2
    ,
    3
    ,
    4
    ,
    .
    .
    .
    }
    @
    1
    0
    .
    0
    .
    0
    .
    2
    (password = phpnw13)
    s
    u
    d
    o a
    t
    o
    p
    From inside a MySQL session
    [
    p
    h
    p
    n
    w
    1
    3
    -
    1
    @
    s
    e
    r
    v
    e
    r ~
    ]
    $ m
    y
    s
    q
    l -
    u p
    h
    p
    n
    w
    1
    3
    -
    1 -
    p
    (password = phpnw13)
    S
    H
    O
    W F
    U
    L
    L P
    R
    O
    C
    E
    S
    S
    L
    I
    S
    T \
    G

    View Slide

  7. Exercise 1 - your environment
    System Load
    [
    p
    h
    p
    n
    w
    1
    3
    @
    s
    e
    r
    v
    e
    r ~
    ]
    $ s
    u
    d
    o a
    t
    o
    p
    Press d
    MySQL
    [
    p
    h
    p
    n
    w
    1
    3
    @
    s
    e
    r
    v
    e
    r ~
    ]
    $ m
    y
    s
    q
    l -
    u $
    u
    s
    e
    r
    n
    a
    m
    e
    (when prompted, password = password)
    m
    y
    s
    q
    l > s
    h
    o
    w f
    u
    l
    l p
    r
    o
    c
    e
    s
    s
    l
    i
    s
    t \
    G
    m
    y
    s
    q
    l > q
    u
    i
    t

    View Slide

  8. Exercise 1a - Experience a slap
    m
    y
    s
    q
    l
    s
    l
    a
    p is a very simple MySQL stress testing tool. It runs a set of
    queries in parallel, hammering your server
    System Load
    [
    p
    h
    p
    n
    w
    1
    3
    @
    s
    e
    r
    v
    e
    r ~
    ]
    $ s
    u
    d
    o a
    t
    o
    p
    Press d
    MySQL
    [
    p
    h
    p
    n
    w
    1
    3
    @
    s
    e
    r
    v
    e
    r ~
    ]
    $ m
    y
    s
    q
    l -
    u $
    u
    s
    e
    r
    n
    a
    m
    e
    m
    y
    s
    q
    l > s
    h
    o
    w f
    u
    l
    l p
    r
    o
    c
    e
    s
    s
    l
    i
    s
    t \
    G

    View Slide

  9. Identify - Slow Query Log
    Feels a little like "Percona" product placement
    Drop in replacement for MySQL/MariaDB - no migration process
    Lots of cool diagnostic tools & performance improvements
    Broadly kept in line with upstream, available as packages
    For the risk averse - can just be used as a dev tool
    Some of improvments in Percona are available in MariaDB

    View Slide

  10. Exercise 2 - slow query log
    [
    p
    h
    p
    n
    w
    1
    3
    @
    s
    e
    r
    v
    e
    r ~
    ]
    $ l
    e
    s
    s /
    v
    a
    r
    /
    l
    o
    g
    /
    m
    y
    s
    q
    l
    d
    /
    s
    l
    o
    w
    .
    l
    o
    g

    View Slide

  11. Percona's Slow Query Log
    Statistics Available
    Query_time
    Lock_time
    Rows_sent
    Rows_examined
    * QC_Hit
    * Full_scan
    * Tmp_table (_on_disk)
    * Filesort (_on_disk)
    and many more!

    View Slide

  12. Percona's pt-query-digest
    T
    i
    m
    e r
    a
    n
    g
    e
    : 2
    0
    1
    3
    -
    0
    7
    -
    2
    1 0
    9
    :
    2
    3
    :
    0
    2 t
    o 2
    0
    1
    3
    -
    0
    9
    -
    0
    2 1
    1
    :
    2
    4
    :
    3
    3
    A
    t
    t
    r
    i
    b
    u
    t
    e p
    c
    t t
    o
    t
    a
    l m
    i
    n m
    a
    x a
    v
    g 9
    5
    % s
    t
    d
    d
    e
    v m
    e
    d
    i
    a
    n
    =
    =
    =
    =
    =
    =
    =
    =
    =
    =
    =
    = =
    =
    = =
    =
    =
    =
    =
    =
    = =
    =
    =
    =
    =
    =
    = =
    =
    =
    =
    =
    =
    = =
    =
    =
    =
    =
    =
    = =
    =
    =
    =
    =
    =
    = =
    =
    =
    =
    =
    =
    = =
    =
    =
    =
    =
    =
    =
    C
    o
    u
    n
    t 6 2
    2
    5
    E
    x
    e
    c t
    i
    m
    e 7 5
    5
    7
    5
    s 1
    8
    s 3
    5
    s 2
    5
    s 3
    2
    s 5
    s 2
    5
    s
    L
    o
    c
    k t
    i
    m
    e 0 3
    5
    m
    s 1
    0
    3
    u
    s 2
    1
    6
    u
    s 1
    5
    7
    u
    s 1
    6
    7
    u
    s 1
    1
    u
    s 1
    5
    2
    u
    s
    R
    o
    w
    s s
    e
    n
    t 0 8
    9
    .
    4
    2
    k 3
    1
    3 5
    3
    5 4
    0
    6
    .
    9
    4 5
    1
    1
    .
    4
    5 7
    3
    .
    3
    3 4
    2
    0
    .
    7
    7
    R
    o
    w
    s e
    x
    a
    m
    i
    n
    e 9 1
    .
    7
    3
    G 5
    .
    8
    8
    M 1
    0
    .
    6
    2
    M 7
    .
    8
    6
    M 9
    .
    7
    6
    M 1
    .
    4
    7
    M 8
    .
    0
    3
    M
    B
    y
    t
    e
    s s
    e
    n
    t 0 2
    3
    .
    1
    5
    M 8
    0
    .
    6
    8
    k 1
    4
    0
    .
    0
    9
    k 1
    0
    5
    .
    3
    5
    k 1
    3
    0
    .
    0
    4
    k 1
    8
    .
    9
    2
    k 1
    0
    6
    .
    9
    9
    k
    T
    m
    p t
    a
    b
    l
    e
    s 0 2
    2
    5 1 1 1 1 0 1
    T
    m
    p d
    i
    s
    k t
    b
    l 1
    0 2
    2
    5 1 1 1 1 0 1
    T
    m
    p t
    b
    l s
    i
    z
    e 4 2
    1
    2
    .
    9
    6
    M 7
    4
    5
    .
    8
    7
    k 1
    .
    2
    5
    M 9
    6
    9
    .
    2
    2
    k 1
    .
    2
    0
    M 1
    8
    5
    .
    7
    3
    k 1
    0
    0
    9
    .
    3
    3
    k
    Q
    u
    e
    r
    y s
    i
    z
    e 1 2
    4
    2
    .
    4
    4
    k 1
    .
    0
    7
    k 1
    .
    1
    1
    k 1
    .
    0
    8
    k 1
    .
    0
    9
    k 1
    9
    .
    7
    1 1
    .
    0
    4
    k

    View Slide

  13. Exercise 3 - pt-query-digest
    aka "cheating"
    Task 1
    [
    p
    h
    p
    n
    w
    1
    3
    @
    s
    e
    r
    v
    e
    r ~
    ]
    $ p
    t
    -
    q
    u
    e
    r
    y
    -
    d
    i
    g
    e
    s
    t
    /
    v
    a
    r
    /
    l
    i
    b
    /
    m
    y
    s
    q
    l
    /
    s
    l
    o
    w
    .
    l
    o
    g | l
    e
    s
    s
    Task 2
    [
    p
    h
    p
    n
    w
    1
    3
    @
    s
    e
    r
    v
    e
    r ~
    ]
    $ p
    t
    -
    q
    u
    e
    r
    y
    -
    d
    i
    g
    e
    s
    t -
    -
    r
    e
    v
    i
    e
    w \
    u
    =
    p
    h
    p
    n
    w
    1
    3
    -
    1
    ,
    D
    =
    p
    h
    p
    n
    w
    1
    3
    -
    1
    ,
    t
    =
    r
    e
    v
    i
    e
    w
    _
    m
    e
    ,
    h
    =
    l
    o
    c
    a
    l
    h
    o
    s
    t \
    /
    v
    a
    r
    /
    l
    i
    b
    /
    m
    y
    s
    q
    l
    /
    s
    l
    o
    w
    .
    l
    o
    g

    View Slide

  14. Exercise 3a - pt-query-advisor
    aka "really cheating"
    Task 3
    [
    p
    h
    p
    n
    w
    1
    3
    @
    s
    e
    r
    v
    e
    r ~
    ]
    $ .
    /
    p
    t
    -
    q
    u
    e
    r
    y
    -
    a
    d
    v
    i
    s
    o
    r -
    -
    r
    e
    v
    i
    e
    w
    u
    =
    r
    o
    o
    t
    ,
    h
    =
    l
    o
    c
    a
    l
    h
    o
    s
    t
    ,
    D
    =
    t
    e
    m
    p
    ,
    t
    =
    r
    e
    v
    i
    e
    w
    _
    m
    e
    Some hints
    A
    R
    G
    .
    0
    0
    1 You've got a LIKE comparison with a % prefix
    C
    L
    A
    .
    0
    0
    1 You've issued a SELECT without a WHERE clause
    C
    L
    A
    .
    0
    0
    3 LIMIT,OFFSET used (bad news, unfortunately)

    View Slide

  15. "Identify" - Review
    Is there a problem, and is it MySQL?
    Is it caught by the slow query log?
    If it's in the slow query log, it's a problem - no scope for "works on my
    machine"
    What is the scale of problem we're dealing with?
    Do we have the luxury of reproducing it locally?
    Do we care?
    Is it easier to ignore the problem?

    View Slide

  16. Improve

    View Slide

  17. Dive in to my.cnf
    Get started by visiting tools.percona.com/wizard
    Get started using mysqltuner.pl
    The manual is a fantastic resource with gems like:
    If you see many Sort_merge_passes per second in SHOW GLOBAL
    STATUS output, you can consider increasing the sort_buffer_size
    value to speed up ORDER BY or GROUP BY operations that cannot
    be improved with query optimization or improved indexing.
    Any variable in the MySQL "Server System Variables" section of the manual
    marked as "Dynamic" can be changed live.
    You can capture "real" query load using g
    e
    n
    e
    r
    a
    l
    _
    l
    o
    g and replay locally
    using m
    y
    s
    q
    l
    s
    l
    a
    p

    View Slide

  18. Dive in to my.cnf
    This sounds suspiciously like "RTFM"
    [
    p
    h
    p
    n
    w
    1
    3
    @
    s
    e
    r
    v
    e
    r ~
    ]
    $ l
    e
    s
    s /
    e
    t
    c
    /
    m
    y
    .
    c
    n
    f
    What to tune in MySQL server after installation
    Section 5.1.6. Server Status Variables
    Section 5.1.4. Server System Variables

    View Slide

  19. Dive in to my.cnf
    Sounds obvious but ... be scientific
    F
    L
    U
    S
    H S
    T
    A
    T
    U
    S
    ;
    Wait 10 minutes....
    S
    H
    O
    W G
    L
    O
    B
    A
    L S
    T
    A
    T
    U
    S L
    I
    K
    E '
    %
    m
    e
    r
    g
    e
    %
    '
    ;
    S
    H
    O
    W G
    L
    O
    B
    A
    L S
    T
    A
    T
    U
    S L
    I
    K
    E '
    %
    t
    m
    p
    %
    '
    ;
    S
    E
    T s
    o
    r
    t
    _
    b
    u
    f
    f
    e
    r
    _
    s
    i
    z
    e = 8
    M
    ;
    S
    E
    T t
    m
    p
    _
    t
    a
    b
    l
    e
    _
    s
    i
    z
    e = 3
    2
    M
    ;
    S
    E
    T m
    a
    x
    _
    h
    e
    a
    p
    _
    t
    a
    b
    l
    e
    _
    s
    i
    z
    e = 3
    2
    M
    ;
    F
    L
    U
    S
    H S
    T
    A
    T
    U
    S
    ;
    Check again in 10 minutes...

    View Slide

  20. How to use E
    X
    P
    L
    A
    I
    N
    You prefix your S
    E
    L
    E
    C
    T query with E
    X
    P
    L
    A
    I
    N
    (As of 5.6, you can now explain INSERTs and UPDATEs)
    That's it ... (Seriously. E
    X
    P
    L
    A
    I
    N ALL the queries!)
    Add an index
    E
    X
    P
    L
    A
    I
    N again
    Take the rest of the week off

    View Slide

  21. Example output from EXPLAIN
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M p
    a
    y
    m
    e
    n
    t W
    H
    E
    R
    E p
    a
    y
    m
    e
    n
    t
    _
    d
    a
    t
    e = '
    2
    0
    0
    5
    -
    0
    8
    -
    0
    1
    '
    s
    e
    l
    e
    c
    t
    _
    t
    y
    p
    e t
    a
    b
    l
    e t
    y
    p
    e p
    o
    s
    s
    i
    b
    l
    e
    _
    k
    e
    y
    s k
    e
    y k
    e
    y
    _
    l
    e
    n r
    e
    f r
    o
    w
    s E
    x
    t
    r
    a
    S
    I
    M
    P
    L
    E p
    a
    y
    m
    e
    n
    t A
    L
    L N
    U
    L
    L N
    U
    L
    L N
    U
    L
    L N
    U
    L
    L 1
    6
    5
    9
    1 U
    s
    i
    n
    g W
    h
    e
    r
    e
    Let's take a look at MySQL's analysis:
    It found a WHERE clause? Cheers, Sherlock (split architecture)
    MySQL estimates that it'll have to scan through 16k rows
    Note that the estimate is approximate - 16,049 rows
    There are no suitable candidates for indexes (or keys)
    There was no key used, so it did a "Full Table Scan"

    View Slide

  22. Predict bad queries
    Proof at last!
    Even though last_name is indexed, we can now prove that %
    s
    t
    r
    i
    n
    g
    %
    searches are bad
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M c
    u
    s
    t
    o
    m
    e
    r W
    H
    E
    R
    E l
    a
    s
    t
    _
    n
    a
    m
    e L
    I
    K
    E '
    %
    o
    n
    e
    %
    '
    s
    e
    l
    e
    c
    t
    _
    t
    y
    p
    e t
    a
    b
    l
    e t
    y
    p
    e p
    o
    s
    s
    i
    b
    l
    e
    _
    k
    e
    y
    s k
    e
    y k
    e
    y
    _
    l
    e
    n r
    e
    f r
    o
    w
    s E
    x
    t
    r
    a
    S
    I
    M
    P
    L
    E c
    u
    s
    t
    o
    m
    e
    r A
    L
    L N
    U
    L
    L N
    U
    L
    L N
    U
    L
    L N
    U
    L
    L 6
    6
    1 U
    s
    i
    n
    g W
    h
    e
    r
    e
    Again, this query resulted in a "Full Table Scan"

    View Slide

  23. "Full Table Scan"
    Every row on the table is fetched during scanning
    Any filtering is done by "MySQL" rather than the
    storage engine
    Table might not be in memory - disk can be
    contended
    Things like "country lists" for are an
    exception

    View Slide

  24. Exercise 4 - our first index
    Let's remember that the S
    E
    L
    E
    C
    T on p
    a
    y
    m
    e
    n
    t triggered a scan of 16,000
    rows
    Add the carefully crafted index
    A
    L
    T
    E
    R T
    A
    B
    L
    E p
    a
    y
    m
    e
    n
    t A
    D
    D I
    N
    D
    E
    X d
    a
    t
    e
    _
    o
    n
    l
    y (
    p
    a
    y
    m
    e
    n
    t
    _
    d
    a
    t
    e
    )
    ;
    E
    X
    P
    L
    A
    I
    N again
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M p
    a
    y
    m
    e
    n
    t W
    H
    E
    R
    E p
    a
    y
    m
    e
    n
    t
    _
    d
    a
    t
    e = '
    2
    0
    0
    5
    -
    0
    8
    -
    0
    1
    '
    s
    e
    l
    e
    c
    t
    _
    t
    y
    p
    e t
    a
    b
    l
    e t
    y
    p
    e p
    o
    s
    s
    i
    b
    l
    e
    _
    k
    e
    y
    s k
    e
    y k
    e
    y
    _
    l
    e
    n r
    e
    f r
    o
    w
    s E
    x
    t
    r
    a
    S
    I
    M
    P
    L
    E e
    m
    p
    l
    o
    y
    e
    e
    s r
    e
    f d
    a
    t
    e
    _
    o
    n
    l
    y d
    a
    t
    e
    _
    o
    n
    l
    y 3 c
    o
    n
    s
    t 6
    7
    1 U
    s
    i
    n
    g W
    h
    e
    r
    e

    View Slide

  25. Compound indexes
    Compound indexes comprise more than one field
    It's important to note that their definition is order
    sensitive
    You can use part of a compound index too - but the
    index must be used "contiguously"

    View Slide

  26. Exercise 5: Brute-indexing
    Take the following query
    S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m W
    H
    E
    R
    E r
    a
    t
    i
    n
    g I
    N (
    '
    G
    '
    , '
    P
    G
    '
    ,
    '
    P
    G
    -
    1
    3
    '
    ) A
    N
    D l
    e
    n
    g
    t
    h < 8
    0
    ;
    Add indexes and EXPLAIN
    A
    L
    T
    E
    R T
    A
    B
    L
    E f
    i
    l
    m A
    D
    D I
    N
    D
    E
    X i
    d
    x
    _
    r
    a
    t
    i
    n
    g (
    r
    a
    t
    i
    n
    g
    )
    ; -
    - a
    n
    d n
    o
    w E
    X
    P
    L
    A
    I
    N
    A
    L
    T
    E
    R T
    A
    B
    L
    E f
    i
    l
    m A
    D
    D I
    N
    D
    E
    X i
    d
    x
    _
    l
    e
    n
    g
    t
    h (
    l
    e
    n
    g
    t
    h
    )
    ; -
    - a
    n
    d n
    o
    w E
    X
    P
    L
    A
    I
    N
    A
    L
    T
    E
    R T
    A
    B
    L
    E f
    i
    l
    m A
    D
    D I
    N
    D
    E
    X i
    d
    x
    _
    r
    a
    t
    i
    n
    g
    _
    l
    e
    n
    g
    t
    h (
    r
    a
    t
    i
    n
    g
    , l
    e
    n
    g
    t
    h
    )
    ; -
    - E
    X
    P
    L
    A
    I
    N
    !
    There's no shame in brute forcing combos

    View Slide

  27. Exercise 5a: Brute-indexing
    So compound indexes are better, right?
    A
    L
    T
    E
    R T
    A
    B
    L
    E f
    i
    l
    m D
    R
    O
    P I
    N
    D
    E
    X i
    d
    x
    _
    r
    a
    t
    i
    n
    g
    , D
    R
    O
    P I
    N
    D
    E
    X i
    d
    x
    _
    l
    e
    n
    g
    t
    h
    ;
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m W
    H
    E
    R
    E l
    e
    n
    g
    t
    h < 8
    0
    ;
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m W
    H
    E
    R
    E r
    a
    t
    i
    n
    g I
    N (
    '
    G
    '
    )
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m W
    H
    E
    R
    E r
    a
    t
    i
    n
    g I
    N (
    '
    G
    '
    ) A
    N
    D l
    e
    n
    g
    t
    h < 8
    0
    "key_len" Can tell us when partial indexes are used
    Indexes must be used "left-to-right"

    View Slide

  28. Exercise 5b: Multi-range scans
    S
    H
    O
    W I
    N
    D
    E
    X
    E
    S I
    N f
    i
    l
    m
    ; -
    - s
    h
    o
    u
    l
    d o
    n
    l
    y h
    a
    v
    e i
    d
    x
    _
    r
    a
    t
    i
    n
    g
    _
    l
    e
    n
    g
    t
    h
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m W
    H
    E
    R
    E l
    e
    n
    g
    t
    h > 1
    7
    0 A
    N
    D r
    a
    t
    i
    n
    g = '
    G
    '
    ;
    A
    L
    T
    E
    R T
    A
    B
    L
    E f
    i
    l
    m D
    R
    O
    P I
    N
    D
    E
    X i
    d
    x
    _
    r
    a
    t
    i
    n
    g
    _
    l
    e
    n
    g
    t
    h
    , C
    R
    E
    A
    T
    E I
    N
    D
    E
    X i
    d
    x
    _
    l
    e
    n
    g
    t
    h
    _
    r
    a
    t
    i
    n
    g (
    l
    e
    n
    g
    t
    h
    ,
    r
    a
    t
    i
    n
    g
    )
    ;
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m W
    H
    E
    R
    E l
    e
    n
    g
    t
    h > 1
    7
    0 A
    N
    D r
    a
    t
    i
    n
    g = '
    G
    '
    ;
    Notice the reduced key_len - Multi-range scans!

    View Slide

  29. Exercise 5c: Multi-range scans 2
    A better example
    S
    E
    L
    E
    C
    T * F
    R
    O
    M r
    e
    n
    t
    a
    l W
    H
    E
    R
    E r
    e
    n
    t
    a
    l
    _
    d
    a
    t
    e > '
    2
    0
    0
    5
    -
    0
    5
    -
    3
    0
    ' A
    N
    D r
    e
    t
    u
    r
    n
    _
    d
    a
    t
    e < '
    2
    0
    0
    5
    -
    0
    6
    -
    0
    6
    '
    ;
    Find all rentals which took place within a given week
    No matter how you index, MySQL will only use the first range scan

    View Slide

  30. Exercise 5d: Cardinality
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m W
    H
    E
    R
    E l
    e
    n
    g
    t
    h > 1
    7
    0
    ;
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m W
    H
    E
    R
    E l
    e
    n
    g
    t
    h > 1
    5
    0
    ;
    MySQL CBA using an index with the last query
    "If I'm going to have to scan > 30% of the index, don't use an index"
    S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m F
    O
    R
    C
    E I
    N
    D
    E
    X
    (
    l
    e
    n
    g
    t
    h
    _
    o
    n
    l
    y
    ) W
    H
    E
    R
    E l
    e
    n
    g
    t
    h > 1
    5
    0

    View Slide

  31. Index "Cardinality"
    MySQL makes decisions on which indexes to use in
    a fraction of a second - cardinality
    Cardinality refers to the uniqueness of values
    contained in a particular column. A column with the
    lowest possible cardinality would have the same value
    for every row - Wikipedia
    S
    H
    O
    W I
    N
    D
    E
    X
    E
    S I
    N t
    a
    b
    l
    e
    _
    n
    a
    m
    e
    A
    N
    A
    L
    Y
    Z
    E T
    A
    B
    L
    E t
    a
    b
    l
    e
    _
    n
    a
    m
    e
    ; -
    - u
    p
    d
    a
    t
    e
    s
    /
    g
    u
    e
    s
    s
    e
    s c
    a
    r
    d
    i
    n
    a
    l
    i
    t
    y

    View Slide

  32. Exercise 5e: Custom Index
    length
    Index an n-char prefix of a column
    This isn't really about speed, but only indexing what you need
    A
    L
    T
    E
    R T
    A
    B
    L
    E s
    o
    _
    u
    s
    e
    r A
    D
    D I
    N
    D
    E
    X i
    d
    x
    _
    n
    a
    m
    e (
    d
    i
    s
    p
    l
    a
    y
    _
    n
    a
    m
    e
    )
    ;
    S
    E
    L
    E
    C
    T * F
    R
    O
    M s
    o
    _
    u
    s
    e
    r W
    H
    E
    R
    E d
    i
    s
    p
    l
    a
    y
    _
    n
    a
    m
    e L
    I
    K
    E '
    A
    d
    r
    i
    a
    n
    %
    '
    A
    L
    T
    E
    R T
    A
    B
    L
    E s
    o
    _
    u
    s
    e
    r A
    D
    D I
    N
    D
    E
    X i
    d
    x
    _
    n
    a
    m
    e (
    d
    i
    s
    p
    l
    a
    y
    _
    n
    a
    m
    e
    (
    6
    )
    )
    ;
    S
    E
    L
    E
    C
    T * F
    R
    O
    M s
    o
    _
    u
    s
    e
    r W
    H
    E
    R
    E d
    i
    s
    p
    l
    a
    y
    _
    n
    a
    m
    e L
    I
    K
    E '
    A
    d
    r
    i
    a
    n
    %
    '
    Look at the key_len column change

    View Slide

  33. View Slide

  34. Exercise 5: Indexing review
    Compound indexes allow us to index over more than one WHERE
    clause
    Index definitions (and to some degree, their usage) is order sensitve
    MySQL will allow you to use a sub-set of an index
    Which leads to "Indexing Strategy"
    Analyse your own queries and order index definition such that you
    hit indexes in the majority of cases
    Index all the things?
    Cardinality can give us frustrating outcomes
    Multi Range scans are a bitch

    View Slide

  35. E
    X
    P
    L
    A
    I
    N works for JOINs too
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M c
    u
    s
    t
    o
    m
    e
    r L
    E
    F
    T J
    O
    I
    N a
    d
    d
    r
    e
    s
    s U
    S
    I
    N
    G (
    a
    d
    d
    r
    e
    s
    s
    _
    i
    d
    )
    ;
    s
    e
    l
    e
    c
    t
    _
    t
    y
    p
    e t
    a
    b
    l
    e t
    y
    p
    e p
    o
    s
    s
    i
    b
    l
    e
    _
    k
    e
    y
    s k
    e
    y k
    e
    y
    _
    l
    e
    n r
    e
    f r
    o
    w
    s E
    x
    t
    r
    a
    S
    I
    M
    P
    L
    E c
    u
    s
    t
    o
    m
    e
    r A
    L
    L N
    U
    L
    L N
    U
    L
    L N
    U
    L
    L N
    U
    L
    L 6
    6
    1 N
    U
    L
    L
    S
    I
    M
    P
    L
    E a
    d
    d
    r
    e
    s
    s e
    q
    _
    r
    e
    f P
    R
    I
    M
    A
    R
    Y P
    R
    I
    M
    A
    R
    Y 2 c
    u
    s
    t
    .
    a
    d
    d
    r
    e
    s
    s
    _
    i
    d 1 N
    U
    L
    L
    Rough Interpretation
    MySQL is going to start with scanning ALL customers and for each one it
    attempts to join on to address. Fortunately, MySQL doesn't have to work
    hard to find the address for each customer.
    Nested-loop approach - aka don't make JOINs
    work hard
    Pro-tip: ORDER / GROUP BY over two tables = bad times

    View Slide

  36. MySQL will change the order of
    your JOINs
    Not terribly useful, just interesting
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M c
    u
    s
    t
    o
    m
    e
    r I
    N
    N
    E
    R J
    O
    I
    N a
    d
    d
    r
    e
    s
    s U
    S
    I
    N
    G (
    a
    d
    d
    r
    e
    s
    s
    _
    i
    d
    )
    ;
    s
    e
    l
    e
    c
    t
    _
    t
    y
    p
    e t
    a
    b
    l
    e t
    y
    p
    e p
    o
    s
    s
    i
    b
    l
    e
    _
    k
    e
    y
    s k
    e
    y k
    e
    y
    _
    l
    e
    n r
    e
    f r
    o
    w
    s E
    x
    t
    r
    a
    S
    I
    M
    P
    L
    E a
    d
    d
    r
    e
    s
    s A
    L
    L P
    R
    I
    M
    A
    R
    Y N
    U
    L
    L N
    U
    L
    L N
    U
    L
    L 6
    2
    8 N
    U
    L
    L
    S
    I
    M
    P
    L
    E c
    u
    s
    t
    o
    m
    e
    r r
    e
    f f
    k
    _
    a
    d
    d
    r
    _
    i
    d f
    k
    _
    a
    d
    d
    r
    _
    i
    d 2 a
    d
    d
    r
    .
    a
    d
    d
    r
    e
    s
    s
    _
    i
    d 1 N
    U
    L
    L
    Rough Interpretation
    Because the INNER JOIN means that the address must exist for the
    customer, MySQL switches the order of JOINs

    View Slide

  37. Trigger Index Optimisation
    A
    L
    T
    E
    R T
    A
    B
    L
    E s
    o
    _
    u
    s
    e
    r A
    D
    D I
    N
    D
    E
    X i
    d
    x
    _
    r
    e
    p
    u
    t
    a
    t
    i
    o
    n (
    r
    e
    p
    u
    t
    a
    t
    i
    o
    n
    )
    ;
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M s
    o
    _
    u
    s
    e
    r W
    H
    E
    R
    E r
    e
    p
    u
    t
    a
    t
    i
    o
    n > 2
    5
    0
    0
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T i
    d F
    R
    O
    M s
    o
    _
    u
    s
    e
    r W
    H
    E
    R
    E r
    e
    p
    u
    t
    a
    t
    i
    o
    n > 2
    5
    0
    0
    Using Index = good stuff
    MySQL did not have to go to the table (which might not be loaded) to
    return the results to the client
    i
    d
    x
    _
    r
    e
    p
    u
    t
    a
    t
    i
    o
    n is now a "covering index"
    Compound indexes also work as covering indexes, e.g.
    (reputation,display_name)
    But this is at odds with our "reduce index lengths" maxim

    View Slide

  38. View Slide

  39. Trigger Index Optimisation
    Can we avoid temp tables through indexing?
    If there is an ORDER BY clause and a different GROUP BY clause, or
    if the ORDER BY or GROUP BY contains columns from tables other
    than the first table in the join queue, a temporary table is created.
    (and four more conditions)
    No :(

    View Slide

  40. Trigger Index Optimisations
    Avoiding "filesort"
    A quick reminder on filesorts...
    Sample Query
    S
    E
    L
    E
    C
    T * F
    R
    O
    M s
    a
    k
    i
    l
    a
    .
    r
    e
    n
    t
    a
    l W
    H
    E
    R
    E r
    e
    t
    u
    r
    n
    _
    d
    a
    t
    e B
    E
    T
    W
    E
    E
    N '
    2
    0
    0
    5
    -
    0
    1
    -
    0
    1
    ' A
    N
    D '
    2
    0
    0
    5
    -
    0
    6
    -
    0
    1
    ' O
    R
    D
    E
    R
    B
    Y r
    e
    t
    u
    r
    n
    _
    d
    a
    t
    e A
    S
    C
    We can avoid a file sort
    More at 8.3.1.11 ORDER BY Optimisations
    Documentation details very specific circumstances where an index will be
    used

    View Slide

  41. Large datasets and L
    I
    M
    I
    T
    You have a large dataset, and you want to paginate
    Technically, without an ORDER BY, LIMIT is "non-deterministic", so we really
    should have an ORDER BY
    A
    L
    T
    E
    R T
    A
    B
    L
    E c
    o
    m
    m
    e
    n
    t A
    D
    D I
    N
    D
    E
    X i
    d
    x
    _
    c
    r
    e
    a
    t
    e
    d (
    c
    r
    e
    a
    t
    e
    d
    )
    ;
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M c
    o
    m
    m
    e
    n
    t O
    R
    D
    E
    R B
    Y c
    r
    e
    a
    t
    e
    d L
    I
    M
    I
    T 2
    5
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M c
    o
    m
    m
    e
    n
    t O
    R
    D
    E
    R B
    Y c
    r
    e
    a
    t
    e
    d L
    I
    M
    I
    T 2
    0
    0
    ,
    2
    5
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M c
    o
    m
    m
    e
    n
    t O
    R
    D
    E
    R B
    Y c
    r
    e
    a
    t
    e
    d L
    I
    M
    I
    T 2
    0
    0
    0
    0
    ,
    2
    5

    View Slide

  42. Large datasets and L
    I
    M
    I
    T
    Manual last row pagination (dark territory)
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M c
    o
    m
    m
    e
    n
    t O
    R
    D
    E
    R B
    Y c
    r
    e
    a
    t
    e
    d L
    I
    M
    I
    T 2
    5
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M c
    o
    m
    m
    e
    n
    t W
    H
    E
    R
    E i
    d > 2
    0
    0 O
    R
    D
    E
    R B
    Y c
    r
    e
    a
    t
    e
    d L
    I
    M
    I
    T 2
    5
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M c
    o
    m
    m
    e
    n
    t W
    H
    E
    R
    E i
    d > 2
    0
    0
    0
    0 O
    R
    D
    E
    R B
    Y c
    r
    e
    a
    t
    e
    d L
    I
    M
    I
    T 2
    5

    View Slide

  43. Large datasets and L
    I
    M
    I
    T
    Delayed JOIN (very dark)
    S
    E
    L
    E
    C
    T * F
    R
    O
    M p
    o
    s
    t
    s I
    N
    N
    E
    R J
    O
    I
    N (
    S
    E
    L
    E
    C
    T i
    d F
    R
    O
    M p
    o
    s
    t
    s
    O
    R
    D
    E
    R B
    Y c
    r
    e
    a
    t
    e
    d L
    I
    M
    I
    T 2
    0
    0
    0
    0
    ,
    2
    5
    ) A
    S t
    m
    p U
    S
    I
    N
    G (
    i
    d
    )
    The sub query is very fast ("using index") which
    makes the outer select's job easy

    View Slide

  44. O
    R
    D
    E
    R B
    Y R
    A
    N
    D
    (
    ) : discuss
    S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m O
    R
    D
    E
    R B
    Y R
    A
    N
    D
    (
    ) L
    I
    M
    I
    T 1
    How does this "Random Film" feature work
    Using what we now know about Full Table Scans, filesorts and limiting, why
    is the above query bad?
    RAND() forces all rows to have a function called
    ORDER BY forces a temporary table and filesort
    LIMIT ditches 99% of the temporary table!

    View Slide

  45. O
    R
    D
    E
    R B
    Y R
    A
    N
    D
    (
    ) : a fix ..?!
    Strategy 1 - maintain a list
    Use your app to find valid IDs randomly (i.e. 1 Query to identify valid ID,
    another to execute on PK)
    Maintain a list (somewhere) of valid PKs (i.e. a cron job, or a trigger)

    View Slide

  46. O
    R
    D
    E
    R B
    Y R
    A
    N
    D
    (
    ) : a fix ..?!
    Strategy 2 - maintain a different list
    Maintain an extra column containing a random number, which is
    indexed
    Use a similar approach to our cheap LIMIT hack
    S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m W
    H
    E
    R
    E i
    d > $
    r
    a
    n
    d
    o
    m L
    I
    M
    I
    T 1

    View Slide

  47. O
    R
    D
    E
    R B
    Y R
    A
    N
    D
    (
    ) : a fix ..?!
    Strategy 3 - Level up
    S
    E
    L
    E
    C
    T * F
    R
    O
    M f
    i
    l
    m J
    O
    I
    N
    (
    S
    E
    L
    E
    C
    T C
    E
    I
    L
    (
    R
    A
    N
    D
    (
    ) *
    (
    S
    E
    L
    E
    C
    T M
    A
    X
    (
    f
    i
    l
    m
    _
    i
    d
    )
    F
    R
    O
    M f
    i
    l
    m
    )
    ) A
    S f
    i
    l
    m
    _
    i
    d
    ) A
    S r
    2
    U
    S
    I
    N
    G (
    f
    i
    l
    m
    _
    i
    d
    )
    ;
    O(1) method for selecting a single random record
    from the DB
    Courtesy of jan.kneschke.de/projects/mysql/order-by-rand/
    No-one can write this from memory - this would've
    taken hours of research

    View Slide

  48. View Slide

  49. Modify the query or data
    Clearly more disruptive to the application
    E.g. Find all rentals which took place on a Saturday or Sunday
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M r
    e
    n
    t
    a
    l W
    H
    E
    R
    E D
    A
    T
    E
    _
    F
    O
    R
    M
    A
    T
    (
    r
    e
    n
    t
    a
    l
    _
    d
    a
    t
    e
    , '
    %
    w
    '
    ) I
    N (
    0
    ,
    6
    )
    ;
    s
    e
    l
    e
    c
    t
    _
    t
    y
    p
    e t
    a
    b
    l
    e t
    y
    p
    e p
    o
    s
    s
    i
    b
    l
    e
    _
    k
    e
    y
    s k
    e
    y k
    e
    y
    _
    l
    e
    n r
    e
    f r
    o
    w
    s E
    x
    t
    r
    a
    S
    I
    M
    P
    L
    E r
    e
    n
    t
    a
    l A
    L
    L N
    U
    L
    L N
    U
    L
    L N
    U
    L
    L N
    U
    L
    L 1
    6
    4
    8
    6 U
    s
    i
    n
    g w
    h
    e
    r
    e
    Problem?
    MySQL must execute the DATE_FORMAT function on every row first,
    before cutting down the resultset
    Solutions?
    Pre-calculate the Saturdays and Sundays and use IN(...)
    Summarise the weekday in another column, and make sure that's
    indexed

    View Slide

  50. View Slide

  51. Modify data - how?
    A
    L
    T
    E
    R T
    A
    B
    L
    E r
    e
    n
    t
    a
    l A
    D
    D C
    O
    L
    U
    M
    N w
    e
    e
    k
    d
    a
    y I
    N
    T
    (
    1
    ) N
    O
    T N
    U
    L
    L D
    E
    F
    A
    U
    L
    T 0
    ;
    A
    L
    T
    E
    R T
    A
    B
    L
    E r
    e
    n
    t
    a
    l A
    D
    D I
    N
    D
    E
    X i
    d
    x
    _
    w
    e
    e
    k
    d
    a
    y (
    w
    e
    e
    k
    d
    a
    y
    )
    ;
    C
    R
    E
    A
    T
    E T
    R
    I
    G
    G
    E
    R t
    r
    g
    _
    w
    e
    e
    k
    d
    a
    y
    _
    u
    p B
    E
    F
    O
    R
    E U
    P
    D
    A
    T
    E O
    N r
    e
    n
    t
    a
    l
    F
    O
    R E
    A
    C
    H R
    O
    W S
    E
    T N
    E
    W
    .
    w
    e
    e
    k
    d
    a
    y = D
    A
    T
    E
    _
    F
    O
    R
    M
    A
    T
    (
    N
    E
    W
    .
    r
    e
    n
    t
    a
    l
    _
    d
    a
    t
    e
    , '
    %
    w
    '
    )
    ;
    E
    X
    P
    L
    A
    I
    N S
    E
    L
    E
    C
    T * F
    R
    O
    M r
    e
    n
    t
    a
    l W
    H
    E
    R
    E D
    A
    T
    E
    _
    F
    O
    R
    M
    A
    T
    (
    r
    e
    n
    t
    a
    l
    _
    d
    a
    t
    e
    , '
    %
    w
    '
    ) I
    N (
    0
    ,
    6
    )
    ;

    View Slide

  52. Modify the query/data (story
    time)
    The problem
    Imagine a large, hand-rolled ecommerce system (7 years ago)
    Thousands and thousands of products
    Each product is tagged, categorised, searchable on title/description
    Tags and categories used many-to-many link tables
    The solution
    Create a summary table - denormalise the data
    Maintain summary table with triggers or (lazy) cron jobs
    Abuse full text indexes in MySQL
    _
    T
    A
    G
    _
    2
    _ _
    T
    A
    G
    _
    3
    4
    _ _
    T
    A
    G
    _
    5
    6
    _ _
    C
    A
    T
    E
    G
    O
    R
    Y
    _
    4
    5
    _ P
    r
    o
    d
    u
    c
    t n
    a
    m
    e a
    n
    d d
    e
    s
    c
    r
    i
    p
    t
    i
    o
    n
    .
    .
    .

    View Slide

  53. View Slide

  54. Questions

    View Slide

  55. Tools & Resources
    Resources
    Section 8.8.2. Explain Output Format
    High Performance MySQL (3rd Edition)
    mysqlperformanceblog.com
    Everything that Percona ever made (e.g p
    t
    -
    v
    i
    s
    u
    a
    l
    -
    e
    x
    p
    l
    a
    i
    n
    , p
    t
    -
    q
    u
    e
    r
    y
    -
    d
    i
    g
    e
    s
    t
    )
    New in 5.6 - Optimizer Tracing (mindblown)

    View Slide