Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rewriting binary logs because your database broke

Rewriting binary logs because your database broke

Chris Sinjakli

June 18, 2020
Tweet

More Decks by Chris Sinjakli

Other Decks in Programming

Transcript

  1. Rewriting binary logs
    because your
    database
    broke
    @ChrisSinjo

    View full-size slide

  2. Site Reliability
    Engineer

    View full-size slide

  3. Site Reliability
    Engineer
    (a Software Engineer who cares more about
    systems…ish)

    View full-size slide

  4. POST /cash/monies HTTP/1.1
    { amount: 100 }

    View full-size slide

  5. High per-request

    View full-size slide

  6. One part of a
    production incident

    View full-size slide

  7. https://gocardless.com/blog/incident-review-api-and-dashboard-outage-on-10th-
    october/

    View full-size slide

  8. Postgres
    Postgres
    Replication
    Application

    View full-size slide

  9. Postgres
    Postgres
    Replication
    Application

    View full-size slide

  10. Postgres
    Postgres
    Application

    View full-size slide

  11. Postgres
    Postgres
    Application

    View full-size slide

  12. Many parts to that
    investigation

    View full-size slide

  13. Too many for one
    talk

    View full-size slide

  14. Focus:
    Debugging parts of the system
    you don’t normally touch

    View full-size slide

  15. Focus:
    Reproduce what happened
    with Postgres’ binary logs
    during that incident

    View full-size slide

  16. What is a
    binary log?

    View full-size slide

  17. [2020-06-14 23:02:37Z] GET /
    [2020-06-14 23:02:49Z] GET /favicon.ico
    [2020-06-14 23:03:53Z] GET /posts/binary-logs—talk
    What we normally mean by logs

    View full-size slide

  18. Databases have
    a different kind
    of log

    View full-size slide

  19. INSERT INTO talks VALUES ('binary-logs');
    INSERT INTO talks VALUES ('niche-git-features');
    INSERT INTO talks VALUES ('fp-karaoke');
    Some extremely boring SQL

    View full-size slide

  20. Warning:
    simplifying lie ahead

    View full-size slide

  21. INSERT INTO talks VALUES ('binary-logs');
    INSERT INTO talks VALUES ('niche-git-features');
    INSERT INTO talks VALUES ('fp-karaoke');

    Wrote 'binary-logs' into table 'talks'
    Wrote 'niche-git-features' into table 'talks'
    Wrote 'fp-karaoke' into table 'talks'
    A different kind of logs
    (if they were textual)

    View full-size slide

  22. But why bother
    doing that?

    View full-size slide

  23. Crash safety

    View full-size slide

  24. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    id
    1
    2

    View full-size slide

  25. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    3 fp-karaoke
    id
    1
    2

    View full-size slide

  26. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    3 fp-karaoke
    id
    1
    2
    .

    View full-size slide

  27. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    3 fp-karaoke
    id
    1
    2
    ???

    View full-size slide

  28. INSERT INTO talks VALUES ('binary-logs');
    INSERT INTO talks VALUES ('niche-git-features');
    INSERT INTO talks VALUES ('fp-karaoke');

    Wrote 'binary-logs' into table 'talks'
    Wrote 'niche-git-features' into table 'talks'
    Wrote 'fp-karaoke' into table 'talks'
    We can replay this operation

    View full-size slide

  29. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    3 fp-karaoke
    id
    1
    2
    ???

    View full-size slide

  30. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    3 fp-karaoke
    id
    1
    2
    3

    View full-size slide

  31. Also:
    replication

    View full-size slide

  32. Postgres
    Postgres
    Replication
    Application

    View full-size slide

  33. Postgres
    Postgres
    Replication
    Application
    (binary logs)

    View full-size slide

  34. So why the interest
    in rewriting the
    binary logs?

    View full-size slide

  35. 2020-06-14 17:23:01 GMT LOG: restored log file
    "000000020000000000000003" from archive
    2020-06-14 17:23:02 GMT LOG: invalid record length
    at 0/3000180
    A suspicious error message

    View full-size slide

  36. Issue restoring binary log

    Cause of failure to
    promote replica?

    View full-size slide

  37. Goal: do this
    repeatably in a non-
    production cluster

    View full-size slide

  38. 2020-06-14 17:23:01 GMT LOG: restored log file
    "000000020000000000000003" from archive
    2020-06-14 17:23:02 GMT LOG: invalid record length
    at 0/3000180
    Goal: produce this error message

    View full-size slide

  39. We can cheat:
    Postgres is
    open source

    View full-size slide

  40. These techniques
    also work on closed
    source software

    View full-size slide

  41. We just call that
    reverse engineering

    View full-size slide

  42. $ git checkout REL9_4_26 # we were running 9.4
    $ git grep -n "invalid record length"
    src/backend/access/transam/xlogreader.c:295: [...]
    src/backend/access/transam/xlogreader.c:604: [...]
    src/backend/access/transam/xlogreader.c:678: [...]
    Let's find the error

    View full-size slide

  43. src/backend/access/transam/xlogreader.c:291-300:
    {
    /* XXX: more validation should be done here */
    if (total_len < SizeOfXLogRecord)
    {
    report_invalid_record(state, "invalid record length at %X/%X",
    (uint32) (RecPtr >> 32), (uint32) RecPtr);
    goto err;
    }
    gotheader = false;
    }
    Let's find the error

    View full-size slide

  44. src/backend/access/transam/xlogreader.c:291-300:
    {
    /* XXX: more validation should be done here */
    if (total_len < SizeOfXLogRecord)
    {
    report_invalid_record(state, "invalid record length at %X/%X",
    (uint32) (RecPtr >> 32), (uint32) RecPtr);
    goto err;
    }
    gotheader = false;
    }
    Let's find the error

    View full-size slide

  45. src/include/access/xlog.h:58:
    #define SizeOfXLogRecord MAXALIGN(sizeof(XLogRecord))
    Let's find the error

    View full-size slide

  46. Wouldn't it be convenient
    if we could make
    total_len == 0?

    View full-size slide

  47. src/backend/access/transam/xlogreader.c:272-273:
    record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
    total_len = record->xl_tot_len;
    Let's find the error

    View full-size slide

  48. src/include/access/xlog.h:41:
    typedef struct XLogRecord
    {
    uint32 xl_tot_len; /* total len of entire record */
    TransactionId xl_xid; /* xact id */
    uint32 xl_len; /* total len of rmgr data */
    uint8 xl_info; /* flag bits, see below */
    RmgrId xl_rmid; /* resource manager for this record */
    /* 2 bytes of padding here, initialize to zero */
    XLogRecPtr xl_prev; /* ptr to previous record in log */
    pg_crc32 xl_crc; /* CRC for this record */
    /* If MAXALIGN==8, there are 4 wasted bytes here */
    /* ACTUAL LOG DATA FOLLOWS AT END OF STRUCT */
    } XLogRecord;
    Let's find the error

    View full-size slide

  49. src/include/access/xlog.h:41-56:
    typedef struct XLogRecord
    {
    uint32 xl_tot_len; /* total len of entire record */
    TransactionId xl_xid; /* xact id */
    uint32 xl_len; /* total len of rmgr data */
    uint8 xl_info; /* flag bits, see below */
    RmgrId xl_rmid; /* resource manager for this record */
    /* 2 bytes of padding here, initialize to zero */
    XLogRecPtr xl_prev; /* ptr to previous record in log */
    pg_crc32 xl_crc; /* CRC for this record */
    /* If MAXALIGN==8, there are 4 wasted bytes here */
    /* ACTUAL LOG DATA FOLLOWS AT END OF STRUCT */
    } XLogRecord;
    Let's find the error

    View full-size slide

  50. src/backend/access/transam/xlogreader.c:291-300:
    {
    /* XXX: more validation should be done here */
    if (total_len < SizeOfXLogRecord)
    {
    report_invalid_record(state, "invalid record length at %X/%X",
    (uint32) (RecPtr >> 32), (uint32) RecPtr);
    goto err;
    }
    gotheader = false;
    }
    What was that check doing?

    View full-size slide

  51. src/backend/access/transam/xlogreader.c:291-300:
    {
    /* XXX: more validation should be done here */
    if (total_len < SizeOfXLogRecord)
    {
    report_invalid_record(state, "invalid record length at %X/%X",
    (uint32) (RecPtr >> 32), (uint32) RecPtr);
    goto err;
    }
    gotheader = false;
    }
    What was that check doing?
    Size the record says it is
    Smallest possible size it can be

    View full-size slide

  52. INSERT INTO talks VALUES ('binary-logs');
    INSERT INTO talks VALUES ('niche-git-features');
    INSERT INTO talks VALUES ('fp-karaoke');

    Wrote 'binary-logs' into table 'talks'
    Wrote 'niche-git-features' into table 'talks'
    Wrote 'fp-karaoke' into table 'talks'
    A different kind of logs
    (if they were textual)

    View full-size slide

  53. Let's see what they
    look like in
    practice

    View full-size slide

  54. INSERT INTO talks VALUES ('binary-logs');
    INSERT INTO talks VALUES ('niche-git-features');
    INSERT INTO talks VALUES ('fp-karaoke');
    Some extremely boring SQL

    View full-size slide

  55. Grab the binary
    log file, and...

    View full-size slide

  56. A barely comprehensible
    wall of data

    View full-size slide

  57. A barely comprehensible
    wall of data
    Hex ASCII

    View full-size slide

  58. A little help: ASCII codes
    Decimal Hexadecimal Character
    62 3E >
    63 3F ?
    64 40 @
    65 41 A
    66 42 B

    View full-size slide

  59. A little help: ASCII codes
    Decimal Hexadecimal Character
    62 3E >
    63 3F ?
    64 40 @
    65 41 A
    66 42 B

    View full-size slide

  60. A barely comprehensible
    wall of data
    Hex ASCII

    View full-size slide

  61. A barely comprehensible
    wall of data
    We can see our talks!!

    View full-size slide

  62. How can we find
    xl_tot_len?

    View full-size slide

  63. INSERT INTO repro VALUES ('A');
    INSERT INTO repro VALUES ('AB');
    INSERT INTO repro VALUES ('ABC');
    INSERT INTO repro VALUES ('ABCD');
    INSERT INTO repro VALUES ('ABCDE');
    ...
    Some even more boring SQL

    View full-size slide

  64. Look for a field
    increasing
    by 1

    View full-size slide

  65. Guesswork incoming!

    View full-size slide

  66. Guesswork incoming!
    The data we inserted

    View full-size slide

  67. A little help: ASCII codes
    Decimal Hexadecimal Character
    62 3E >
    63 3F ?
    64 40 @
    65 41 A
    66 42 B

    View full-size slide

  68. Notice anything?
    The data we inserted

    View full-size slide

  69. Notice anything?
    The data we inserted
    Familiar characters

    View full-size slide

  70. Notice anything?
    Decimal Hexadecimal Character
    63 3F ?
    64 40 @
    65 41 A
    The data we inserted
    Familiar characters

    View full-size slide

  71. Notice anything?
    The data we inserted
    Familiar characters

    View full-size slide

  72. Notice anything?
    The data we inserted
    Familiar characters

    View full-size slide

  73. Notice anything?
    The data we inserted
    Familiar characters
    Familiar characters (hex)
    The data we inserted (hex)

    View full-size slide

  74. Wouldn't it be convenient
    if we could make
    total_len == 0?

    View full-size slide

  75. We could import the
    Postgres structs and do
    this properly...

    View full-size slide

  76. ...or we could write a
    regex

    View full-size slide

  77. Let's write a regex

    View full-size slide

  78. Let's break this one

    View full-size slide

  79. binlog_file_name = ARGV[0]
    puts binlog_file_name
    binlog_contents = IO.read(binlog_file_name, encoding: "BINARY")
    hex = binlog_contents.unpack("H*").first
    replaced = hex.gsub(/3f(000000.+41424300)/, "00\\1")
    bindata = [replaced].pack("H*")
    File.write(binlog_file_name + ".broken", bindata)
    break_binlog.rb

    View full-size slide

  80. binlog_file_name = ARGV[0]
    puts binlog_file_name
    binlog_contents = IO.read(binlog_file_name, encoding: "BINARY")
    hex = binlog_contents.unpack("H*").first
    replaced = hex.gsub(/3f(000000.+41424300)/, "00\\1") # Replaces 'ABC' size
    bindata = [replaced].pack("H*")
    File.write(binlog_file_name + ".broken", bindata)
    break_binlog.rb

    View full-size slide

  81. Let's break this one

    View full-size slide

  82. And if we give it to
    a Postgres
    replica?

    View full-size slide

  83. 2020-06-17 19:24:11 GMT LOG: restored log file
    "000000020000000000000003" from archive
    2020-06-17 19:24:11 GMT LOG: invalid record length
    at 0/3000148
    We reproduced the error!

    View full-size slide

  84. Success, with a
    caveat...

    View full-size slide

  85. Postgres
    Postgres
    Application

    View full-size slide

  86. https://gocardless.com/blog/incident-review-api-and-dashboard-outage-on-10th-
    october/

    View full-size slide

  87. Hopefully you
    learned some cool
    stuff about databases

    View full-size slide

  88. Debugging
    unfamiliar layers
    of a system

    View full-size slide

  89. No part
    of software
    is magic

    View full-size slide

  90. Thank you
    ✌❤
    @ChrisSinjo
    @GoCardlessEng

    View full-size slide

  91. Image credits
    • Hard Disk Guts - CC-BY - https://www.flickr.com/photos/mattandkim/97533589/
    • Corsair ForceGT 180GB - CC-BY - https://www.flickr.com/photos/ruocaled/8173124575/
    • Rope - CC-BY - https://www.flickr.com/photos/49140926@N07/6798304070/

    View full-size slide

  92. Questions?
    ✌❤
    @ChrisSinjo
    @GoCardlessEng

    View full-size slide