Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rewriting binary logs because your database broke

Rewriting binary logs because your database broke

Chris Sinjakli

June 18, 2020
Tweet

More Decks by Chris Sinjakli

Other Decks in Programming

Transcript

  1. Hi

    View Slide

  2. @ChrisSinjo

    View Slide

  3. @ChrisSinjo

    View Slide

  4. @ChrisSinjo

    View Slide

  5. Rewriting binary logs
    because your
    database
    broke
    @ChrisSinjo

    View Slide

  6. Site Reliability
    Engineer

    View Slide

  7. Site Reliability
    Engineer
    (a Software Engineer who cares more about
    systems…ish)

    View Slide

  8. View Slide

  9. POST /cash/monies HTTP/1.1
    { amount: 100 }

    View Slide

  10. High per-request

    View Slide

  11. Uptime is

    View Slide

  12. One part of a
    production incident

    View Slide

  13. View Slide

  14. https://gocardless.com/blog/incident-review-api-and-dashboard-outage-on-10th-
    october/

    View Slide

  15. Postgres
    Postgres
    Replication
    Application

    View Slide

  16. Postgres
    Postgres
    Replication
    Application

    View Slide

  17. Postgres
    Postgres
    Application

    View Slide

  18. Postgres
    Postgres
    Application

    View Slide

  19. Many parts to that
    investigation

    View Slide

  20. Too many for one
    talk

    View Slide

  21. Focus:
    Debugging parts of the system
    you don’t normally touch

    View Slide

  22. Focus:
    Reproduce what happened
    with Postgres’ binary logs
    during that incident

    View Slide

  23. What is a
    binary log?

    View Slide

  24. [2020-06-14 23:02:37Z] GET /
    [2020-06-14 23:02:49Z] GET /favicon.ico
    [2020-06-14 23:03:53Z] GET /posts/binary-logs—talk
    What we normally mean by logs

    View Slide

  25. Databases have
    a different kind
    of log

    View Slide

  26. INSERT INTO talks VALUES ('binary-logs');
    INSERT INTO talks VALUES ('niche-git-features');
    INSERT INTO talks VALUES ('fp-karaoke');
    Some extremely boring SQL

    View Slide

  27. View Slide

  28. View Slide

  29. Warning:
    simplifying lie ahead

    View Slide

  30. INSERT INTO talks VALUES ('binary-logs');
    INSERT INTO talks VALUES ('niche-git-features');
    INSERT INTO talks VALUES ('fp-karaoke');

    Wrote 'binary-logs' into table 'talks'
    Wrote 'niche-git-features' into table 'talks'
    Wrote 'fp-karaoke' into table 'talks'
    A different kind of logs
    (if they were textual)

    View Slide

  31. But why bother
    doing that?

    View Slide

  32. Crash safety

    View Slide

  33. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    id
    1
    2

    View Slide

  34. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    3 fp-karaoke
    id
    1
    2

    View Slide

  35. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    3 fp-karaoke
    id
    1
    2
    .

    View Slide

  36. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    3 fp-karaoke
    id
    1
    2
    ???

    View Slide

  37. INSERT INTO talks VALUES ('binary-logs');
    INSERT INTO talks VALUES ('niche-git-features');
    INSERT INTO talks VALUES ('fp-karaoke');

    Wrote 'binary-logs' into table 'talks'
    Wrote 'niche-git-features' into table 'talks'
    Wrote 'fp-karaoke' into table 'talks'
    We can replay this operation

    View Slide

  38. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    3 fp-karaoke
    id
    1
    2
    ???

    View Slide

  39. Index Table
    id talk_name
    1 binary-logs
    2 niche-git-features
    3 fp-karaoke
    id
    1
    2
    3

    View Slide

  40. Also:
    replication

    View Slide

  41. Postgres
    Postgres
    Replication
    Application

    View Slide

  42. Postgres
    Postgres
    Replication
    Application
    (binary logs)

    View Slide

  43. So why the interest
    in rewriting the
    binary logs?

    View Slide

  44. 2020-06-14 17:23:01 GMT LOG: restored log file
    "000000020000000000000003" from archive
    2020-06-14 17:23:02 GMT LOG: invalid record length
    at 0/3000180
    A suspicious error message

    View Slide

  45. Issue restoring binary log

    Cause of failure to
    promote replica?

    View Slide

  46. Goal: do this
    repeatably in a non-
    production cluster

    View Slide

  47. 2020-06-14 17:23:01 GMT LOG: restored log file
    "000000020000000000000003" from archive
    2020-06-14 17:23:02 GMT LOG: invalid record length
    at 0/3000180
    Goal: produce this error message

    View Slide

  48. We can cheat:
    Postgres is
    open source

    View Slide

  49. But!

    View Slide

  50. These techniques
    also work on closed
    source software

    View Slide

  51. We just call that
    reverse engineering

    View Slide

  52. $ git checkout REL9_4_26 # we were running 9.4
    $ git grep -n "invalid record length"
    src/backend/access/transam/xlogreader.c:295: [...]
    src/backend/access/transam/xlogreader.c:604: [...]
    src/backend/access/transam/xlogreader.c:678: [...]
    Let's find the error

    View Slide

  53. src/backend/access/transam/xlogreader.c:291-300:
    {
    /* XXX: more validation should be done here */
    if (total_len < SizeOfXLogRecord)
    {
    report_invalid_record(state, "invalid record length at %X/%X",
    (uint32) (RecPtr >> 32), (uint32) RecPtr);
    goto err;
    }
    gotheader = false;
    }
    Let's find the error

    View Slide

  54. src/backend/access/transam/xlogreader.c:291-300:
    {
    /* XXX: more validation should be done here */
    if (total_len < SizeOfXLogRecord)
    {
    report_invalid_record(state, "invalid record length at %X/%X",
    (uint32) (RecPtr >> 32), (uint32) RecPtr);
    goto err;
    }
    gotheader = false;
    }
    Let's find the error

    View Slide

  55. src/include/access/xlog.h:58:
    #define SizeOfXLogRecord MAXALIGN(sizeof(XLogRecord))
    Let's find the error

    View Slide

  56. Wouldn't it be convenient
    if we could make
    total_len == 0?

    View Slide

  57. src/backend/access/transam/xlogreader.c:272-273:
    record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
    total_len = record->xl_tot_len;
    Let's find the error

    View Slide

  58. src/include/access/xlog.h:41:
    typedef struct XLogRecord
    {
    uint32 xl_tot_len; /* total len of entire record */
    TransactionId xl_xid; /* xact id */
    uint32 xl_len; /* total len of rmgr data */
    uint8 xl_info; /* flag bits, see below */
    RmgrId xl_rmid; /* resource manager for this record */
    /* 2 bytes of padding here, initialize to zero */
    XLogRecPtr xl_prev; /* ptr to previous record in log */
    pg_crc32 xl_crc; /* CRC for this record */
    /* If MAXALIGN==8, there are 4 wasted bytes here */
    /* ACTUAL LOG DATA FOLLOWS AT END OF STRUCT */
    } XLogRecord;
    Let's find the error

    View Slide

  59. src/include/access/xlog.h:41-56:
    typedef struct XLogRecord
    {
    uint32 xl_tot_len; /* total len of entire record */
    TransactionId xl_xid; /* xact id */
    uint32 xl_len; /* total len of rmgr data */
    uint8 xl_info; /* flag bits, see below */
    RmgrId xl_rmid; /* resource manager for this record */
    /* 2 bytes of padding here, initialize to zero */
    XLogRecPtr xl_prev; /* ptr to previous record in log */
    pg_crc32 xl_crc; /* CRC for this record */
    /* If MAXALIGN==8, there are 4 wasted bytes here */
    /* ACTUAL LOG DATA FOLLOWS AT END OF STRUCT */
    } XLogRecord;
    Let's find the error

    View Slide

  60. View Slide

  61. src/backend/access/transam/xlogreader.c:291-300:
    {
    /* XXX: more validation should be done here */
    if (total_len < SizeOfXLogRecord)
    {
    report_invalid_record(state, "invalid record length at %X/%X",
    (uint32) (RecPtr >> 32), (uint32) RecPtr);
    goto err;
    }
    gotheader = false;
    }
    What was that check doing?

    View Slide

  62. src/backend/access/transam/xlogreader.c:291-300:
    {
    /* XXX: more validation should be done here */
    if (total_len < SizeOfXLogRecord)
    {
    report_invalid_record(state, "invalid record length at %X/%X",
    (uint32) (RecPtr >> 32), (uint32) RecPtr);
    goto err;
    }
    gotheader = false;
    }
    What was that check doing?
    Size the record says it is
    Smallest possible size it can be

    View Slide

  63. INSERT INTO talks VALUES ('binary-logs');
    INSERT INTO talks VALUES ('niche-git-features');
    INSERT INTO talks VALUES ('fp-karaoke');

    Wrote 'binary-logs' into table 'talks'
    Wrote 'niche-git-features' into table 'talks'
    Wrote 'fp-karaoke' into table 'talks'
    A different kind of logs
    (if they were textual)

    View Slide

  64. Let's see what they
    look like in
    practice

    View Slide

  65. INSERT INTO talks VALUES ('binary-logs');
    INSERT INTO talks VALUES ('niche-git-features');
    INSERT INTO talks VALUES ('fp-karaoke');
    Some extremely boring SQL

    View Slide

  66. Grab the binary
    log file, and...

    View Slide

  67. A barely comprehensible
    wall of data

    View Slide

  68. A barely comprehensible
    wall of data
    Hex ASCII

    View Slide

  69. A little help: ASCII codes
    Decimal Hexadecimal Character
    62 3E >
    63 3F ?
    64 40 @
    65 41 A
    66 42 B

    View Slide

  70. A little help: ASCII codes
    Decimal Hexadecimal Character
    62 3E >
    63 3F ?
    64 40 @
    65 41 A
    66 42 B

    View Slide

  71. A barely comprehensible
    wall of data
    Hex ASCII

    View Slide

  72. A barely comprehensible
    wall of data
    We can see our talks!!

    View Slide

  73. How can we find
    xl_tot_len?

    View Slide

  74. INSERT INTO repro VALUES ('A');
    INSERT INTO repro VALUES ('AB');
    INSERT INTO repro VALUES ('ABC');
    INSERT INTO repro VALUES ('ABCD');
    INSERT INTO repro VALUES ('ABCDE');
    ...
    Some even more boring SQL

    View Slide

  75. Look for a field
    increasing
    by 1

    View Slide

  76. Guesswork incoming!

    View Slide

  77. Guesswork incoming!
    The data we inserted

    View Slide

  78. A little help: ASCII codes
    Decimal Hexadecimal Character
    62 3E >
    63 3F ?
    64 40 @
    65 41 A
    66 42 B

    View Slide

  79. Notice anything?
    The data we inserted

    View Slide

  80. Notice anything?
    The data we inserted
    Familiar characters

    View Slide

  81. Notice anything?
    Decimal Hexadecimal Character
    63 3F ?
    64 40 @
    65 41 A
    The data we inserted
    Familiar characters

    View Slide

  82. Notice anything?
    The data we inserted
    Familiar characters

    View Slide

  83. Notice anything?
    The data we inserted
    Familiar characters

    View Slide

  84. Notice anything?
    The data we inserted
    Familiar characters
    Familiar characters (hex)
    The data we inserted (hex)

    View Slide

  85. Wouldn't it be convenient
    if we could make
    total_len == 0?

    View Slide

  86. We could import the
    Postgres structs and do
    this properly...

    View Slide

  87. ...or we could write a
    regex

    View Slide

  88. Let's write a regex

    View Slide

  89. View Slide

  90. Let's break this one

    View Slide

  91. binlog_file_name = ARGV[0]
    puts binlog_file_name
    binlog_contents = IO.read(binlog_file_name, encoding: "BINARY")
    hex = binlog_contents.unpack("H*").first
    replaced = hex.gsub(/3f(000000.+41424300)/, "00\\1")
    bindata = [replaced].pack("H*")
    File.write(binlog_file_name + ".broken", bindata)
    break_binlog.rb

    View Slide

  92. binlog_file_name = ARGV[0]
    puts binlog_file_name
    binlog_contents = IO.read(binlog_file_name, encoding: "BINARY")
    hex = binlog_contents.unpack("H*").first
    replaced = hex.gsub(/3f(000000.+41424300)/, "00\\1") # Replaces 'ABC' size
    bindata = [replaced].pack("H*")
    File.write(binlog_file_name + ".broken", bindata)
    break_binlog.rb

    View Slide

  93. Let's break this one

    View Slide

  94. Broken!!

    View Slide

  95. And if we give it to
    a Postgres
    replica?

    View Slide

  96. 2020-06-17 19:24:11 GMT LOG: restored log file
    "000000020000000000000003" from archive
    2020-06-17 19:24:11 GMT LOG: invalid record length
    at 0/3000148
    We reproduced the error!

    View Slide

  97. Success

    View Slide

  98. Success, with a
    caveat...

    View Slide

  99. Postgres
    Postgres
    Application

    View Slide

  100. https://gocardless.com/blog/incident-review-api-and-dashboard-outage-on-10th-
    october/

    View Slide

  101. Lessons

    View Slide

  102. Hopefully you
    learned some cool
    stuff about databases

    View Slide

  103. Debugging
    unfamiliar layers
    of a system

    View Slide

  104. No part
    of software
    is magic

    View Slide

  105. Thank you
    ✌❤
    @ChrisSinjo
    @GoCardlessEng

    View Slide

  106. Image credits
    • Hard Disk Guts - CC-BY - https://www.flickr.com/photos/mattandkim/97533589/
    • Corsair ForceGT 180GB - CC-BY - https://www.flickr.com/photos/ruocaled/8173124575/
    • Rope - CC-BY - https://www.flickr.com/photos/[email protected]/6798304070/

    View Slide

  107. Questions?
    ✌❤
    @ChrisSinjo
    @GoCardlessEng

    View Slide