Slide 1

Slide 1 text

MAKING YOUR PATCH MORE COMMITTABLE Melanie Plageman Microsoft

Slide 2

Slide 2 text

Communication and collaboration Exercising your code Splitting up commits

Slide 3

Slide 3 text

Terminology ■ patch ó commit ■ patch set ó commits (branch) ■ mailing list (pgsql- hackers)

Slide 4

Slide 4 text

THREE YEARS AGO… August 2020 https://www.postgresql.org/message-id/CAAKRu_aLMRHX6_y%3DK5i5wBMTMQvoPMO8DT3eyCziTHjsY11cVA%40mail.gmail.com

Slide 5

Slide 5 text

https://www.postgresql.org/message-id/1fc1f116-b235-f016-a1f8-f124a9840a5c%40enterprisedb.com

Slide 6

Slide 6 text

COMMUNICATION & COLLABORATION

Slide 7

Slide 7 text

Structure of an initial email ■ High level summary ■ Why do we want this? ■ Why this way? ■ Call to action

Slide 8

Slide 8 text

HIGH LEVEL SUMMARY https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com

Slide 9

Slide 9 text

Why do we want this? ■ Problem fixed ■ Benefit added ■ Users impacted ■ Why now https://www.postgresql.org/message-id/flat/cf85f46f-b02f-05b2-5248-5000b894ebab%40enterprisedb.com

Slide 10

Slide 10 text

Why this way? ■ How is this different from previous attempts? ■ Explain the design ■ Why not any obvious alternative designs? ■ Why isn’t a part implemented? https://www.postgresql.org/message-id/flat/cf85f46f-b02f-05b2-5248-5000b894ebab%40enterprisedb.com

Slide 11

Slide 11 text

Call to action ■ TODOs awaiting validation – No documentation yet – Haven’t implemented it for this scenario yet ■ Open questions – Is this part of the design good? ■ What else needs to be done? – Do I need to implement it for partition tables? ■ Issues – Why is this test failing? https://www.postgresql.org/message-id/flat/cf85f46f-b02f-05b2-5248-5000b894ebab%40enterprisedb.com https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com

Slide 12

Slide 12 text

Other tips for initial email ■ Don't make everything equally important in commit message and email ■ Try to loop in invested users if user facing or devs if API – CC people who have worked on this problem in the past ■ Chime in on/review threads in the same subject area

Slide 13

Slide 13 text

How to respond ■ Answer questions that reviewers ask you ■ Address review inline point-by-point when sending a new version ■ Be flexible with stylistic changes ■ Address all instances of a given type of feedback ■ Don’t be afraid to fork new mailing list discussions

Slide 14

Slide 14 text

EXERCISING YOUR CODE

Slide 15

Slide 15 text

Opening Arguments With my patch applied, vacuum is 10% faster on average. A trivial example: CREATE TABLE foo(a INT) WITH (autovacuum_enabled=false); INSERT INTO foo SELECT i FROM generate_series(1, 1000) AS i; \timing on VACUUM (VERBOSE) foo; [Paste results here]

Slide 16

Slide 16 text

SIMPLE USAGE EXAMPLE https://www.postgresql.org/message-id/flat/CA%2BTgmoYOYZfMCyOXFyC-P%2B-mdrZqm5pP2N7S-r0z3_402h9rsA%40mail.gmail.com

Slide 17

Slide 17 text

BENCHMARKS

Slide 18

Slide 18 text

Why Benchmark? ■ Performance improvement ■ Lack of performance regression

Slide 19

Slide 19 text

Less is More in Benchmarks ■ \timing on is okay! ■ Use pgbench – Other supporting benchmarks are okay ■ Avoid bash scripts and attachments ■ Minimize distractions ■ Eliminate extraneous conditions – Use development GUCs, table options when helpful ■ Minimize runtime

Slide 20

Slide 20 text

Benchmark the right thing ■ Use pgbench “custom scripts” pgbench –c 1 –M prepared –P 1 –t 50 \ –f <(echo "COPY table(…) FROM 'copysource'") ■ Ensure reproducibility

Slide 21

Slide 21 text

Explain How to Run Your Benchmark ■ Postgres prep steps: prewarm, restart, initdb ■ Postgres settings (GUCs, table options, etc) ■ Schema and data ■ Machine specs or settings (e.g. readahead)

Slide 22

Slide 22 text

- DDL and data generation - steps required - command to run https://www.postgresql.org/message-id/flat/CA%2BTgmoa8nddwpMA4Emn7sVoNrQ883mn8ZJBiqXa8dm3puKn%3DqQ%40mail.gmail.com#4251b35792be16116af1c6cf87c2eb5d

Slide 23

Slide 23 text

- machine under test - brief results summary https://www.postgresql.org/message-id/flat/0e9c1643-c620-47e7-b30e-6840b0abbe2e%40enterprisedb.com#a9ec204b960afffe4bbf405a93b603aa

Slide 24

Slide 24 text

Provide the realistic use case ■ If example is contrived, explain realistic scenario – e.g. not tpcb-like pgbench with random data access distribution ■ Describe expected impact and how much of user base it would affect ■ Outline worst and best cases ■ Run additional benchmarks with longer runtimes that prove lack of negative interaction with subsystems like checkpointer

Slide 25

Slide 25 text

Presenting Performance Test Results ■ Don't provide more benchmark results than is useful ■ TPS (Transactions per second) isn’t the only metric – Plan diffs – perf output summary – P99 latency – Memory utilization – Per process IO consumption – …

Slide 26

Slide 26 text

https://www.postgresql.org/message-id/flat/cf85f46f-b02f-05b2-5248-5000b894ebab%40enterprisedb.com

Slide 27

Slide 27 text

https://www.postgresql.org/message-id/flat/20230630002952.34zj7conccw42yiv%40awork3.anarazel.de#2016c72ed21d7d8b09d830dd84876ae9

Slide 28

Slide 28 text

TESTS

Slide 29

Slide 29 text

Regression Tests ■ Minimal ■ Fast ■ Reliable ■ Core suite (SQL) ■ “Isolation, recovery, subscription” suite (Perl) ■ …

Slide 30

Slide 30 text

Reproduction Scripts https://www.postgresql.org/message-id/flat/8d526f29-c269-dc3d-38ac-10fa3a0a7fb3%40gmail.com#481e217bc7896fd4542f928d656dc46b

Slide 31

Slide 31 text

■ When adding a test is difficult ■ Instills confidence in correctness of work ■ Protects future developers Debug-only Validation

Slide 32

Slide 32 text

SPLITTING COMMITS UP

Slide 33

Slide 33 text

in July… ■ reduces vacuum WAL volume by > 50%! ■ passes tests! ■ and the code looks good too!

Slide 34

Slide 34 text

in July… ■ 21 files changed, 140 insertions, 1348 deletions L ■ unreviewable L ■ no commit message L

Slide 35

Slide 35 text

the journey to committability 1. Every commit works 2. No omnibus commits 3. Each line of a commit is purposeful and clearly outlined in commit message 4. Refactoring should be motivated 5. Functional changes should be obvious 6. Comments and commit messages must clearly communicate risk and action required

Slide 36

Slide 36 text

initial splitting attempt

Slide 37

Slide 37 text

minor improvement may not merit churn

Slide 38

Slide 38 text

- turn the refactor into a feature - independently valuable commit justifying churn add error detail for messages originating in on-access pruning

Slide 39

Slide 39 text

Commit message doesn’t communicate lack of risk

Slide 40

Slide 40 text

Better commit message

Slide 41

Slide 41 text

- separate form and function - single issue commits - don’t distract reviewer

Slide 42

Slide 42 text

separated into two commits

Slide 43

Slide 43 text

rearrange then relocate

Slide 44

Slide 44 text

single issue commits - separate commit for new APIs

Slide 45

Slide 45 text

SEPARATE COMMITS OR SEPARATE SUB- PROJECTS?

Slide 46

Slide 46 text

2O24.PGCONF.DEV May 28-31 Vancouver, Canada CFP through Jan 15, 2024

Slide 47

Slide 47 text

QUESTIONS