NilGuard: Minimising Patches for NPEs with Incorrectness Separation Logic

NilGuard Minimising Patches for Null Pointer Errors with Incorrectness Separation
Logic Raghav Roy Vorashil Farzaliyev Earl T. Barr Quang Loc Le University College London

Your Codebase auth parser crypto network storage config A large
project Each column bears load. A crack in one can bring down the whole structure. 1

Your Codebase auth parser crypto network storage config A large
project Each column bears load. A crack in one can bring down the whole structure. 2

Software Over Time Day 1 No bugs Few months Production
Many bugs As code grows, unchecked pointers accumulate. How do we repair them at scale? 3

The State of NPE Repair alias chain alias chain single
NPE Your Program 5 Cracks (NPEs) bloat false positive SOTA (OX) Repair False Positives, Bloat aliased cracks merged missed (UX) NilGuard (UX) Repair Precise, Minimal 4

CWE-476: Not a Solved Problem MITRE CWE Top 25 (2025):
NULL Pointer Dereference is #13 — up 8 places from #21 in 2024. Trajectory: year rank 25 15 10 5 2020 #11 #21 #13 2025 Recent CVEs (2025): • Windows Server 2025 CVE-2025-49694 — privilege escalation via NPE in Brokering FS (CVSS 7.8) • SAP NetWeaver CVE-2025-42902 — remote DoS via corrupted SAML ticket • Windows 10 CSC Service CVE-2025-62466 — local privilege escalation NPEs are privilege escalation and DoS vectors. 5

NPEs Are Structural to Systems Programming Even in memory-safe languages,
systems-level code requires escape hatches. Rust or future languages • unsafe blocks permit raw pointers • Real systems code needs unsafe: kernel modules, device drivers, performance critical paths • First Rust kernel CVE already in 2025 The fundamental issue Any language that permits direct memory manipulation will have code paths where null dereferences are possible. This is an inherent systems problem. 6

The World Runs on C/C++ Linux kernel: 34M+ lines C
isn’t going anywhere. (First Rust kernel CVE already: 2025) Embedded: ∼70% C C++: ∼23%, Rust: ∼5% Automotives, medical devices, industrial control: all C/C++ Critical infra: all C OpenSSL, glibc, musl 2025 CVEs in glibc, OpenSSL linux-pam, wolfSSL . . . There are billions of lines of deployed C/C++ that won’t be rewritten. We need tools for the code that exists today. 7

Current Repair Tools Tend to Over-Approximate State-of-the-art NPE repair tools
for C rely on over-approximate reasoning. This can lead to patching non-bugs or producing invalid patches. Patching non-existent bugs char *end; end = NULL; value = strtoul(..., &end, 10); /* Already guarded! */ if (!end || *end != ’\0’) return false; + if (end==0){ return; } // redundant patch Producing broken patches + if (ptr!=0){free(ptr);} int *ptr = NULL; // not declared yet! switch (argc) { case 2: ptr = malloc(sizeof(int)); *ptr = 20; // NPE break; ... } 8

Our Contributions NilGuard builds on Pulse-X1 to perform automated NPE
repair with under-approximate reasoning. 1. Local Safety: A formal safety criterion for NPE patches. 2. Two canonical repair schemas: Skip and Replace. That minimally repair a faulty program path. 3. A cost model for evaluating patch quality: Two potentially conflicting measures, decisions (D) and writes (W ), to quantify patch bloat. 4. Patch generation algorithm: Simultaneously minimises D and W to find minimal repairs across aliased NPEs, while being Locally Safe. 1Finding Real Bugs in Big Programs with Incorrectness Logic, OOPSLA 2022 9

Incorrectness Separation Logic ISL triple (under-approximate): [P] C [ϵ :
Q] — “every state in Q is reachable from some state in P.” ϵ ∈ {ok; er} Proves presence of bugs with [P] C [er : Q] • Manifest errors: P ≡ emp ∧ true — the bug is context-independent. Guaranteed to occur regardless of how the function is called. • Latent errors: bug depends on calling context. Propagated compositionally. Why this matters for repair • No false positives (under the model) ⇒ No repair on non-existent bugs. NilGuard takes Pulse-X’s manifest and latent error reports as input. Pulse-X: Le et al., OOPSLA 2022 — found 15 confirmed bugs in OpenSSL. 10

Local Safety What does it mean for a patch to
be “safe”? APR tools don’t have a formal answer. Local Safety A transformation C′ of C is locally safe if, for every error triple [P] C [er : Q], there exists Q′ such that [P] C′ [ok : Q′]. In words: every input that used to crash now terminates normally. Locality: local safety is defined per error triple, i.e. relative to program path with a NPE. It does not guarantee global correctness of the whole program. 11

Why Local? Compositionality and Scalability Local reasoning means NilGuard analyses
and repairs each function independently, using only its ISL specification (pre/postconditions). What local safety gives us • Compositionality: repairs are derived from per-function IL triples. Each function is analysed in isolation; the repair of a callee propagates safety to its callers via the call graph. • Scalability: NilGuard operates on compact ISL summaries, not the whole program. What local safety does not guarantee Global Safety: a locally safe patch ensures the patched path no longer crashes, but it may not repair the entire error-trace. Bridging local safety to global is future work. 12

Repair Schema: Skip Before (buggy): node *y = x; //
alias! y->next = (int*)malloc(sizeof(int)); y->next->val = 0; // NPE After (Skip-Fault): node *y = x; y->next = (int*)malloc(sizeof(int)); if (y->next != NULL) { y->next->val = 0; } What happened: sf(C, y->next) wraps the NPE in a null guard. ISL triple before: [Q ∗ y → Y ∗ Y → (next,L) ∧ L=null] y->next->val = 0; [er: Q ∗ y → Y ∗ Y → (next,L) ∧ L=null] After transformation: [Q ∗ y → Y ∗ Y → (next,L) ∧ L=null] (assume(y->next ̸= null); y->next->val = 0;) + (assume(y->next = null) [ok: Q ∗ y → Y ∗ Y → (next,L) ∧ L=null] The assume(y->next ̸= null) makes the error precondition unsatisfiable. 13

Measuring Patch Quality Existing APR tools have no notion of
patch quality. A patch that guards 100 statements “costs” the same as one guarding 1. Our cost model: JAPR(C ⇝ C′) = D decisions , W writes D: # of guards (non-det. choices) inserted. More guards = more control-flow disruption. W : # of assignments added or conditioned on a new guard. More writes = larger semantic footprint. D and W are incomparable: reducing one can increase the other, making them competing objectives.2 Patch bloat = JAPR(δ) − JAPR(δ∗) where δ∗ is the minimal patch. 2The model also admits read-only operations, but their impact is nonfunctional. 14

The D vs. W Trade-Off Minimising D and W simultaneously
is not always possible — the two costs can conflict. Two separate guards: low W , high D if (x != NULL) { x->val = a; // NPE1 } y = compute(a, b); z = transform(y); if (x != NULL) { x->next = z; // NPE2 } Cost: (D=2, W =2) Each guard covers only its dereference. One merged guard: low D, high W if (x != NULL) { x->val = a; // NPE1 y = compute(a, b); // captured z = transform(y); // captured x->next = z; // NPE2 } Cost: (D=1, W =4) Merging reduces guards but captures innocent writes. 15

Why Minimising D Matters: The Cost of a Guard A
null guard is a conditional branch, which carries its own nonfunctional cost independent of how many reads or writes fall inside it. Pipeline pressure: • Each guard replaces 1 instruction with 3–4 (compare, branch), adding instructions that can exceed pipeline length and cause more flushes. • More guards also means more predictions, increasing mispredictions and potentially exceeding the CPU’s concurrent speculative execution streams. Guard-heavy approaches have been reported to slow programs by 20–60%. Jim et al. Cyclone: A Safe Dialect of C. USENIX ATC 2002. Necula et al. CCured: Type-Safe Retrofitting of Legacy Code. POPL 2002 (20–60% slowdown on SPECINT). 16

Patch Compaction Algorithm Given the cost model with incomparable objectives
(D, W ), we want patches on the Pareto front — not dominated on either dimension. Three-stage algorithm: 1. Partition Group NPEs by alias chain 2. Generate & Rank Apply enabled schemas → rank by cost 3. Filter & Merge Remove nested guards Merge adjacent guards Filter removes inner guards that are redundant given an outer guard on an aliased pointer. Merge combines adjacent guards in the same basic block. Both operations reduce D (fewer guards) while preserving local safety (Lemma 2). The implementation uses a heuristic that favours lower D 17

Compaction Rules The compaction rules that enable the algorithm to
simultaneously minimise D and W , while being locally safe. [P] assume(x̸=null); C1 [ok: Q1∧ x=y] [Q1∧ x=y] assume(y̸=null); C2 [ϵ: Q2 ] [Q1∧ x=y] assume(y=null); C3 [ϵ: Q′ 2 ] [P] assume(x̸=null); C1 ; C2 [ϵ: Q2 ] Filter Modify(C1 ) ∩ {x} = ∅ [P] assume(x̸=null); C1 [ok: Q1 ] [Q1 ] C2 [ok: Q2∧ x=y] [Q2∧ x=y] assume(y̸=null); C3 [ϵ: Q3 ] [Q2∧ x=y] assume(y=null); C4 [ϵ: Q′ 3 ] [P] assume(x̸=null); C1 ; C2 ; C3 [ϵ: Q3 ] Merge Modify(C1 ; C2 ) ∩ {x} = ∅ Both rules are proven sound (Lemma 2). The full proofs are in the paper. Theorem 1: ∀ P ∈ SynthesizePatch(C, T), P is locally safe. 18

Experimental Setup Three benchmark suites • Small Programs (54 NPEs)
Meta’s regression tests for the Infer framework. • LLM-Generated (20 NPEs) AI-generated C programs covering diverse NPE patterns. • Large Projects (317 NPEs) 8 open-source C projects, 1.5M+ LoC. Baseline & methodology • Compared against PnF-fse3(ProveNFix), the current SOTA APR tool for C. • Manual inspection of all patches, classified as: C (correct/safe), δ (partial), I (incorrect), N (no patch). • Intel i5, 32 GB RAM, Ubuntu 20.04. 3ProveNFix: ACM SIGSOFT Distinguished Paper Award, FSE 2024. 19

Results at a Glance 28.7% more safe patches4 3.75× less
patch bloat 3× faster 431.9% vs. 3.2% safe patch rate — a ∼10× improvement. 20

Large Projects: Patch Quality Results PnF-fse NilGuard Project kLoC Safe
¯ P I Safe ¯ P I flex 23.9 1 9 1 9 8 0 x264 64.6 0 0 0 0 0 1 lxc 62.4 0 31 0 5 4 0 p11-kit 76.2 0 2 0 0 0 0 recutils 81.9 1 77 11 30 4 1 OpenSSL-1 336.0 0 6 0 8 6 1 Snort 378.0 0 18 0 24 50 0 OpenSSL-3 556.4 0 16 0 0 2 0 Total 1579.4 2 159 12 76 74 3 ¯ P = patches on false positives, I = incorrect patches. PnF-fse: 92% patches on false positives. NilGuard: 48%. 21

Patch Bloat Results Overall cost distribution: NilGuard (125 total bugs.
114 compacted patches): Schema #P min Q1 med Q3 max Skip 81 (1, 1) (1, 1) (1, 1) (1, 3) (1, 6) Evade 4 (1, 2) (1, 24) (1, 46) (1, 106) (1, 106) Replace 29 (1, 1) (1, 1) (1, 1) (1, 1) (1, 1) All 114 (1, 1) (1, 1) (1, 1) (1, 2) (1, 106) Median cost (1, 1) across all projects. PnF-fse: Median cost (1, 3.75). 22

NilGuard Minimising Patches for NPEs with Incorrectness Separation Logic Raghav
Roy Vorashil Farzaliyev Earl T. Barr Quang Loc Le — UCL CWE-476 Trajectory #13 #21 Precise Patches Evaluation 28.7% more safe patches 3.75× less patch bloat 3× faster Slides Contact raghavroy145 at gmail dot com 1. Local Safety 2. Skip + Replace schemas 3. Cost model (D, W ) 4. Patch compaction

Replace Schema: Rewiring the Data Flow Before (buggy): int *p
= NULL; int **pp = &p; int ***ppp = &pp; ***ppp = 5; // NPE After (Assign-Fresh): int *p = NULL; int **pp = &p; int ***ppp = &pp; if (pp == NULL) { int tmp = 0; *pp = &tmp; } ***ppp = 5; What happened: af(C, **ppp) binds the null pointer to a fresh, safe memory cell. Key: NilGuard extracted the alias chain {p, ∗pp, ∗ ∗ ppp} from the ISL postcondition and traced the null back to its source. OX-based tools could not resolve this example — multi-level aliasing requires precise alias chain extraction from the postcondition. Cost: (1, 2) — zero guards, two writes (declaration + assignment). 23

NilGuard: Minimising Patches for NPEs with Inco...

NilGuard: Minimising Patches for NPEs with Incorrectness Separation Logic

Raghav Roy

More Decks by Raghav Roy

Featured

Transcript

NilGuard Minimising Patches for Null Pointer Errors with Incorrectness Separation

Your Codebase auth parser crypto network storage config A large

Your Codebase auth parser crypto network storage config A large

Software Over Time Day 1 No bugs Few months Production

The State of NPE Repair alias chain alias chain single

CWE-476: Not a Solved Problem MITRE CWE Top 25 (2025):

NPEs Are Structural to Systems Programming Even in memory-safe languages,

The World Runs on C/C++ Linux kernel: 34M+ lines C

Current Repair Tools Tend to Over-Approximate State-of-the-art NPE repair tools

Our Contributions NilGuard builds on Pulse-X1 to perform automated NPE

Incorrectness Separation Logic ISL triple (under-approximate): [P] C [ϵ :

Local Safety What does it mean for a patch to

Why Local? Compositionality and Scalability Local reasoning means NilGuard analyses

Repair Schema: Skip Before (buggy): node *y = x; //

Measuring Patch Quality Existing APR tools have no notion of

The D vs. W Trade-Off Minimising D and W simultaneously

Why Minimising D Matters: The Cost of a Guard A

Patch Compaction Algorithm Given the cost model with incomparable objectives

Compaction Rules The compaction rules that enable the algorithm to

Experimental Setup Three benchmark suites • Small Programs (54 NPEs)

Results at a Glance 28.7% more safe patches4 3.75× less

Large Projects: Patch Quality Results PnF-fse NilGuard Project kLoC Safe

Patch Bloat Results Overall cost distribution: NilGuard (125 total bugs.

NilGuard Minimising Patches for NPEs with Incorrectness Separation Logic Raghav

Replace Schema: Rewiring the Data Flow Before (buggy): int *p