Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NilGuard: Minimising Patches for NPEs with Inco...

Avatar for Raghav Roy Raghav Roy
February 24, 2026
12

NilGuard: Minimising Patches for NPEs with Incorrectness Separation Logic

Automated Program Repair (APR) for Null Pointer Exceptions (NPEs) is often hindered by either false positive or bloated patches
that make unnecessary changes. Applying one of these problematic patches to a program can make it incorrect, even unsafe. This paper introduces NilGuard, a novel APR tool built on the under-approximate foundation of Incorrectness Separation Logic to combat both problems. For patch bloat, it presents a measure that approximates the impact of a patch on the rest of program. We formalise APR for NPEs as an alias-aware transformation problem, solved via two canonical repair schema, Skip and Replace, and a novel three stage compaction algorithm that finds Pareto-optimal patches for our patch bloat metric. Our experiments on over 1.5MLOC show that NilGuard outperforms the state-of-the-art APR tools: it generates 28.7% more safe patches (31.9% - 3.2%) and fewer patches for false positives; its patches are less bloated: a median of 3.75 unnecessary statements against 1.

Avatar for Raghav Roy

Raghav Roy

February 24, 2026
Tweet

Transcript

  1. NilGuard Minimising Patches for Null Pointer Errors with Incorrectness Separation

    Logic Raghav Roy Vorashil Farzaliyev Earl T. Barr Quang Loc Le University College London
  2. Your Codebase auth parser crypto network storage config A large

    project Each column bears load. A crack in one can bring down the whole structure. 1
  3. Your Codebase auth parser crypto network storage config A large

    project Each column bears load. A crack in one can bring down the whole structure. 2
  4. Software Over Time Day 1 No bugs Few months Production

    Many bugs As code grows, unchecked pointers accumulate. How do we repair them at scale? 3
  5. The State of NPE Repair alias chain alias chain single

    NPE Your Program 5 Cracks (NPEs) bloat false positive SOTA (OX) Repair False Positives, Bloat aliased cracks merged missed (UX) NilGuard (UX) Repair Precise, Minimal 4
  6. CWE-476: Not a Solved Problem MITRE CWE Top 25 (2025):

    NULL Pointer Dereference is #13 — up 8 places from #21 in 2024. Trajectory: year rank 25 15 10 5 2020 #11 #21 #13 2025 Recent CVEs (2025): • Windows Server 2025 CVE-2025-49694 — privilege escalation via NPE in Brokering FS (CVSS 7.8) • SAP NetWeaver CVE-2025-42902 — remote DoS via corrupted SAML ticket • Windows 10 CSC Service CVE-2025-62466 — local privilege escalation NPEs are privilege escalation and DoS vectors. 5
  7. NPEs Are Structural to Systems Programming Even in memory-safe languages,

    systems-level code requires escape hatches. Rust or future languages • unsafe blocks permit raw pointers • Real systems code needs unsafe: kernel modules, device drivers, performance critical paths • First Rust kernel CVE already in 2025 The fundamental issue Any language that permits direct memory manipulation will have code paths where null dereferences are possible. This is an inherent systems problem. 6
  8. The World Runs on C/C++ Linux kernel: 34M+ lines C

    isn’t going anywhere. (First Rust kernel CVE already: 2025) Embedded: ∼70% C C++: ∼23%, Rust: ∼5% Automotives, medical devices, industrial control: all C/C++ Critical infra: all C OpenSSL, glibc, musl 2025 CVEs in glibc, OpenSSL linux-pam, wolfSSL . . . There are billions of lines of deployed C/C++ that won’t be rewritten. We need tools for the code that exists today. 7
  9. Current Repair Tools Tend to Over-Approximate State-of-the-art NPE repair tools

    for C rely on over-approximate reasoning. This can lead to patching non-bugs or producing invalid patches. Patching non-existent bugs char *end; end = NULL; value = strtoul(..., &end, 10); /* Already guarded! */ if (!end || *end != ’\0’) return false; + if (end==0){ return; } // redundant patch Producing broken patches + if (ptr!=0){free(ptr);} int *ptr = NULL; // not declared yet! switch (argc) { case 2: ptr = malloc(sizeof(int)); *ptr = 20; // NPE break; ... } 8
  10. Our Contributions NilGuard builds on Pulse-X1 to perform automated NPE

    repair with under-approximate reasoning. 1. Local Safety: A formal safety criterion for NPE patches. 2. Two canonical repair schemas: Skip and Replace. That minimally repair a faulty program path. 3. A cost model for evaluating patch quality: Two potentially conflicting measures, decisions (D) and writes (W ), to quantify patch bloat. 4. Patch generation algorithm: Simultaneously minimises D and W to find minimal repairs across aliased NPEs, while being Locally Safe. 1Finding Real Bugs in Big Programs with Incorrectness Logic, OOPSLA 2022 9
  11. Local Safety What does it mean for a patch to

    be “safe”? APR tools don’t have a formal answer. Local Safety A transformation C′ of C is locally safe if, for every error triple [P] C [er : Q], there exists Q′ such that [P] C′ [ok : Q′]. In words: every input that used to crash now terminates normally. Locality: local safety is defined per error triple, i.e. relative to program path with a NPE. It does not guarantee global correctness of the whole program. 10
  12. Why Local? Compositionality and Scalability Local reasoning means NilGuard analyses

    and repairs each function independently, using only its ISL specification (pre/postconditions). What local safety gives us • Compositionality: repairs are derived from per-function IL triples. Each function is analysed in isolation; the repair of a callee propagates safety to its callers via the call graph. • Scalability: NilGuard operates on compact ISL summaries, not the whole program. What local safety does not guarantee Global Safety: a locally safe patch ensures the patched path no longer crashes, but it may not repair the entire error-trace. Bridging local safety to global is future work. 11
  13. Repair Schema: Skip Before (buggy): node *y = x; //

    alias! y->next = (int*)malloc(sizeof(int)); y->next->val = 0; // NPE After (Skip-Fault): node *y = x; y->next = (int*)malloc(sizeof(int)); if (y->next != NULL) { y->next->val = 0; } What happened: sf(C, y->next) wraps the NPE in a null guard. ISL triple before: [Q ∗ y → Y ∗ Y → (next,L) ∧ L=null] y->next->val = 0; [er: Q ∗ y → Y ∗ Y → (next,L) ∧ L=null] After transformation: [Q ∗ y → Y ∗ Y → (next,L) ∧ L=null] (assume(y->next ̸= null); y->next->val = 0;) + (assume(y->next = null) [ok: Q ∗ y → Y ∗ Y → (next,L) ∧ L=null] The assume(y->next ̸= null) makes the error precondition unsatisfiable. 12
  14. Measuring Patch Quality Existing APR tools have no notion of

    patch quality. A patch that guards 100 statements “costs” the same as one guarding 1. Our cost model: JAPR(C ⇝ C′) = D decisions , W writes D: # of guards (non-det. choices) inserted. More guards = more control-flow disruption. W : # of assignments added or conditioned on a new guard. More writes = larger semantic footprint. Patch bloat = JAPR(δ) − JAPR(δ∗) where δ∗ is the minimal patch. 13
  15. The D vs. W Trade-Off Minimising D and W simultaneously

    is not always possible — the two costs can conflict. Two separate guards: low W , high D if (x != NULL) { x->val = a; // NPE1 } y = compute(a, b); z = transform(y); if (x != NULL) { x->next = z; // NPE2 } Cost: (D=2, W =2) Each guard covers only its dereference. One merged guard: low D, high W if (x != NULL) { x->val = a; // NPE1 y = compute(a, b); // captured z = transform(y); // captured x->next = z; // NPE2 } Cost: (D=1, W =4) Merging reduces guards but captures innocent writes. 14
  16. Patch Compaction Algorithm Given the cost model with potentially conflicting

    objectives (D, W ), we want patches that simultaneously minimise both costs. Three-stage algorithm: 1. Partition Group NPEs by alias chain 2. Generate & Rank Apply enabled schemas → rank by cost 3. Filter & Merge Remove nested guards Merge adjacent guards Filter removes inner guards that are redundant given an outer guard on an aliased pointer. Merge combines adjacent guards in the same basic block. Both operations reduce D (fewer guards) while preserving local safety (Lemma 2). 15
  17. Compaction Rules The compaction rules that enable the algorithm to

    simultaneously minimise D and W , while being locally safe. [P] assume(x̸=null); C1 [ok: Q1∧ x=y] [Q1∧ x=y] assume(y̸=null); C2 [ϵ: Q2 ] [Q1∧ x=y] assume(y=null); C3 [ϵ: Q′ 2 ] [P] assume(x̸=null); C1 ; C2 [ϵ: Q2 ] Filter Modify(C1 ) ∩ {x} = ∅ [P] assume(x̸=null); C1 [ok: Q1 ] [Q1 ] C2 [ok: Q2∧ x=y] [Q2∧ x=y] assume(y̸=null); C3 [ϵ: Q3 ] [Q2∧ x=y] assume(y=null); C4 [ϵ: Q′ 3 ] [P] assume(x̸=null); C1 ; C2 ; C3 [ϵ: Q3 ] Merge Modify(C1 ; C2 ) ∩ {x} = ∅ Both rules are proven sound (Lemma 2). The full proofs are in the paper. Theorem 1: ∀ P ∈ SynthesizePatch(C, T), P is locally safe. 16
  18. Experimental Setup Three benchmark suites • Small Programs (54 NPEs)

    Meta’s regression tests for the Infer framework. • LLM-Generated (20 NPEs) AI-generated C programs covering diverse NPE patterns. • Large Projects (317 NPEs) 8 open-source C projects, 1.5M+ LoC. Baseline & methodology • Compared against PnF-fse2(ProveNFix), the current SOTA APR tool for C. • Manual inspection of all patches, classified as: C (correct/safe), δ (partial), I (incorrect), N (no patch). • Intel i5, 32 GB RAM, Ubuntu 20.04. 2ProveNFix: ACM SIGSOFT Distinguished Paper Award, FSE 2024. 17
  19. Results at a Glance 28.7% more safe patches3 3.75× less

    patch bloat 3× faster 331.9% vs. 3.2% safe patch rate — a ∼10× improvement. 18
  20. Large Projects: Patch Quality Results PnF-fse NilGuard Project kLoC Safe

    ¯ P I Safe ¯ P I flex 23.9 1 9 1 9 8 0 x264 64.6 0 0 0 0 0 1 lxc 62.4 0 31 0 5 4 0 p11-kit 76.2 0 2 0 0 0 0 recutils 81.9 1 77 11 30 4 1 OpenSSL-1 336.0 0 6 0 8 6 1 Snort 378.0 0 18 0 24 50 0 OpenSSL-3 556.4 0 16 0 0 2 0 Total 1579.4 2 159 12 76 74 3 ¯ P = patches on false positives, I = incorrect patches. PnF-fse: 92% patches on false positives. NilGuard: 48%. 19
  21. Patch Bloat Results Overall cost distribution: NilGuard (125 total bugs.

    114 compacted patches): Schema #P min Q1 med Q3 max Skip 81 (1, 1) (1, 1) (1, 1) (1, 3) (1, 6) Evade 4 (1, 2) (1, 24) (1, 46) (1, 106) (1, 106) Replace 29 (1, 1) (1, 1) (1, 1) (1, 1) (1, 1) All 114 (1, 1) (1, 1) (1, 1) (1, 2) (1, 106) Median cost (1, 1) across all projects. PnF-fse: Median cost (1, 3.75). 20
  22. NilGuard Minimising Patches for NPEs with Incorrectness Separation Logic Raghav

    Roy Vorashil Farzaliyev Earl T. Barr Quang Loc Le — UCL CWE-476 Trajectory #13 #21 Precise Patches Evaluation 28.7% more safe patches 3.75× less patch bloat 3× faster Slides Contact raghavroy145 at gmail dot com 1. Local Safety 2. Skip + Replace schemas 3. Cost model (D, W ) 4. Patch compaction
  23. Incorrectness Separation Logic Standard Hoare triple (over-approximate): {P} C {Q}

    — “if P holds before C, then Q holds after.” Proves absence of bugs. ISL triple (under-approximate): [P] C [ϵ : Q] — “every state in Q is reachable from some state in P.” Proves presence of bugs. Why this matters for repair • If the analysis says there’s a bug, there is one. No false positives (under the model). • Manifest errors: P ≡ emp ∧ true — the bug is context-independent. Guaranteed to occur regardless of how the function is called. • Latent errors: bug depends on calling context. Propagated compositionally. NilGuard takes Pulse-X’s manifest and latent error reports as input. 21
  24. Replace Schema: Rewiring the Data Flow Before (buggy): int *p

    = NULL; int **pp = &p; int ***ppp = &pp; ***ppp = 5; // NPE After (Assign-Fresh): int *p = NULL; int **pp = &p; int ***ppp = &pp; if (pp == NULL) { int tmp = 0; *pp = &tmp; } ***ppp = 5; What happened: af(C, **ppp) binds the null pointer to a fresh, safe memory cell. Key: NilGuard extracted the alias chain {p, ∗pp, ∗ ∗ ppp} from the ISL postcondition and traced the null back to its source. OX-based tools could not resolve this example — multi-level aliasing requires precise alias chain extraction from the postcondition. Cost: (1, 2) — zero guards, two writes (declaration + assignment). 22