Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Context Management for Everyday Software Engine...

Context Management for Everyday Software Engineering

Context management — curating what an AI agent sees, when, in what form, and what it doesn't — is becoming a first-class software engineering skill. This talk covers the foundations (tokens, effective context, the lost-in-the-middle effect), how framing positions the model in latent space before it answers, when prose loses fidelity to ASTs, hierarchies, and graphs, eight techniques for controlling a session thread (fork, compact, scratchpads, decision logs, sub-agents, Yegge's beads), and the failure modes that break sessions — distractors, stale recall, hallucination lock-in, correction-induced confusion. Five takeaways: curate don't dump, frame deliberately, match format to problem shape, externalize state, and expertise amplifies.

Presented at GIDS 2026 on April 22.

Avatar for Ragunath Jawahar

Ragunath Jawahar

April 23, 2026

More Decks by Ragunath Jawahar

Other Decks in Programming

Transcript

  1. CTX-MGMT // GIDS 2026 01 / 71 Ragunath Jawahar ·

    legacycode.com Context Management. for everyday software engineering Ragunath Jawahar • Founder, Legacy Code Labs • https://legacycode.com April 22, 2026 · GIDS 2026 · Bengaluru
  2. CTX-MGMT // GIDS 2026 02 / 71 Ragunath Jawahar ·

    legacycode.com “Good morning, Lucy.”
  3. CTX-MGMT // GIDS 2026 03 / 71 Ragunath Jawahar ·

    legacycode.com R O A D M A P Agenda 01 Foundations tokens, context windows, what fills them up 02 Latent Space Activation how framing positions the model 03 Input Modality voice, show-and-tell 04 Where Prose Loses Fidelity structured formats for structured problems 05 Thread Control eight ways to continue a session 06 Failure Modes what breaks, and how to recover 07 Agent-side Management what the tools do for you 08 Bonus + Key Takeaways
  4. CTX-MGMT // GIDS 2026 04 / 71 Ragunath Jawahar ·

    legacycode.com O P E N I N G Context Management as a Software Engineering Skill promoting context curation from folk practice to named principle
  5. CTX-MGMT // GIDS 2026 05 / 71 Ragunath Jawahar ·

    legacycode.com D E F I N I T I O N What is context management? Context management is the process of curating what information an AI agent sees, when it sees it, in what form, and what it doesn’t see.
  6. CTX-MGMT // GIDS 2026 06 / 71 Ragunath Jawahar ·

    legacycode.com P R O M O T I O N Why this is a first-class SWE skill now 01 We manage agents, not just write code. 02 Model capability is rarely the bottleneck. Failures are increasingly about inputs, not intelligence. 03 ~17% of PRs contain high-severity bugs (Qodo benchmark) — many trace back to missing context. 04 Token budgets. 05 Seniority is shifting toward those who frame problems well and keep sessions focused.
  7. CTX-MGMT // GIDS 2026 07 / 71 Ragunath Jawahar ·

    legacycode.com S E C T I O N 0 1 / 0 8 Foundations tokens, context, and memory
  8. CTX-MGMT // GIDS 2026 08 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S What’s a token? Words are to human language what tokens are to language models — the unit meaning is carried in. A token is a subword unit from the model’s learned vocabulary — not a character, not a word. Heuristics (English): ~4 characters per token, ~0.75 words per token. Code and non-English text run higher. Every limit, every bill, every latency number is denominated in tokens.
  9. CTX-MGMT // GIDS 2026 09 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S / 0 1 Tokens vs. characters: English Hello, world! 13 c h a r a c t e r s 4 t o k e n s Right on the ~4-chars-per-token heuristic. Canonical English baseline.
  10. CTX-MGMT // GIDS 2026 10 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S / 0 2 Tokens vs. characters: Hindi है लो व र्ल ्ड ! 12 c h a r a c t e r s 7 t o k e n s Fewer characters than English, but ~2× the tokens. Devanagari falls back to byte-level. Non-English users pay more — invisibly.
  11. CTX-MGMT // GIDS 2026 11 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S Models are stateless Zero memory between calls. The agent re-injects the whole history on every call. Most of that window gets assembled by the agent — system prompt, tool calls, tool outputs, prior turns. Your lever is narrower: it’s what you say. Everything else is downstream of that.
  12. CTX-MGMT // GIDS 2026 12 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S What is a context window? The set of tokens the model sees in a single inference call. input + output combined.
  13. CTX-MGMT // GIDS 2026 13 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S An inference call R E Q U E S T { "messages": [ { "role": "system", "content": "You are a helpful geography tutor. Answer in as few words as possible." }, { "role": "user", "content": "Capital of India?" } ] } R E S P O N S E "New Delhi." System + user + reply all live in the same envelope. Output counts against the same budget.
  14. CTX-MGMT // GIDS 2026 14 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S Next call, the agent rebuilds it R E Q U E S T { "messages": [ { "role":"system", "content":"You are a helpful geography tutor..." }, // re-sent by agent { "role":"user", "content":"Capital of India?" }, // re-sent by agent { "role":"assistant", "content":"New Delhi." }, // re-sent by agent { "role":"user", "content":"And Brazil?" } // new ] } R E S P O N S E "Brasília."
  15. CTX-MGMT // GIDS 2026 15 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S Already in the window before you type Every session opens with tens of thousands of tokens already spent. You haven’t said a word yet. F R O M T H E A G E N T System prompt thousands of tokens of agent instructions Built-in tool definitions JSON schemas for every tool the agent can call Environment metadata cwd, OS, model, time, permissions F R O M Y O U R S E T U P AGENTS.md / CLAUDE.md / .cursorrules project instructions MCP servers each contributes tools, resources, prompts Installed skills names and descriptions in the registry Saved memory / preferences
  16. CTX-MGMT // GIDS 2026 16 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S / 1 o f 3 What fills up: from you, live This is where the window actually starts to explode. Your messages every turn Files you attach or drag in Tight prompts → tight sessions.
  17. CTX-MGMT // GIDS 2026 17 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S / 2 o f 3 From the agent’s decisions Typically dominates — and where most curation acts. Tools it calls and their results Files it reads via Read, Grep, Glob Shell commands and their output RAG chunks it retrieves Skills it invokes the full SKILL.md body, not just the description Sub-agents their output comes back into context
  18. CTX-MGMT // GIDS 2026 18 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S / 3 o f 3 From the model’s output Every assistant turn Visible. Predictable. Counted. Thinking tokens if enabled Silent. 10–50K+ per turn. Off-screen, on-bill.
  19. CTX-MGMT // GIDS 2026 19 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S A turn’s anatomy Composition of one typical coding-agent turn (~156K tokens) System prompt ~3K 2% Tool definitions ~8K 5% Prior conversation ~25K 16% File reads + tool outputs ~120K 77% Current user message ~0.5K 0.3% Tool outputs and file reads dominate. Your actual question is the smallest thing in the window.
  20. CTX-MGMT // GIDS 2026 20 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S Same turn, different windows How that same ~156K payload fits Claude Haiku 4.5 200K window ~78% full ~44K headroom GPT-5.3 Codex 400K window ~39% full ~244K headroom Claude Sonnet 4.6 / Opus 4.7 1000K window ~16% full ~844K headroom Same work, different budgets. Model choice is a context-management lever.
  21. CTX-MGMT // GIDS 2026 21 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S Tokens aren’t free Model Input ($/M) Output ($/M) Ratio Claude Opus 4.7 $5 $25 5× Claude Sonnet 4.6 $3 $15 5× OpenAI GPT-5.3 Codex $1.75 $14 8× 3–5× o u t p u t : i n p u t typical asymmetry Latency: bigger context → slower first token, slower generation. Prompt caching: changes the economics for repeated prefixes — pay full price once, cache the rest. Pricing per million tokens · April 2026
  22. CTX-MGMT // GIDS 2026 22 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S Advertised ≠ effective Effective context = the length at which the model still reasons reliably. 60–70% of advertised window — practitioner heuristic for where reasoning still holds. Needle-in-a-haystack ≠ reasoning-over-haystack. Retrieval is easy. Synthesis degrades.
  23. CTX-MGMT // GIDS 2026 23 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S Attention favors the edges Lost-in-the-middle (Liu et al., 2023) arxiv.org/abs/2307.03172 Info at the start and end of the window is used more reliably than the middle. This is why effective context sits below advertised — the middle is the first casualty.
  24. CTX-MGMT // GIDS 2026 24 / 71 Ragunath Jawahar ·

    legacycode.com F O U N D A T I O N S The edge is your tool The U-curve has a constructive flip side. The start is the preserved edge. Plant what matters there. system prompt AGENTS.md role framings
  25. CTX-MGMT // GIDS 2026 25 / 71 Ragunath Jawahar ·

    legacycode.com E X C E P T I O N S When it works Three patterns break the rule — filling the window is fine, sometimes helpful. 01 Batch application one task, many targets 02 Iterative refinement many tries, one target 03 Progressive expansion many steps, one target
  26. CTX-MGMT // GIDS 2026 26 / 71 Ragunath Jawahar ·

    legacycode.com E X C E P T I O N S / 1 o f 3 Batch application: one task, many targets W H A T T H E P A T T E R N I S The procedure is fixed; the target varies each turn. Prior turns establish the template implicitly — once the agent has produced the first few, subsequent ones come out consistent in style, verbosity, and conventions. W H E N I T W O R K S — Adding OpenAPI specifications to every function in a controller — Logging exceptions for every catch block in a package — Generating CRUD scaffolding for a list of models
  27. CTX-MGMT // GIDS 2026 27 / 71 Ragunath Jawahar ·

    legacycode.com E X C E P T I O N S / 2 o f 3 Iterative refinement: many tries, one target W H A T T H E P A T T E R N I S The target is fixed; each turn is a new attempt that builds on what prior attempts revealed. Prior turns record the search trajectory — what was tried, what failed, which direction to lean. W H E N I T W O R K S — Refining a SQL query until it returns the right output — Working on an email draft — Troubleshooting a failing test
  28. CTX-MGMT // GIDS 2026 28 / 71 Ragunath Jawahar ·

    legacycode.com E X C E P T I O N S / 3 o f 3 Progressive expansion: many steps, one target W H A T T H E P A T T E R N I S The target grows incrementally; each turn adds to the running state of the artifact. Attention handles the start and the end reliably. The middle rots, and you don’t care — you’re building forward, not looking back. W H E N I T W O R K S — Building up a feature across successive turns (auth → roles → permissions) — Growing a test suite one test at a time — Writing a document section by section
  29. CTX-MGMT // GIDS 2026 29 / 71 Ragunath Jawahar ·

    legacycode.com S E C T I O N 0 2 / 0 8 Latent Space Activation how your context window decides what comes out
  30. CTX-MGMT // GIDS 2026 30 / 71 Ragunath Jawahar ·

    legacycode.com A C T I V A T I O N What activation means Assume the context window contains just a prompt. The prompt doesn’t retrieve an answer. It positions the model before it answers. Picture the model’s knowledge as a vast landscape, shaped by training. Your prompt drops a ball on the terrain. Where it lands decides what comes out. Same landscape, different drop point → very different output.
  31. CTX-MGMT // GIDS 2026 31 / 71 Ragunath Jawahar ·

    legacycode.com A C T I V A T I O N What that looks like P R O M P T A — U N F R A M E D “Write a function to sort a list.” → Unpredictable. Python, JavaScript, pseudocode — the model picks from its own priors. P R O M P T B — F R A M E D “Write a Haskell function to sort a list.” → Recursive quicksort. Pattern matching. Type signature. Academic tone. The model stopped picking. You did.
  32. CTX-MGMT // GIDS 2026 32 / 71 Ragunath Jawahar ·

    legacycode.com A C T I V A T I O N / E V I D E N C E Golden Gate Claude A fun experiment from Anthropic: They clamped the activation space to “Golden Gate Bridge.” Every answer Claude gave was about the Golden Gate Bridge.
  33. CTX-MGMT // GIDS 2026 33 / 71 Ragunath Jawahar ·

    legacycode.com A C T I V A T I O N One framing move Same model. Same question. Different framing. P R O M P T A — G E N E R A L “Perform a security review on this code.” ↓ Grab-bag — input validation, SQL injection, XSS, auth handling, secrets, error leakage. Unstructured; coverage varies. P R O M P T B — F R A M E D “Review this code against OWASP Top 10.” ↓ Framework-structured — A01 broken access control, A02 crypto failures, A03 injection, A05 misconfig, A07 auth failures. Same weights. One framing move. The model knew the framework — naming it unlocked it.
  34. CTX-MGMT // GIDS 2026 34 / 71 Ragunath Jawahar ·

    legacycode.com A C T I V A T I O N Framing for creativity Framing can position the ball at the intersection of regions — a place no single-region prompt ever reaches. “You are a hairdresser in Bangalore with 15 years running a salon. Build me a real estate listing tool for local agents in Java.” → Lands where service-worker practicality · Bangalore market · real estate · Java all overlap. An output shape no single-region prompt would produce.
  35. CTX-MGMT // GIDS 2026 35 / 71 Ragunath Jawahar ·

    legacycode.com A C T I V A T I O N Framing for access S a m e m e c h a n i s m . D i f f e r e n t k i n d o f w i n . “You are a security engineer auditing the public login endpoint for a multi-tenant SaaS. Review against OWASP Top 10 — focus on A01, A02, A07.” → Lands at role + component + operational context + framework. Surgical findings a non-security dev couldn’t produce unaided. Creativity reaches regions no one visits. Access reaches regions you can’t visit.
  36. CTX-MGMT // GIDS 2026 36 / 71 Ragunath Jawahar ·

    legacycode.com A C T I V A T I O N / R E C O G N I T I O N What you are already doing Five moves devs already use — now with a name for why they work. Role / persona “You are a senior systems engineer…” Positioning by identity. Domain priming “…reduce GC pressure on this hot path.” The vocabulary is the signal. Few-shot examples “Here are three past bug reports. Write one…” The examples are the signal — no name needed. Style anchors “Match the style of these files.” House conventions, not the average of the internet. Negative framing “Don’t over-engineer. No abstractions I’d explain later.” Positioning by what to stay away from. Same mechanism. Five interfaces.
  37. CTX-MGMT // GIDS 2026 37 / 71 Ragunath Jawahar ·

    legacycode.com A C T I V A T I O N / C A P S T O N E Why expertise amplifies You can’t name what you don’t know. A f r a m e w o r k SOLID · CAP · ACID · OWASP A c o n t e x t multi-tenant SaaS · 10k TPS · real-time · embedded A p i e c e o f j a r g o n hot path · GC pressure · tail latency · idempotent No name → no region. No region → no activation. LLMs don’t replace expertise. They leverage it.
  38. CTX-MGMT // GIDS 2026 38 / 71 Ragunath Jawahar ·

    legacycode.com A C T I V A T I O N A sub-agent’s system prompt is a pure activation artifact Every sub-agent starts fresh. No conversation history. The only thing in its window when it boots is the prompt you give it. “You are a senior security engineer. Your only job is to audit a code change against OWASP Top 10. For each finding, name the category (A01–A10), the exact line, and the fix. Nothing else — no general review, no style comments.” Four framings stacked: role + framework + scope + negative framing. It isn’t personality. It’s geography — where the ball lands before any work begins.
  39. CTX-MGMT // GIDS 2026 39 / 71 Ragunath Jawahar ·

    legacycode.com S E C T I O N 0 3 / 0 8 Input Modality beyond typing
  40. CTX-MGMT // GIDS 2026 40 / 71 Ragunath Jawahar ·

    legacycode.com M O D A L I T Y / V O I C E Voice beats typing Your input is the ceiling. Voice raises it two ways. B A N D W I D T H ~150 wpm speaking vs ~40 wpm typing — same 60s, ~4× the content C O M P L E T E N E S S more context you’ll say constraints, edge cases, prior context you’d skip if typing wisprflow.ai Same model. More context. Better output.
  41. CTX-MGMT // GIDS 2026 41 / 71 Ragunath Jawahar ·

    legacycode.com M O D A L I T Y / V I S I O N Show and tell Foundation models are natively multi-modal. Screenshots, diagrams, and UI captures go in as first-class input. B A N D W I D T H One image, paragraphs of context Layout, spacing, colors, surrounding context, real data — none of which you’d transcribe. F I D E L I T Y Prose is the lossy channel For UI bugs, design comps, charts, whiteboard sketches, error dialogs, the image isn’t. “The card is misaligned” is five words. The screenshot is the misalignment.
  42. CTX-MGMT // GIDS 2026 42 / 71 Ragunath Jawahar ·

    legacycode.com S E C T I O N 0 4 / 0 8 Where prose loses fidelity structured formats for structured problems
  43. CTX-MGMT // GIDS 2026 43 / 71 Ragunath Jawahar ·

    legacycode.com F I D E L I T Y / 1 o f 4 AST for code Building or debugging a parser, codemod, syntactic refactor. The LLM keeps editing the wrong node or missing cases. Instead of describing the tree, show it. Program ├── FunctionDeclaration name="handleClick" │ └── BlockStatement │ ├── VariableDeclaration │ └── CallExpression callee="setState" └── ExportDefaultDeclaration The model no longer has to parse your description of the tree. The tree is already there.
  44. CTX-MGMT // GIDS 2026 44 / 71 Ragunath Jawahar ·

    legacycode.com F I D E L I T Y / 2 o f 4 Hierarchy for UI “The second card inside the sidebar, above the footer” — the model edits the wrong node. Hand over the hierarchy; pseudo-structured indentation is enough. Layout: Sidebar: CardList: - Card (id=promo) - Card (id=stats) ← this one - Card (id=links) Main: Header Content Now “this card” points at something concrete.
  45. CTX-MGMT // GIDS 2026 45 / 71 Ragunath Jawahar ·

    legacycode.com F I D E L I T Y / 3 o f 4 DOT for graphs A cyclic dependency the linter flagged. You describe it in English and the model keeps suggesting the wrong edge to cut. Hand over the graph. digraph { Auth -> User User -> Billing Billing -> Auth // cycle Session -> User Session -> Auth } The model sees the shape — cycles, hubs, orphans — without reconstructing it from prose.
  46. CTX-MGMT // GIDS 2026 46 / 71 Ragunath Jawahar ·

    legacycode.com F I D E L I T Y / 4 o f 4 Projections Take one precise form. Project it into another. The agent handles the translation. W O R K E D E X A M P L E — T E S T M A T R I X → T E S T C O D E qty price discount expected notes 0 10 0 0 empty cart 1 10 0 10 baseline 2 10 0.1 18 percent discount 1 -10 0 error negative price 1e6 0.01 0 9999.99 float precision T H E S A M E M O V E OpenAPI spec → client SDK State diagram → state machine code ERD → ORM models Type definitions → runtime validators Prose is the default, not the requirement. Match the format to the shape of the problem.
  47. CTX-MGMT // GIDS 2026 47 / 71 Ragunath Jawahar ·

    legacycode.com S E C T I O N 0 5 / 0 8 Thread Control eight ways to continue a session
  48. CTX-MGMT // GIDS 2026 48 / 71 Ragunath Jawahar ·

    legacycode.com T H R E A D C O N T R O L / 1 o f 8 Summarize and restart When the session is getting noisy, step out clean. Ask the model for a structured handoff: “Write a summary of what we’ve done, what we decided, where we’re stuck, and the next three steps. Treat the reader as a fresh instance of yourself.” Paste the output into a new chat. Fresh window, sharp framing, no accumulated noise. T R A D E Lossy — details fall out. W I N Attention is sharp again, token budget resets, activation is clean.
  49. CTX-MGMT // GIDS 2026 49 / 71 Ragunath Jawahar ·

    legacycode.com T H R E A D C O N T R O L / 2 o f 8 User-initiated compaction You run a command. The agent compresses the conversation in place. Same session, lighter window. Claude Code: /compact. Cursor: “Summarize thread.” The agent reads the existing conversation and replaces it with a condensed summary you stay inside. Use when the thread is on-track and you just need more room. T R A D E Lossy — and the agent picks what to keep. W I N Zero friction, you keep working.
  50. CTX-MGMT // GIDS 2026 50 / 71 Ragunath Jawahar ·

    legacycode.com T H R E A D C O N T R O L / 3 o f 8 Fork and continue When the agent goes off track, don’t correct — rewind. Edit the prompt that produced the bad response. The incorrect response never enters the window; the model sees only your improved prompt. Two flavors: tweak (add a constraint, sharpen the ask) or redirect (throw it out, try a different angle). Correction is context pollution. Rewind instead. T R A D E You lose any work after the fork point. W I N No correction turn, no prior wrong answer for the model to work around.
  51. CTX-MGMT // GIDS 2026 51 / 71 Ragunath Jawahar ·

    legacycode.com T H R E A D C O N T R O L / 4 o f 8 Delegate to a sub-agent The next chunk of work doesn’t have to happen in this window. Hand it off to a sub-agent with its own fresh context. The main thread sees only the sub-agent’s return value — typically a few hundred tokens instead of thousands. Pure activation artifact — the sub-agent’s whole behavior is its system prompt. T R A D E The sub-agent can’t see your full conversation. W I N Main window stays clean, no cost to the work it’s doing.
  52. CTX-MGMT // GIDS 2026 52 / 71 Ragunath Jawahar ·

    legacycode.com T H R E A D C O N T R O L / 5 o f 8 Append to a scratchpad Offload state to a file. Let the session read from disk, not from memory. PROGRESS.md, plan.md, todo.md — anything the agent can read between turns and write to as it works. As context fills, the scratchpad holds the truth. The window holds only what’s needed now. T R A D E Mild discipline — someone (you or the agent) has to keep it current. W I N Survives compaction, survives restart, survives you closing the laptop.
  53. CTX-MGMT // GIDS 2026 53 / 71 Ragunath Jawahar ·

    legacycode.com T H R E A D C O N T R O L / 6 o f 8 Log decisions Scratchpads track where you are. Decision logs track why you’re there. DECISIONS.md — append-only record of choices made and the reasoning behind them. “Chose X over Y because Z.” “Skipped approach A — brittle under constraint B.” A new session that reads the log stops re-litigating settled questions. T R A D E Mild discipline — write the why, not just the what. W I N The reasoning survives the compression.
  54. CTX-MGMT // GIDS 2026 54 / 71 Ragunath Jawahar ·

    legacycode.com T H R E A D C O N T R O L / 7 o f 8 Read the last N commits Your git history is context. Point the agent at it. “Look at the last 10 commits on this branch before you start — what’s been done, what’s incomplete, and what I was working on last.” The agent reads real diffs, commit messages, and file changes. No handoff doc. No scratchpad maintenance. T R A D E Commit discipline matters — thin or junk commits give thin context. W I N Free externalization — the artifact exists as a byproduct of normal work.
  55. CTX-MGMT // GIDS 2026 55 / 71 Ragunath Jawahar ·

    legacycode.com T H R E A D C O N T R O L / 8 o f 8 Yegge’s beads Turn a monolithic requirements doc into a graph of small, dependency-aware issues — stored in the repo, under version control. Break a large PRD once. Add subtasks. Add dependencies. Commit. Each issue seeds its own thread. A session reads one issue, not the whole PRD. Want parallelism? Kick off N agents, one per issue. Credit: Steve Yegge · github.com/gastownhall/beads T R A D E Upfront decomposition — you have to actually break the PRD down. W I N Externalized, queryable context. The agent reads a small unit, not two large docs.
  56. CTX-MGMT // GIDS 2026 56 / 71 Ragunath Jawahar ·

    legacycode.com S E C T I O N 0 6 / 0 8 Failure Modes what breaks, and how to recover
  57. CTX-MGMT // GIDS 2026 57 / 71 Ragunath Jawahar ·

    legacycode.com F A I L U R E / 1 o f 6 Distractors S C E N A R I O You paste three files so the agent “has everything” — two are actually unrelated. Quality drops on the one that matters. M E C H A N I S M Even within a tight window, irrelevant-but-plausible sentences measurably drop accuracy. The model treats nearby text as context to condition on — relevant or not. (Shi et al. 2023, ICML.) R E C O V E R Y Curate, don’t dump. One file, one function, one error — only what the turn actually needs. When in doubt, trim.
  58. CTX-MGMT // GIDS 2026 58 / 71 Ragunath Jawahar ·

    legacycode.com F A I L U R E / 2 o f 6 Stale API hallucination S C E N A R I O You ask for a TanStack Query v5 example; you get v4 syntax. Or a Next.js App Router question, answered in Pages Router. The code compiles. The API doesn’t exist. M E C H A N I S M The model was trained before the API changed. Confidently invents the old signature, removed flag, deprecated hook. A specific slice of the general hallucination problem (Ji et al. 2023) where the cause is training cutoff, not reasoning error. R E C O V E R Y Point the agent at the current source of truth. MCP server connected to vendor docs is the cleanest fix. Lighter alternatives: paste version, read package.json, fetch the relevant doc.
  59. CTX-MGMT // GIDS 2026 59 / 71 Ragunath Jawahar ·

    legacycode.com F A I L U R E / 3 o f 6 Stale recall S C E N A R I O You deleted the function. The agent, having worked with the file all session, writes it back. M E C H A N I S M Induction heads (Olsson et al. 2022) — the circuit that lets the model repeat patterns from context is also the circuit that resurfaces content you wanted gone. The older the state in the window, the more confidently the agent treats it as current. R E C O V E R Y Clear and restart. A fresh conversation with the current file state is faster than fighting ghost content.
  60. CTX-MGMT // GIDS 2026 60 / 71 Ragunath Jawahar ·

    legacycode.com F A I L U R E / 4 o f 6 Hallucination lock-in S C E N A R I O Wrong answer in turn 8. Corrected in turn 10. From turn 11 onward, the model quietly kept building on the original wrong answer. M E C H A N I S M The wrong answer is now its answer, sitting in the window. The model tends to extend or defend its own prior output rather than contradict it. R E C O V E R Y Fork, don’t correct (Slide 50). Edit the original prompt before it produced the wrong answer. The wrong answer never enters the window; nothing to defend.
  61. CTX-MGMT // GIDS 2026 61 / 71 Ragunath Jawahar ·

    legacycode.com F A I L U R E / 5 o f 6 Correction-induced confusion S C E N A R I O Turn 8: wrong. Turn 9: correction. Turn 14: a follow-up — and the model produces a third answer that blends both, or flips between them unpredictably. M E C H A N I S M Distinct from lock-in. Here, the wrong answer and the correction are both in the window, and the model oscillates between them — looks like reasoning, is actually unresolved context. R E C O V E R Y Fork-and-continue (Slide 50) if caught in flight. Otherwise summarize-and-restart (Slide 48) — a fresh session is cleaner than patching.
  62. CTX-MGMT // GIDS 2026 62 / 71 Ragunath Jawahar ·

    legacycode.com F A I L U R E / 6 o f 6 Format non-compliance S C E N A R I O Ask for camelCase; get mixed-case on turn 30. Ask for 80-column wrap; get 120 next generation. Ask for strict JSON; get prose with JSON-ish middle. M E C H A N I S M LLMs consume structured input well. They don’t reliably produce it. This isn’t really an LLM problem — it’s an output- validation problem. R E C O V E R Y Run a formatter (Prettier, Black, gofmt, ruff) and a linter post-generation. For must-be-valid output, use constrained decoding (Outlines) or native structured-output / function-calling APIs.
  63. CTX-MGMT // GIDS 2026 63 / 71 Ragunath Jawahar ·

    legacycode.com S E C T I O N 0 7 / 0 8 Agent-side context management what the tools do for you
  64. CTX-MGMT // GIDS 2026 64 / 71 Ragunath Jawahar ·

    legacycode.com A G E N T - S I D E / 1 o f 3 Auto-compaction The agent compresses its own history mid-session — no command from you. Triggered as the window fills: long tool outputs, multi-file refactors, deep debug runs. Older turns get replaced with a summary the agent writes itself. What stays is its call, not yours. W A T C H F O R The thread feeling “lighter” — the agent re-asks something you already answered, or slips on a convention you set early.
  65. CTX-MGMT // GIDS 2026 65 / 71 Ragunath Jawahar ·

    legacycode.com A G E N T - S I D E / 2 o f 3 Sub-agents The agent spawns a fresh-context worker for a subtask. Main thread sees only the return value — hundreds of tokens instead of thousands. Typical sweet spots: searching across files, verification passes, side questions. Claude Code’s Task tool, Cursor’s composer, Aider’s plan → execute split — same move. W A T C H F O R A sub-agent is only as good as its delegation prompt. Weak prompt, weak return.
  66. CTX-MGMT // GIDS 2026 66 / 71 Ragunath Jawahar ·

    legacycode.com A G E N T - S I D E / 3 o f 3 Parallel agents Many agents at once, each with its own context. Distinct from sub-agents: delegation is serial; parallelism is simultaneous. In practice: Claude Code’s parallel Task tool, git worktree parallelism, beads-driven work with one issue per agent. The context win is shape, not size — N narrow contexts beat one wide one, because attention stays sharp. W A T C H F O R Coordination cost. Shared files or cross-dependencies eat the parallelism gain.
  67. CTX-MGMT // GIDS 2026 67 / 71 Ragunath Jawahar ·

    legacycode.com S E C T I O N 0 8 / 0 8 Bonus two extras worth carrying home
  68. CTX-MGMT // GIDS 2026 68 / 71 Ragunath Jawahar ·

    legacycode.com B O N U S / 1 o f 2 Triangulation When the codebase overflows any single window, ask from two angles — same agent, different entry points, or two different agents — and see where they agree. Agreement is a lead. Disagreement flags ambiguity. The real answer underneath: know your codebase well. Triangulation is a crutch for when you don’t.
  69. CTX-MGMT // GIDS 2026 69 / 71 Ragunath Jawahar ·

    legacycode.com B O N U S / 2 o f 2 Caveman github.com/juliusbrussee/caveman Token compression for the inputs you send to the agent — rewrites prose into a terse “caveman” style that reduces token count while preserving technical substance. For when the task is clear and the bottleneck is cost, not correctness. W A T C H F O R Compress the narration, not the artifacts. Don’t shrink specs, API contracts, or exact error messages.
  70. CTX-MGMT // GIDS 2026 70 / 71 Ragunath Jawahar ·

    legacycode.com C L O S E Key takeaways 01 Curate, don’t dump. Tool outputs and file reads dominate the window — your prompt is the smallest thing in it. Less context, better curated. 02 Frame deliberately. You’re positioning the model, not informing it. Name the framework, role, or domain — that’s where activation comes from. 03 Match format to problem shape. Prose is the default, not the requirement. Hand over the AST, hierarchy, graph, or matrix. 04 Externalize state. Scratchpads, decision logs, and git history survive compaction and restart. The window isn’t the only place state can live. 05 Expertise amplifies. You can’t name what you don’t know. Framing is the lever; vocabulary is how you see the levers.
  71. CTX-MGMT // GIDS 2026 71 / 71 Ragunath Jawahar ·

    legacycode.com Thank you. Ragunath Jawahar · Founder, Legacy Code Labs [email protected] Q & A