Context Management for Everyday Software Engineering

CTX-MGMT // GIDS 2026 01 / 71 Ragunath Jawahar ·
legacycode.com Context Management. for everyday software engineering Ragunath Jawahar • Founder, Legacy Code Labs • https://legacycode.com April 22, 2026 · GIDS 2026 · Bengaluru

legacycode.com “Good morning, Lucy.”

legacycode.com R O A D M A P Agenda 01 Foundations tokens, context windows, what fills them up 02 Latent Space Activation how framing positions the model 03 Input Modality voice, show-and-tell 04 Where Prose Loses Fidelity structured formats for structured problems 05 Thread Control eight ways to continue a session 06 Failure Modes what breaks, and how to recover 07 Agent-side Management what the tools do for you 08 Bonus + Key Takeaways

legacycode.com O P E N I N G Context Management as a Software Engineering Skill promoting context curation from folk practice to named principle

legacycode.com D E F I N I T I O N What is context management? Context management is the process of curating what information an AI agent sees, when it sees it, in what form, and what it doesn’t see.

legacycode.com P R O M O T I O N Why this is a first-class SWE skill now 01 We manage agents, not just write code. 02 Model capability is rarely the bottleneck. Failures are increasingly about inputs, not intelligence. 03 ~17% of PRs contain high-severity bugs (Qodo benchmark) — many trace back to missing context. 04 Token budgets. 05 Seniority is shifting toward those who frame problems well and keep sessions focused.

legacycode.com S E C T I O N 0 1 / 0 8 Foundations tokens, context, and memory

legacycode.com F O U N D A T I O N S What’s a token? Words are to human language what tokens are to language models — the unit meaning is carried in. A token is a subword unit from the model’s learned vocabulary — not a character, not a word. Heuristics (English): ~4 characters per token, ~0.75 words per token. Code and non-English text run higher. Every limit, every bill, every latency number is denominated in tokens.

legacycode.com F O U N D A T I O N S / 0 1 Tokens vs. characters: English Hello, world! 13 c h a r a c t e r s 4 t o k e n s Right on the ~4-chars-per-token heuristic. Canonical English baseline.

legacycode.com F O U N D A T I O N S / 0 2 Tokens vs. characters: Hindi है लो व र्ल ्ड ! 12 c h a r a c t e r s 7 t o k e n s Fewer characters than English, but ~2× the tokens. Devanagari falls back to byte-level. Non-English users pay more — invisibly.

legacycode.com F O U N D A T I O N S Models are stateless Zero memory between calls. The agent re-injects the whole history on every call. Most of that window gets assembled by the agent — system prompt, tool calls, tool outputs, prior turns. Your lever is narrower: it’s what you say. Everything else is downstream of that.

legacycode.com F O U N D A T I O N S What is a context window? The set of tokens the model sees in a single inference call. input + output combined.

legacycode.com F O U N D A T I O N S An inference call R E Q U E S T { "messages": [ { "role": "system", "content": "You are a helpful geography tutor. Answer in as few words as possible." }, { "role": "user", "content": "Capital of India?" } ] } R E S P O N S E "New Delhi." System + user + reply all live in the same envelope. Output counts against the same budget.

legacycode.com F O U N D A T I O N S Next call, the agent rebuilds it R E Q U E S T { "messages": [ { "role":"system", "content":"You are a helpful geography tutor..." }, // re-sent by agent { "role":"user", "content":"Capital of India?" }, // re-sent by agent { "role":"assistant", "content":"New Delhi." }, // re-sent by agent { "role":"user", "content":"And Brazil?" } // new ] } R E S P O N S E "Brasília."

legacycode.com F O U N D A T I O N S Already in the window before you type Every session opens with tens of thousands of tokens already spent. You haven’t said a word yet. F R O M T H E A G E N T System prompt thousands of tokens of agent instructions Built-in tool definitions JSON schemas for every tool the agent can call Environment metadata cwd, OS, model, time, permissions F R O M Y O U R S E T U P AGENTS.md / CLAUDE.md / .cursorrules project instructions MCP servers each contributes tools, resources, prompts Installed skills names and descriptions in the registry Saved memory / preferences

legacycode.com F O U N D A T I O N S / 1 o f 3 What fills up: from you, live This is where the window actually starts to explode. Your messages every turn Files you attach or drag in Tight prompts → tight sessions.

legacycode.com F O U N D A T I O N S / 2 o f 3 From the agent’s decisions Typically dominates — and where most curation acts. Tools it calls and their results Files it reads via Read, Grep, Glob Shell commands and their output RAG chunks it retrieves Skills it invokes the full SKILL.md body, not just the description Sub-agents their output comes back into context

legacycode.com F O U N D A T I O N S / 3 o f 3 From the model’s output Every assistant turn Visible. Predictable. Counted. Thinking tokens if enabled Silent. 10–50K+ per turn. Off-screen, on-bill.

legacycode.com F O U N D A T I O N S A turn’s anatomy Composition of one typical coding-agent turn (~156K tokens) System prompt ~3K 2% Tool definitions ~8K 5% Prior conversation ~25K 16% File reads + tool outputs ~120K 77% Current user message ~0.5K 0.3% Tool outputs and file reads dominate. Your actual question is the smallest thing in the window.

legacycode.com F O U N D A T I O N S Same turn, different windows How that same ~156K payload fits Claude Haiku 4.5 200K window ~78% full ~44K headroom GPT-5.3 Codex 400K window ~39% full ~244K headroom Claude Sonnet 4.6 / Opus 4.7 1000K window ~16% full ~844K headroom Same work, different budgets. Model choice is a context-management lever.

legacycode.com F O U N D A T I O N S Tokens aren’t free Model Input ($/M) Output ($/M) Ratio Claude Opus 4.7 $5 $25 5× Claude Sonnet 4.6 $3 $15 5× OpenAI GPT-5.3 Codex $1.75 $14 8× 3–5× o u t p u t : i n p u t typical asymmetry Latency: bigger context → slower first token, slower generation. Prompt caching: changes the economics for repeated prefixes — pay full price once, cache the rest. Pricing per million tokens · April 2026

legacycode.com F O U N D A T I O N S Advertised ≠ effective Effective context = the length at which the model still reasons reliably. 60–70% of advertised window — practitioner heuristic for where reasoning still holds. Needle-in-a-haystack ≠ reasoning-over-haystack. Retrieval is easy. Synthesis degrades.

legacycode.com F O U N D A T I O N S Attention favors the edges Lost-in-the-middle (Liu et al., 2023) arxiv.org/abs/2307.03172 Info at the start and end of the window is used more reliably than the middle. This is why effective context sits below advertised — the middle is the first casualty.

legacycode.com F O U N D A T I O N S The edge is your tool The U-curve has a constructive flip side. The start is the preserved edge. Plant what matters there. system prompt AGENTS.md role framings

legacycode.com E X C E P T I O N S When it works Three patterns break the rule — filling the window is fine, sometimes helpful. 01 Batch application one task, many targets 02 Iterative refinement many tries, one target 03 Progressive expansion many steps, one target

legacycode.com E X C E P T I O N S / 1 o f 3 Batch application: one task, many targets W H A T T H E P A T T E R N I S The procedure is fixed; the target varies each turn. Prior turns establish the template implicitly — once the agent has produced the first few, subsequent ones come out consistent in style, verbosity, and conventions. W H E N I T W O R K S — Adding OpenAPI specifications to every function in a controller — Logging exceptions for every catch block in a package — Generating CRUD scaffolding for a list of models

legacycode.com E X C E P T I O N S / 2 o f 3 Iterative refinement: many tries, one target W H A T T H E P A T T E R N I S The target is fixed; each turn is a new attempt that builds on what prior attempts revealed. Prior turns record the search trajectory — what was tried, what failed, which direction to lean. W H E N I T W O R K S — Refining a SQL query until it returns the right output — Working on an email draft — Troubleshooting a failing test

legacycode.com E X C E P T I O N S / 3 o f 3 Progressive expansion: many steps, one target W H A T T H E P A T T E R N I S The target grows incrementally; each turn adds to the running state of the artifact. Attention handles the start and the end reliably. The middle rots, and you don’t care — you’re building forward, not looking back. W H E N I T W O R K S — Building up a feature across successive turns (auth → roles → permissions) — Growing a test suite one test at a time — Writing a document section by section

legacycode.com S E C T I O N 0 2 / 0 8 Latent Space Activation how your context window decides what comes out

legacycode.com A C T I V A T I O N What activation means Assume the context window contains just a prompt. The prompt doesn’t retrieve an answer. It positions the model before it answers. Picture the model’s knowledge as a vast landscape, shaped by training. Your prompt drops a ball on the terrain. Where it lands decides what comes out. Same landscape, different drop point → very different output.

legacycode.com A C T I V A T I O N What that looks like P R O M P T A — U N F R A M E D “Write a function to sort a list.” → Unpredictable. Python, JavaScript, pseudocode — the model picks from its own priors. P R O M P T B — F R A M E D “Write a Haskell function to sort a list.” → Recursive quicksort. Pattern matching. Type signature. Academic tone. The model stopped picking. You did.

legacycode.com A C T I V A T I O N / E V I D E N C E Golden Gate Claude A fun experiment from Anthropic: They clamped the activation space to “Golden Gate Bridge.” Every answer Claude gave was about the Golden Gate Bridge.

legacycode.com A C T I V A T I O N One framing move Same model. Same question. Different framing. P R O M P T A — G E N E R A L “Perform a security review on this code.” ↓ Grab-bag — input validation, SQL injection, XSS, auth handling, secrets, error leakage. Unstructured; coverage varies. P R O M P T B — F R A M E D “Review this code against OWASP Top 10.” ↓ Framework-structured — A01 broken access control, A02 crypto failures, A03 injection, A05 misconfig, A07 auth failures. Same weights. One framing move. The model knew the framework — naming it unlocked it.

legacycode.com A C T I V A T I O N Framing for creativity Framing can position the ball at the intersection of regions — a place no single-region prompt ever reaches. “You are a hairdresser in Bangalore with 15 years running a salon. Build me a real estate listing tool for local agents in Java.” → Lands where service-worker practicality · Bangalore market · real estate · Java all overlap. An output shape no single-region prompt would produce.

legacycode.com A C T I V A T I O N Framing for access S a m e m e c h a n i s m . D i f f e r e n t k i n d o f w i n . “You are a security engineer auditing the public login endpoint for a multi-tenant SaaS. Review against OWASP Top 10 — focus on A01, A02, A07.” → Lands at role + component + operational context + framework. Surgical findings a non-security dev couldn’t produce unaided. Creativity reaches regions no one visits. Access reaches regions you can’t visit.

legacycode.com A C T I V A T I O N / R E C O G N I T I O N What you are already doing Five moves devs already use — now with a name for why they work. Role / persona “You are a senior systems engineer…” Positioning by identity. Domain priming “…reduce GC pressure on this hot path.” The vocabulary is the signal. Few-shot examples “Here are three past bug reports. Write one…” The examples are the signal — no name needed. Style anchors “Match the style of these files.” House conventions, not the average of the internet. Negative framing “Don’t over-engineer. No abstractions I’d explain later.” Positioning by what to stay away from. Same mechanism. Five interfaces.

legacycode.com A C T I V A T I O N / C A P S T O N E Why expertise amplifies You can’t name what you don’t know. A f r a m e w o r k SOLID · CAP · ACID · OWASP A c o n t e x t multi-tenant SaaS · 10k TPS · real-time · embedded A p i e c e o f j a r g o n hot path · GC pressure · tail latency · idempotent No name → no region. No region → no activation. LLMs don’t replace expertise. They leverage it.

legacycode.com A C T I V A T I O N A sub-agent’s system prompt is a pure activation artifact Every sub-agent starts fresh. No conversation history. The only thing in its window when it boots is the prompt you give it. “You are a senior security engineer. Your only job is to audit a code change against OWASP Top 10. For each finding, name the category (A01–A10), the exact line, and the fix. Nothing else — no general review, no style comments.” Four framings stacked: role + framework + scope + negative framing. It isn’t personality. It’s geography — where the ball lands before any work begins.

legacycode.com S E C T I O N 0 3 / 0 8 Input Modality beyond typing

legacycode.com M O D A L I T Y / V O I C E Voice beats typing Your input is the ceiling. Voice raises it two ways. B A N D W I D T H ~150 wpm speaking vs ~40 wpm typing — same 60s, ~4× the content C O M P L E T E N E S S more context you’ll say constraints, edge cases, prior context you’d skip if typing wisprflow.ai Same model. More context. Better output.

legacycode.com M O D A L I T Y / V I S I O N Show and tell Foundation models are natively multi-modal. Screenshots, diagrams, and UI captures go in as first-class input. B A N D W I D T H One image, paragraphs of context Layout, spacing, colors, surrounding context, real data — none of which you’d transcribe. F I D E L I T Y Prose is the lossy channel For UI bugs, design comps, charts, whiteboard sketches, error dialogs, the image isn’t. “The card is misaligned” is five words. The screenshot is the misalignment.

legacycode.com S E C T I O N 0 4 / 0 8 Where prose loses fidelity structured formats for structured problems

legacycode.com F I D E L I T Y / 1 o f 4 AST for code Building or debugging a parser, codemod, syntactic refactor. The LLM keeps editing the wrong node or missing cases. Instead of describing the tree, show it. Program ├── FunctionDeclaration name="handleClick" │ └── BlockStatement │ ├── VariableDeclaration │ └── CallExpression callee="setState" └── ExportDefaultDeclaration The model no longer has to parse your description of the tree. The tree is already there.

legacycode.com F I D E L I T Y / 2 o f 4 Hierarchy for UI “The second card inside the sidebar, above the footer” — the model edits the wrong node. Hand over the hierarchy; pseudo-structured indentation is enough. Layout: Sidebar: CardList: - Card (id=promo) - Card (id=stats) ← this one - Card (id=links) Main: Header Content Now “this card” points at something concrete.

legacycode.com F I D E L I T Y / 3 o f 4 DOT for graphs A cyclic dependency the linter flagged. You describe it in English and the model keeps suggesting the wrong edge to cut. Hand over the graph. digraph { Auth -> User User -> Billing Billing -> Auth // cycle Session -> User Session -> Auth } The model sees the shape — cycles, hubs, orphans — without reconstructing it from prose.

legacycode.com F I D E L I T Y / 4 o f 4 Projections Take one precise form. Project it into another. The agent handles the translation. W O R K E D E X A M P L E — T E S T M A T R I X → T E S T C O D E qty price discount expected notes 0 10 0 0 empty cart 1 10 0 10 baseline 2 10 0.1 18 percent discount 1 -10 0 error negative price 1e6 0.01 0 9999.99 float precision T H E S A M E M O V E OpenAPI spec → client SDK State diagram → state machine code ERD → ORM models Type definitions → runtime validators Prose is the default, not the requirement. Match the format to the shape of the problem.

legacycode.com S E C T I O N 0 5 / 0 8 Thread Control eight ways to continue a session

legacycode.com T H R E A D C O N T R O L / 1 o f 8 Summarize and restart When the session is getting noisy, step out clean. Ask the model for a structured handoff: “Write a summary of what we’ve done, what we decided, where we’re stuck, and the next three steps. Treat the reader as a fresh instance of yourself.” Paste the output into a new chat. Fresh window, sharp framing, no accumulated noise. T R A D E Lossy — details fall out. W I N Attention is sharp again, token budget resets, activation is clean.

legacycode.com T H R E A D C O N T R O L / 2 o f 8 User-initiated compaction You run a command. The agent compresses the conversation in place. Same session, lighter window. Claude Code: /compact. Cursor: “Summarize thread.” The agent reads the existing conversation and replaces it with a condensed summary you stay inside. Use when the thread is on-track and you just need more room. T R A D E Lossy — and the agent picks what to keep. W I N Zero friction, you keep working.

legacycode.com T H R E A D C O N T R O L / 3 o f 8 Fork and continue When the agent goes off track, don’t correct — rewind. Edit the prompt that produced the bad response. The incorrect response never enters the window; the model sees only your improved prompt. Two flavors: tweak (add a constraint, sharpen the ask) or redirect (throw it out, try a different angle). Correction is context pollution. Rewind instead. T R A D E You lose any work after the fork point. W I N No correction turn, no prior wrong answer for the model to work around.

legacycode.com T H R E A D C O N T R O L / 4 o f 8 Delegate to a sub-agent The next chunk of work doesn’t have to happen in this window. Hand it off to a sub-agent with its own fresh context. The main thread sees only the sub-agent’s return value — typically a few hundred tokens instead of thousands. Pure activation artifact — the sub-agent’s whole behavior is its system prompt. T R A D E The sub-agent can’t see your full conversation. W I N Main window stays clean, no cost to the work it’s doing.

legacycode.com T H R E A D C O N T R O L / 5 o f 8 Append to a scratchpad Offload state to a file. Let the session read from disk, not from memory. PROGRESS.md, plan.md, todo.md — anything the agent can read between turns and write to as it works. As context fills, the scratchpad holds the truth. The window holds only what’s needed now. T R A D E Mild discipline — someone (you or the agent) has to keep it current. W I N Survives compaction, survives restart, survives you closing the laptop.

legacycode.com T H R E A D C O N T R O L / 6 o f 8 Log decisions Scratchpads track where you are. Decision logs track why you’re there. DECISIONS.md — append-only record of choices made and the reasoning behind them. “Chose X over Y because Z.” “Skipped approach A — brittle under constraint B.” A new session that reads the log stops re-litigating settled questions. T R A D E Mild discipline — write the why, not just the what. W I N The reasoning survives the compression.

legacycode.com T H R E A D C O N T R O L / 7 o f 8 Read the last N commits Your git history is context. Point the agent at it. “Look at the last 10 commits on this branch before you start — what’s been done, what’s incomplete, and what I was working on last.” The agent reads real diffs, commit messages, and file changes. No handoff doc. No scratchpad maintenance. T R A D E Commit discipline matters — thin or junk commits give thin context. W I N Free externalization — the artifact exists as a byproduct of normal work.

legacycode.com T H R E A D C O N T R O L / 8 o f 8 Yegge’s beads Turn a monolithic requirements doc into a graph of small, dependency-aware issues — stored in the repo, under version control. Break a large PRD once. Add subtasks. Add dependencies. Commit. Each issue seeds its own thread. A session reads one issue, not the whole PRD. Want parallelism? Kick off N agents, one per issue. Credit: Steve Yegge · github.com/gastownhall/beads T R A D E Upfront decomposition — you have to actually break the PRD down. W I N Externalized, queryable context. The agent reads a small unit, not two large docs.

legacycode.com S E C T I O N 0 6 / 0 8 Failure Modes what breaks, and how to recover

legacycode.com F A I L U R E / 1 o f 6 Distractors S C E N A R I O You paste three files so the agent “has everything” — two are actually unrelated. Quality drops on the one that matters. M E C H A N I S M Even within a tight window, irrelevant-but-plausible sentences measurably drop accuracy. The model treats nearby text as context to condition on — relevant or not. (Shi et al. 2023, ICML.) R E C O V E R Y Curate, don’t dump. One file, one function, one error — only what the turn actually needs. When in doubt, trim.

legacycode.com F A I L U R E / 2 o f 6 Stale API hallucination S C E N A R I O You ask for a TanStack Query v5 example; you get v4 syntax. Or a Next.js App Router question, answered in Pages Router. The code compiles. The API doesn’t exist. M E C H A N I S M The model was trained before the API changed. Confidently invents the old signature, removed flag, deprecated hook. A specific slice of the general hallucination problem (Ji et al. 2023) where the cause is training cutoff, not reasoning error. R E C O V E R Y Point the agent at the current source of truth. MCP server connected to vendor docs is the cleanest fix. Lighter alternatives: paste version, read package.json, fetch the relevant doc.

legacycode.com F A I L U R E / 3 o f 6 Stale recall S C E N A R I O You deleted the function. The agent, having worked with the file all session, writes it back. M E C H A N I S M Induction heads (Olsson et al. 2022) — the circuit that lets the model repeat patterns from context is also the circuit that resurfaces content you wanted gone. The older the state in the window, the more confidently the agent treats it as current. R E C O V E R Y Clear and restart. A fresh conversation with the current file state is faster than fighting ghost content.

legacycode.com F A I L U R E / 4 o f 6 Hallucination lock-in S C E N A R I O Wrong answer in turn 8. Corrected in turn 10. From turn 11 onward, the model quietly kept building on the original wrong answer. M E C H A N I S M The wrong answer is now its answer, sitting in the window. The model tends to extend or defend its own prior output rather than contradict it. R E C O V E R Y Fork, don’t correct (Slide 50). Edit the original prompt before it produced the wrong answer. The wrong answer never enters the window; nothing to defend.

legacycode.com F A I L U R E / 5 o f 6 Correction-induced confusion S C E N A R I O Turn 8: wrong. Turn 9: correction. Turn 14: a follow-up — and the model produces a third answer that blends both, or flips between them unpredictably. M E C H A N I S M Distinct from lock-in. Here, the wrong answer and the correction are both in the window, and the model oscillates between them — looks like reasoning, is actually unresolved context. R E C O V E R Y Fork-and-continue (Slide 50) if caught in flight. Otherwise summarize-and-restart (Slide 48) — a fresh session is cleaner than patching.

legacycode.com F A I L U R E / 6 o f 6 Format non-compliance S C E N A R I O Ask for camelCase; get mixed-case on turn 30. Ask for 80-column wrap; get 120 next generation. Ask for strict JSON; get prose with JSON-ish middle. M E C H A N I S M LLMs consume structured input well. They don’t reliably produce it. This isn’t really an LLM problem — it’s an output- validation problem. R E C O V E R Y Run a formatter (Prettier, Black, gofmt, ruff) and a linter post-generation. For must-be-valid output, use constrained decoding (Outlines) or native structured-output / function-calling APIs.

legacycode.com S E C T I O N 0 7 / 0 8 Agent-side context management what the tools do for you

legacycode.com A G E N T - S I D E / 1 o f 3 Auto-compaction The agent compresses its own history mid-session — no command from you. Triggered as the window fills: long tool outputs, multi-file refactors, deep debug runs. Older turns get replaced with a summary the agent writes itself. What stays is its call, not yours. W A T C H F O R The thread feeling “lighter” — the agent re-asks something you already answered, or slips on a convention you set early.

legacycode.com A G E N T - S I D E / 2 o f 3 Sub-agents The agent spawns a fresh-context worker for a subtask. Main thread sees only the return value — hundreds of tokens instead of thousands. Typical sweet spots: searching across files, verification passes, side questions. Claude Code’s Task tool, Cursor’s composer, Aider’s plan → execute split — same move. W A T C H F O R A sub-agent is only as good as its delegation prompt. Weak prompt, weak return.

legacycode.com A G E N T - S I D E / 3 o f 3 Parallel agents Many agents at once, each with its own context. Distinct from sub-agents: delegation is serial; parallelism is simultaneous. In practice: Claude Code’s parallel Task tool, git worktree parallelism, beads-driven work with one issue per agent. The context win is shape, not size — N narrow contexts beat one wide one, because attention stays sharp. W A T C H F O R Coordination cost. Shared files or cross-dependencies eat the parallelism gain.

legacycode.com S E C T I O N 0 8 / 0 8 Bonus two extras worth carrying home

legacycode.com B O N U S / 1 o f 2 Triangulation When the codebase overflows any single window, ask from two angles — same agent, different entry points, or two different agents — and see where they agree. Agreement is a lead. Disagreement flags ambiguity. The real answer underneath: know your codebase well. Triangulation is a crutch for when you don’t.

legacycode.com B O N U S / 2 o f 2 Caveman github.com/juliusbrussee/caveman Token compression for the inputs you send to the agent — rewrites prose into a terse “caveman” style that reduces token count while preserving technical substance. For when the task is clear and the bottleneck is cost, not correctness. W A T C H F O R Compress the narration, not the artifacts. Don’t shrink specs, API contracts, or exact error messages.

legacycode.com C L O S E Key takeaways 01 Curate, don’t dump. Tool outputs and file reads dominate the window — your prompt is the smallest thing in it. Less context, better curated. 02 Frame deliberately. You’re positioning the model, not informing it. Name the framework, role, or domain — that’s where activation comes from. 03 Match format to problem shape. Prose is the default, not the requirement. Hand over the AST, hierarchy, graph, or matrix. 04 Externalize state. Scratchpads, decision logs, and git history survive compaction and restart. The window isn’t the only place state can live. 05 Expertise amplifies. You can’t name what you don’t know. Framing is the lever; vocabulary is how you see the levers.

legacycode.com Thank you. Ragunath Jawahar · Founder, Legacy Code Labs [email protected] Q & A

Context Management for Everyday Software Engine...

Context Management for Everyday Software Engineering

More Decks by Ragunath Jawahar

Other Decks in Programming

Featured

Transcript