Token Optimization Folklore

© 2026 CGI Inc. 1 Token optimization folklore Maaret Pyhäjärvi
June 2026 How to survive with tokens you have or get more

© 2026 CGI Inc. 2 This lottery runs on tokens:
focus on optimizing quality AI autocomplete Basic use in editor. Light on tokens. Prompting agentic Asking initiates doing. One-shot tasks. Individual. Reactive agentic Agent/skills for specific repeatable task. Shared for team. Proactive agentic Automate tasks on triggers. Dark factories Work initiated to go away for something else for full flows of value. Agen t gambli ng (lack of cont ext engi neering ) is not susta inable w ith scale . AI ASSISTED ENGINEER Works w ith one agent, most ly sync AI ENG INEER Orch estrato r of mult iple asyn c agent s

© 2026 CGI Inc. 3 3 Exper imentat ion vs.
Build ing with intent We do t he best work withi n known boun daries we can rene gotiate on valu e 1. Learning 2. Building solo 3. Building scaled Defa ult boun daries Github Copilot: 3000 + 5000 AI credits ChatGPT+Codex: 3000 credits, 700 per week Model sufficiency is experimentation.

© 2026 CGI Inc. 4 5 things you can start
doing today SUMMARY 1 2 3 4 5 Choose right model for the right task. Clear guidance on your intent with stop signals in your prompt. Research – Plan – Implement. Provide deterministic controls (tests, linters, audits) Maintain concise, human-worthy instructions hierarchy. Improve as agent-miss log and trim outputs.

© 2026 CGI Inc. 5 Agentic frameworks expose what was
always true ARCHITECTURE-AWARE AI Struc tured c ontext i s the ke y to AI s uccess. Without i t, agen ts fail. Anatomy of an AI agent Language model Models reason over context. Instructions Goals, constraints, what ‘good’ looks like Memory Short-term and long-term state across turns. Tools MCP servers, APIs, CMDs, files, stores, other agents. Oversight and harness Humans-in-the loop, policy gates, harness. THE LESSON Every agent that ‘just works’ is running on curated context. Agents don’t usually fail because the model is weak. They fail because context is stale, thin or contradictory. Context is not documentation for the model; it is the environment model runs in.

© 2026 CGI Inc. 6 Signal beats volume. More tokens
is not more context. CONTEXT CURATION: LESS INFORMATION, MORE MEANING Curate don’t dump Give the agent the 3 things it needs, not the 300 things that exists. Surgical relevance is a design choice. Filter before you feed. Structure beats prose Schemas, specs, tables and examples outperform long paragraphs. Structure makes meaning machine-readable. If a human has to skim it, the agent will miss it. Freshness matters Stale content silently poisons the output. Treat context like a running system, not a one- time artifact. Refresh context intentionally. Context is a living asset, not a file drop

© 2026 CGI Inc. 7 LLM SPECIFIC CONTEXT LIMIT Context
window and tokens AGENTS WORKING WITH LLMS 1st loop 2nd loop Response Input tokens Output tokens Prompt File System prompt & tools Response Prompt File System prompt & tools Response Output tokens Input tokens Prompt File Cache Input tokens (not guaranteed) Context rot 1. Lost in the middle <50% Models bias context in the beginning and end of context 2. Recency bias 50% Models bias end of context New c ontext w indow wh en you switch ta sks. Plans drive s witching tasks.

© 2026 CGI Inc. 8 Configs as mechanism to manage
context TIPS Persistent instructions: agent.md, copilot- instructions.md • Non-negotiables of your project • Log reoccurring agent misses • Statement to trim output “Be concise” • Iterate, maintain, recreate. • Consider writing these yourself. Add conditional capabilities: Skills Design workflows and behaviors: Custom agents Persistent instructions copilot-instructions.md Custom agents .github/agents/*.agent.md Skills .github/skills/*/SKILL.md MCP Subagents Scoped instructions .github/instructions/*.instructions.md Prompt files .github/prompts/*.prompt.md Copilot memory CONFIGS

© 2026 CGI Inc. 9 Optimizing token usage TIPS Right
model for task. Github Copilot automode tries to automate this for you since 05/26. Thinking intensity. Control over the effort means control over cost. Plan is most intense. Scripts and tools. Your assets to support AI use. Solve AI with adjacent technologies. Preprocessing, deterministic controls. 3rd party memory, context optimization. Tests, audits. Context engineering. Agents/skills and files. Minimize. Guide choices. Task-specific context. 1 2 3 4 5 Research – Plan - Fleet. Enable separation of tasks. Gemini 2.5 – Opus 4.7 – GPT 5.4 6 Excha nging d eveloper time to tokens is a layer ed pro blem we are solv ing.

© 2026 CGI Inc. 10 Avoid these TIPS Replacing all
traditional IDE features of refactoring with AI requests. Giving tasks that should be a deterministic script to AI. Just prompting, so that your team context does not improve. Running everything on Claude Sonet / Opus if on Github Copilot. Using MCPs (verbose) when CLI (terse) is available. Just-in-case context that is not required or necessary. Using low-power models for tasks that require you to try again on a higher-power model.

© 2026 CGI Inc. 11 Power user guidance TIPS Think
in code - prefer creating scripts to analyze code Consider CLI vs. MCP – scenario specific but famously known for playwright Optimize with 3rd party components • Shell outputs can be very long e.g https://github.com/rtk.ai/rtk • Context can be locally optimized before LLM e.g. https://github.com/chopratejas/headroom • Team repeats same things e.g. https://github.com/ruvnet/ruvector https://github.com/RBKunnela/ALMA-memory https://github.com/qdrant/qdrant https://github.com/gastownhall/beads https://github.com/mempalace/mempalace • Tool calls are verbose and can be collapsed e.g https://github.com/jsturtevenat/copilot-codeact-plugin • Token profiling locally https://github.com/getagentseal/codeburn • Opinionated spec-driven context engineering https://github.com/open-gsd/gsd-core https://github.com/bmad-code-org/bmad- method https://github.com/github/spec-kit https://github.com/Fission-AI/OpenSpec/ Github Copilot CLI “/chronicle tips” to learn what to change in your configs based on past Model specific context optimization – models behave differently and can be tweaked

© 2026 CGI Inc. 12 Using AI better FIVE MOVES.
NO NEW TOOLS. MEASURABLY BETTER RESULTS. Audit your prompts. Pick three prompts you use often. Where is the context leaking? What is the model having to guess? Write intent brief. For each recurring task, state audience, output format, constraints and what good looks like. Reuse it. Measure before / after. Track rework cycles and time-to-good-draft. Context improvements show up fast in these two numbers. Make context portable. Put reusable context into files you can hand to any agent tool. Curate, don’t dump. Swap long docs for structured notes. Give the agent the 3 things it needs, not the 300 it might want. 1 2 3 4 5

© 2026 CGI Inc. 13 Insights you can act on
Founded in 1976, CGI is among the largest IT and business consulting services firms in the world. We are insights-driven and outcomes-focused to help accelerate returns on your investments. Across hundreds of locations worldwide, we provide comprehensive, scalable and sustainable IT and business consulting services that are informed globally and delivered locally. cgi.com

Token Optimization Folklore

Token Optimization Folklore

Maaret Pyhäjärvi

More Decks by Maaret Pyhäjärvi

Other Decks in Programming

Featured

Transcript

© 2026 CGI Inc. 1 Token optimization folklore Maaret Pyhäjärvi

© 2026 CGI Inc. 2 This lottery runs on tokens:

© 2026 CGI Inc. 3 3 Exper imentat ion vs.

© 2026 CGI Inc. 4 5 things you can start

© 2026 CGI Inc. 5 Agentic frameworks expose what was

© 2026 CGI Inc. 6 Signal beats volume. More tokens

© 2026 CGI Inc. 7 LLM SPECIFIC CONTEXT LIMIT Context

© 2026 CGI Inc. 8 Configs as mechanism to manage

© 2026 CGI Inc. 9 Optimizing token usage TIPS Right

© 2026 CGI Inc. 10 Avoid these TIPS Replacing all

© 2026 CGI Inc. 11 Power user guidance TIPS Think

© 2026 CGI Inc. 12 Using AI better FIVE MOVES.

© 2026 CGI Inc. 13 Insights you can act on