Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Disciplined Vibes: Scaling AI-Assisted Engineering

Disciplined Vibes: Scaling AI-Assisted Engineering

My talk on vibe coding and harness engineering at Code District, Lahore.

Avatar for Sheharyar Naseer

Sheharyar Naseer

June 16, 2026

More Decks by Sheharyar Naseer

Other Decks in Technology

Transcript

  1. Background ✦ Principal Software Architect at Infra One ✦ Worked

    with: Apple, Slab, TheScore, Superlist, etc. ✦ 16+ years of polyglot experience, focus on Web & Cloud ✦ StackOver fl ow: 75,000+ score (Top 5 in Pakistan) ✦ Author / Contributor of multiple famous libraries & tools ✦ Featured on popular developer communities
  2. Outline PART 1 The Problem PART 2 It's Not the

    Model PART 3 Harness Engineering PART 4 Live Workshop OUTRO What's Next?
  3. Struggling with AI ✦ 16+ years of experience, still humbled

    by a chatbot ✦ Struggled a lot with AI-assisted coding ✦ Code quality was extremely poor ✦ Often had to spent time fi xing it ✦ Or throwing it away and doing manually
  4. AI-Assisted Problems ✦ Hallucinated APIs, function calls, and packages ✦

    Insecure code ✦ Architectural drift ✦ Ignored edge-cases ✦ Incorrect, or no error-handling ✦ Performance issues ✦ So many more...
  5. The Data Agrees METR METR's randomized controlled trial found experienced

    developers were 19% slower with early-2025 AI. SOURCE DORA Google's DORA 2024 research found AI adoption reduced delivery stability, continuing into 2025 despite higher adoption & throughput. SOURCE
  6. “Seniors often get worse results than juniors from same tools

    until they learn deliberate prompting. But once they do they have a massive advantage. Sabrina Goldfarb SWE at Github Co-Pilot
  7. 02 It's Not The Model Exploring the root causes and

    developing the right thinking model.
  8. It's a You Problem ✦ Don't understand how LLMs work

    ✦ Gold fi sh memory & context management ✦ Incomplete specs ✦ Basic prompts ✦ Missing documentation & examples ✦ Unreliable guardrails ✦ No systems or quality checks ✦ Agents don't receive feedback about what's wrong
  9. Mental Models AI Search Shallow use of modern LLMs as

    a Google replacement Vibe Coding Fully delegating code to AI without reviewing output Vibe Engineering Accelerating professional software engineering with AI YOU ARE HERE
  10. Vibe Engineering ✦ Does not mean better prompts ✦ Foundation/architecture/system

    where the agent can "succeed" ✦ Feedback loops ✦ Also called Evaluation Driven Development (EDD)
  11. “You shouldn’t be prompting coding agents anymore. You should be

    designing loops that prompt your agents. Peter Steinberger Creator of OpenClaw, Technical Staff at OpenAI
  12. ✦ LangChain research team describe it as: Agent = Model

    + Harness ✦ "Everything other than the model" ✦ Prompt, Evals, Tool Calls, Docs, Context, etc. ✦ Even the GUI/CLI "agent" tool you use What's a Harness? “Agent = Model + Harness Vivek Trivedi (Researcher, LangChain)
  13. ✦ SWE bench score improvements ✦ 42% → 78%, 46%

    → 80%, 23% → 45% ✦ ~22 point swings vs ~1 point swings ✦ Using frontier models The Model Doesn't Matter SAME HARNESS Different Model SAME MODEL Scaffold Changes ~1 ~22 POINT SWINGS POINT SWINGS
  14. ✦ Inner Harness (System) ✦ Built into your coding agent

    (CLI/GUI tool) ✦ System prompt, Tool calls, Orchestration ✦ Outer Harness (User) ✦ Controls put in place by users ✦ User prompt, Agent rules, Output validation ✦ Our focus today Anatomy of a Harness MODEL INNER HARNESS OUTER HARNESS
  15.  HUMAN ✦ AGENT PROMPTS AGENTS.MD SPECS, PRD & ADR

    STYLEGUIDES REFERENCE DOCS RULES SCRIPTS / CLI TOOLS CODEMODS LANGUAGE SERVERS ... UNIT TESTS E2E TESTS STATIC ANALYSIS REVIEW AGENTS LOGS BROWSER LINTERS SBOM VALIDATION SECURITY SCANNERS ... Feedforward Guides Feedback Sensors INITIAL GENERATION SELF-CORRECTING
  16. ✦ Write actual documentation ✦ Guides, rules, conventions; plus examples

    ✦ Current architecture overview ✦ Long-term specs, PRDs, and ADRs ✦ Add helpful tooling ✦ Code generation scripts, tools, helpers ✦ Language servers ✦ Entrypoint is the "router" Implementing Guides my_app ├── AGENTS.md ├── docs/ │ ├── rules/ │ ├── guides/ │ ├── adrs/ │ └── specs/ ├── . . . └── . . .
  17. ✦ More important than Guides ✦ For maintainability and architectural

    quality ✦ Focus on Deterministic controls fi rst ✦ Fast, reliable, cheap ✦ Implementation Layers ✦ Fastest & accurate feedback early ✦ Goal: Push agents' reliable coverage as far up as possible Implementing Sensors 1. LINTING & STATIC CHECKS 2. UNIT TESTS 3. INTEGRATION/ E2E 4. AI REVIEWS 5. MANUAL QA IMPLEMENTATION LAYERS
  18. ✦ Establish Discipline ✦ Capture standard conventions, security mandates, architecture

    patterns ✦ Keep AI out of writing tests, preserve double-bookkeeping ✦ Build Reusable Harnesses ✦ CI templates with common deterministic checks ✦ Inferential review agents for security, architecture, gap analysis, even PR reviews ✦ Scale via Service Templates ✦ Service-level AGENTS.md Recommendations
  19. ✦ Enterprises & agencies have pre-de fi ned service templates

    ✦ Internal team guides ✦ Codemods & internal tools ✦ Boilerplate projects ✦ Embed harnesses directly in them ✦ Scaffold not just code, but AI knowledge and conventions from day one ✦ Inter-organization review agents Service Templates
  20. ✦ Custom skills and slash commands ✦ Subagents for sub-tasks

    for context optimization ✦ Agent Councils & Consensus ✦ Adverserial reviews with multiple agents deciding on next steps ✦ Parallel agent execution with git worktrees ✦ Multiply output using same harness ✦ Independently running agent loops ✦ Spec → Code → PR → Review → Address Feedback → Merge Advanced Workflows