Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Code smarter, not harder | CodeBuzz 2026

Code smarter, not harder | CodeBuzz 2026

Avatar for Daniel Sogl

Daniel Sogl PRO

June 01, 2026

More Decks by Daniel Sogl

Other Decks in Programming

Transcript

  1. Code smarter, not harder How AI coding tools boost productivity

    — and where they don't. Daniel Sogl @sogldaniel Consultant @ Thinktecture
  2. About me Daniel Sogl Consultant @ Thinktecture AG MVP —

    Developer & Web Technologies Focus: Developer Productivity & Generative AI Socials: linktr.ee/daniel_sogl 2 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  3. 5 Acts in 35 Minutes Act 1 — The Productivity

    Question Everyone claims a boost. Can you even measure it? Act 2 — Smart along the SDLC Where AI really helps — and where it quietly hurts Act 3 — From Vibes to Specs The biggest leverage upgrade of 2026 Act 4 — From Local to Cloud The PR is the new interface Act 5 — Roles are changing — all of them Engineer · QA · PM · Designer · SRE 3 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  4. Act 1 The Productivity Question — can you measure it?

    4 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  5. Adoption Is Solved We’re past the inflection point 84% of

    developers use or plan to use AI tools Stack Overflow Developer Survey 2025 · N=49,000 90% use AI at work DORA 2025 State of AI-assisted Dev · N≈5,000 80% of new devs use Copilot in week one GitHub Octoverse 2025 · 180M+ developers 22% already use coding agents JetBrains AI Pulse · Jan 2026 wave · N=11,000 Four independent surveys. The adoption question is closed. The real question: did it make us more productive — and how would you even know? 5 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  6. THE TRUST GAP 84% use or plan to use AI

    33% trust their accuracy — down from 43% in 2024 We use it. We don't trust it. We use it anyway. "Almost-right" AI code is the #1 frustration — cited by 66% of developers. SOURCE — STACK OVERFLOW DEVELOPER SURVEY 2025 · 49,000 RESPONDENTS 6 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  7. YOU CAN'T TRUST THE FEELING · METR RCT JULY 2025

    −19% Experienced OSS devs slower with AI. They felt 20% faster. 16 devs · 246 tasks · mature repos → FEBRUARY 2026 RETEST Signal too noisy to call The retest even hinted at a speedup (−18% / −4%) — but every CI crosses zero. METR: "an unreliable signal" — design being reworked. 57 devs · 143 repos · 800+ tasks Developers overestimated AI's time savings by ~40 percentage points — Stanford found self-rated productivity is "almost as good as flipping a coin." If you can't feel it, you have to measure it. SOURCES — METR.ORG · 2025-07-10, 2026-02-24 & SELF-REPORT SURVEY 2026-05 · ARXIV:2507.09089 · STANFORD SEP 2025 7 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  8. HOW DO YOU MEASURE IT? · STOP COUNTING THE WRONG

    THINGS THE CEILING NOBODY MENTIONS 20–25% of a developer's time is actually spent writing code. So even doubling typing speed caps total output at ~15–25%. Counting code volume measures the wrong 20%. VANITY METRICS — WHAT NOT TO TRACK Lines of code Commits PR count AI-acceptance-rate "Typing speed has never been the bottleneck." — Gergely Orosz · The Pragmatic Engineer DX dropped acceptance-rate from its framework — "such a tiny part of the story." — Laura Tacho · CTO, DX Sources — Pragmatic Engineer · DX (getdx.com) · AWS / Bain developer time-allocation studies 8 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  9. MEASURE IN TENSIONS, NOT IN LINES · DX CORE 4

    Speed Diffs per engineer from DORA Quality Change-failure rate from DORA Effectiveness Developer Experience Index from DevEx Impact % time on new value from SPACE Four dimensions in tension — by design. You can't game speed without quality, effectiveness or impact dropping. DORA's AI Capabilities back it up: small batches, clean version control, a quality internal platform. +16% throughput · Booking.com 3,500 engineers +41% AI-driven time savings Intercom 10 min /week/eng per +1 DXI point measured, not guessed Sources — DX Core 4 (Tacho & Noda, Dec 2024) · DX AI Measurement Framework (Jul 2025) · DORA AI Capabilities Model 2025 9 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  10. DORA’s One-Sentence Diagnosis "AI's primary role in software development is

    that of an amplifier. It magnifies the strengths of high-performing organisations and the dysfunctions of struggling ones." Strong teams get stronger. Struggling teams get worse — faster. ↑ Throughput + product performance verified · DORA 2025 ↓ Stability the "instability tax" verified · DORA 2025 ≈8 mo payback · DORA ROI model — not measured SOURCES — DORA 2025 STATE OF AI-ASSISTED SOFTWARE DEVELOPMENT · DORA 2026 ROI REPORT (ILLUSTRATIVE MODEL) 10 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  11. THE AI PRODUCTIVITY PARADOX · FAROS AI · 10,000+ DEVS

    LOOKS LIKE A HUGE WIN +98% pull requests +21% tasks completed This is what a PR-count dashboard shows your VP. WHAT THE SAME DATA SHOWS DOWNSTREAM +91% review time +154% PR size +9% bugs / PR flat org-level DORA Twice the PRs. Same delivery. More bugs. The work didn't disappear — it moved downstream: to review, to QA, to prod. So the real question isn't "more AI" — it's WHERE. Source — Faros AI engineering telemetry · Jul 2025 (10,000+ devs · 1,255 teams) · Dec 2025 update (22,000 devs) · correlational 11 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  12. Act 2 Smart along the SDLC 12 Code smarter, not

    harder How AI coding tools boost productivity — and where they don't
  13. Eight Phases. Eight Verdicts. 1 · Discovery Solid first drafts

    — but it hallucinates fast without a domain glossary. 2 · Design Great sounding board — weak on your own decision history. 3 · Implementation 35–40% on greenfield, ≤10% on legacy. UP CLOSE ↓ 4 · Test 99% coverage — and zero bugs found. UP CLOSE ↓ 5 · Review 2× a classic linter — but 1.7× more to review. UP CLOSE ↓ 6 · Docs Easy to adopt — staying correct on real code is the hard part. 7 · Maintenance Translates old debt straight into the new stack. UP CLOSE ↓ 8 · Ops Faster incident triage — unless your alerting is already chaos. Four phases up close. The other four — one verdict each. Sources — Implementation: Stanford SEP (AIEWF 2025) · Test: eferro "Mutation Testing" (Nov 2025) · Review: CodeRabbit Martian Bench (Mar 2026) · Maintenance: GitClear 2025. Discovery/Design/Docs/Ops = field experience. 13 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  14. PHASE 3 — IMPLEMENTATION · THE PRODUCTIVITY GAP GREENFIELD ·

    SIMPLE TASKS 35–40% productivity gain new projects · clean slate · isolated tasks COMPLEX LEGACY CODE ≤10% gain — or negative existing systems · the 90% of real work AI is up to 4× more productive on greenfield than on code you maintain. Most of us live on the right side of this slide. Smart move: spec-driven + small batches. Big legacy refactors with AI are an anti-pattern. We come back to this in Act 3. Sources — Stanford SEP (in DORA 2026) · METR 2025 RCT 14 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  15. PHASE 4 — TEST · AI MATCHES CODE, NOT REQUIREMENTS

    HERO NUMBER 99% line coverage — looks done coverage ≠ correctness tests match the code that already exists FIELD NOTE "We hit 99% coverage with AI" — then mutation testing showed the tests just pinned the existing behaviour. The AI tested what the code does, not what it should do. Smart move: BDD-first. Tests assert the requirement, not the implementation. AI proves business intent. You review the technical "how". Sources — eferro "Mutation Testing" Nov 2025 · field experience 15 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  16. PHASE 5 — CODE REVIEW · AI AS REVIEWER DETECTION

    · MARTIAN CODE REVIEW BENCH 49% CodeRabbit precision · 53% recall · F1 51% #1 by F1 score ~300k PRs · Mar 2026 FIELD NOTE AI as pre-reviewer: catches the obvious, frees humans for taste & architecture. The catch: AI-generated code produces 1.7× more issues than human code (CodeRabbit, N=470 OSS PRs). → More AI authoring ⇒ more AI reviewing. Smart move: AI catches first pass, you decide what ships. Sources — CodeRabbit Martian Code Review Bench (Mar 2026) · CodeRabbit "State of AI vs Human Code" Dec 2025 16 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  17. PHASE 7 — MAINTENANCE · THE COMPOUND INTEREST PROBLEM ×8

    code-clone blocks YoY ≥ 5-line duplicates 25 → <10% refactored-code share of all changes ≈60% drop since 2021 3.1 → 5.7% code churn within 2 weeks 2020 → 2024 · revised right after commit FIELD NOTE · WPF → WEB MIGRATIONS Run the agent blind on a legacy migration and it translates the old debt into the new architecture — same anti-patterns, new stack. Smart move: name the weaknesses first. Anti-patterns in instructions · domain rules in skills · self- healing hooks to catch regressions. Proven across multiple client engagements — WPF desktop → modern web apps. Source — GitClear AI Copilot Code Quality Report 2025 · 211M changed lines · field experience 17 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  18. Where AI Actually Helps — A Cheat Sheet HIGH LEVERAGE

    Greenfield implementation Code review (pre-pass) Incident triage & correlation User-story / doc drafting Test scaffolding (then mutate) LOW / NEGATIVE LEVERAGE Complex legacy refactors Cross-decision architecture Domain-heavy requirements (no glossary) "Vibe maintenance" AIOps on top of broken alerting Smart isn't "more AI". Smart is AI in the right place — and knowing when to keep it out. 18 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  19. Act 3 From Vibes to Specs 19 Code smarter, not

    harder How AI coding tools boost productivity — and where they don't
  20. The Four Pillars of AI Coding Red Hat’s framework —

    and where most of us are stuck Specs Explicit intent. Repeatable. Team- shareable. Skills Reusable agent capabilities. Composable. Agents Plan, execute, iterate. Autonomously. Most teams ship from Vibes. The wins are in Specs. Vibes Intuitive, conversational. Fast — until it isn't. Source — Red Hat Developer · "Vibes, specs, skills, and agents" · March 2026 20 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  21. SPEC-DRIVEN DEVELOPMENT — WHERE YOUR VALUE ACTUALLY LIVES Software exists

    for a reason. A what. A why. The agent handles the syntax. The agent knows the API. It doesn't know your customer, your domain, or the problem you're actually solving. That part is on you — and it's where your value as an engineer now lives. WHAT STILL MATTERS · WHAT DOESN'T Understanding the customer's problem Defining behaviour & constraints The speedup comes from alignment — not from faster typing. Memorising framework APIs Knowing the syntax cold Source — Microsoft Developer Blog · GitHub Spec Kit · Sept 2025 21 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  22. FROM PROBLEM TO PR — WHERE YOUR BRAIN GOES FIRST

    STEP ZERO — BEFORE ANY CODE Who is this for, and what problem does it solve? If you can't answer that in one sentence, no agent — local or cloud — will save the work. A spec is just this answer, written down in a form the agent can act on. Spec Kit, Kiro & Co. automate steps 1–4. They cannot do step zero. THEN — AND ONLY THEN — THE PIPELINE 01 Specify the what 02 Plan the how 03 Tasks the steps 04 Implement agent → PR Source — github/spec-kit · Kiro Specs · Tessl 22 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  23. SDD IN PRACTICE — THREE LEVELS OF RIGOR LEVEL 1

    Spec-first Persistent context for every session. No automation. Where most teams start. AGENTS.md CLAUDE.md .cursorrules LEVEL 2 — SWEET SPOT Spec-anchored Spec evolves with code. Slash commands, checkpoints, cross-artefact consistency. GitHub Spec Kit ~107k★ · ~30 agents supported Kiro (AWS) Agentic IDE · EARS notation LEVEL 3 Spec-as-source Humans only edit specs — never generated code. Generated files marked DO NOT EDIT . Tessl Private beta · spec is the source Most teams in May 2026 sit between Level 1 and Level 2. If you take one thing from this talk: write an AGENTS.md tonight. 23 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  24. Two Rules Worth Stealing Simon Willison · creator of Datasette

    · co- creator of Django: "I won't commit code I couldn't explain to someone else." → Forces understanding. Kills hallucinated dependencies. Catches silent bugs. Addy Osmani · Google: Beware "house of cards code". → Fragile AI output that collapses under scrutiny. Specs in workflows prevent it. 24 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  25. Act 4 From Local to Cloud 25 Code smarter, not

    harder How AI coding tools boost productivity — and where they don't
  26. THE 4-YEAR SHIFT — WHERE AI CODE ACTUALLY COMES FROM

    2025 / 26 Async Cloud Agents in your PR queue THE INTERFACE TO AI IS NO LONGER the cursor → it's the pull request 2022 Autocomplete tab in the IDE 2023 Chat side panel 2024 Agents inside the IDE 26 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  27. The Cloud Agent Landscape May 2026 Devin · Cognition Production

    at Goldman Sachs, Citi, Nubank, Dell. Valuation $10.2B → $26B (raise closed May 2026). OpenAI Codex Cloud 4M+ weekly developers (Apr 2026 · OpenAI announcement). 10× growth since Aug 2025. GitHub Copilot Coding Agent Assign an issue → get a PR. 1M+ PRs in 5 months (Octoverse 2025). CODEOWNERS, branch protection apply. Cursor 3 ("Glass") April 2026. Parallel agents in worktrees + cloud sandboxes. 35% of Cursor's own merged PRs from background agents. Also in the field: Claude Code (via GitHub Action) · Google Jules · Sourcegraph Amp · Tembo. Sources — Vendor announcements · OpenAI 21.4.2026 · TechCrunch 27.5.2026 (Devin raise) · InfoQ 2.4.2026 27 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  28. DEVIN · COGNITION — FROM DEMO TO PRODUCTION VALUATION ·

    14 MONTHS $4B → $10.2B → $26B Mar 2025 · Sep 2025 · May 2026 (closed) NUBANK — PRODUCTION CASE 6M-line monolith · ~100K data classes Parallel Devin sessions migrated the ETL stack. 8–12× efficiency, 20× cost saved. 18-month plan → shipped in weeks. 40 min → 10 min per subtask. THE HONEST CAVEAT Async is powerful. It's not autopilot. narrow + well-specified → ships reliably ambiguous + cross-cutting → senior engineer reviewing every step Cloud agents reward Act 3. Bad specs = expensive garbage. Sources — devin.ai/customers/nubank · TechCrunch May 2026 · SiliconANGLE 28 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  29. Act 5 Roles are changing — all of them 29

    Code smarter, not harder How AI coding tools boost productivity — and where they don't
  30. The Companies Already Moved Product Engineers, not "developers" Linear No

    traditional PMs — ~100 people, 2 PMs. $1.25B valuation, 15,000+ customers. PostHog Same playbook. Published the playbook. Vercel "Code-last" philosophy. Outcomes > commits. Stripe Early pioneer. High-ownership engineering. Shopify Product engineers shipping product, not features. incident.io JD: "outcomes & impact > exact implementation" "In an AI-first era, product engineering is more important than ever. Dare I say — it's basically the only thing left." — Lee Robinson · then VP Product, Vercel (now Cursor) Sources — Linear Series C, linear.app, Jun 2025 ($1.25B, 15,000+ customers) · "~100 people, 2 PMs" — Aakash Gupta, "How Linear Grows," Nov 2025 (Series-C- era headcount) 30 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  31. THE JUNIOR CRISIS — STANFORD "CANARIES IN THE COAL MINE"

    · NOV 2025 AGE 22–25 −20% software-developer employment since late-2022 peak OLDER / EXPERIENCED DEVS stable employment held steady — even grew AI rewards existing expertise. It punishes "junior implementer". Routine implementation evaporates first. The question this opens: If junior pipeline collapses, who's the senior in 5 years? Source — Brynjolfsson, Chandar, Chen · Stanford Digital Economy Lab · Nov 2025 · ADP payroll (25M+ US workers covered) 31 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  32. The Team Shifts Too — Not Just Engineers QA →

    Quality Owner World Quality Report 2025: GenAI is now the #1 quality-engineering skill (63%). Test-author → quality strategist. PM → Builder Linear: no PMs. Increasingly, PMs ship a working prototype to stakeholder review in minutes using Claude Code & v0. Designer → Frontend Author v0, Galileo V3, paper.design — Figma → production- ready frontend code. Designer's deliverable becomes a PR. SRE → Platform Multiplier DORA: 90% of orgs have a platform · 76% a dedicated team. Platform quality decides whether AI helps or hurts. Every role on the team is moving up the abstraction stack. The deliverable changes — but the seat in the room stays human. Sources — World Quality Report 2025 · Lenny on Linear · Productside · DORA 2025 32 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  33. WHAT STAYS HUMAN AI replaces tasks. Not people. The realistic

    risk isn't being replaced by AI. It's being out- competed by someone on your team who uses it smarter. 33 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  34. So what do you do on Monday? 34 Code smarter,

    not harder How AI coding tools boost productivity — and where they don't
  35. Three Concrete Things — Starting Monday 1 Write your first

    real spec A CLAUDE.md , AGENTS.md , or .cursorrules for your most active repo. Treat it like onboarding for a new hire. 2 Pick one SDLC phase to optimise Not "use more AI everywhere". Pick test, or review, or incident triage — and measure the change for 2 weeks. 3 Run a cloud agent on one real backlog item Pick a Copilot / Codex / Devin / Jules task. Let it open the PR. Review like a senior would. Notice what you'd actually ship. 35 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  36. Smart isn't "more AI". Smart is AI in the right

    place, at the right time, and knowing when not to use it at all. 36 Code smarter, not harder How AI coding tools boost productivity — and where they don't
  37. Sources & Further Reading STATE & NUMBERS Stack Overflow Dev

    Survey 2025 survey.stackoverflow.co/2025 DORA 2025 · State of AI-assisted Dev dora.dev GitHub Octoverse 2025 github.blog JetBrains AI Pulse 2026 jetbrains.com METR · AI productivity RCT metr.org · arXiv:2507.09089 Stanford Digital Economy Lab digitaleconomy.stanford.edu MEASURING & CODE QUALITY DX Core 4 · AI Measurement getdx.com Faros AI · engineering telemetry faros.ai GitClear · Code Quality 2025 gitclear.com CodeRabbit · review benchmarks coderabbit.ai The Pragmatic Engineer pragmaticengineer.com eferro · mutation testing eferro.net SPECS · TOOLS · ROLES Red Hat · Vibes/Specs/Skills/Agents developers.redhat.com GitHub Spec Kit github.com/github/spec-kit Kiro (AWS) · Tessl kiro.dev · tessl.io Cognition Devin · Nubank case devin.ai/customers/nubank Cursor 3 · OpenAI Codex cursor.com · openai.com World Quality Report 2025 capgemini.com Voices — Simon Willison simonwillison.net · Addy Osmani addyosmani.com · Lee Robinson leerob.com · All links: linktr.ee/daniel_sogl 37 Code smarter, not harder How AI coding tools boost productivity — and where they don't