Developer & Web Technologies Focus: Developer Productivity & Generative AI Socials: linktr.ee/daniel_sogl 2 Code smarter, not harder How AI coding tools boost productivity — and where they don't
Question Everyone claims a boost. Can you even measure it? Act 2 — Smart along the SDLC Where AI really helps — and where it quietly hurts Act 3 — From Vibes to Specs The biggest leverage upgrade of 2026 Act 4 — From Local to Cloud The PR is the new interface Act 5 — Roles are changing — all of them Engineer · QA · PM · Designer · SRE 3 Code smarter, not harder How AI coding tools boost productivity — and where they don't
developers use or plan to use AI tools Stack Overflow Developer Survey 2025 · N=49,000 90% use AI at work DORA 2025 State of AI-assisted Dev · N≈5,000 80% of new devs use Copilot in week one GitHub Octoverse 2025 · 180M+ developers 22% already use coding agents JetBrains AI Pulse · Jan 2026 wave · N=11,000 Four independent surveys. The adoption question is closed. The real question: did it make us more productive — and how would you even know? 5 Code smarter, not harder How AI coding tools boost productivity — and where they don't
33% trust their accuracy — down from 43% in 2024 We use it. We don't trust it. We use it anyway. "Almost-right" AI code is the #1 frustration — cited by 66% of developers. SOURCE — STACK OVERFLOW DEVELOPER SURVEY 2025 · 49,000 RESPONDENTS 6 Code smarter, not harder How AI coding tools boost productivity — and where they don't
−19% Experienced OSS devs slower with AI. They felt 20% faster. 16 devs · 246 tasks · mature repos → FEBRUARY 2026 RETEST Signal too noisy to call The retest even hinted at a speedup (−18% / −4%) — but every CI crosses zero. METR: "an unreliable signal" — design being reworked. 57 devs · 143 repos · 800+ tasks Developers overestimated AI's time savings by ~40 percentage points — Stanford found self-rated productivity is "almost as good as flipping a coin." If you can't feel it, you have to measure it. SOURCES — METR.ORG · 2025-07-10, 2026-02-24 & SELF-REPORT SURVEY 2026-05 · ARXIV:2507.09089 · STANFORD SEP 2025 7 Code smarter, not harder How AI coding tools boost productivity — and where they don't
THINGS THE CEILING NOBODY MENTIONS 20–25% of a developer's time is actually spent writing code. So even doubling typing speed caps total output at ~15–25%. Counting code volume measures the wrong 20%. VANITY METRICS — WHAT NOT TO TRACK Lines of code Commits PR count AI-acceptance-rate "Typing speed has never been the bottleneck." — Gergely Orosz · The Pragmatic Engineer DX dropped acceptance-rate from its framework — "such a tiny part of the story." — Laura Tacho · CTO, DX Sources — Pragmatic Engineer · DX (getdx.com) · AWS / Bain developer time-allocation studies 8 Code smarter, not harder How AI coding tools boost productivity — and where they don't
Speed Diffs per engineer from DORA Quality Change-failure rate from DORA Effectiveness Developer Experience Index from DevEx Impact % time on new value from SPACE Four dimensions in tension — by design. You can't game speed without quality, effectiveness or impact dropping. DORA's AI Capabilities back it up: small batches, clean version control, a quality internal platform. +16% throughput · Booking.com 3,500 engineers +41% AI-driven time savings Intercom 10 min /week/eng per +1 DXI point measured, not guessed Sources — DX Core 4 (Tacho & Noda, Dec 2024) · DX AI Measurement Framework (Jul 2025) · DORA AI Capabilities Model 2025 9 Code smarter, not harder How AI coding tools boost productivity — and where they don't
that of an amplifier. It magnifies the strengths of high-performing organisations and the dysfunctions of struggling ones." Strong teams get stronger. Struggling teams get worse — faster. ↑ Throughput + product performance verified · DORA 2025 ↓ Stability the "instability tax" verified · DORA 2025 ≈8 mo payback · DORA ROI model — not measured SOURCES — DORA 2025 STATE OF AI-ASSISTED SOFTWARE DEVELOPMENT · DORA 2026 ROI REPORT (ILLUSTRATIVE MODEL) 10 Code smarter, not harder How AI coding tools boost productivity — and where they don't
LOOKS LIKE A HUGE WIN +98% pull requests +21% tasks completed This is what a PR-count dashboard shows your VP. WHAT THE SAME DATA SHOWS DOWNSTREAM +91% review time +154% PR size +9% bugs / PR flat org-level DORA Twice the PRs. Same delivery. More bugs. The work didn't disappear — it moved downstream: to review, to QA, to prod. So the real question isn't "more AI" — it's WHERE. Source — Faros AI engineering telemetry · Jul 2025 (10,000+ devs · 1,255 teams) · Dec 2025 update (22,000 devs) · correlational 11 Code smarter, not harder How AI coding tools boost productivity — and where they don't
— but it hallucinates fast without a domain glossary. 2 · Design Great sounding board — weak on your own decision history. 3 · Implementation 35–40% on greenfield, ≤10% on legacy. UP CLOSE ↓ 4 · Test 99% coverage — and zero bugs found. UP CLOSE ↓ 5 · Review 2× a classic linter — but 1.7× more to review. UP CLOSE ↓ 6 · Docs Easy to adopt — staying correct on real code is the hard part. 7 · Maintenance Translates old debt straight into the new stack. UP CLOSE ↓ 8 · Ops Faster incident triage — unless your alerting is already chaos. Four phases up close. The other four — one verdict each. Sources — Implementation: Stanford SEP (AIEWF 2025) · Test: eferro "Mutation Testing" (Nov 2025) · Review: CodeRabbit Martian Bench (Mar 2026) · Maintenance: GitClear 2025. Discovery/Design/Docs/Ops = field experience. 13 Code smarter, not harder How AI coding tools boost productivity — and where they don't
SIMPLE TASKS 35–40% productivity gain new projects · clean slate · isolated tasks COMPLEX LEGACY CODE ≤10% gain — or negative existing systems · the 90% of real work AI is up to 4× more productive on greenfield than on code you maintain. Most of us live on the right side of this slide. Smart move: spec-driven + small batches. Big legacy refactors with AI are an anti-pattern. We come back to this in Act 3. Sources — Stanford SEP (in DORA 2026) · METR 2025 RCT 14 Code smarter, not harder How AI coding tools boost productivity — and where they don't
HERO NUMBER 99% line coverage — looks done coverage ≠ correctness tests match the code that already exists FIELD NOTE "We hit 99% coverage with AI" — then mutation testing showed the tests just pinned the existing behaviour. The AI tested what the code does, not what it should do. Smart move: BDD-first. Tests assert the requirement, not the implementation. AI proves business intent. You review the technical "how". Sources — eferro "Mutation Testing" Nov 2025 · field experience 15 Code smarter, not harder How AI coding tools boost productivity — and where they don't
· MARTIAN CODE REVIEW BENCH 49% CodeRabbit precision · 53% recall · F1 51% #1 by F1 score ~300k PRs · Mar 2026 FIELD NOTE AI as pre-reviewer: catches the obvious, frees humans for taste & architecture. The catch: AI-generated code produces 1.7× more issues than human code (CodeRabbit, N=470 OSS PRs). → More AI authoring ⇒ more AI reviewing. Smart move: AI catches first pass, you decide what ships. Sources — CodeRabbit Martian Code Review Bench (Mar 2026) · CodeRabbit "State of AI vs Human Code" Dec 2025 16 Code smarter, not harder How AI coding tools boost productivity — and where they don't
code-clone blocks YoY ≥ 5-line duplicates 25 → <10% refactored-code share of all changes ≈60% drop since 2021 3.1 → 5.7% code churn within 2 weeks 2020 → 2024 · revised right after commit FIELD NOTE · WPF → WEB MIGRATIONS Run the agent blind on a legacy migration and it translates the old debt into the new architecture — same anti-patterns, new stack. Smart move: name the weaknesses first. Anti-patterns in instructions · domain rules in skills · self- healing hooks to catch regressions. Proven across multiple client engagements — WPF desktop → modern web apps. Source — GitClear AI Copilot Code Quality Report 2025 · 211M changed lines · field experience 17 Code smarter, not harder How AI coding tools boost productivity — and where they don't
Greenfield implementation Code review (pre-pass) Incident triage & correlation User-story / doc drafting Test scaffolding (then mutate) LOW / NEGATIVE LEVERAGE Complex legacy refactors Cross-decision architecture Domain-heavy requirements (no glossary) "Vibe maintenance" AIOps on top of broken alerting Smart isn't "more AI". Smart is AI in the right place — and knowing when to keep it out. 18 Code smarter, not harder How AI coding tools boost productivity — and where they don't
and where most of us are stuck Specs Explicit intent. Repeatable. Team- shareable. Skills Reusable agent capabilities. Composable. Agents Plan, execute, iterate. Autonomously. Most teams ship from Vibes. The wins are in Specs. Vibes Intuitive, conversational. Fast — until it isn't. Source — Red Hat Developer · "Vibes, specs, skills, and agents" · March 2026 20 Code smarter, not harder How AI coding tools boost productivity — and where they don't
for a reason. A what. A why. The agent handles the syntax. The agent knows the API. It doesn't know your customer, your domain, or the problem you're actually solving. That part is on you — and it's where your value as an engineer now lives. WHAT STILL MATTERS · WHAT DOESN'T Understanding the customer's problem Defining behaviour & constraints The speedup comes from alignment — not from faster typing. Memorising framework APIs Knowing the syntax cold Source — Microsoft Developer Blog · GitHub Spec Kit · Sept 2025 21 Code smarter, not harder How AI coding tools boost productivity — and where they don't
STEP ZERO — BEFORE ANY CODE Who is this for, and what problem does it solve? If you can't answer that in one sentence, no agent — local or cloud — will save the work. A spec is just this answer, written down in a form the agent can act on. Spec Kit, Kiro & Co. automate steps 1–4. They cannot do step zero. THEN — AND ONLY THEN — THE PIPELINE 01 Specify the what 02 Plan the how 03 Tasks the steps 04 Implement agent → PR Source — github/spec-kit · Kiro Specs · Tessl 22 Code smarter, not harder How AI coding tools boost productivity — and where they don't
Spec-first Persistent context for every session. No automation. Where most teams start. AGENTS.md CLAUDE.md .cursorrules LEVEL 2 — SWEET SPOT Spec-anchored Spec evolves with code. Slash commands, checkpoints, cross-artefact consistency. GitHub Spec Kit ~107k★ · ~30 agents supported Kiro (AWS) Agentic IDE · EARS notation LEVEL 3 Spec-as-source Humans only edit specs — never generated code. Generated files marked DO NOT EDIT . Tessl Private beta · spec is the source Most teams in May 2026 sit between Level 1 and Level 2. If you take one thing from this talk: write an AGENTS.md tonight. 23 Code smarter, not harder How AI coding tools boost productivity — and where they don't
· co- creator of Django: "I won't commit code I couldn't explain to someone else." → Forces understanding. Kills hallucinated dependencies. Catches silent bugs. Addy Osmani · Google: Beware "house of cards code". → Fragile AI output that collapses under scrutiny. Specs in workflows prevent it. 24 Code smarter, not harder How AI coding tools boost productivity — and where they don't
2025 / 26 Async Cloud Agents in your PR queue THE INTERFACE TO AI IS NO LONGER the cursor → it's the pull request 2022 Autocomplete tab in the IDE 2023 Chat side panel 2024 Agents inside the IDE 26 Code smarter, not harder How AI coding tools boost productivity — and where they don't
at Goldman Sachs, Citi, Nubank, Dell. Valuation $10.2B → $26B (raise closed May 2026). OpenAI Codex Cloud 4M+ weekly developers (Apr 2026 · OpenAI announcement). 10× growth since Aug 2025. GitHub Copilot Coding Agent Assign an issue → get a PR. 1M+ PRs in 5 months (Octoverse 2025). CODEOWNERS, branch protection apply. Cursor 3 ("Glass") April 2026. Parallel agents in worktrees + cloud sandboxes. 35% of Cursor's own merged PRs from background agents. Also in the field: Claude Code (via GitHub Action) · Google Jules · Sourcegraph Amp · Tembo. Sources — Vendor announcements · OpenAI 21.4.2026 · TechCrunch 27.5.2026 (Devin raise) · InfoQ 2.4.2026 27 Code smarter, not harder How AI coding tools boost productivity — and where they don't
14 MONTHS $4B → $10.2B → $26B Mar 2025 · Sep 2025 · May 2026 (closed) NUBANK — PRODUCTION CASE 6M-line monolith · ~100K data classes Parallel Devin sessions migrated the ETL stack. 8–12× efficiency, 20× cost saved. 18-month plan → shipped in weeks. 40 min → 10 min per subtask. THE HONEST CAVEAT Async is powerful. It's not autopilot. narrow + well-specified → ships reliably ambiguous + cross-cutting → senior engineer reviewing every step Cloud agents reward Act 3. Bad specs = expensive garbage. Sources — devin.ai/customers/nubank · TechCrunch May 2026 · SiliconANGLE 28 Code smarter, not harder How AI coding tools boost productivity — and where they don't
traditional PMs — ~100 people, 2 PMs. $1.25B valuation, 15,000+ customers. PostHog Same playbook. Published the playbook. Vercel "Code-last" philosophy. Outcomes > commits. Stripe Early pioneer. High-ownership engineering. Shopify Product engineers shipping product, not features. incident.io JD: "outcomes & impact > exact implementation" "In an AI-first era, product engineering is more important than ever. Dare I say — it's basically the only thing left." — Lee Robinson · then VP Product, Vercel (now Cursor) Sources — Linear Series C, linear.app, Jun 2025 ($1.25B, 15,000+ customers) · "~100 people, 2 PMs" — Aakash Gupta, "How Linear Grows," Nov 2025 (Series-C- era headcount) 30 Code smarter, not harder How AI coding tools boost productivity — and where they don't
· NOV 2025 AGE 22–25 −20% software-developer employment since late-2022 peak OLDER / EXPERIENCED DEVS stable employment held steady — even grew AI rewards existing expertise. It punishes "junior implementer". Routine implementation evaporates first. The question this opens: If junior pipeline collapses, who's the senior in 5 years? Source — Brynjolfsson, Chandar, Chen · Stanford Digital Economy Lab · Nov 2025 · ADP payroll (25M+ US workers covered) 31 Code smarter, not harder How AI coding tools boost productivity — and where they don't
Quality Owner World Quality Report 2025: GenAI is now the #1 quality-engineering skill (63%). Test-author → quality strategist. PM → Builder Linear: no PMs. Increasingly, PMs ship a working prototype to stakeholder review in minutes using Claude Code & v0. Designer → Frontend Author v0, Galileo V3, paper.design — Figma → production- ready frontend code. Designer's deliverable becomes a PR. SRE → Platform Multiplier DORA: 90% of orgs have a platform · 76% a dedicated team. Platform quality decides whether AI helps or hurts. Every role on the team is moving up the abstraction stack. The deliverable changes — but the seat in the room stays human. Sources — World Quality Report 2025 · Lenny on Linear · Productside · DORA 2025 32 Code smarter, not harder How AI coding tools boost productivity — and where they don't
risk isn't being replaced by AI. It's being out- competed by someone on your team who uses it smarter. 33 Code smarter, not harder How AI coding tools boost productivity — and where they don't
real spec A CLAUDE.md , AGENTS.md , or .cursorrules for your most active repo. Treat it like onboarding for a new hire. 2 Pick one SDLC phase to optimise Not "use more AI everywhere". Pick test, or review, or incident triage — and measure the change for 2 weeks. 3 Run a cloud agent on one real backlog item Pick a Copilot / Codex / Devin / Jules task. Let it open the PR. Review like a senior would. Notice what you'd actually ship. 35 Code smarter, not harder How AI coding tools boost productivity — and where they don't
place, at the right time, and knowing when not to use it at all. 36 Code smarter, not harder How AI coding tools boost productivity — and where they don't