Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Can AI tools really deliver 10× productivity? -...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Can AI tools really deliver 10× productivity? - A backend team's 10-month record — 9×+ productivity built by redefining environment · verification · work definition

You adopted AI tools — so why isn't productivity rising as much as you expected?
The same tools are available to everyone, yet results differ wildly from team to team. So is "10x productivity with AI" a real number? Instead of guessing, a LINE Home backend team measured it over about 10 months.
Tracking 384 Jira tickets · 862 Git commits · 460 PRs, per-person hourly output came out at 9x+ (×9.24). And we got there with zero production incidents and one or fewer hotfixes per month — no quality trade-off — while the size of problems the team could handle expanded from routine backend work to Multi-LLM · MCP · building CMS UI directly · E2E test scenarios.
The key is that this result came from redesigning how we work, not from picking an AI tool. The answer lay in three conditions that worked together. First, we built the context (CLAUDE.md) — a one-page environment explaining our team and project, shared automatically every session. On top of it we added a verification infrastructure (Harness), four axes — Hooks, verification agents, Skills, and workflow docs — that safely back autonomous delegation. And finally we refined the work definition itself, raising Definition-of-Done items from a median of one to three. Together, these let us confidently delegate even large domains.
In this session we trace, with data, what changes produced the 9x — showing concretely how we changed the way we work, including the moment an autonomous agent modified files on its own and how we solved it by turning verification into infrastructure. Then we distill it all into a replication plan any team can follow in just 4 weeks.

More Decks by LINEヤフーTech (LY Corporation Tech)

Other Decks in Technology

Transcript

  1. 1 2026.06.29 LY Corporation Lyn Heo | LINE Plus Hongjoong

    Shin | LINE Plus Jinook Hong | LINE Plus Can AI tools really deliver 10× productivity?
  2. Home Contents Intelligence Experience Dev, LINE Plus Who Are We?

    Hongjoong Shin Backend Server Developer Lyn Heo Backend Server Developer Jinook Hong Backend Server Developer Who Are We / What Weʼre Building Home Contents Intelligence Experience Dev / LINE Plus Home Content Intelligence Platform using AI ① Content search ② Query by self-classified metadata ③ Production/consumption metric analytics Contents Spectrum Project
  3. Agenda Act 1 (10 mins) What We Measured Act 2

    (8 mins) How the 3 Conditions Were Built Act 3 (6 mins) Here's how your team can replicate this in 4 weeks Appendix Glossary, SP->WWP, WWP Formula, etc
  4. SP(Story Point) work size, team-agreed score (not time) WWP(Weighted Work

    Point) SP × 4 weight adjustments (cross-stage unit) Glossary 5 core terms repeated throughout Weighted Work Point = SP× ( Structural Impact + Code Lines + Spec Depth + Verification )
  5. SP(Story Point) work size, team-agreed score (not time) WWP(Weighted Work

    Point) SP × 4 weight adjustments (cross-stage unit) dev done Dev ticket Started → Code Merged Context Engineering environment that explains us to AI - Breakthrough 1 Harness Engineering Infrastructure to verify and control AI output. - Breakthrough 2 Glossary 5 core terms repeated throughout
  6. JIRA 384 Tickets GIT 862 Commits PR 460 Merged Baseline

    25.07 ~ 08 Prompt 25.09 ~ 10 Context 25.11 ~ 26.01 Agentic 26.02 ~ 03 Harness 26.04 ~ Measurement Scope the scope of what we measured
  7. Working-Style Evolution Each stage includes all prior stages and adds

    new capability 25.07-08 Baseline Manual coding 25.09-10 Prompt Help via single prompts 25.11-26.01 Context Delegate with context ★ Breakthrough 1 26.02-03 Agentic Autonomous agent 26.04- Harness Delegation + Verification Infra ★ Breakthrough 2 Cumulative evolution ̶ each stage keeps all prior abilities.
  8. We experienced a ×9.24 productivity boost Baseline → Harness ·

    per-person hourly output approached 10× over 10 months SAME-WEIGHT TASK DEV Time for 1 unit of work 4.3d → 1.5d x2.79 FASTER PER-PERSON THROUGHPUT Output per month 14.1 → 46.7 x3.31 MORE OUTPUT PER-PERSON HOURLY OUTPUT Efficiency: Throughput × Speed 0.41 → 3.78 x9.24 TOTAL BOOST Speed 2.79 × Throughput 3.31 = x9.24
  9. AFTER • All AI output auto-verified • Large domains delegated

    • DoD refined ̶ median 1 → 3 Not only speed ̶ work definition has been refined together. What Changed The real meaning is that the size of problems the team can handle has changed BEFORE • Backend routine work only • No frontend · partial tests • DoD vague ̶ median 1 DoD = Definition of Done
  10. The 3 Conditions That Made ×10 Possible Context Engineering ·

    Harness Engineering · Work Definition Refinement Context Engineering Harness Engineering Work Definition Refinement The environment that explains our team and project to AI The infrastructure that verifies and controls AI output Clarifying requirements and Definition of Done via AI
  11. How We Worked Changed Baseline 25.07 ~ 08 Prompt 25.09

    ~ 10 Context 25.11 ~ 26.01 Agentic 26.02 ~ 03 Harness 26.04 ~ ★ Breakthrough 1 Context Explosion (25.11) At Two Breakthroughs
  12. CLAUDE.md # Project Overview Contents-spectrum is a Home & VOOM

    Contents Intelligence Search Platform built with Kotlin/Spring Boot, multi-module Gradle. Provides search, CMS, data processing. # Architecture (7 modules) • spectrum-share: Common types/utils • spectrum-storage-api: MySQL/ES/IU storage • spectrum-search-api, cms-api, consumer, batch, iu-api... gRPC (internal) + REST (external) # Code Conventions camelCase / PascalCase, 4-space indent @Configuration for APIs/gRPC/Interceptors @RestControllerAdvice global exceptions kotlinx.serialization (docs only) / Jackson @GrpcService + proto Request/Response Kotest framework for testing # Commit Message Format type(scope): [TICKET-123] subject types: feat|fix|docs|style|refactor|perf|test|build|ci|chore # Module Communication Internal: gRPC + HMAC authentication External: REST APIs Async: Kafka (event-driven) Storage: MySQL per module + shared connection utils # Skill commands (Work History) Slash commands auto-track work history: /dev-start : ticket/branch start. Write dev plan document. /pr-suggest : PR body auto-generate → AI knows every session's stage → Work history stays as code + docs 2025.11 Committedd Breakthrough 1 - Context Engineering Project Understanding & Structure/Work-History Documentation
  13. Context Effect MONTHLY TICKETS DONE higher is better 25 →

    32 +7 (27% up) 3SP DEV DONE (DAYS) lower is better 4.6 → 4.3 -0.3 (7% down) MONTHLY SP DONE higher is better 81 → 86 +5 (6% up) MONTHLY LINES OF CODE (k) neutral 32.9 → 20.2 -12.7 (39% down) Less code −39%, more tickets +27% ̶ right code, not more code. Comparison: Prompt-stage average (2025.09~10) → Context-stage average (2025.11 ~ 2026.01)
  14. Why Verification Infrastructure Was Needed The 60-file Case Discovery +

    Impact • ~60 files, out-of-scope • Caught in PR review • Production Impact 0 Recovery • Revert -> remerge • Diff + test check • 1 person-day Lessons • CLAUDE.md not enough • Verify in infrastructure
  15. How We Worked Changed Baseline 25.07 ~ 08 Prompt 25.09

    ~ 10 Context 25.11 ~ 26.01 Agentic 26.02 ~ 03 Harness 26.04 ~ ★ Breakthrough 1 Context Explosion (25.11) ★ Breakthrough 2 Verification Infra Explosion (26.03) At Two Breakthroughs
  16. Harness Engineering - 4-Axis at Spectrum Verification Infrastructure Applied ①

    Workflow Tracking 7-stage tracking Ideation - Planning - Development - Verification - Delivery - Review - Release ② Verification Pipeline Multi-agent review ③ Skills / Commands Repetition -> commands ④ Automatic Hooks Pre · Post · Stop
  17. Harness Engineering Result Comparison: Agentic-stage average(2026.02~03) → Harness-stage average (2026.04

    ~) MONTHLY TICKETS DONE higher is better 33.5 → 78 +44.5 (+133% up) MONTHLY COMMITS higher is better 110 → 222 +112 (102% up) MONTHLY SP DONE higher is better 129 → 268 +139 (108% up) MONTHLY LINES OF CODE (k) neutral 44.6 → 105.7 +51.1 (137% up)
  18. Large-Task Processing Time Reduced ×3 Harness Engineering Gain ① Pre-spec

    agreement (committee) ② Verification automation (8 agents) ③ Skills /Commands standardiza tion ④ Hooks guard (safety net) ⚡ × 3 faster Result ̶ 8 SP large-task dev completed Baseline 22.1 days → Harness 7.3 days · Large tasks see the biggest reduction
  19. Risk: very low · No code changes ̶ just teach

    AI about your team Week 1 · Project Understanding WEEK 1 Est. investment 2 ~ 4 hours Why safe You only teach AI about your team. Zero code changes, fully read-only. Examples Write one CLAUDE.md • Briefly define team conventions (10~15 lines) Contextual Q&A • Ask AI: "What does this project do?" or "Show me this function's call path" Structural Alignment • Organize module structure and established conventions with AI No code changes
  20. Risk: low · Apply AI to text/metadata work and codify

    repetition Week 2-3 · Safe Areas + Skill Adoption WEEK 2-3 Est. investment 10 ~ 20 hours Why safe Output is text/metadata only. Zero impact on production code. Examples Auto-generate docs and READMEs Auto-generate commit messages Jira comments and status changes Auto-generate PR descriptions Codify 1~2 repetitive tasks as /skill
  21. Risk: medium · AI plans and codes, humans approve Week

    4 · Review → Approve Workflow WEEK 4 Est. investment 20 ~ 40 hours Why safe Two human gates + small PR scope. Start with 1~2 small code PRs. The Flow Requirements gathered (human + AI) AI drafts the plan → human approval AI develops → human approval Code review → merge Human Approval Always Required!
  22. Recommended / Our team / Hard cases Adoption Conditions DIMENSION

    Recommended Our Team Hard Cases Team Size 2~10 3 -> 5 -> 7 1 or +50? Module Structure Multi-module / clear boundaries 7 modules single monolith / unclear boundaries VCS / Tickets Git PR, Jira/Linear, etc Git, Jira no PR / ticket workflow AI Tool Access Frontier LLM Access Claude(sonnet, opus), Codex(5.4, 5.3-codex) no enterprise contract / frontier model use restricted Review Culture PR review established Human Code Review, AI Code Review direct push
  23. AI 10x is not from adopting AI Tools It is

    the result of redesigning how we work.
  24. Developer terms from the talk mapped to non-developer analogies Glossary

    ̶ Non-developer Analogies Term In-talk definition Non-developer analogy Everyday form CLAUDE.md One-page document explaining the project to AI Company manual / new-hire onboarding doc "Here's what our company is and the rules" AI coding tool LLM agent that helps write/modify/verify code (Claude Code, Codex, etc.) Fast, precise junior hire Fast and accurate at assigned tasks; humans provide context and judgment Skill / Slash command Bundle of commands standardizing repetitive work Work macro / checklist Team standard like a "meeting-notes template" SP (Story Point) Team-agreed score (1·2·3·5·8·13…) for the scope/size of a piece of work. Not a time unit ̶ a relative size against team capacity. Score for the volume of work "This is an 8-point task" ̶ not time, but size the team can handle in one go WWP (Weighted Work Point) Self-defined unit: SP × 4 weight adjustments (for cross-stage comparison) Estimate adjusted by difficulty Distinguishes *deep vs shallow work* even at the same SP dev done Time from ticket In Progress → MERGED (TODO -> In Progress -> In Review -> MERGED -> In QA -> DONE is the process of our team) Start-to-finish time of a task "From the start of approval flow through final sign-off" Harness 4-axis Hooks · Verification · Skills · Docs ̶ infrastructure that lets AI work safely Approval flow + standard forms + manual + auto-notifications "Draft → review → approve → deploy" workflow
  25. The same SP has different weight across stages ̶ defining

    the adjusted unit (WWP) Story Point → Weighted Work Point