Can AI tools really deliver 10× productivity? - A backend team's 10-month record — 9×+ productivity built by redefining environment · verification · work definition

1 2026.06.29 LY Corporation Lyn Heo | LINE Plus Hongjoong
Shin | LINE Plus Jinook Hong | LINE Plus Can AI tools really deliver 10× productivity?

Who Are We? Why 10x? スピード10倍 10x Faster

Home Contents Intelligence Experience Dev, LINE Plus Who Are We?
Hongjoong Shin Backend Server Developer Lyn Heo Backend Server Developer Jinook Hong Backend Server Developer Who Are We / What Weʼre Building Home Contents Intelligence Experience Dev / LINE Plus Home Content Intelligence Platform using AI ① Content search ② Query by self-classiﬁed metadata ③ Production/consumption metric analytics Contents Spectrum Project

Agenda Act 1 (10 mins) What We Measured Act 2
(8 mins) How the 3 Conditions Were Built Act 3 (6 mins) Here's how your team can replicate this in 4 weeks Appendix Glossary, SP->WWP, WWP Formula, etc

Act 1 What We Measured

SP(Story Point) work size, team-agreed score (not time) WWP(Weighted Work
Point) SP × 4 weight adjustments (cross-stage unit) Glossary 5 core terms repeated throughout Weighted Work Point = SP× ( Structural Impact + Code Lines + Spec Depth + Veriﬁcation )

SP(Story Point) work size, team-agreed score (not time) WWP(Weighted Work
Point) SP × 4 weight adjustments (cross-stage unit) dev done Dev ticket Started → Code Merged Context Engineering environment that explains us to AI - Breakthrough 1 Harness Engineering Infrastructure to verify and control AI output. - Breakthrough 2 Glossary 5 core terms repeated throughout

JIRA 384 Tickets GIT 862 Commits PR 460 Merged Baseline
25.07 ~ 08 Prompt 25.09 ~ 10 Context 25.11 ~ 26.01 Agentic 26.02 ~ 03 Harness 26.04 ~ Measurement Scope the scope of what we measured

Working-Style Evolution Each stage includes all prior stages and adds
new capability 25.07-08 Baseline Manual coding 25.09-10 Prompt Help via single prompts 25.11-26.01 Context Delegate with context ★ Breakthrough 1 26.02-03 Agentic Autonomous agent 26.04- Harness Delegation + Veriﬁcation Infra ★ Breakthrough 2 Cumulative evolution ̶ each stage keeps all prior abilities.

We experienced a ×9.24 productivity boost Baseline → Harness ·
per-person hourly output approached 10× over 10 months SAME-WEIGHT TASK DEV Time for 1 unit of work 4.3d → 1.5d x2.79 FASTER PER-PERSON THROUGHPUT Output per month 14.1 → 46.7 x3.31 MORE OUTPUT PER-PERSON HOURLY OUTPUT Efﬁciency: Throughput × Speed 0.41 → 3.78 x9.24 TOTAL BOOST Speed 2.79 × Throughput 3.31 = x9.24

Monthly Work-Efﬁciency Trends All Indicators Same Direction

AFTER • All AI output auto-verified • Large domains delegated
• DoD refined ̶ median 1 → 3 Not only speed ̶ work definition has been refined together. What Changed The real meaning is that the size of problems the team can handle has changed BEFORE • Backend routine work only • No frontend · partial tests • DoD vague ̶ median 1 DoD = Definition of Done

Act 2 How the 3 Conditions Were Built

The 3 Conditions That Made ×10 Possible Context Engineering ·
Harness Engineering · Work Definition Refinement Context Engineering Harness Engineering Work Definition Refinement The environment that explains our team and project to AI The infrastructure that verifies and controls AI output Clarifying requirements and Definition of Done via AI

How We Worked Changed Baseline 25.07 ~ 08 Prompt 25.09
~ 10 Context 25.11 ~ 26.01 Agentic 26.02 ~ 03 Harness 26.04 ~ ★ Breakthrough 1 Context Explosion (25.11) At Two Breakthroughs

CLAUDE.md # Project Overview Contents-spectrum is a Home & VOOM
Contents Intelligence Search Platform built with Kotlin/Spring Boot, multi-module Gradle. Provides search, CMS, data processing. # Architecture (7 modules) • spectrum-share: Common types/utils • spectrum-storage-api: MySQL/ES/IU storage • spectrum-search-api, cms-api, consumer, batch, iu-api... gRPC (internal) + REST (external) # Code Conventions camelCase / PascalCase, 4-space indent @Conﬁguration for APIs/gRPC/Interceptors @RestControllerAdvice global exceptions kotlinx.serialization (docs only) / Jackson @GrpcService + proto Request/Response Kotest framework for testing # Commit Message Format type(scope): [TICKET-123] subject types: feat|ﬁx|docs|style|refactor|perf|test|build|ci|chore # Module Communication Internal: gRPC + HMAC authentication External: REST APIs Async: Kafka (event-driven) Storage: MySQL per module + shared connection utils # Skill commands (Work History) Slash commands auto-track work history: /dev-start : ticket/branch start. Write dev plan document. /pr-suggest : PR body auto-generate → AI knows every session's stage → Work history stays as code + docs 2025.11 Committedd Breakthrough 1 - Context Engineering Project Understanding & Structure/Work-History Documentation

Context Effect MONTHLY TICKETS DONE higher is better 25 →
32 +7 (27% up) 3SP DEV DONE (DAYS) lower is better 4.6 → 4.3 -0.3 (7% down) MONTHLY SP DONE higher is better 81 → 86 +5 (6% up) MONTHLY LINES OF CODE (k) neutral 32.9 → 20.2 -12.7 (39% down) Less code −39%, more tickets +27% ̶ right code, not more code. Comparison: Prompt-stage average (2025.09~10) → Context-stage average (2025.11 ~ 2026.01)

Why Verification Infrastructure Was Needed The 60-file Case Discovery +
Impact • ~60 files, out-of-scope • Caught in PR review • Production Impact 0 Recovery • Revert -> remerge • Diff + test check • 1 person-day Lessons • CLAUDE.md not enough • Verify in infrastructure

How We Worked Changed Baseline 25.07 ~ 08 Prompt 25.09
~ 10 Context 25.11 ~ 26.01 Agentic 26.02 ~ 03 Harness 26.04 ~ ★ Breakthrough 1 Context Explosion (25.11) ★ Breakthrough 2 Veriﬁcation Infra Explosion (26.03) At Two Breakthroughs

Harness Engineering - 4-Axis at Spectrum Verification Infrastructure Applied ①
Workflow Tracking 7-stage tracking Ideation - Planning - Development - Verification - Delivery - Review - Release ② Verification Pipeline Multi-agent review ③ Skills / Commands Repetition -> commands ④ Automatic Hooks Pre · Post · Stop

Harness Engineering Result Comparison: Agentic-stage average(2026.02~03) → Harness-stage average (2026.04
~) MONTHLY TICKETS DONE higher is better 33.5 → 78 +44.5 (+133% up) MONTHLY COMMITS higher is better 110 → 222 +112 (102% up) MONTHLY SP DONE higher is better 129 → 268 +139 (108% up) MONTHLY LINES OF CODE (k) neutral 44.6 → 105.7 +51.1 (137% up)

Large-Task Processing Time Reduced ×3 Harness Engineering Gain ① Pre-spec
agreement (committee) ② Veriﬁcation automation (8 agents) ③ Skills /Commands standardiza tion ④ Hooks guard (safety net) ⚡ × 3 faster Result ̶ 8 SP large-task dev completed Baseline 22.1 days → Harness 7.3 days · Large tasks see the biggest reduction

Act 3 Here's how your team can replicate this in
4 weeks

“I can see the value, but how should our team
get started?”

Risk: very low · No code changes ̶ just teach
AI about your team Week 1 · Project Understanding WEEK 1 Est. investment 2 ~ 4 hours Why safe You only teach AI about your team. Zero code changes, fully read-only. Examples Write one CLAUDE.md • Brieﬂy deﬁne team conventions (10~15 lines) Contextual Q&A • Ask AI: "What does this project do?" or "Show me this function's call path" Structural Alignment • Organize module structure and established conventions with AI No code changes

Risk: low · Apply AI to text/metadata work and codify
repetition Week 2-3 · Safe Areas + Skill Adoption WEEK 2-3 Est. investment 10 ~ 20 hours Why safe Output is text/metadata only. Zero impact on production code. Examples Auto-generate docs and READMEs Auto-generate commit messages Jira comments and status changes Auto-generate PR descriptions Codify 1~2 repetitive tasks as /skill

Risk: medium · AI plans and codes, humans approve Week
4 · Review → Approve Workﬂow WEEK 4 Est. investment 20 ~ 40 hours Why safe Two human gates + small PR scope. Start with 1~2 small code PRs. The Flow Requirements gathered (human + AI) AI drafts the plan → human approval AI develops → human approval Code review → merge Human Approval Always Required!

Recommended / Our team / Hard cases Adoption Conditions DIMENSION
Recommended Our Team Hard Cases Team Size 2~10 3 -> 5 -> 7 1 or +50? Module Structure Multi-module / clear boundaries 7 modules single monolith / unclear boundaries VCS / Tickets Git PR, Jira/Linear, etc Git, Jira no PR / ticket workﬂow AI Tool Access Frontier LLM Access Claude(sonnet, opus), Codex(5.4, 5.3-codex) no enterprise contract / frontier model use restricted Review Culture PR review established Human Code Review, AI Code Review direct push

AI 10x is not from adopting AI Tools It is
the result of redesigning how we work.

Who Are We? スピード10倍 10x Faster

AI is not just a tool. It is a team
member.

Thank you

Appendix

Developer terms from the talk mapped to non-developer analogies Glossary
̶ Non-developer Analogies Term In-talk definition Non-developer analogy Everyday form CLAUDE.md One-page document explaining the project to AI Company manual / new-hire onboarding doc "Here's what our company is and the rules" AI coding tool LLM agent that helps write/modify/verify code (Claude Code, Codex, etc.) Fast, precise junior hire Fast and accurate at assigned tasks; humans provide context and judgment Skill / Slash command Bundle of commands standardizing repetitive work Work macro / checklist Team standard like a "meeting-notes template" SP (Story Point) Team-agreed score (1·2·3·5·8·13…) for the scope/size of a piece of work. Not a time unit ̶ a relative size against team capacity. Score for the volume of work "This is an 8-point task" ̶ not time, but size the team can handle in one go WWP (Weighted Work Point) Self-defined unit: SP × 4 weight adjustments (for cross-stage comparison) Estimate adjusted by difficulty Distinguishes *deep vs shallow work* even at the same SP dev done Time from ticket In Progress → MERGED (TODO -> In Progress -> In Review -> MERGED -> In QA -> DONE is the process of our team) Start-to-finish time of a task "From the start of approval flow through final sign-off" Harness 4-axis Hooks · Verification · Skills · Docs ̶ infrastructure that lets AI work safely Approval flow + standard forms + manual + auto-notifications "Draft → review → approve → deploy" workflow

The same SP has different weight across stages ̶ deﬁning
the adjusted unit (WWP) Story Point → Weighted Work Point

WWP Formula how the result moves when weights change

ROI of the 4-Week Plan 5-person team · 6-month horizon
· conservative assumptions

Can AI tools really deliver 10× productivity? -...

Can AI tools really deliver 10× productivity? - A backend team's 10-month record — 9×+ productivity built by redefining environment · verification · work definition

More Decks by LINEヤフーTech (LY Corporation Tech)

Other Decks in Technology

Featured

Transcript