コーディングエージェントのための環境設計 -- 仕組みで品質を守る --

コーディングエージェントのための環境設計 -- 仕組みで品質を守る -- 2026/04/12/ AkarengaLT vol.44 Y. Nakamura

アジェンダ 1. ハーネスエンジニアリングとは？ 2. OpenAIの事例 3. 実践案 4. まとめ 2

1. ハーネスエンジニアリングとは？ > "I've grown to calling this 'harness engineering.'
It is the idea that anytime you ﬁnd an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again." ▼ハーネスエンジニアリングという考え⽅ Mitchell Hashimoto さん（my-ai-adoption-journey）エージェントがミスをするたびに、そのミスが⼆度と起きない仕組みを設計する参考: My AI Adoption Journey

1. ハーネスエンジニアリングとは？ > "I've grown to calling this 'harness engineering.'
It is the idea that anytime you ﬁnd an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again." Mitchell Hashimoto さん（my-ai-adoption-journey）エージェントがミスをするたびに、そのミスが⼆度と起きない仕組みを設計する AIエージェントを動かす上での環境設計の考え⽅（*1） ▼ハーネスエンジニアリングという考え⽅ *1：意味合いの幅がありますが、本発表ではこの定義で進めます

1. ハーネスエンジニアリングとは？そもそもハーネスってなに？

1. ハーネスエンジニアリングとは？ ▼ハーネスとは？こういうもの : ⾺具（⼿綱‧鞍‧くつわ）

1. ハーネスエンジニアリングとは？ • 例： ◦ ⾺は⼈間より速く⾛れる ◦ しかし、⾺具なしでは制御できない、、 ▼ハーネスとは？
こういうもの : ⾺具（⼿綱‧鞍‧くつわ）

1. ハーネスエンジニアリングとは？ ▼ハーネスとは？こういうもの : ⾺具（⼿綱‧鞍‧くつわ） • 例： ◦ ⾺は⼈間より速く⾛れる
◦ しかし、⾺具なしでは制御できない、、 ◦ AIエージェントも同じ ◦ コードを速く書けるが、仕組みなしでは品質を保てない、、

1. ハーネスエンジニアリングとは？ AIエージェント

1. ハーネスエンジニアリングとは？ゴールタスクのゴールへ進んで欲しい！

1. ハーネスエンジニアリングとは？意図しない動き⽅をしてしまう‧‧ ゴール

1. ハーネスエンジニアリングとは？ AIの⼒を正しい⽅向に発揮させるための仕組みゴール

1. ハーネスエンジニアリングとは？ AIの⼒を正しい⽅向に発揮させるための仕組み似た考え⽅ゴール

1. ハーネスエンジニアリングとは？ • リンター（コードルール違反の⾃動検出） • pre-commit hook（コミット前の⾃動チェック） • CI/CD（継続的インテグレーション/デリバリー） •
ブランチ保護ルール仕組みで品質を担保する、という考え⽅ ▼既存の開発スタイル〜etc〜

1. ハーネスエンジニアリングとは？ • リンター（コードルール違反の⾃動検出） • pre-commit hook（コミット前の⾃動チェック） • CI/CD（継続的インテグレーション/デリバリー） •
ブランチ保護ルール仕組みで品質を担保する、という考え⽅ ▼既存の開発スタイル〜etc〜 → 根底にある考え⽅は、既存のソフトウェアエンジニアリングと類似

2. OpenAIの事例 ▼⼿書きコード0⾏で社内ベータ版のプロダクトを構築する参考：Harness engineering: leveraging Codex in an agent-ﬁrst
world • 記事全体はエンジニアリングプロセスの再設計の話 ◦ ⼈間は⼀切コードを書かないという制約を⾃らに課す ◦ すべてのコードをCodexに書かせる ◦ ⼈間の役割は環境設計‧意図の明⽰‧フィードバックループの構築 "Over the past ﬁve months, our team has been running an experiment: building and shipping an internal beta of a software product with 0 lines of manually-written code."

2. OpenAIの事例項目値原文手書きコード 0行 "0 lines of
manually-written code" 生成コード約100万行 "on the order of a million lines" マージされた PR 約1,500 "roughly 1,500 pull requests" エンジニアあたりの日次 PR 3.5 "3.5 PRs per engineer per day" 従来比の所要時間約1/10 "about 1/10th the time" 参考：Harness engineering: leveraging Codex in an agent-ﬁrst world

2. OpenAIの事例 1. We started with an empty git repository
| 空のgitリポジトリから始めた 2. Redeﬁning the role of the engineer | エンジニアの役割の再定義 3. Increasing application legibility | アプリケーションの可読性向上 4. We made repository knowledge the system of record | リポジトリ知識を正式な記録源にした 5. Agent legibility is the goal | エージェントにとっての可読性がゴール 6. Enforcing architecture and taste | アーキテクチャとテイストの強制 7. Throughput changes the merge philosophy | スループットがマージ哲学を変える 8. What "agent-generated" actually means | 「エージェント⽣成」が実際に意味すること 9. Increasing levels of autonomy | ⾃律性の段階的向上 10. Entropy and garbage collection | エントロピーとガベージコレクション 11. What we're still learning | まだ学んでいること ▼OpenAIの記事の記載内容（11セクション）

2. OpenAIの事例 1. We started with an empty git repository
| 空のgitリポジトリから始めた 2. Redeﬁning the role of the engineer | エンジニアの役割の再定義 3. Increasing application legibility | アプリケーションの可読性向上 4. We made repository knowledge the system of record | リポジトリ知識を正式な記録源にした 5. Agent legibility is the goal | エージェントにとっての可読性がゴール 6. Enforcing architecture and taste | アーキテクチャとテイストの強制 7. Throughput changes the merge philosophy | スループットがマージ哲学を変える 8. What "agent-generated" actually means | 「エージェント⽣成」が実際に意味すること 9. Increasing levels of autonomy | ⾃律性の段階的向上 10. Entropy and garbage collection | エントロピーとガベージコレクション 11. What we're still learning | まだ学んでいること ▼OpenAIの記事の記載内容（今回深掘りする3つ）案1 案3 案2

アジェンダ 1. ハーネスエンジニアリングとは？ 2. OpenAIの事例 3. 実践案（あくまで⾃分視点での深掘りです） 4. まとめ 22

3. 実践案 > "We wired the Chrome DevTools Protocol into
the agent runtime and created skills for working with DOM snapshots, screenshots, and navigation. This enabled Codex to reproduce bugs, validate ﬁxes, and reason about UI behavior directly." ▼実践案1：ブラウザ操作による⾃⼰検証の実現 OpenAIの記事（"Increasing application legibility"）: • Chrome DevTools でブラウザ操作‧スクリーンショット‧DOM確認 • エージェントに⾃分でバグを再現し、修正を検証できるようにした

3. 実践案 ▼実践案1：ブラウザ操作による⾃⼰検証の実現例：Chrome DevTools MCP を接続すると同じことができる参考：MCP Client configuration

3. 実践案例：Chrome DevTools MCP を接続すると同じことができる参考：MCP Client configuration パフォーマンスチューニング
に活⽤できる機能もあり ▼ 補⾜：パフォーマンスのガードレール

3. 実践案 • performance_start_trace 1. Core Web Vitals の計測値 2.
各指標が悪化してる原因取得可能！ • web.dev > Web Vitals • performance_stop_trace 参考：

3. 実践案パフォーマンスを計測して調査→ボトルネック特定 →修正→計測 🔄 • performance_start_trace 1. Core
Web Vitals の計測値 2. 各指標が悪化してる原因取得可能！

3. 実践案 > "Codex replicates patterns that already exist in
the repository—even uneven or suboptimal ones. Over time, this inevitably leads to drift." ▼実践案2: 技術的負債のガベージコレクション OpenAIの記事（"Entropy and garbage collection"）: エージェントは最適でないものも含めて既存パターンを複製する。放置するとコードが徐々にあるべき姿からずれていく。参考： https://openai.com/index/harness-engineering/

3. 実践案 > "This functions like garbage collection. Technical debt
is like a high-interest loan: it's almost always better to pay it down continuously in small increments than to let it compound." ▼実践案2: 技術的負債のガベージコレクション OpenAIの記事（"Entropy and garbage collection"）: 技術的負債は⾼⾦利ローンのようなもの。溜め込んで⼀気に返すより、少しずつ継続的に返済する⽅が良い。参考： https://openai.com/index/harness-engineering/

3. 実践案 > "On a regular cadence, we have a
set of background Codex tasks that scan for deviations, update quality grades, and open targeted refactoring pull requests." ▼実践案2: 技術的負債のガベージコレクション OpenAIの記事（"Entropy and garbage collection"）: 定期タスクで逸脱を検出し、⾃動でリファクタリングPRを出す仕組みを導⼊。参考： https://openai.com/index/harness-engineering/

3. 実践案例）共有ユーティリティを使わず都度実装してしまう • チームで「⽇付処理は `myapp/utils/date.py` を使う」と決めているのにチームで共通利⽤

3. 実践案例）共有ユーティリティを使わず都度実装してしまう • チームで「⽇付処理は `myapp/utils/date.py` を使う」と決めているのに • 同じ役割の関数が複数箇所に散らばる、、 •
仕様変更時に全箇所を修正しないといけない（条件が集中管理できていない）、、チームで共通利⽤ AIが同じような関数を作成

3. 実践案 • 散在したヘルパーを検出 → 共有ユーティリティに統合するPRを⽣成 • ⼿動で毎週掃除するのではなく、仕組みで継続的に掃除する見て欲しい観点を記載したルールを定義
定期スキャン（逸脱を検出）例：日時で9時リファクタリングPR 自動生成レビュー & マージ ▼例：定期的なGCで逸脱を検出し修正PRを出す

3. 実践案 • 指定したスケジュール（cron）でClaude Codeが⾃動起動する • 初回設定後はクラウドで⾃動実⾏（ローカルの常時起動は不要） • 使⽤例： ◦
毎朝9時にリポジトリをスキャンしてリファクタリングPRを出す ◦ 毎週⽉曜にドキュメントの鮮度をチェックする参考： Schedule tasks on the web ▼例：Claude Codeの `/schedule` コマンドの活⽤

3. 実践案 Claude Codeの `/schedule` で定期的にリファクタリングタスクを⾛らせる

3. 実践案 Claude Codeの `/schedule` で定期的にリファクタリングタスクを⾛らせる • ⽇次でチェック処理が⾛る • 必要ならPRが出る

3. 実践案 Codex：定期実⾏処理は Codex App > Automations 機能が活⽤できそう⽇時プロジェクト
プロンプト

3. 実践案 > "we built the application around a rigid
architectural model...These constraints are enforced mechanically via custom linters and structural tests." ▼実践案3：リンターを使ったルール制御 OpenAIの記事（"Enforcing architecture and taste"）: 参考: https://openai.com/index/harness-engineering/ • 厳格なアーキテクチャモデルを構築 • リンターと構造テストで依存⽅向を機械的に強制 • ドキュメントで「お願い」するのではなくツールで⽌める

3. 実践案 ▼実践案3：リンターを使ったルール制御 • やりたいこと例： ◦ リンターでアーキテクチャ境界を守る • ライブラリ：import-linter ◦
Pythonのimport⽂を解析 ◦ 定義したレイヤー間の依存⽅向に違反がないかチェックするツール ◦ 参考：https://import-linter.readthedocs.io/en/stable/ • 例）依存⽅向：上位レイヤー → 下位レイヤーへの依存のみ許可。逆⽅向は即失敗

3. 実践案 ▼実践案3：リンターを使ったルール制御解析の起点となるディレクトリ toml 依存関係 ┗ 上から順に⾼レイヤー → 低レイヤー

3. 実践案 ▼例：シンプルなレイヤードアーキテクチャ • 依存の⽅向：presentation → application → domain →
infrastructure（⼀⽅向のみ）

3. 実践案 ▼NG例：逆⽅向の依存 infrastructure層からpresentation層をimport → `lint-imports` 実⾏で即エラー Python

3. 実践案 ▼OK例：正しい⽅向 application層からdomain層をimport（正しい⽅向）→ チェック通過 Python

4. まとめ • ハーネスエンジニアリング： ◦ エージェントのミスが⼆度と起きない仕組みを設計していく考え⽅ ◦ 根底にある考え⽅は既存のソフトウェアエンジニアリングと類似 1.
ブラウザ操作による⾃⼰検証の実現（Chrome DevTools MCP） 2. 技術的負債のガベージコレクション（/schedule定期実⾏） 3. エージェントに「ルール」を守らせる（リンター / CI） • 深掘りした3つ：

Appendix 47 Appendix

Appendix ▼ガベージコレクションとは？ • GC = 不要になったメモリを⾃動的に検出‧解放する仕組み ◦ プログラムは実⾏中にメモリを確保して使う ◦ 使い終わったメモリを放置すると、メモリリーク（不要なメモリが解放
されず残り続ける問題）が起きる参考： MDN Web Docs Memory management

Appendix • コードの分岐の多さ（複雑さ）を数値化する指標 ◦ if⽂、forループ、switch⽂などの分岐が増えるほど数値が上がる • リンターのように設定可能循環的複雑度とは？ ▼補⾜：循環的複雑度にハードリミットを設ける

Appendix • この関数の循環的複雑度は 5（= 全パターンのテストケースのようなイメージ） • 参考：ruﬀでこの項⽬を有効化した場合デフォルト値は10 Python 参考：max-complexity

Appendix 測れるもの（メタ的な項目）測れないもの（ドメイン系の項目）メソッドレベルの制御フローの複雑さアーキテクチャの適切性（クラス設計、責任分離）条件分岐・ループの数ビジネスロジックの妥当性（要件との整合性）テストしやすさの目安命名の分かりやすさパフォーマンス
▼循環的複雑度で測れるもの‧測れないもの

Appendix 測れるもの（メタ的な項目）測れないもの（ドメイン系の項目）メソッドレベルの制御フローの複雑さアーキテクチャの適切性（クラス設計、責任分離）条件分岐・ループの数ビジネスロジックの妥当性（要件との整合性）テストしやすさの目安命名の分かりやすさパフォーマンス
▼循環的複雑度で測れるもの‧測れないもの • 循環的複雑度だけですべての品質問題が解決されるわけではない • 他の指標（import-linter、mypy等）と組み合わせて使う

Appendix ▼循環的複雑度の設定例関数内の⾏数関数内の条件分岐の数設定有効化参考：complex-structure ruff

参考資料 54 1. My AI Adoption Journey https://mitchellh.com/writing/my-ai-adoption-journey 2. Harness
engineering: leveraging Codex in an agent-ﬁrst world https://openai.com/ja-JP/index/harness-engineering/ 3. ClaudeCode Docs Schedule tasks on the web https://code.claude.com/docs/en/web-scheduled-tasks 4. web.dev Web Vitals https://web.dev/articles/vitals 5. chrome-devtools-mcp performance_start_trace https://github.com/ChromeDevTools/chrome-devtools-mcp/blob/main/docs/tool-referen ce.md#performance_start_trace

参考資料 55 6. Import-Linter https://import-linter.readthedocs.io/en/stable/ 7. Ruff max-complexity https://docs.astral.sh/ruff/settings/#lint_mccabe_max-complexity 8.
Ruff complex-structure (C901) https://docs.astral.sh/ruff/rules/complex-structure/ 9. MDN Web Docs Memory management https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Memory_management#ga rbage_collection

コーディングエージェントのための環境設計 -- 仕組みで品質を守る --

コーディングエージェントのための環境設計 -- 仕組みで品質を守る --

More Decks by enumura

Other Decks in Technology

Featured

Transcript