Process Based on Human-AI Responsibility Separation ― BSDD: Boundary Spec Driven Development ― What should humans define to entrust autonomous execution to AI? 1
how to manage the knowledge and information referenced by LLMs. • This presentation takes a different perspective: when incorporating generative AI into software development, it examines what humans should define and where responsibility should be delegated to AI. • For production use of LLMs, it is important not only to manage models and data, but also to explicitly define the scope entrusted to AI as part of the development process. 2
• Expertise: AX Enablement / AI Product Development • Research Theme: Human-AI Collaboration and Responsibility Separation — About Me — Received the DICOMO Outstanding Paper Award last year. This year, I propose BSDD, a development process that delegates autonomous execution to AI.
AI has made it possible to implement software from natural-language instructions (e.g., vibe coding). • However, when implementation proceeds without specifications, AI implicitly fills in even design decisions, making it difficult to validate the soundness of the design. 6 Natural language Why did it choose these fields and this screen? Implicit completion Scope decided by AI
organizes specifications such as requirements and design policies, then AI performs implementation. • By making design intent explicit, AI can more easily design and implement in line with human intention. 7 What to build How to build Organized as specifications • requirements • design policies • implementation policies Scope decided by AI
make AI design and implement as expected, people tend to keep adding detailed decisions to specifications, in addition to implementation policies and test perspectives. • As a result, the volume and scope of specifications expand, making it difficult to embed the method in a continuous development process. • Updating specifications, reviewing them, and confirming design validity all become costly. 8 Specification Even a small change can reach 1,000+ lines AI implementation Not as expected Humans add more details
Vibe Coding Without specifications, AI may fill in even decisions humans should make. B) SDD (Specification-Driven Development) Specifications can be structured, but the scope tends to expand as we try to make AI implement exactly as expected 9 At the core is not whether specifications exist, but whether the scope humans must finalize as specifications is clear.
process that clarifies the scope humans should define as specifications and the scope that should be delegated to AI. Boundary Spec Driven Development (BSDD) is proposed 10
structures specifications and connects them to AI implementation. • BSDD (Boundary Spec-Driven Development) focuses on the boundary between what humans must define and what AI can be entrusted to do. 11 Example: screen fields and API contracts are decided by humans; design and implementation are delegated to AI. Human-defined scope AI-delegated scope
development This study targets microservices-based development, in which multiple services and development roles collaborate from the product objective. • Examples: front-end, back-end, design, infrastructure, and other teams or development units collaborating across service boundaries. In such development, specifications requiring agreement across services are important.
process is divided into two layers, starting from the product objective. • Boundary Definition Layer (Human): defines specifications requiring agreement across services as boundary specifications. • Execution Layer (AI): concretizes design and implementation policies for each service based on boundary specifications.
15 1. Identify requirements from product objectives. 2. With AI support, extract and organize specifications that require agreement across services. 3. Humans finalize them as boundary specifications and use them as input to the execution layer. Before the execution layer, humans take responsibility for identifying the specifications that require agreement. Product Objective 1. Requirements identification 2. Extract specifications requiring agreement 3. Define boundary specifications Input to Execution Layer
Specifications 16 • Items are included in boundary specifications based on whether agreement across services is required. • They are not a fixed rule; they vary depending on the impact of agreement in each project. Examples included as boundary specifications Background / objective Use cases Screen flow / transitions API input/output External integration data These boundary specifications are the “system skeleton”: the specifications that must be agreed across services. Examples delegated to the Execution Layer Internal design Implementation policies Test cases Task lists Internal model structure
and Behavior 17 2. Use Case Agree on behavior across services 1. User Story Map Agree on the value to deliver • Structuring key boundary specifications visually reduces cognitive load and enables early agreement.
Contract 18 These are boundary specifications because they affect multiple services 3. Wireframe (Screen) Agree on screen composition and input fields 4. API Interface (API Contract) Agree on inputs and outputs between services
• In the execution layer, implementation proceeds for each service, starting from boundary specifications. • For each service, AI proposes an execution plan, which humans verify. • After approval, AI proceeds in parallel from code generation through PR creation according to the plan.
• In BSDD, AI does not implement boundary specifications directly. It organizes them into “implementation units that AI can handle easily” as an execution plan. (Example: API change) Define implementation policy, work sequence, and validation method for each layer. • After approval, AI uses this plan as an “implementation map,” proceeding step by step from implementation to PR creation. PR Execution plan example: API change 1. API Contract Policy: follow boundary spec Skill: OpenAPI Verify: OpenAPI lint 2. Data Layer Policy: follow existing structure Skill: migration Verify: migration test 3. API Implementation Policy: follow existing layers Skill: usecase Verify: unit test / lint
harness is a foundational technology that has attracted increasing attention for supporting AI autonomous execution. • In BSDD, the execution layer provides AI with product- and service-specific knowledge, constraints, and verification methods. • This enables AI to implement and validate steadily in accordance with the execution plan.
In BSDD, steps that humans must review are consolidated into boundary specifications, execution plans, and PRs. • As AI handles the remaining elaboration, humans can focus on critical decisions.
25 • Conventional SDD: to have AI implement as expected, specifications tend to become increasingly detailed. • BSDD: the scope written by humans is consolidated into boundary specifications, while AI concretizes the rest. Conventional SDD BSDD Scope humans tend to write Requirements Basic Design Detailed Design Implementation Policy Test Perspectives Internal Design Human writing scope Boundary Specification AI concretization scope Detailed Design Implementation Policy Test Perspectives Internal Design
26 • By consolidating the scope of specification writing, we reduced both specification volume and specification effort. • For implementations based on boundary specifications, we observed no critical incidents or rollbacks within the observed scope. • These are preliminary observations, but they indicate feasibility for production use. Item Target Conventional SDD (median) BSDD (median) Reduction Specification volume 20 cases 1,213 lines 228 lines 84.5% Specification planning effort 8 cases 4h 1h 75% Severe incidents / rollback 8 cases — None — *Initial confirmation of applicability; not a strict causal inference. Specification planning effort (h, median) 4h 1h SDD BSDD
specifications that require agreement and delegate elaboration to the AI Execution Layer. • As a result, human review steps and the scope of specification writing are consolidated, allowing the development process to be more efficient while preserving design intent. • The significance of this research lies in demonstrating a development process in which humans concentrate on critical decisions and AI handles elaboration, by defining the responsibility boundary between humans and AI. • Future work will make the evaluation methods more rigorous and expand the application scope to test reproducibility and effectiveness
the “boundary specification” that forms the system's skeleton Beyond that, we must establish mechanisms that enable AI to execute autonomously (e.g., Harness, Loop Engineering) This is the separation of responsibilities between humans and AI that supports AI autonomy. 29
evaluation, BSDD was applied under the following policy to maintain responsibility separation between humans and AI. 1. Humans focus on defining boundary specifications that require agreement across services. 2. Concretization completed within a service is delegated to the AI execution layer. To maintain this separation, humans do not continue to substitute for AI decisions. Instead, the harness—knowledge, constraints, and validation —is continuously improved so that AI can execute reliably.
the harness? • It includes harness preparation, but that is not the main purpose. BSDD clarifies what humans decide and what AI is entrusted with, and positions the harness as the mechanism that supports execution. Q. Does BSDD lower quality? • It is not intended to reduce quality. BSDD retains the emphasis on reviewing specifications, execution plans, and PRs; its aim is to clarify points that humans must review. Q. How does it differ from AI development support tools offered by companies? • Such tools support AI execution. BSDD is a framework that comes before them, organizing what humans decide and where they delegate to AI. Q. Wouldn't specifications be unnecessary if AI summarizes code after implementation? • Code summaries help explain implementation results. However, it is difficult to retain what was agreed beforehand and why decisions were made, so BSDD organizes boundary specifications before implementation. Q. Isn't deciding boundary specifications subjective? • The criterion is whether agreement across services is required. Screens, APIs, roles, data, and business rules that affect multiple stakeholders are treated as boundary specifications Q. What types of development can it apply to? • It primarily targets web development in which multiple services interact through APIs and similar interfaces. Applicability to monoliths and greenfield development is future work. Q. What if AI does not implement as expected? • Rather than have people implement the work individually, revise execution foundations such as rules, knowledge, and skills. If a deviation affects an agreed matter, return to the Boundary Definition Layer for confirmation Q. Do data models belong in boundary specifications? • It depends. Business concepts, states, and API responses that affect multiple services belong there. Table definitions and detailed internal model structures are, in principle, elaborated by AI in the Execution Layer, with validity checked in the execution plan. Appendix: Q&A
single-service development? • Yes. Even in a single service, items that require agreement with others - such as product objectives, behavior, or acceptance criteria - are treated as boundary specifications. The design and implementation details completed inside the service are, in principle, delegated to the AI Execution Layer Q. Is quality evaluation sufficient? • At present, it is an initial evaluation: we verify that external behavior conforms to boundary specifications and that no critical incidents or rollbacks have occurred. Internal quality, maintainability, review effort, and defect density remain areas for future evaluation. Q. How will evaluation be expanded? • We plan to examine defect density, number of review comments, review effort, complexity, changeability, and post-release incidents, assessing effects on quality, maintainability, and operational quality from multiple perspectives. Q. Why use agreement across services as the boundary? • Because items that affect multiple roles or services must be agreed upon with human accountability. This is why agreement across services is used as the criterion for boundary specifications. Q. What if boundary specifications are incomplete? • If external behavior or stakeholder agreement is affected, return to the Boundary Definition Layer for confirmation. Do not force the adjustment solely within the Execution Layer. Q. Isn't the execution plan a detailed design? • No. It is an implementation plan used to verify the units in which AI will proceed. Humans verify whether the implementation approach and split are appropriate. Q. Can it be used before the harness is mature? • Yes, but effectiveness will be limited. Improving the harness remains important so AI can execute reliably. Q. What are the prerequisites for the evaluation? • Specification volume compares conventional SDD documents with BSDD boundary specifications across the same 20 cases. Effort and quality were assessed in 8 real cases involving adding a single API. It is not a rigorous causal estimate; it is an initial evaluation to confirm production feasibility. Appendix: Q&A