Spec-Driven Development with AI Agents (Workshop, May 2026)

From Spec to Screen Spec-Driven Development with AI Agents @antonarhipov

From Spec to Screen Spec-Driven Development with AI Agents @antonarhipov
Context

Expectations management github.com/antonarhipov/sdd-workshop gist.github.com/antonarhipov

Expectations management * Application: CSV -> DB -> Web ->
UI github.com/antonarhipov/sdd-workshop gist.github.com/antonarhipov

UI * Use: AI agents github.com/antonarhipov/sdd-workshop gist.github.com/antonarhipov

UI * Use: AI agents * Learn: Spec-Driven Development github.com/antonarhipov/sdd-workshop gist.github.com/antonarhipov

UI * Use: AI agents * Learn: Spec-Driven Development * Side effect: explore Intell iJ IDEA github.com/antonarhipov/sdd-workshop gist.github.com/antonarhipov

Spec-Driven Development (SDD)

Spec-Driven Development (SDD) * Distill requirements * Generate plan *
Breakdown to tasks * Implement What people generally mean by "SDD"

Tools Methods Context window

Tools Methods Prompt and commands Agent skills Subagents MCP AGENTS.md
Context window

Context window SDD?

Context window SDD? BMAD spec-kit Kiro openspec agent-os intent-integrity-chain Toolkits

The intuition for SDD Get the thinking right, and any
tool becomes your instrument.

Implement batch job with Spring Batch and JDBC to import
temperature data from CSV files into MySQL database. How many clarifying questions can you ask?

temperature data from CSV files into MySQL database: 1. Extract "name", "datetime", and "temp" columns from the csv file, ignore other columns. 2. The "name" and "datetime" columns make a unique pair. 3. The duplicate entries should be reported and ignored. 4. Print the summary, how many records were inserted to the database, and how many duplicates were detected. 5. Use Testcontainers for integration testing (must not use H2). 6. Use Java 21 compatible features. Use Java records instead of POJOs for data. proposal.md Is this detailed enough now?

Par t 1 : The naive process

proposal.md Model: GPT-5

proposal.md Model: Sonnet 4.5

proposal.md Model: Sonnet 4.5 Model: GPT-5 Different LLMs, different result
- this is expected

plan.md

plan.md Why not just ...

plan.md "Make no mistakes!" Why not just ...

plan.md It is not possible to track the "plan". Let's
create an ordered task list that we could track

plan.md prompt

plan.md prompt GPT-5

plan.md prompt GPT-5 Sonnet 4.5

plan.md prompt GPT-5 Sonnet 4.5 Again, the result depends on
the selected LLM.

plan.md prompt GPT-5 Sonnet 4.5

plan.md AGENTS.md The prompts are getting more specific to become
LLM-agnostic

Naturally tracking the progress during the execution Done Next task

proposal.md Vague description of the application (or a feature) that
we want to build

proposal.md plan.md REVIEW! prompt Make the implementation plan

proposal.md plan.md prompt tasks.md REVIEW! REVIEW! prompt Make a trackable
task list

plan.md prompt tasks.md REVIEW! REVIEW! prompt

It looks like the naive process could work pretty well!

github.com/antonarhipov/eshop Let's try doing something very conventional, a CRUD app
with Spring ... and let the teammates to review the result

github.com/antonarhipov/eshop Terrible layered architecture, it's easier to rewrite than to
refactor

refactor Should use @Transaction(readOnly=true) for read-only operations

refactor Exception handling code duplicated in controllers even if there is a GlobalExceptionHandler Should use @Transaction(readOnly=true) for read-only operations Inefficient use of Spring Data JPA features Testing with H2 - prefer Testcontainers for integration testing Application-specific settings in proper t ies file instead of the database? Mapping of entities to DTO objects should not happen in the service layer

refactor Mixing SSR and REST API REST API endpoits secured but not configured Exception handling code duplicated in controllers even if there is a GlobalExceptionHandler Should use @Transaction(readOnly=true) for read-only operations String concatenation instead of multi-line strings Inefficient use of Spring Data JPA features Testing with H2 - prefer Testcontainers for integration testing Test data created in test code, Flyway not applied Application-specific settings in proper t ies file instead of the database? Mapping of entities to DTO objects should not happen in the service layer

github.com/antonarhipov/eshop Lesson learned: if I would spend more time on
figuring out, what and how exactly the application (or feature) should be built, then I'd get a better result

Par t 2 : Custom SDD process

temperature data from CSV files into MySQL database: 1. Extract "name", "datetime", and "temp" columns from the csv file, ignore other columns. 2. The "name" and "datetime" columns make a unique pair. 3. The duplicate entries should be reported and ignored. 4. Print the summary, how many records were inserted to the database, and how many duplicates were detected. 5. Use Testcontainers for integration testing (must not use H2). 6. Use Java 21 compatible features. Use Java records instead of POJOs for data. proposal.md github.com/antonarhipov/sdd-workshop Let's try building an application, but spend more time planning: refine the proposal into more detailed requirements, acceptance criteria, constraints, etc Demo project

github.com/antonarhipov/sdd-workshop speci fi cation artefacts Implementation of the initial spec:
CSV fi le to DB import UI implementation using superpowers plugin UI implementation using superpowers plugin initial state

Put the initial (vague) idea into the proposal.md

See the commands and skills in .junie and .claude folders

proposal.md requirements.md prompt acceptance_criteria.md constraints.md plan/tasks prompt

proposal.md requirements.md prompt acceptance_criteria.md constraints.md plan/tasks prompt This is just
a custom process that demonstrates the idea of refining the vague proposal into detailed specification and implementation plan

# Role You are a Senior Business Analyst preparing requirements
for implementation by an AI coding agent. # Task Analyze the following feature request and identify: 1.AMBIGUITIES - unclear or vague statements that need clarification 2.MISSING INFORMATION - what's not specified but needed for implementation 3.IMPLICIT ASSUMPTIONS - things that seem assumed but should be explicit 4.EDGE CASES - scenarios not addressed in the description 5.CLARIFYING QUESTIONS - questions to ask the stakeholder **Important** Use the AskUserTool to clarify the questions with the user. Ask the questions sequentially, one question at a time, one by one. # Feature Request See @file:proposal.md # Output Format Provide your analysis in structured sections. For each clarifying question, explain WHY this information matters for implementation. # Output File Write the results into spec/requirements.md file /clarify

for implementation by an AI coding agent. # Task Analyze the following feature request and identify: 1.AMBIGUITIES - unclear or vague statements that need clarification 2.MISSING INFORMATION - what's not specified but needed for implementation 3.IMPLICIT ASSUMPTIONS - things that seem assumed but should be explicit 4.EDGE CASES - scenarios not addressed in the description 5.CLARIFYING QUESTIONS - questions to ask the stakeholder **Important** Use the AskUserTool to clarify the questions with the user. Ask the questions sequentially, one question at a time, one by one. # Feature Request See @file:proposal.md # Output Format Provide your analysis in structured sections. For each clarifying question, explain WHY this information matters for implementation. # Output File Write the results into spec/requirements.md file Previously: /clarify

for implementation by an AI coding agent. # Task Analyze the following feature request and identify: 1.AMBIGUITIES - unclear or vague statements that need clarification 2.MISSING INFORMATION - what's not specified but needed for implementation 3.IMPLICIT ASSUMPTIONS - things that seem assumed but should be explicit 4.EDGE CASES - scenarios not addressed in the description 5.CLARIFYING QUESTIONS - questions to ask the stakeholder **Important** Use the AskUserTool to clarify the questions with the user. Ask the questions sequentially, one question at a time, one by one. # Feature Request See @file:proposal.md # Output Format Provide your analysis in structured sections. For each clarifying question, explain WHY this information matters for implementation. # Output File Write the results into spec/requirements.md file Make the LLM find what's missing from the initial proposal /clarify

for implementation by an AI coding agent. # Task Analyze the following feature request and identify: 1.AMBIGUITIES - unclear or vague statements that need clarification 2.MISSING INFORMATION - what's not specified but needed for implementation 3.IMPLICIT ASSUMPTIONS - things that seem assumed but should be explicit 4.EDGE CASES - scenarios not addressed in the description 5.CLARIFYING QUESTIONS - questions to ask the stakeholder **Important** Use the AskUserTool to clarify the questions with the user. Ask the questions sequentially, one question at a time, one by one. # Feature Request See @file:proposal.md # Output Format Provide your analysis in structured sections. For each clarifying question, explain WHY this information matters for implementation. # Output File Write the results into spec/requirements.md file Force the LLM to ask clarifying questions /clarify

The LLM star t s asking questions to clarify the
task

For the demo: assume the LLM can choose the best
option (don't try this at home!)

proposal.md requirements.md prompt acceptance_criteria.md constraints.md plan/tasks prompt This is the
first ar t efact for the spec, describing what we are going to build /clarify

# Role You are a QA Architect writing formal acceptance
criteria for an AI coding agent. # Context See @file:proposal.md file # Task Analyze the list of requirements in @file:requirements.md Write acceptance criteria using WHEN-THEN-SHALL format. ## Format Rules •WHEN: describes the precondition or trigger •THEN: describes the action or input •SHALL: describes the expected observable outcome •Each criterion must be independently testable •Focus on BEHAVIOR, not implementation •Include happy path, edge cases, and error scenarios •Group criteria by category Output File: Write the results to `spec/acceptance_criteria.md` /analyze

Looks like a good input for generating the tests

proposal.md requirements.md prompt acceptance_criteria.md constraints.md plan/tasks prompt /analyze

# Role You are a Software Architect defining technical constraints
for an AI coding agent. # Project context See @file:proposal.md # Task Analyze the @file:requirements.md and @file:acceptance_criteria.md and define technical constraints covering: 1. Project structure (packages, modules) 2. Component design (classes, interfaces, patterns) 3. Technology decisions (specific libraries, configurations) 4. Code style (naming, patterns to follow, anti-patterns to avoid) 5. Testing strategy (what to test, how to test) # Output Format Use clear, imperative statements. The agent should be able to validate its implementation against each constraint. # Constraint Categories For each category, specify: • MUST: mandatory requirements • SHOULD: strong preferences • MUST NOT: explicit prohibitions # Output file Write the results to `spec/constraints.md` file and link to the relevant specs /constrain

Is this all part of the speci fi cation, or
could these be agent skills?

proposal.md requirements.md prompt acceptance_criteria.md constraints.md plan/tasks prompt /constrain

# Role You are a Specification Reviewer ensuring completeness before
implementation. # Task Review the specification package (Requirements + Acceptance Criteria + Technical Spec) and identify any gaps, contradictions, or ambiguities that could cause implementation issues. # Checklist ## Completeness • [ ] Every acceptance criterion has a clear test strategy • [ ] All error scenarios have defined behavior • [ ] Edge cases are explicitly addressed • [ ] Performance requirements are measurable ## Consistency • [ ] No contradictions between acceptance criteria and technical spec • [ ] Package structure supports all specified components • [ ] Data types are consistent throughout ## Implementability • [ ] Technical constraints are specific enough to be validated • [ ] No circular dependencies in component design • [ ] All external dependencies are identified ## Testability • [ ] Each acceptance criterion maps to at least one test case • [ ] Test data requirements are clear • [ ] Success/failure conditions are unambiguous # Output List any issues found with severity (BLOCKER / MAJOR / MINOR) and suggested resolution. What if we make the LLM to review the specs /validate

"Apply more reasoning effor t "

Review and fix the blockers/majors. (I asked the model to
fix the specs according to the suggested resolution)

The agent decided to rewrite the initial specs

proposal.md requirements.md prompt acceptance_criteria.md constraints.md plan/tasks prompt The validation step
will find inconsistencies and update the specs /validate prompt

proposal.md requirements.md prompt acceptance_criteria.md constraints.md plan/tasks prompt Ready to generate
the task list! /generate-plan

# Role You are an AI coding agent creating an
implementation plan from a specification. # Input • Specification document: @file:proposal.md • Requirements: @file:requirements.md • Acceptance criteria: @file:acceptance_criteria.md * Technical constaints: @file:constraints.md # Task Analyze the specification and produce an EXECUTION PLAN with: 1. Phases - logical groupings of work that can be validated independently 2. Tasks - atomic units of work within each phase 3. Dependencies - what must complete before each task can start 4. Validation criteria - how to verify each task is complete 5. Checkpoints - points where human review is recommended # Plan Requirements ## Task Granularity • Each task should be completable in a single focused effort • Tasks should produce a verifiable artifact (file, test passing, etc.) • Tasks should be small enough to rollback if wrong ## Dependency Rules • No circular dependencies • Minimize cross-phase dependencies • Infrastructure before business logic • Interfaces before implementations ## Checkpoint Placement Place checkpoints after: • Project structure creation • Core domain model completion • Each major component integration • Test suite completion • Final integration # Output Format ```yaml plan: name: "{feature name}" phases: - id: phase-1 name: "{phase name}" description: "{what this phase accomplishes}" tasks: - id: task-1.1 name: "{task name}" description: "{what to do}" artifact: "{file path or outcome}" depends_on: [] validation: "{how to verify completion}" - id: task-1.2 ... checkpoint: description: "{what to review}" criteria: ["{criterion 1}", "{criterion 2}"] ``` # Constraints • Maximum 5 phases • Maximum 7 tasks per phase • Every task must have a validation criterion • Every phase must end with a checkpoint • Report if the result does not fit the constrants IMPORTANT: do not start implementing the tasks, only output the task list # Output File Write the result to spec/plan.yaml file Finally, generate the task list The output doesn't have to be in *.md

The LLM decided to validate if the plan is a
valid YAML file

The tasks are grouped into phases

proposal.md requirements.md prompt acceptance_criteria.md constraints.md plan/tasks prompt /generate-plan Now we
have the task list; we can proceed with the implementation

The AGENTS.md contains the instructions how to work with the
spec and the task list

Let's ask the agent to implement the tasks in batches,
by phases This allows to scope session and keep the context window under control

Let's ask the agent to implement the tasks in batches,
by phases This allows to scope session and keep the context window under control And we also stay in the loop by reviewing the results after each phase is complete. No vibes!

Progress tracking in a separate file, as instructed by AGENTS.md

Breaking changes in Spring Batch 5.x- > 6.x

The agent needed additional information how to deal with Spring
Batch 5.x- > 6.x changes. I created a skill for this purpose

The agent repor t ed that its in trouble with
Spring Batch Reading the agent skill Correct new impor t s for Spring Batch 6.x

Phase by phase, eventually, the implementation is complete

See the commits in the 'implementation' branch

proposal.md requirements.md prompt acceptance_criteria.md constraints.md plan/tasks prompt 1. Refine the
initial high- level idea 2. Review and verify the spec 3. Execute task. Use AGENTS.md to steer the agent

proposal.md requirements.md prompt acceptance_criteria.md constraints.md plan/tasks prompt 1. Refine the
initial high- level idea 2. Review and verify the spec 3. Execute task. Use AGENTS.md to steer the agent IMPORTANT: the process and format is not set in stone, you can adjust everything! /clarify /analyze /constrain /validate /generate-plan

proposal.md requirements.md prompt acceptance_criteria.md constraints.md plan/tasks prompt DISCUSSION POINT: do
you need all these artefacts or is it an overkill?

you need all these artefacts or is it an overkill? For instance, the technical constraints could be replaced with a reusable agent skill that enforces the style and the technology decisions of your organization

you need all these artefacts or is it an overkill? For instance, the technical constraints could be replaced with a reusable agent skill that enforces the style and the technology decisions of your organization Or maybe it's fi ne to create a fl at list of tasks in *.md fi le?

you need all these artefacts or is it an overkill? For instance, the technical constraints could be replaced with a reusable agent skill that enforces the style and the technology decisions of your organization Or maybe it's fi ne to create a fl at list of tasks in *.md fi le? And maybe the clarifying promp could be made better to clarify even more details?

Breakdown to tasks * Implement

Breakdown to tasks * Implement Spend more time clarifying the requirements Clarify details, find missing information, ambiguities, contradictions, etc This will lead to a better result

Learn from SDD principles (better planning, structured speci fi cations,
architectural thinking) but don't treat it as dogma.

speakerdeck.com/antonarhipov @antonarhipov github.com/antonarhipov learn.deeplearning.ai/courses/spec-driven-development-with-coding-agents

Spec-Driven Development with AI Agents (Worksho...

Spec-Driven Development with AI Agents (Workshop, May 2026)

More Decks by Anton Arhipov

Other Decks in Programming

Featured

Transcript