1. Open 5 browser tabs ( or ask your favorite agent) 2. Copy the CVE into chat 3. Ask in Slack: “false positive?” 4. Context-switch … then code quality fixes and better tests Late feedback is expensive feedback.
while you still have context. Late feedback (pain) CI discovers issues after merge Dev loses context Fixes are rushed and risky Compliance feels like a tax Early feedback (flow) IDE detects issues pre-emptively Fix in the same mental model Automation increases resolution speed Compliance becomes default path
their time on? Writing Code About 8h a day, right ?! CI/CD Builds, deployments, pipeline maintenance Meetings 2-4h a day in standups, reviews, syncs? User Requests Bug reports, support escalations, ad hoc asks Juggling Deadlines Vulnerabilities, upgrades, audits, tech debt
Or less — actual feature development Meetings & Requests Communication and coordination overhead What Drains the Rest • Handling security vulnerabilities and dependency upgrades • Compliance audits and policy enforcement • Quality metrics: coverage gaps, tech debt, security findings • Manual refactoring and modernization work
and AI is now a first-class participant. The “em-dash” was intentionally added. Tools Evolving to Help Both open-source and commercial solutions targeting developer productivity AI Agents Can Help AI coding agents can automate compliance, testing, and modernization tasks One Unified Goal Productivity + Compliance + Quality, without the manual grind CHAPTER 1
Agents comply best when requirements are measurable. Setup Commands Exact build, test, and lint commands the agent can run Definition of Done Tests + lint + format + security checks must all pass Quality Gates "Must satisfy Sonar rules / 0 new Criticals / Qodana pass" PR Expectations Tests updated, changelog noted, suppressions justified Practical Strategy Write a solid root AGENTS.md, then copy the top 10– 20% most critical rules into each tool's native file: .github/copilot-instructions.md, CLAUDE.md, and .cursorrules. Every agent gets the same non- negotiables. In monorepos, place a stricter AGENTS.md inside a subdirectory (e.g., /backend) to override the root for that subtree — the nearest file wins.
accidentally broke standards. What Devs see “Looks fine to me.” Small change. Clean diff. Merged fast… What CI sees Style violations New static analysis findings Coverage regression
No shared rules Every tool behaves differently No 'Definition of Done' Not measurable Vague instructions Hard to verify Late feedback Find issues in CI Rework + context switch
here, then iterate. # AGENTS.md (root) ## Setup - Use Java 21 (or org standard) - Build: ./mvnw -q -DskipTests=false test - Lint/format: ./mvnw -q spotless:check ## Definition of Done (non-negotiable) - All tests pass (unit + integration where applicable) - No new Blocker/Critical issues in CI - Coverage does not decrease for touched code - No new high-severity vulnerabilities ## PR expectations - Update/add tests for behavior changes - Explain any suppression (false positive + narrow scope) Tip: keep this under ~60 lines. Agents follow short docs best.
success. Vague “Improve code quality.” “Fix security issues.” “Add more tests.” Testable “0 new Critical Findings in CI.” “No coverage decrease.” “Add 3 tests: happy / failure / edge.” If you can’t test it, the agent can’t reliably do it.
copy the top 10–20% into each tool’s native file. AGENTS.md (root) Adapters (tool-specific) .github/copilot-instructions.md CLAUDE.md .cursorrules …others as needed Intent: Every agent sees the same non-negotiables.
like a junior dev: small steps + verification. 1. Agent proposes plan (no code yet) 2. Agent makes a small change set 3. Agent runs build + tests + linters 4. Agent fixes findings (no config edits) 5. Human reviews + merges Guardrails reduce rework and make agents safe in enterprise.
path is fast. AI agents become reliable development partners when teams provide: • clear rules • measurable quality gates • a shared Definition of Done
blocked by a vulnerability. What dev says “This is a tiny feature.” “Why is security blocking us?” “Can we ignore it just this once?” What the org needs Known risk reduced Audit trail of remediation Consistent enforcement Tools help when they provide a clear, fast fix path.
Good defaults Block new high-severity issues Fix what you touch Timebox backlog reduction Prefer upgrades over suppressions Suppressions allowed only if… Confirmed false positive Narrowest possible scope Comment explains why Ticket/link for follow-up
(non-negotiable): - Do NOT change qodana.yaml / sonar config / CI workflows. - Fix root cause; do NOT suppress unless confirmed false positive. - Run: ./mvnw test (and include output summary). - Add/adjust tests for the behavior change. Task: - Remediate the finding in FooService (path shown in report). - Explain the fix in 3 bullet points. Prompt pattern Using an Agent to Remediate Safely Give the agent guardrails: what to fix, what NOT to touch.
Must-pass quality gates 1. Qodana: PASS with 0 new issues. 2. Tests: green before PR. ## Non-negotiable rules - Treat findings as merge blockers. - Respect qodana.yaml — do not modify. - Fix root cause; do NOT suppress. - Only add @SuppressWarnings if: - confirmed false positive AND - narrowest possible scope AND - comment explains why. ## Run locally qodana scan --fail-threshold 0 # or: ./gradlew qodanaScan
unambiguous. ## Definition of Done - Pass SonarQube Quality Gate in CI. - Zero new Blocker/Critical/Major issues. - Fix Sonar issues in any code you touch. - No //NOSONAR without justification. ## Sonar-friendly coding rules - Small methods; low cognitive complexity. - Always use try-with-resources. - Never swallow exceptions. - No eager log string construction. - Keep tests updated with behavior. ## PR notes "SonarQube: PASS (no new issues)" “policy as code” beats tribal knowledge.
path is fast. • Separate signals (SCA vs SAST vs secrets) • Shift-left: IDE + PR feedback, CI enforcement • Automate dependency PRs; humans review risk • Use agents for remediation — with guardrails
Scaffolded by Copilot with proper Arrange/Act/Assert structure. Diffblue Cover AI-generated Java tests. Speeds up coverage but often produces skeleton tests — always review. PiTest Mutation testing validates test quality, not just coverage. Slower runs, best in CI. High Coverage ≠ High Quality If your tests never fail, do they really test? Aim for meaningful assertions, not line counts. Diffblue tests are built to pass — verify they actually catch regressions. AGENTS.md testing rule: Every public method needs at minimum 3 tests — one happy path, one failure path (assertThrows), and one boundary/edge case. Name tests as method_whenCondition_thenResult. CHAPTER 3
don’t have proof. What happens Coverage dips by a tiny amount CI blocks merge Team scrambles for tests Root cause Tests weren’t part of the flow Low confidence in behavior Quality gate becomes a fight
do they really test? Coverage tells you… What lines ran What didn’t crash A rough signal Quality tells you… Behavior is asserted Regressions get caught Edge cases are covered Goal: reduce uncertainty, not chase numbers.
a few high-value integration tests. Unit (many) Pure functions • Business rules • Fast feedback Integration (some) DB + HTTP clients • Testcontainers where useful • Real wiring Contract/E2E (few) Critical paths only • Run in CI/nightly • High signal Mutation (many) Unit test quality • Run locally/CI • Fast feedback
and assertions. 1. Agent generates a test skeleton (Arrange/Act/Assert) 2. You add meaningful assertions and edge cases 3. Agent refines and runs tests until green 4. You review for readability + maintainability
tool changes your code by introducing mutants. Good tests should fail when code altered. What it catches Missing assertions Over-mocked tests Logic that isn’t validated How to use it Run on key modules In CI/nightly (it’s slower) Track killed vs survived mutants If mutants survive, your tests might be lying.
gates stop working. Common causes Time / randomness Shared state Order dependencies External services Good practices Deterministic tests Isolated fixtures Use containers for dependencies Quarantine & fix quickly
not a tax at the end. • Coverage gates are fine, but pair them with easy test scaffolding • Prefer high-signal assertions over line-count heroics • Use mutation testing on critical modules • Fix flakiness fast or Devs stop trusting CI
small. Prove value. Then scale. 30 days Pick 1 service/module Add AGENTS.md (minimal) Turn on 1–2 gates Automate dependency PRs 60 days Expand to team repos Add test scaffolding flow Triage policy + suppressions Track rework reduction 90 days Bake into CI templates Run mutation tests nightly Modernization recipes Make compliance boring
a workflow. IDE / local Fast linting • quick scans • agent-assisted edits PR checks Actionable comments • dependency PRs • lightweight gates CI gates Hard enforcement • evidence • consistent policy AGENTS.md (contract) Definition of Done Commands to run Quality + security rules Test expectations Suppression policy