Advancing with Java

Beyond Coding Taming AI Agents, Static Analysis, and Automated Testing
to Reclaim Developer Productivity RODRIGO GRACIANO & CHANDRA GUNTUR Advancing with Java

A Friday Horror Story It’s Friday. 4:55 PM. You hit
merge.

The Pipeline Turns Red Not a 'one test failed' red.
A full Christmas-tree red. PR CI Deploy

You Discover You Hit the Trifecta The PR can possibly
fail for three different reasons. Security Dependency CVE • High risk • Blocked merge Quality Gate Static analysis findings • New critical findings • Suppressions Coverage 0.7% below threshold • Gate says “no” • Write tests now

What Happens Next? This is where velocity goes to die.
1. Open 5 browser tabs ( or ask your favorite agent) 2. Copy the CVE into chat 3. Ask in Slack: “false positive?” 4. Context-switch … then code quality fixes and better tests Late feedback is expensive feedback.

An Alternate Path What if: • … this could be
handled differently with early detection • … we could then “shift-left” the handling of such problems

Same PR. Different Mindset. Move signals earlier so fixes happen
while you still have context. Late feedback (pain) CI discovers issues after merge Dev loses context Fixes are rushed and risky Compliance feels like a tax Early feedback (flow) IDE detects issues pre-emptively Fix in the same mental model Automation increases resolution speed Compliance becomes default path

Roadmap We’ll revisit that Friday merge at each chapter. Chapter
1 • AI agents • AGENTS.md guardrails • Definition of Done Chapter 2 • Static analysis • Security & SCA • Dependency automation Chapter 3 • Tests & coverage • Mutation testing • Quality > numbers

Meet the Speakers Rodrigo Graciano JUG Leader @NYJavaSIG & @GardenStateJUG
• Blog: graciano.dev • Code github.com/rodrigolgraciano • X: @RodrigoGraciano Chandra Guntur Java Champion · JUG Leader @GardenStateJUG & @NYJavaSIG • Blog: cguntur.me • Code: github.com/c-guntur • X: @CGuntur

A Typical Developer's Day What does a developer actually spend
their time on? Writing Code About 8h a day, right ?! CI/CD Builds, deployments, pipeline maintenance Meetings 2-4h a day in standups, reviews, syncs? User Requests Bug reports, support escalations, ad hoc asks Juggling Deadlines Vulnerabilities, upgrades, audits, tech debt

~20% ~30% The Developer’s Time Where Time Actually Goes Coding
Or less — actual feature development Meetings & Requests Communication and coordination overhead What Drains the Rest • Handling security vulnerabilities and dependency upgrades • Compliance audits and policy enforcement • Quality metrics: coverage gaps, tech debt, security findings • Manual refactoring and modernization work

The Good News The tooling ecosystem is evolving fast —
and AI is now a first-class participant. The “em-dash” was intentionally added. Tools Evolving to Help Both open-source and commercial solutions targeting developer productivity AI Agents Can Help AI coding agents can automate compliance, testing, and modernization tasks One Unified Goal Productivity + Compliance + Quality, without the manual grind CHAPTER 1

CHAPTER 1 Taming AI Agents with AGENTS.md Agents produce code
Add thin adapters Write AGENTS.md

What Belongs in AGENTS.md Keep it short, concrete, and testable.
Agents comply best when requirements are measurable. Setup Commands Exact build, test, and lint commands the agent can run Definition of Done Tests + lint + format + security checks must all pass Quality Gates "Must satisfy Sonar rules / 0 new Criticals / Qodana pass" PR Expectations Tests updated, changelog noted, suppressions justified Practical Strategy Write a solid root AGENTS.md, then copy the top 10– 20% most critical rules into each tool's native file: .github/copilot-instructions.md, CLAUDE.md, and .cursorrules. Every agent gets the same non- negotiables. In monorepos, place a stricter AGENTS.md inside a subdirectory (e.g., /backend) to override the root for that subtree — the nearest file wins.

Scenario: The Agent’s PR The agent fixed a bug… and
accidentally broke standards. What Devs see “Looks fine to me.” Small change. Clean diff. Merged fast… What CI sees Style violations New static analysis findings Coverage regression

Why Agents Go Off-Rails Not malicious. Just missing your constraints.
No shared rules Every tool behaves differently No 'Definition of Done' Not measurable Vague instructions Hard to verify Late feedback Find issues in CI Rework + context switch

A Minimal AGENTS.md Template Short • concrete • testable. Start
here, then iterate. # AGENTS.md (root) ## Setup - Use Java 21 (or org standard) - Build: ./mvnw -q -DskipTests=false test - Lint/format: ./mvnw -q spotless:check ## Definition of Done (non-negotiable) - All tests pass (unit + integration where applicable) - No new Blocker/Critical issues in CI - Coverage does not decrease for touched code - No new high-severity vulnerabilities ## PR expectations - Update/add tests for behavior changes - Explain any suppression (false positive + narrow scope) Tip: keep this under ~60 lines. Agents follow short docs best.

Make It Measurable Agents comply best when they can verify
success. Vague “Improve code quality.” “Fix security issues.” “Add more tests.” Testable “0 new Critical Findings in CI.” “No coverage decrease.” “Add 3 tests: happy / failure / edge.” If you can’t test it, the agent can’t reliably do it.

One Source of Truth + Thin Adapters Root AGENTS.md →
copy the top 10–20% into each tool’s native file. AGENTS.md (root) Adapters (tool-specific) .github/copilot-instructions.md CLAUDE.md .cursorrules …others as needed Intent: Every agent sees the same non-negotiables.

An Agent Workflow That Doesn’t Scare You Treat the agent
like a junior dev: small steps + verification. 1. Agent proposes plan (no code yet) 2. Agent makes a small change set 3. Agent runs build + tests + linters 4. Agent fixes findings (no config edits) 5. Human reviews + merges Guardrails reduce rework and make agents safe in enterprise.

Chapter 1: Takeaway Security and quality scale when the fix
path is fast. AI agents become reliable development partners when teams provide: • clear rules • measurable quality gates • a shared Definition of Done

Static Analysis & Security Checkstyle Style and convention enforcement, highly
configurable Dependabot Automated dependency upgrades via GitHub PRs Common findings: SQL injection, path traversal, hardcoded credentials, IaC misconfigs, sensitive data exposure via entities, and OSS CVEs. Snyk OSS vulnerability scanning with actionable fix guidance Qodana JetBrains — deep Java/Kotlin analysis, CI-ready quality gates CHAPTER 2

Scenario: “We Didn’t Touch That Dependency” Yet the build is
blocked by a vulnerability. What dev says “This is a tiny feature.” “Why is security blocking us?” “Can we ignore it just this once?” What the org needs Known risk reduced Audit trail of remediation Consistent enforcement Tools help when they provide a clear, fast fix path.

Security Signals: What’s What? Different tools, different problems. Don’t treat
them as one bucket. SCA Software Component Analysis Dependencies / CVEs SBOM-friendly Fast wins SAST Static Application Security Testing Code patterns Injection, traversal False positives exist Secrets Keys/tokens in code Pre-commit/PR Rotate + revoke IaC / Config Terraform/K8s policies Misconfig detection Policy-as-code Runtime (bonus) Firewalls Running App Self Protection (RASP,) monitoring Not a substitute for SAST/SCA

Shift-Left Placement Put the cheapest checks closest to the developer.
IDE Fast hints quick feedback lint/basic scans PR Actionable review security quality comments with fix guidance CI Enforced gates enforce gates no negotiation Release Audit trail release report evidence for auditors

Dependency Upgrades Without Pain Automate the tedious parts. Humans review
the risky parts. Automation Scheduled upgrade PRs Grouped updates (by ecosystem) Auto-merge for low-risk patches Changelogs + CVE context Human judgment Major upgrades Behavior changes Breaking transitive updates Rollout strategy Rule of thumb: automate PR creation, not risk decisions.

Triage Without Chaos A small, consistent policy beats ad-hoc debates.
Good defaults Block new high-severity issues Fix what you touch Timebox backlog reduction Prefer upgrades over suppressions Suppressions allowed only if… Confirmed false positive Narrowest possible scope Comment explains why Ticket/link for follow-up

You are fixing a security finding in this repo. Constraints
(non-negotiable): - Do NOT change qodana.yaml / sonar config / CI workflows. - Fix root cause; do NOT suppress unless confirmed false positive. - Run: ./mvnw test (and include output summary). - Add/adjust tests for the behavior change. Task: - Remediate the finding in FooService (path shown in report). - Explain the fix in 3 bullet points. Prompt pattern Using an Agent to Remediate Safely Give the agent guardrails: what to fix, what NOT to touch.

Example Gate: Qodana (in AGENTS.md) Short, explicit, and merge-blocking. ##
Must-pass quality gates 1. Qodana: PASS with 0 new issues. 2. Tests: green before PR. ## Non-negotiable rules - Treat findings as merge blockers. - Respect qodana.yaml — do not modify. - Fix root cause; do NOT suppress. - Only add @SuppressWarnings if: - confirmed false positive AND - narrowest possible scope AND - comment explains why. ## Run locally qodana scan --fail-threshold 0 # or: ./gradlew qodanaScan

Example Gate: SonarQube (in AGENTS.md) Make the definition of done
unambiguous. ## Definition of Done - Pass SonarQube Quality Gate in CI. - Zero new Blocker/Critical/Major issues. - Fix Sonar issues in any code you touch. - No //NOSONAR without justification. ## Sonar-friendly coding rules - Small methods; low cognitive complexity. - Always use try-with-resources. - Never swallow exceptions. - No eager log string construction. - Keep tests updated with behavior. ## PR notes "SonarQube: PASS (no new issues)" “policy as code” beats tribal knowledge.

Chapter 2: Takeaway Security and quality scale when the fix
path is fast. • Separate signals (SCA vs SAST vs secrets) • Shift-left: IDE + PR feedback, CI enforcement • Automate dependency PRs; humans review risk • Use agents for remediation — with guardrails

Unit Testing, Coverage & Quality JUnit 5 The de-facto standard.
Scaffolded by Copilot with proper Arrange/Act/Assert structure. Diffblue Cover AI-generated Java tests. Speeds up coverage but often produces skeleton tests — always review. PiTest Mutation testing validates test quality, not just coverage. Slower runs, best in CI. High Coverage ≠ High Quality If your tests never fail, do they really test? Aim for meaningful assertions, not line counts. Diffblue tests are built to pass — verify they actually catch regressions. AGENTS.md testing rule: Every public method needs at minimum 3 tests — one happy path, one failure path (assertThrows), and one boundary/edge case. Name tests as method_whenCondition_thenResult. CHAPTER 3

Scenario: Coverage Gate Fails The PR is correct… but you
don’t have proof. What happens Coverage dips by a tiny amount CI blocks merge Team scrambles for tests Root cause Tests weren’t part of the flow Low confidence in behavior Quality gate becomes a fight

High Coverage ≠ High Quality If your tests never fail,
do they really test? Coverage tells you… What lines ran What didn’t crash A rough signal Quality tells you… Behavior is asserted Regressions get caught Edge cases are covered Goal: reduce uncertainty, not chase numbers.

A Practical Test Strategy (Services) Mix fast unit tests with
a few high-value integration tests. Unit (many) Pure functions • Business rules • Fast feedback Integration (some) DB + HTTP clients • Testcontainers where useful • Real wiring Contract/E2E (few) Critical paths only • Run in CI/nightly • High signal Mutation (many) Unit test quality • Run locally/CI • Fast feedback

AI-Assisted Testing (Human-in-the-Loop) Let the agent scaffold; you provide intent
and assertions. 1. Agent generates a test skeleton (Arrange/Act/Assert) 2. You add meaningful assertions and edge cases 3. Agent refines and runs tests until green 4. You review for readability + maintainability

Mutation Testing: Does Your Test Suite Fight Back? A mutation
tool changes your code by introducing mutants. Good tests should fail when code altered. What it catches Missing assertions Over-mocked tests Logic that isn’t validated How to use it Run on key modules In CI/nightly (it’s slower) Track killed vs survived mutants If mutants survive, your tests might be lying.

Flaky Tests Kill Trust Once Devs stop trusting CI, quality
gates stop working. Common causes Time / randomness Shared state Order dependencies External services Good practices Deterministic tests Isolated fixtures Use containers for dependencies Quarantine & fix quickly

Chapter 3: Takeaway Make tests part of the flow —
not a tax at the end. • Coverage gates are fine, but pair them with easy test scaffolding • Prefer high-signal assertions over line-count heroics • Use mutation testing on critical modules • Fix flakiness fast or Devs stop trusting CI

A Monday Adoption Plan (30 / 60 / 90) Start
small. Prove value. Then scale. 30 days Pick 1 service/module Add AGENTS.md (minimal) Turn on 1–2 gates Automate dependency PRs 60 days Expand to team repos Add test scaffolding flow Triage policy + suppressions Track rework reduction 90 days Bake into CI templates Run mutation tests nightly Modernization recipes Make compliance boring

Toolchain Map (Where Each Fits) The point isn’t tools. It’s
a workflow. IDE / local Fast linting • quick scans • agent-assisted edits PR checks Actionable comments • dependency PRs • lightweight gates CI gates Hard enforcement • evidence • consistent policy AGENTS.md (contract) Definition of Done Commands to run Quality + security rules Test expectations Suppression policy

Final Thought We’re not optimizing for tools. We’re optimizing for
flow. Make compliance the default path; not an interrupt. Thank you — Q&A next.

Q&A What’s your worst ‘pipeline turned red’ story? ?

Advancing with Java

Advancing with Java

More Decks by Rodrigo Graciano

Other Decks in Technology

Featured

Transcript