Building LLM Apps in Java with LangChain4j

Building LLM Apps in Java with LangChain4j Microsoft JDConf 2026
What Happens After the Demo Works A N M Bazlur Rahman Java Champion | Sr. Staff Software Engineer Hammerspace

Things That Break After The Demo 1 Hallucination & Bad
Retrieval LLM guesses; wrong chunks make it worse 2 Tool Misuse Wrong parameters, unauthorized actions 3 No Observability Can’t trace costs, can’t measure, can’t debug 4 No Guardrails No input validation, no output validation, no fallback Target Architecture User Message ↓ Input Guardrails ↓ RAG Retrieval + Context ↓ LLM (Primary Model) ↓ fallback? LLM (Fallback Model) ↓ Output Guardrails ↓ AssistantResponse (typed) So the goal is to develop this system gradually, and each time we’ll identify where the issue lies and fix it. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 2

ACT I The Data Accuracy Arc Ground it. Measure it.
Tune it. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 3

How We Make RAG Trustworthy The Pipeline Documents → chunk
+ index ▼ User question → retrieve relevant chunks ▼ Retrieved context → prompt ▼ LLM answer grounded in business data How We Verify It Evaluate retrieval separately from generation Inspect: query chunks scores sources Without involving the model, did the search find the right evidence? Use hybrid retrieval: Dense for meaning Lexical for exact matches Metadata filters for scope If retrieval is wrong, the LLM never had a chance. 4/6/2026 Microsoft JDConf 2026 | bazlur.com 4

How We Evaluate Retrieval The Flow Golden dataset ▼ Retrieval
only ▼ Score the results ▼ Fail on regression 1 2 3 4 Metrics precision@k are the top results relevant? recall@k did we find all expected supporting docs? supported-answer rate did retrieval give the assistant enough grounding? content-match does the retrieved text contain the needed facts? latency is retrieval fast enough for chat? We don’t guess if retrieval works. We measure it. 4/6/2026 Microsoft JDConf 2026 | bazlur.com 5

ACT II The Tool-Calling Centerpiece Give it hands. 4/5/2026 Microsoft
JDConf 2026 | bazlur.com 6

Tool Calling Flow User Question ▼ LLM decides which tool
to call ▼ LLM outputs JSON: { "tool": "getOrderStatus", "args": { "orderId": 100045 } } ▼ LangChain4j validates & calls your @Tool method ▼ Your Spring bean executes: orderService.getStatus(100045) ▼ Result returns to LLM ▼ LLM formulates natural language response The LLM is an API consumer. @Tool is your @GetMapping. @P is your @RequestParam. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 7

@Tool — Your LLM API Contract @Tool("Get the current status
of an order by orderId") public OrderStatusView getOrderStatus( @P("Order identifier, numeric") long orderId) { return orderService.getStatus(orderId); } @Tool("Check live inventory for a SKU in a region") public StockView getInventory( @P("Product SKU, e.g. HDPH-100") String sku, @P("Region code: CA, UK, US") String region) { return inventoryService.getStock(sku, region); } These are Spring beans. • Inject services Same DI you already use • Transactions Can integrate with @Transactional • Security Can be integrated with Spring Security Check @PreAuthorize • Validation Tool descriptions acts like an API docs "The LLM is just choosing which API to call. You still own correctness and safety." 4/5/2026 Microsoft JDConf 2026 | bazlur.com 8

ACT III The Observability Reveal See it. Measure it. Operate
it. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 9

Distributed Tracing with Grafana Tempo Key Insights End-to-end request traces
visible across all services Span timing reveals retrieval vs. inference latency Errors surface immediately with full context Trace every LLM call from user request to final response Grafana Tempo — Distributed trace waterfall view 4/6/2026 Microsoft JDConf 2026 | bazlur.com 10

Correlated Logs with Grafana Loki Key Insights Logs auto-correlated with
trace IDs for instant drill-down Structured JSON logs capture LLM prompts and responses Log volume spikes highlight failure patterns in real-time Click any trace span to jump to its correlated log entries Grafana Loki — Trace-to-log correlation view 4/6/2026 Microsoft JDConf 2026 | bazlur.com 11

ACT IV The Reliability Payoff Defend it. Test it. Ship
it. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 13

Guardrails User Message ▼ InputGuardrail: PromptInjectionGuardrail → blocks injection ▼
InputGuardrail: WriteIntentGuardrail → gates write actions ▼ Retrieval → Context Injection ▼ LLM (primary model) ▼ LLM (fallback model) ← on failure ▼ Safe refusal ← both fail ▼ OutputGuardrail: JsonOutputGuardrail → validates structure ▼ AssistantResponse (typed record) Patterns ✗ Injection blocked 422 — Bean Validation for prompts Confirmation gate Safe failure is a feature ✓ Typed response DTOs with output validation ⇄ Fallback routing Resilience4j thinking for LLMs 4/5/2026 Microsoft JDConf 2026 | bazlur.com 14

Structured Outputs = DTOs record AssistantResponse( String requestId, String answer,
List<String> citations, boolean requiresConfirmation, String errorCode, String route) {} // Output guardrail validates: // - valid JSON structure // - no false claims // - reprompts up to 2x if malformed Example response: { "requestId": "abc-123", "answer": "UK express...", "citations": ["uk/shipping.md"], "requiresConfirmation": false, "errorCode": null, "route": "primary" } "Every guardrail is a unit-testable component. They’re deterministic Java code." 4/5/2026 Microsoft JDConf 2026 | bazlur.com 15

The Pattern Mapping LLM Pattern Java Pattern You Already Know
Structured outputs DTOs / records Input guardrails Bean Validation Output guardrails Response validation Observability Micrometer + OpenTelemetry Fallback routing Resilience4j / circuit breakers Tool calling @GetMapping + service layer Eval runner Integration tests in CI Confirmation gating @PreAuthorize 4/5/2026 Microsoft JDConf 2026 | bazlur.com 16

The LLM is just another external dependency. An unreliable one.
And Java developers are the best in the world at building reliable systems on top of unreliable dependencies. Everything I showed you is in this repo. Six branches, one per stage. Clone it Monday morning. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 17

Beyond This Talk Further Improvements We Did Not Cover Query
Transformation Rewrite user queries for better retrieval precision and recall Rerankers Re-score retrieved chunks with cross- encoders before feeding to the LLM Persistent Memory Maintain conversation context across sessions with durable storage PII Masking & Security Redact sensitive data in prompts and enforce access control policies Semantic Caching Cache similar queries to cut LLM costs and reduce latency by 10× Production Rollout Blue/green deploys, A/B testing, and canary releases for LLM apps Each of these deserves its own deep-dive — stay tuned. 4/6/2026 Microsoft JDConf 2026 | bazlur.com 18

Resources & Contact Get the Code https://github.com/rokon12/jdconf2026 Get My Book
Modern Concurrency in Java Virtual Threads, Structured Concurrency, and Beyond O'Reilly Media 4/5/2026 Microsoft JDConf 2026 | bazlur.com 21 A N M Bazlur Rahman . @bazlur_rahman · bazlur.com · bazlur.substack.com

Building LLM Apps in Java withLangChain4j

Building LLM Apps in Java with LangChain4j

A N M Bazlur Rahman

More Decks by A N M Bazlur Rahman

Featured

Transcript

Building LLM Apps in Java with LangChain4j Microsoft JDConf 2026

Things That Break After The Demo 1 Hallucination & Bad

ACT I The Data Accuracy Arc Ground it. Measure it.

How We Make RAG Trustworthy The Pipeline Documents → chunk

How We Evaluate Retrieval The Flow Golden dataset ▼ Retrieval

ACT II The Tool-Calling Centerpiece Give it hands. 4/5/2026 Microsoft

Tool Calling Flow User Question ▼ LLM decides which tool

@Tool — Your LLM API Contract @Tool("Get the current status

ACT III The Observability Reveal See it. Measure it. Operate

Distributed Tracing with Grafana Tempo Key Insights End-to-end request traces

Correlated Logs with Grafana Loki Key Insights Logs auto-correlated with

ACT IV The Reliability Payoff Defend it. Test it. Ship

Guardrails User Message ▼ InputGuardrail: PromptInjectionGuardrail → blocks injection ▼

Structured Outputs = DTOs record AssistantResponse( String requestId, String answer,

The Pattern Mapping LLM Pattern Java Pattern You Already Know

The LLM is just another external dependency. An unreliable one.

Beyond This Talk Further Improvements We Did Not Cover Query

Resources & Contact Get the Code https://github.com/rokon12/jdconf2026 Get My Book