Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building LLM Apps in Java with LangChain4j

Avatar for A N M Bazlur Rahman A N M Bazlur Rahman
April 08, 2026
5

Building LLM Apps in Java with LangChain4j

AI is revolutionizing the software landscape. However, for many Java developers, integrating these powerful AI tools into existing enterprise applications or a new one can feel daunting. This hands-on session will demystify the process and show you how to build LLM-powered features directly into your Java codebase. Through a live coding demo, we'll walk you through constructing an AI-powered online store backend and provide practical insights into the architecture and code.

Avatar for A N M Bazlur Rahman

A N M Bazlur Rahman

April 08, 2026

More Decks by A N M Bazlur Rahman

Transcript

  1. Building LLM Apps in Java with LangChain4j Microsoft JDConf 2026

    What Happens After the Demo Works A N M Bazlur Rahman Java Champion | Sr. Staff Software Engineer Hammerspace
  2. Things That Break After The Demo 1 Hallucination & Bad

    Retrieval LLM guesses; wrong chunks make it worse 2 Tool Misuse Wrong parameters, unauthorized actions 3 No Observability Can’t trace costs, can’t measure, can’t debug 4 No Guardrails No input validation, no output validation, no fallback Target Architecture User Message ↓ Input Guardrails ↓ RAG Retrieval + Context ↓ LLM (Primary Model) ↓ fallback? LLM (Fallback Model) ↓ Output Guardrails ↓ AssistantResponse (typed) So the goal is to develop this system gradually, and each time we’ll identify where the issue lies and fix it. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 2
  3. ACT I The Data Accuracy Arc Ground it. Measure it.

    Tune it. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 3
  4. How We Make RAG Trustworthy The Pipeline Documents → chunk

    + index ▼ User question → retrieve relevant chunks ▼ Retrieved context → prompt ▼ LLM answer grounded in business data How We Verify It Evaluate retrieval separately from generation Inspect: query chunks scores sources Without involving the model, did the search find the right evidence? Use hybrid retrieval: Dense for meaning Lexical for exact matches Metadata filters for scope If retrieval is wrong, the LLM never had a chance. 4/6/2026 Microsoft JDConf 2026 | bazlur.com 4
  5. How We Evaluate Retrieval The Flow Golden dataset ▼ Retrieval

    only ▼ Score the results ▼ Fail on regression 1 2 3 4 Metrics precision@k are the top results relevant? recall@k did we find all expected supporting docs? supported-answer rate did retrieval give the assistant enough grounding? content-match does the retrieved text contain the needed facts? latency is retrieval fast enough for chat? We don’t guess if retrieval works. We measure it. 4/6/2026 Microsoft JDConf 2026 | bazlur.com 5
  6. Tool Calling Flow User Question ▼ LLM decides which tool

    to call ▼ LLM outputs JSON: { "tool": "getOrderStatus", "args": { "orderId": 100045 } } ▼ LangChain4j validates & calls your @Tool method ▼ Your Spring bean executes: orderService.getStatus(100045) ▼ Result returns to LLM ▼ LLM formulates natural language response The LLM is an API consumer. @Tool is your @GetMapping. @P is your @RequestParam. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 7
  7. @Tool — Your LLM API Contract @Tool("Get the current status

    of an order by orderId") public OrderStatusView getOrderStatus( @P("Order identifier, numeric") long orderId) { return orderService.getStatus(orderId); } @Tool("Check live inventory for a SKU in a region") public StockView getInventory( @P("Product SKU, e.g. HDPH-100") String sku, @P("Region code: CA, UK, US") String region) { return inventoryService.getStock(sku, region); } These are Spring beans. • Inject services Same DI you already use • Transactions Can integrate with @Transactional • Security Can be integrated with Spring Security Check @PreAuthorize • Validation Tool descriptions acts like an API docs "The LLM is just choosing which API to call. You still own correctness and safety." 4/5/2026 Microsoft JDConf 2026 | bazlur.com 8
  8. ACT III The Observability Reveal See it. Measure it. Operate

    it. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 9
  9. Distributed Tracing with Grafana Tempo Key Insights End-to-end request traces

    visible across all services Span timing reveals retrieval vs. inference latency Errors surface immediately with full context Trace every LLM call from user request to final response Grafana Tempo — Distributed trace waterfall view 4/6/2026 Microsoft JDConf 2026 | bazlur.com 10
  10. Correlated Logs with Grafana Loki Key Insights Logs auto-correlated with

    trace IDs for instant drill-down Structured JSON logs capture LLM prompts and responses Log volume spikes highlight failure patterns in real-time Click any trace span to jump to its correlated log entries Grafana Loki — Trace-to-log correlation view 4/6/2026 Microsoft JDConf 2026 | bazlur.com 11
  11. ACT IV The Reliability Payoff Defend it. Test it. Ship

    it. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 13
  12. Guardrails User Message ▼ InputGuardrail: PromptInjectionGuardrail → blocks injection ▼

    InputGuardrail: WriteIntentGuardrail → gates write actions ▼ Retrieval → Context Injection ▼ LLM (primary model) ▼ LLM (fallback model) ← on failure ▼ Safe refusal ← both fail ▼ OutputGuardrail: JsonOutputGuardrail → validates structure ▼ AssistantResponse (typed record) Patterns ✗ Injection blocked 422 — Bean Validation for prompts Confirmation gate Safe failure is a feature ✓ Typed response DTOs with output validation ⇄ Fallback routing Resilience4j thinking for LLMs 4/5/2026 Microsoft JDConf 2026 | bazlur.com 14
  13. Structured Outputs = DTOs record AssistantResponse( String requestId, String answer,

    List<String> citations, boolean requiresConfirmation, String errorCode, String route) {} // Output guardrail validates: // - valid JSON structure // - no false claims // - reprompts up to 2x if malformed Example response: { "requestId": "abc-123", "answer": "UK express...", "citations": ["uk/shipping.md"], "requiresConfirmation": false, "errorCode": null, "route": "primary" } "Every guardrail is a unit-testable component. They’re deterministic Java code." 4/5/2026 Microsoft JDConf 2026 | bazlur.com 15
  14. The Pattern Mapping LLM Pattern Java Pattern You Already Know

    Structured outputs DTOs / records Input guardrails Bean Validation Output guardrails Response validation Observability Micrometer + OpenTelemetry Fallback routing Resilience4j / circuit breakers Tool calling @GetMapping + service layer Eval runner Integration tests in CI Confirmation gating @PreAuthorize 4/5/2026 Microsoft JDConf 2026 | bazlur.com 16
  15. The LLM is just another external dependency. An unreliable one.

    And Java developers are the best in the world at building reliable systems on top of unreliable dependencies. Everything I showed you is in this repo. Six branches, one per stage. Clone it Monday morning. 4/5/2026 Microsoft JDConf 2026 | bazlur.com 17
  16. Beyond This Talk Further Improvements We Did Not Cover Query

    Transformation Rewrite user queries for better retrieval precision and recall Rerankers Re-score retrieved chunks with cross- encoders before feeding to the LLM Persistent Memory Maintain conversation context across sessions with durable storage PII Masking & Security Redact sensitive data in prompts and enforce access control policies Semantic Caching Cache similar queries to cut LLM costs and reduce latency by 10× Production Rollout Blue/green deploys, A/B testing, and canary releases for LLM apps Each of these deserves its own deep-dive — stay tuned. 4/6/2026 Microsoft JDConf 2026 | bazlur.com 18
  17. Resources & Contact Get the Code https://github.com/rokon12/jdconf2026 Get My Book

    Modern Concurrency in Java Virtual Threads, Structured Concurrency, and Beyond O'Reilly Media 4/5/2026 Microsoft JDConf 2026 | bazlur.com 21 A N M Bazlur Rahman . @bazlur_rahman · bazlur.com · bazlur.substack.com