1/16/25 - Central Iowa Java Users Group - Java Meets AI: Build LLM-Powered Apps with LangChain4j

@edeandrea Eric Deandrea, Red Hat Java Champion | Senior Principal
Developer Advocate LangChain4j Deep Dive

@edeandrea 2 • Java Champion • 25+ years software development
experience • ~11 years DevOps Architect • Contributor to Open Source projects Quarkus Spring Boot, Spring Framework, Spring Security LangChain4j (& Quarkus LangChain4j) Wiremock Microcks • Boston Java Users ACM Chapter Board Member • Published Author About Me

@edeandrea 3 https://jchampionsconf.com/speakers.html#deandreaCard https://www.jfokus.se/talks/2070 https://devnexus.com/presentations/test-driven-development-it-s-easier-than-you-think https://www.meetup.com/central-iowa-java-users-group/events/305348139 https://www.meetup.com/ottawa-java-user-group/events/305468016 https://www.meetup.com/montreal-jug/events/304716668

@edeandrea • Showcase & explain Quarkus, how it enables modern
Java development & the Kubernetes-native experience • Introduce familiar Spring concepts, constructs, & conventions and how they map to Quarkus • Equivalent code examples between Quarkus and Spring as well as emphasis on testing patterns & practices 4 https://red.ht/quarkus-spring-devs

@edeandrea

@edeandrea What are we going to see? How to build
AI-Infused applications in Java - Some examples - Main concepts - Chat Models - Ai Services - Memory management - RAG - Function calling - Guardrails - Image models - The almost-all-in-one demo - Plain LangChain4j & Quarkus - Remote model (Open AI) & Local models (Ollama, Podman AI Studio) Example Code Slides https://github.com/cescoﬃer/langchain4j-deep-dive https://speakerdeck.com/edeandrea/25-central-iowa-java-users-group-java-meets-ai-build-llm-powered-apps-with-langchain4j

@edeandrea Some examples of AI-Infused applications

@edeandrea Some examples Summarizer Chatbot Text Extraction from Image https://github.com/cescoﬃer/langchain4j-deep-dive/tree/main/0-examples

@edeandrea AI-Infused applications

@edeandrea What are Large Language Models (LLMs)? Neural Networks •
Transformer based • Recognize, Predict, and Generate text • Trained on a VERY large corpuses of text • Deduce the statistical relationships between tokens • Can be ﬁne-tuned A LLM predicts the next token based on its training data and statistical deduction

@edeandrea The L of LLM means Large LLama 3.3: -
70B parameters - Trained on > 15T of tokens - 128K token window - 43 Gb on disk Granite: - 34B parameters - Trained on 3500B of tokens - 3.8 Gb of RAM, 4.8Gb on disk More on: An idea of the size

@edeandrea More parameters means more capabilities https://research.google/blog/pathways-language-model-palm-scaling-to-540-billion-parameters-for-breakthrough-performance/

@edeandrea Model and Model Serving Model Model Serving - Run
the model - CPU / GPU - Expose an API - REST - gRPC - May support multiple models

@edeandrea Prompt and Prompt Engineering Model Input (Prompt) Output Input:
- Prompt (text) - Instructions to give to the model - Taming a model is hard Output: - Depends on the modality of the model

@edeandrea Application Model AI-infused application |ˌeɪˌaɪ ˈɪnˌfjuːzd ˌæplɪˈkeɪʃən| noun (Plural
AI-Infused applications) A software program enhanced with artiﬁcial intelligence capabilities, utilizing AI models to implement intelligent features and functionalities.

@edeandrea Using models to build apps on top Dev Ops
Release Deploy Operate Monitor Plan Code Build Test Train Evaluate Deploy Collect Evaluate Curate Analyze Data ML APIs

@edeandrea Using models to build apps on top Dev Ops
Release Deploy Operate Monitor Plan Code Build Test Train Evaluate Deploy Collect Evaluate Curate Analyze Data ML Need some clients and toolkits

@edeandrea LangChain (Python, JS) LangChain Chains Agents Prompts Vector Stores
Models Document Loaders

@edeandrea LangChain4j LangChain4j Chains Agents Prompts Vector Stores Models Document
Loaders

@edeandrea LangChain4j https://github.com/langchain4j/langchain4j • Toolkit to build AI-Infused Java applications
◦ Provides integration with many LLM/SML providers ◦ Provides building blocks for the most common patterns (RAG, Function calling…) ◦ Abstractions to manipulate prompts, messages, memory, tokens… ◦ Integrate a large variety of vector stores and document loaders

@edeandrea LangChain / LangChain4j / Quarkus LangChain4j LangChain LangChain4j Quarkus
LangChain4j Inspired By Uses and extends

@edeandrea LangChain4j https://github.com/langchain4j/langchain4j AI Service Loaders Splitters Vector Store Embedding
Models Language Models Image Models Prompt Function calling Memory Output Parsers Building blocks RAG

@edeandrea Quarkus LangChain4j https://docs.quarkiverse.io/quarkus-langchain4j LangChain4j Quarkus LangChain4j Application LLMs Vector
stores Embedding Models - Declarative clients - CDI integration - Observability (Otel, Prometheus) - Auditing - Resilience - RAG building blocks - Tool support - Mockable

@edeandrea Bootstrapping LangChain4j <dependency> <groupId>dev.langchain4j</ groupId> <artifactId>langchain4j</ artifactId> </dependency> <dependency>
<groupId>dev.langchain4j</ groupId> <artifactId>langchain4j-open-ai</ artifactId> </dependency> <dependency> <groupId>io.quarkiverse.langchain4j</ groupId> <artifactId>quarkus-langchain4j-openai</ artifactId> </dependency> Quarkus LangChain4j

@edeandrea The basics - Chat Models

@edeandrea Chat Models • Text to Text ◦ Text in
-> Text out ◦ NLP • Prompt ◦ Set of instructions explaining what the model must generate ◦ Use plain English (or other language) ◦ There are advanced prompting technique ▪ Prompt depends on the model ▪ Prompt engineering is an art ChatLanguageModel modelA = OpenAiChatModel.builder() .apiKey(System.getenv("...")).build(); String answerA = modelA.generate("Say Hello World"); @Inject ChatLanguageModel model; String answer = model.generate("Say Hello"); LangChain4j Quarkus LangChain4j - Chat Model Quarkus LangChain4j - AI Service @RegisterAiService interface PromptA { String ask(String prompt); } @Inject PromptA prompt; String answer = prompt.ask("Say Hello");

@edeandrea Messages Application Role=User (prompt) Role=Assistant (response) LLM

@edeandrea Messages Application Role=User Role=Assistant (response) Role=System LLM Deﬁne the
Context and scope (higher priority)

@edeandrea var system = new SystemMessage( "You are Georgios, all
your answers should be using the Java language using greek letters "); var user = new UserMessage("Say Hello World" ); var response = model.generate(system, user); // Pass a list of messages System.out.println( "Answer: " + response.content().text()); Messages Context or Memory

@edeandrea Memory, well, the absence of memory Application LLM (stateless)

@edeandrea Manual Memory List<ChatMessage> memory = new ArrayList<>(); memory.addAll(List.of( new
SystemMessage( "You are a useful AI assistant." ), new UserMessage("Hello, my name is Clement." ), new UserMessage("What is my name?" ) )); var response = model.generate( memory); System.out.println( "Answer 1: " + response.content().text()); memory.add(response.content()); memory.add(new UserMessage("What's my name again?" )); response = model.generate( memory); System.out.println( "Answer 2: " + response.content().text()); var m = new UserMessage("What's my name again?" ); response = model.generate(m); // No memory System.out.println( "Answer 3: " + response.content().text());

@edeandrea Messages and Memory Application LLM (stateless) Size limit

@edeandrea Messages and Memory Model Context Output Message Models are
stateless - Pass a set of messages named context - These messages are stored in a memory - Context size is limited (eviction strategy) Context = (Stored input messages + Output messages) + New input

@edeandrea Chat Memory var memory = MessageWindowChatMemory .builder() .id("user-id") .maxMessages(
3) // Only 3 messages will be stored .build(); memory.add(new SystemMessage( "You are a useful AI assistant." )); memory.add(new UserMessage("Hello, my name is Clement and I live in Valence, France" )); memory.add(new UserMessage("What is my name?" )); var response = model.generate(memory.messages()); System.out.println("Answer: " + response.content().text());

@edeandrea Context Limit & Pricing Number of tokens - Depends
on the model and model serving (provider) - Tokens are not words Context size is not in terms of messages, but in number of tokens This_talk_is_really_ boring._Hopefully,_it_will _be_over_soon. [2500, 838, 2082, 15224, 3067, 2146, 1535, 7443, 2697, 127345, 46431, 278, 3567, 492, 40729, 34788, 62, 84908, 13] https://platform.openai.com/tokenizer

@edeandrea Token Usage var memory = MessageWindowChatMemory .builder() .id("user-id") .maxMessages(
3) // Only 3 messages will be stored .build(); memory.add(new SystemMessage( "You are a useful AI assistant." )); memory.add(new UserMessage("Hello, my name is Clement and I live in Valence, France" )); memory.add(new UserMessage("What is my name?" )); var response = model.generate(memory.messages()); System.out.println("Answer 1: " + response.content().text()); System.out.println("Input token: " + response.tokenUsage().inputTokenCount()); System.out.println("Output token: " + response.tokenUsage().outputTokenCount()); System.out.println("Total token: " + response.tokenUsage().totalTokenCount());

@edeandrea AI Services

@edeandrea LangChain4j AI Services Map LLM interaction to Java interfaces
- Declarative model - You deﬁne the API the rest of the code uses - Mapping of the output - Parameterized prompt - Abstract/Integrate some of the concepts we have seen public void run() { Assistant assistant = AiServices.create(Assistant.class, model); System.out.println( assistant.answer("Say Hello World") ); } // Represent the interaction with the LLM interface Assistant { String answer(String question); }

@edeandrea LangChain4j AI Services - System Message - @SystemMessage annotation
- Or System message provider public void run() { var assistant = AiServices .create(Assistant.class, model); System.out.println( assistant.answer("Say Hello World") ); } interface Assistant { @SystemMessage("You are a Shakespeare, all your response must be in iambic pentameter.") String answer(String question); } var rapper = AiServices.builder(Friend.class) .chatLanguageModel( model) .systemMessageProvider( chatMemoryId -> "You’re a west coast rapper, all your response must be in rhymes." ) .build();

@edeandrea LangChain4j AI Services - User Message and Parameters public
void run() { Poet poet = AiServices.create(Poet.class, model); System.out.println(poet.answer("Devoxx")); } interface Poet { @SystemMessage ("You are Shakespeare, all your response must be in iambic pentameter." ) @UserMessage("Write a poem about {{topic}}. It should not be more than 5 lines long." ) String answer(@V("topic") String topic); }

@edeandrea LangChain4j AI Services - Structured Output AI Service methods
are not limited to returning String - Primitive types - Enum - JSON Mapping TriageService triageService = … System.out.println(triageService.triage( "It was a great experience!" )); System.out.println(triageService.triage( "It was a terrible experience!" )); // … enum Sentiment { POSITIVE, NEGATIVE,} record Feedback(Sentiment sentiment, String summary) {} interface TriageService { @SystemMessage("You are an AI that need to triage user feedback." ) @UserMessage(""" Analyze the given feedback, and determine i it is positive, or negative. Then, provide a summary of the feedback: {{fb}} """) Feedback triage(@V("feedback") String fb); }

@edeandrea LangChain4j AI Services - Chat Memory - You can
plug a ChatMemory to an AI service to automatically add and evict messages var memory = MessageWindowChatMemory .builder() .id( "user-id") .maxMessages( 3) .build(); var assistant = AiServices.builder(Assistant.class) .chatLanguageModel( model) .chatMemory( memory) .build();

@edeandrea Quarkus AI Services

@edeandrea What’s the diﬀerence between these? Application Database Application Service
CRUD application Microservice Application Model AI-Infused application

CRUD application Microservice Application Model AI-Infused application Integration Points

CRUD application Microservice Application Model AI-Infused application Integration Points Observability (metrics, tracing, auditing) Fault-Tolerance (timeout, circuit-breaker, non-blocking, fallbacks…)

@edeandrea Quarkus AI Services Application Component AI Service - Define
the API (Interface) - Configure the prompt for each method - Configure the tools, memory… Chat Model Tools Memory Retriever Audit Moderation Model (RAG) (Observability) (Agent) Inject and invoke (Manage the context using CDI scopes)

@edeandrea Quarkus AI Services Map LLM interaction to Java interfaces
- Based on LangChain4j AI Service - Made CDI aware - Injectable - Scope - Dev UI, Templating… - Metrics, Audit, Tracing… @Inject Assistant assistant; @ActivateRequestContext public int run() { println(assistant.answer("My name is Clement, can you say \"Hello World\" in Greek?")); println(assistant.answer( "What's my name?")); return 0; } @RegisterAiService interface Assistant { String answer(String question); } Injectable bean, Request scope by default

@edeandrea Quarkus AI Services - Scopes and memory Request scope
by default - Overridable - Keep messages for the duration of the scope - Request - the request only - Application - the lifetime of the application - Because it’s risky, you need a memory id - Session - the lifetime of the websocket session @RegisterAiService @RequestScoped interface ShortMemoryAssistant { String answer(String question); } @RegisterAiService @ApplicationScoped interface LongMemoryAssistant { String answer(@MemoryId int id, @UserMessage String question); } @RegisterAiService @SessionScoped interface ConversationalMemoryAssistant { String answer(String question); }

@edeandrea Quarkus AI Services - Custom Memory Memory Provider -
You can implement a custom memory provider - Can implement persistence - Conversation represented by MemoryId - For session - it’s the WS session ID. @ApplicationScoped public class MyMemoryStore implements ChatMemoryStore { public List<ChatMessage> getMessages( Object memoryId) { // … } public void updateMessages(Object memoryId, List<ChatMessage> messages) // … } public void deleteMessages( Object memoryId){ // … } }

@edeandrea Quarkus AI Services - Parameter and Structured Output Prompt
can be parameterized - Use Qute template engine - Can contain logic Structured output - Based on Jackson @UserMessage(""" What are the {number}th last teams in which {player} played? Only return the team names. """) List<String> ask(int number, String player); @UserMessage(""" What are the last team in which {question.player} played? Return the team and the last season. """) Entry ask(Question question); record Question(String player) {} record Entry(String team, String years) {} Single {}

@edeandrea Quarkus AI Services - Complex templating @SystemMessage(""" Given the
following conversation and a follow-up question, rephrase the follow-up question to be a standalone question. Context: {#for m in chatMessages} {#if m.type.name() == "USER"} User: {m.text()} {/if} {#if m.type.name() == "AI"} Assistant: {m.text()} {/if} {/for} """) String rephrase(List<ChatMessage> chatMessages, @UserMessage String question);

@edeandrea Quarkus AI Services Application Component AI Service Quarkus Extended
with Quarkus capabilities (REST client, Metrics, Tracing…)

@edeandrea Quarkus AI Services - Observability Collect metrics - Exposed
as Prometheus OpenTelemetry Tracing - Trace interactions with the LLM <dependency> <groupId>io.quarkus</groupId> <artifactId> quarkus-opentelemetry </artifactId> </dependency> <dependency> <groupId> io.quarkiverse.micrometer.registry </groupId> <artifactId> quarkus-micrometer-registry-otlp </artifactId> </dependency>

@edeandrea Quarkus AI Services - Tracing

@edeandrea Quarkus AI Services - Auditing Audit Service - Allow
keeping track of interactions with the LLM - Can be persisted - Implemented by the application code @Override public void initialMessages( Optional<SystemMessage> systemMessage, UserMessage userMessage ) { } @Override public void addLLMToApplicationMessage ( Response<AiMessage> response) {} @Override public void onFailure(Exception e) {} @Override public void onCompletion(Object result) {} Deprecated - to be re-written!!

@edeandrea Quarkus AI Services - Fault Tolerance Retry / Timeout
/ Fallback / Circuit Breaker / Rate Limiting… - Protect against error - Graceful recovery There are other resilience patterns (guardrails) @UserMessage("…") @Retry(maxRetries = 2) @Timeout(value = 1, unit = MINUTES) @Fallback(fallbackMethod = "fallback") Entry ask(Question question); default Entry fallback(Question question) { return new Entry("Unknown", "Unknown"); } <dependency> <groupId>io.quarkus</groupId> <artifactId> quarkus-smallrye-fault-tolerance </artifactId> </dependency>

@edeandrea RAG

@edeandrea Retrieval Augmented Generation (RAG) Enhance LLM knowledge by providing
relevant information in real-time from other sources – Dynamic data that changes frequently Fine-tuning is expensive! 2 stages Indexing / Ingestion Retrieval / Augmentation

@edeandrea Indexing / Ingestion

@edeandrea Indexing / Ingestion FileSystemDocumentLoader UrlDocumentLoader AmazonS3DocumentLoader AzureBlobStorageDocumentLoader GitHubDocumentLoader TencentCosDocumentLoader

@edeandrea Indexing / Ingestion TextDocumentParser ApachePdfBoxDocumentParser ApachePoiDocumentParser ApacheTikaDocumentParser

@edeandrea Indexing / Ingestion What do I need to think
about? What is the representation of the data? How do I want to split? Per document? Chapter? Sentence? How many tokens do I want to end up with?

@edeandrea Indexing / Ingestion DocumentByParagraphSplitter DocumentByLineSplitter DocumentBySentenceSplitter DocumentByWordSplitter DocumentByCharacterSplitter DocumentByRegexSplitter
DocumentSplitters.recursive()

@edeandrea Indexing / Ingestion Compute an embedding (numerical vector) representing
semantic meaning of each segment. Requires an embedding model In-process/Onnx, Amazon Bedrock, Azure OpenAI, Cohere, DashScope, Google Vertex AI, Hugging Face, Jine, Jlama, LocalAI, Mistral, Nomic, Ollama, OpenAI, OVHcloud, Voyage AI, Cloudfare Workers AI, Zhipu AI

@edeandrea Store embedding alone or together with segment. Requires a
vector store In-memory, Chroma, Elasticsearch, Milvus, Neo4j, OpenSearch, Pinecone, PGVector, Redis, Vespa, Weaviate, Qdrant Indexing / Ingestion

@edeandrea

@edeandrea Indexing / Ingestion var ingestor = EmbeddingStoreIngestor.builder() .embeddingModel(embeddingModel) .embeddingStore(embeddingStore)
// Add userId metadata entry to each Document to be able to filter by it later .documentTransformer(document -> { document.metadata().put("userId", "12345"); return document; }) // Split each Document into TextSegments of 1000 tokens each with a 200-token overlap .documentSplitter(DocumentSplitters.recursive(1000, 200)) // Add the name of the Document to each TextSegment to improve the quality of search .textSegmentTransformer(textSegment -> TextSegment.from( textSegment.metadata().getString("file_name") + "\n" + textSegment.text(), textSegment.metadata() ) ) .build(); // Get the path of where the documents are and load them recursively Path path = Path.of(...); List<Document> documents = FileSystemDocumentLoader.loadDocumentsRecursively(path); // Ingest the documents into the embedding store ingestor.ingest(documents);

@edeandrea

@edeandrea Retrieval / Augmentation

@edeandrea Retrieval / Augmentation Compute an embedding (numerical vector) representing
semantic meaning of the query. Requires an embedding model.

@edeandrea Retrieval / Augmentation Retrieve & rank relevant content based
on cosine similarity or other similarity/distance measures.

@edeandrea Retrieval / Augmentation Augment input to the LLM with
related content. What do I need to think about? Will I exceed the max number of tokens? How much chat memory is available?

@edeandrea Retrieval / Augmentation public class RagRetriever { @Produces @ApplicationScoped
public RetrievalAugmentor create(EmbeddingStore store, EmbeddingModel model) { var contentRetriever = EmbeddingStoreContentRetriever. builder() .embeddingModel(model) .embeddingStore(store) .maxResults( 3) .minScore( 0.75) .filter( metadataKey("userId").isEqualTo("12345")) .build(); return DefaultRetrievalAugmentor. builder() .contentRetriever(contentRetriever) .build(); } }

@edeandrea Advanced RAG

@edeandrea public class RagRetriever { @Produces @ApplicationScoped public RetrievalAugmentor create(EmbeddingStore
store, EmbeddingModel model) { var embeddingStoreRetriever = EmbeddingStoreContentRetriever.builder() .embeddingModel(model) .embeddingStore(store) .maxResults(3) .minScore(0.75) .filter(metadataKey("userId").isEqualTo("12345")) .build(); var googleSearchEngine = GoogleCustomWebSearchEngine.builder() .apiKey(System.getenv("GOOGLE_API_KEY")) .csi(System.getenv("GOOGLE_SEARCH_ENGINE_ID")) .build(); var webSearchRetriever = WebSearchContentRetriever.builder() .webSearchEngine(googleSearchEngine) .maxResults(3) .build(); return DefaultRetrievalAugmentor.builder() .queryRouter(new DefaultQueryRouter(embeddingStoreRetriever, webSearchRetriever)) .build(); } } Advanced RAG https://github.com/cescofﬁer/langchain4j-deep-dive/blob/main/4-rag/src/main/java/dev/langchain4j/quarkus/deepdive/RagRetriever.java

@edeandrea public class RagRetriever { @Produces @ApplicationScoped public RetrievalAugmentor create(EmbeddingStore
store, EmbeddingModel model, ChatLanguageModel chatModel) { var embeddingStoreRetriever = ... var webSearchRetriever = ... var queryRouter = LanguageModelQueryRouter.builder() .chatLanguageModel(chatModel) .fallbackStrategy(FallbackStrategy.ROUTE_TO_ALL) .retrieverToDescription( Map.of( embeddingStoreContentRetriever, “Local Documents”, webSearchContentRetriever, “Web Search” ) ) .build(); return DefaultRetrievalAugmentor.builder() .queryRouter(queryRouter) .build(); } } Advanced RAG https://github.com/cescofﬁer/langchain4j-deep-dive/blob/main/4-rag/src/main/java/dev/langchain4j/quarkus/deepdive/RagRetriever.java

@edeandrea

@edeandrea application.properties quarkus.langchain4j.easy-rag.path=path/to/files quarkus.langchain4j.easy-rag.max-segment-size=1000 quarkus.langchain4j.easy-rag.max-overlap-size=200 quarkus.langchain4j.easy-rag.max-results=3 quarkus.langchain4j.easy-rag.ingestion-strategy=on|off quarkus.langchain4j.easy-rag.reuse-embeddings=true|false pom.xml <dependency>
<groupId>io.quarkiverse.langchain4j</groupId> <artifactId>quarkus-langchain4j-easy-rag</artifactId> <version>${quarkus-langchain4j.version}</version> </dependency>  <dependency> <groupId>io.quarkiverse.langchain4j</groupId> <artifactId>quarkus-langchain4j-openai</artifactId> <version>${quarkus-langchain4j.version}</version> </dependency>   <dependency> <groupId>io.quarkiverse.langchain4j</groupId> <artifactId>quarkus-langchain4j-pgvector</artifactId> <version>${quarkus-langchain4j.version}</version> </dependency> Easy RAG!

@edeandrea Function Calling, Agents and Tools

@edeandrea Agent and Tools A tool is a function that
the model can call: - Tools are parts of CDI beans - Tools are deﬁned and described using @Tool Prompt (Context) Extend the context with tool descriptions Invoke the model The model asks for a tool invocation (name + parameters) The tool is invoked (on the caller) and the result sent to the model The model computes the response using the tool result Response

@edeandrea <~~ My prompt <~~ Tool invocation request <~~ Tool
invocation response <~~ Model Response

@edeandrea Tools - A tool is just a method -
It can access databases, or invoke a remote service - It can also use another LLM Tools require memory Application

@edeandrea Using tools with LangChain4j Assistant assistant = AiServices.builder(Assistant.class) .chatLanguageModel(
model) .tools(new Calculator()) .chatMemory( MessageWindowChatMemory .withMaxMessages(10)) .build(); static class Calculator { @Tool("Calculates the length of a string") int stringLength(String s) { return s.length(); } @Tool("Calculates the square root of a number" ) double sqrt(int x) { System.out.println("Called sqrt() with x=" + x); return Math.sqrt(x); } } Objects to use as tools Declare an tool method (description optional)

@edeandrea Using tools with Quarkus LangChain4j @RegisterAiService interface Assistant {
@ToolBox(Calculator.class) String chat(String userMessage ); } @ApplicationScoped static class Calculator { @Tool("Calculates the length of a string" ) int stringLength(String s) { return s.length(); } } Class of the bean declaring tools Declare an tool method (description optional) Must be a bean (singleton and dependant supported) Tools can be listed in the `tools` attribute

@edeandrea Giving access to database (Quarkus Panache) @ApplicationScoped public class
BookingRepository implements PanacheRepository<Booking> { @Tool("Cancel a booking" ) @Transactional public void cancelBooking(long bookingId, String customerFirstName , String customerLastName ) { var booking = getBookingDetails( bookingId, customerFirstName, customerLastName); delete(booking); } @Tool("List booking for a customer" ) public List<Booking> listBookingsForCustomer (String customerName , String customerSurname ) { var found = Customer.find("firstName = ?1 and lastName = ?2", customerName, customerSurname).singleResultOptional(); return list("customer", found.get()); } }

@edeandrea Function Calling - Tracing

@edeandrea Web Search Tools (Tavily) @UserMessage(""" Search for information about
the user query: {query}, and answer the question. """) @ToolBox(WebSearchTool.class) String chat(String query); Provided by quarkus-langchain4j-tavily Can also be used with RAG

@edeandrea Risks • Things can go wrong quickly • Risk
of prompt injection ◦ Access can be protected in Quarkus • Audit is very important to check the parameters • Distinction between read and write beans Application

@edeandrea Guardrails

@edeandrea https://www.upworthy.com/prankster-tricks-a-gm-dealership-chatbot-to-sell-him-a-76000-chevy-tahoe-for-1-rp3 https://www.cbsnews.com/news/aircanada-chatbot-discount-customer https://www.bbc.com/news/technology-35902104 https://www.spiceworks.com/tech/artiﬁcial-intelligence/news/meta-blender-bot-3-controversy

@edeandrea Guardrails - Functions used to validate the input and
output of the model - Detect invalid input - Detect prompt injection - Detect hallucination - Chain of guardrails - Sequential - Stop at ﬁrst failure Quarkus LangChain4j only (for now)

@edeandrea Retry and Reprompt Output guardrails can have 4 different
outcomes: - Success - the response is passed to the caller or next guardrail - Fatal - we stop and throw an exception - Retry - we call the model again with the same context (we never know ;-) - Reprompt - we call the model again with another message in the model indicating how to ﬁx the response

@edeandrea Implement an input guardrail @ApplicationScoped public class UppercaseInputGuardrail implements
InputGuardrail { @Override public InputGuardrailResult validate(UserMessage userMessage ) { var message = userMessage.singleText(); var isAllUppercase = message.chars().filter(Character::isLetter) .allMatch( Character::isUpperCase); return isAllUppercase ? success() : failure( "The input must be in uppercase." ); } } CDI beans Interface to implement Can also access the chat memory and the augmentation results OK Failure

@edeandrea Implement an output guardrail @ApplicationScoped public class UppercaseOutputGuardrail implements
OutputGuardrail { @Override public OutputGuardrailResult validate(OutputGuardrailParams params ) { System.out.println("response is: " + params.responseFromLLM().text() + " / " + params.responseFromLLM().text().toUpperCase()); var message = params.responseFromLLM().text(); var isAllUppercase = message.chars().filter(Character::isLetter).allMatch(Character::isUpperCase); return isAllUppercase ? success() : reprompt( "The output must be in uppercase." , "Please provide the output in uppercase." ); } } CDI beans Interface to implement Can also access the chat memory and the augmentation results OK Reprompt

@edeandrea Declaring guardrails @RegisterAiService public interface Assistant { @InputGuardrails(UppercaseInputGuardrail .class)
@OutputGuardrails(UppercaseOutputGuardrail .class) String chat(String userMessage ); } Both can receive multiple values

@edeandrea Images

@edeandrea Process or Generate images Image Model - Image Models
are specialized for … Images - Can generate images from text - Can process images from input (like the OCR demo) - Chat Model: GPT4-o | Image Model: Dall-e - Important: Not every model serving provider provides image support (as it needs specialized models)

@edeandrea Processing picture from AI Services @RegisterAiService @ApplicationScoped public interface
ImageDescriber { @UserMessage(""" Describe the given message. """) String describe(@ImageUrl Image image); } Indicate to the model to use the image Can be String, URL, URI, or Image

@edeandrea Using Image Model to generate pictures @Inject ImageModel model;
@Override public void run(String... args) throws IOException { var prompt = "Generate a picture of a rabbit software developers coming to Devoxx" ; var response = model.generate(prompt); System.out.println(response.content().url()); } Image Model (can also be created with a builder) Response<Image> quarkus.langchain4j.openai.timeout =1m quarkus.langchain4j.openai.image-model.size =1024x1024 quarkus.langchain4j.openai.image-model.quality =standard quarkus.langchain4j.openai.image-model.style =vivid quarkus.langchain4j.openai.image-model.persist =true Print the persisted image

@edeandrea Generating images from AI Services @RegisterAiService @ApplicationScoped public interface
ImageGenerator { Image generate(String userMessage ); } Indicate to use the image model to generate the picture var prompt = "Generate a picture of a rabbit going to Devoxx. The rabbit should be wearing a Quarkus tee-shirt."; var response = generator.generate(prompt); var file = Paths.get("rabbit-at-devoxx.jpg"); Files.copy(response.url().toURL().openStream(), file, StandardCopyOption.REPLACE_EXISTING);

@edeandrea The almost-all-in-one demo

@edeandrea The almost-all-in-one demo - React - Quarkus WebSockets.NEXT -
Quarkus Quinoa - Ollama - Guardrails - RAG - Ingest data from ﬁlesystem - Tools - Update database - Send email - Observability - OpenTelemetry

@edeandrea Conclusion

@edeandrea What did we see? How to Build AI-Infused applications
in Java https://docs.quarkiverse.io/ quarkus-langchain4j/dev https://docs.langchain4j.dev Code Slides Langchain4J Quarkus Chat Models RAG PROMPT MESSAGES AI SERVICE MEMORY CONTEXT TOOLS FUNCTION CALLING GUARDRAILS IMAGE MODELS OBSERVABILITY audit TRACING agent https://github.com/cescoﬃer/langchain4j-deep-dive https://speakerdeck.com/edeandrea/25-central-iowa-java-users-group-java-meets-ai-build-llm-powered-apps-with-langchain4j

@edeandrea

@edeandrea @edeandrea Thank you!

1/16/25 - Central Iowa Java Users Group - Java ...

1/16/25 - Central Iowa Java Users Group - Java Meets AI: Build LLM-Powered Apps with LangChain4j

Video

More Decks by Eric Deandrea

Other Decks in Technology

Featured

Transcript