Devoxx BE 24 - Deep Dive LangChain4J and Quarkus LangChain4J

LangChain4j Deep Dive Georgios Andrianakis, Principal Software Engineer, Red Hat
Eric Deandrea, Java Champion & Dev Advocate, Red Hat Clement Escoffier, Java Champion & Distinguished Engineer, Red Hat

What are we going to see? How to build AI-Infused
applications in Java - Some examples - Main concepts - Chat Models - Ai Services - Memory management - RAG - Function calling - Guardrails - Image models - The almost-all-in-one demo - Plain LangChain4j & Quarkus - Remote model (Open AI) & Local models (Ollama, Podman AI Studio)

Some examples of AI-Infused applications

Some examples Summarizer Chatbot Text Extraction from Image

AI-Infused applications

What are Large Language Models (LLMs)? Neural Networks • Transformer
based • Recognize, Predict, and Generate text • Trained on a VERY large corpuses of text • Deduce the statistical relationships between tokens • Can be fine-tuned A LLM predicts the next token based on its training data and statistical deduction

The L of LLM means Large LLama 3: - 70B
parameters - Trained on 15000B of tokens - 4.7Gb on disk Granite: - 34B parameters - Trained on 3500B of tokens - 3.8 Gb of RAM, 4.8Gb on disk More on: An idea of the size

More parameters means more capabilities https://research.google/blog/pathways-language-model-palm-scaling-to-540-billion-parameters-for-breakthrough-performance/

Model and Model Serving Model Model Serving - Run the
model - CPU / GPU - Expose an API - REST - gRPC - May support multiple models

Prompt and Prompt Engineering Model Input (Prompt) Output Input: -
Prompt (text) - Instructions to give to the model - Taming a model is hard Output: - Depends on the modality of the model

Application Model AI-infused application |ˌeɪˌaɪ ˈɪnˌfjuːzd ˌæplɪˈkeɪʃən| noun (Plural AI-Infused
applications) A software program enhanced with artificial intelligence capabilities, utilizing AI models to implement intelligent features and functionalities.

Using models to build apps on top Dev Ops Release
Deploy Operate Monitor Plan Code Build Test Train Evaluate Deploy Collect Evaluate Curate Analyze Data ML APIs

Using models to build apps on top Dev Ops Release
Deploy Operate Monitor Plan Code Build Test Train Evaluate Deploy Collect Evaluate Curate Analyze Data ML Need some clients and toolkits

LangChain (Python, JS) LangChain Chains Agents Prompts Vector Stores Models
Document Loaders

LangChain4j LangChain4j Chains Agents Prompts Vector Stores Models Document Loaders

LangChain4j https://github.com/langchain4j/langchain4j • Toolkit to build AI-Infused Java applications ◦
Provides integration with many LLM/SML providers ◦ Provides building blocks for the most common patterns (RAG, Function calling…) ◦ Abstractions to manipulate prompts, messages, memory, tokens… ◦ Integrate a large variety of vector stores and document loaders

LangChain / LangChain4j / Quarkus LangChain4j LangChain LangChain4j Quarkus LangChain4j
Inspired By Uses and extends

LangChain4j https://github.com/langchain4j/langchain4j AI Service Loaders Splitters Vector Store Embedding Models
Language Models Image Models Prompt Function calling Memory Output Parsers Building blocks RAG

Quarkus LangChain4j https://docs.quarkiverse.io/quarkus-langchain4j LangChain4j Quarkus LangChain4j Application LLMs Vector stores
Embedding Models - Declarative clients - CDI integration - Observability (Otel, Prometheus) - Auditing - Resilience - RAG building blocks - Tool support - Mockable

Bootstrapping LangChain4j <dependency> <groupId>dev.langchain4j</ groupId> <artifactId>langchain4j</ artifactId> </dependency> <dependency> <groupId>dev.langchain4j</
groupId> <artifactId>langchain4j-open-ai</ artifactId> </dependency> <dependency> <groupId>io.quarkiverse.langchain4j</ groupId> <artifactId>quarkus-langchain4j-openai</ artifactId> </dependency> Quarkus LangChain4j

The basics - Chat Models

Chat Models • Text to Text ◦ Text in ->
Text out ◦ NLP • Prompt ◦ Set of instructions explaining what the model must generate ◦ Use plain English (or other language) ◦ There are advanced prompting technique ▪ Prompt depends on the model ▪ Prompt engineering is an art ChatLanguageModel modelA = OpenAiChatModel.withApiKey(System.getenv("...")); String answerA = modelA.generate("Say Hello World"); @Inject ChatLanguageModel model; String answer = model.generate("Say Hello"); LangChain4j Quarkus LangChain4j - Chat Model Quarkus LangChain4j - AI Service @RegisterAiService interface PromptA { String ask(String prompt); } @Inject PromptA prompt; String answer = prompt.ask("Say Hello");

Messages Application Role=User (prompt) Role=Assistant (response) LLM

Messages Application Role=User Role=Assistant (response) Role=System LLM Deﬁne the Context
and scope (higher priority)

var system = new SystemMessage( "You are Georgios, all your
answers should be using the Java language using greek letters "); var user = new UserMessage("Say Hello World" ); var response = model.generate(system, user); // Pass a list of messages System.out.println( "Answer: " + response.content().text()); Messages Context or Memory

Memory, well, the absence of memory Application LLM (stateless)

Manual Memory List<ChatMessage> memory = new ArrayList<>(); memory.addAll(List.of( new SystemMessage(
"You are a useful AI assistant." ), new UserMessage("Hello, my name is Clement." ), new UserMessage("What is my name?" ) )); var response = model.generate( memory); System.out.println( "Answer 1: " + response.content().text()); memory.add(response.content()); memory.add(new UserMessage("What's my name again?" )); response = model.generate( memory); System.out.println( "Answer 2: " + response.content().text()); var m = new UserMessage("What's my name again?" ); response = model.generate(m); // No memory System.out.println( "Answer 3: " + response.content().text());

Messages and Memory Application LLM (stateless) Size limit

Messages and Memory Model Context Output Message Models are stateless
- Pass a set of messages named context - These messages are stored in a memory - Context size is limited (eviction strategy) Context = (Stored input messages + Output messages) + New input

Chat Memory var memory = MessageWindowChatMemory .builder() .id("user-id") .maxMessages( 3)
// Only 3 messages will be stored .build(); memory.add(new SystemMessage( "You are a useful AI assistant." )); memory.add(new UserMessage("Hello, my name is Clement and I live in Valence, France" )); memory.add(new UserMessage("What is my name?" )); var response = model.generate(memory.messages()); System.out.println("Answer: " + response.content().text());

Context Limit & Pricing Number of tokens - Depends on
the model and model serving (provider) - Tokens are not words Context size is not in terms of messages, but in number of tokens This_talk_is_really_ boring._Hopefully,_it_will _be_over_soon. [2500, 838, 2082, 15224, 3067, 2146, 1535, 7443, 2697, 127345, 46431, 278, 3567, 492, 40729, 34788, 62, 84908, 13] https://platform.openai.com/tokenizer

Token Usage var memory = MessageWindowChatMemory .builder() .id("user-id") .maxMessages( 3)
// Only 3 messages will be stored .build(); memory.add(new SystemMessage( "You are a useful AI assistant." )); memory.add(new UserMessage("Hello, my name is Clement and I live in Valence, France" )); memory.add(new UserMessage("What is my name?" )); var response = model.generate(memory.messages()); System.out.println("Answer 1: " + response.content().text()); System.out.println("Input token: " + response.tokenUsage().inputTokenCount()); System.out.println("Output token: " + response.tokenUsage().outputTokenCount()); System.out.println("Total token: " + response.tokenUsage().totalTokenCount());

AI Services

LangChain4j AI Services Map LLM interaction to Java interfaces -
Declarative model - You define the API the rest of the code uses - Mapping of the output - Parameterized prompt - Abstract/Integrate some of the concepts we have seen public int run() { Assistant assistant = AiServices.create(Assistant.class, model); System.out.println( assistant.answer("Say Hello World") ); return 0; } // Represent the interaction with the LLM interface Assistant { String answer(String question); }

LangChain4j AI Services - System Message - @SystemMessage annotation -
Or System message provider public int run() { Assistant assistant = AiServices.create(Assistant.class, model); System.out.println( assistant.answer("Say Hello World") ); return 0; } interface Assistant { @SystemMessage("You are a Shakespeare, all your response must be in iambic pentameter.") String answer(String question); } var rapper = AiServices.builder(Friend.class) .chatLanguageModel( model) .systemMessageProvider( chatMemoryId -> "Your a west coast rapper, all your response must be in rhymes." ) .build();

LangChain4j AI Services - User Message and Parameters public int
run() { Poet poet = AiServices.create(Poet.class, model); System.out.println(poet.answer("Devoxx")); return 0; } interface Poet { @SystemMessage ("You are Shakespeare, all your response must be in iambic pentameter." ) @UserMessage("Write a poem about {{topic}}. It should not be more than 5 lines long." ) String answer(@V("topic") String topic); }

LangChain4j AI Services - Structured Output AI Service methods are
not limited to returning String - Primitive types - Enum - JSON Mapping TriageService triageService = … System.out.println(triageService.triage( "It was a great experience!" )); System.out.println(triageService.triage( "It was a terrible experience!" )); // … enum Sentiment { POSITIVE, NEGATIVE,} record Feedback(Sentiment sentiment, String summary) {} interface TriageService { @SystemMessage("You are an AI that need to triage user feedback." ) @UserMessage(""" Analyze the given feedback, and determine i it is positive, or negative. Then, provide a summary of the feedback: {{feedback}} """) Feedback triage(@V("feedback") String feedback); }

LangChain4j AI Services - Chat Memory - You can plug
a ChatMemory to an AI service to automatically add and evict messages var memory = MessageWindowChatMemory .builder() .id( "user-id") .maxMessages( 3) .build(); var assistant = AiServices.builder(Assistant.class) .chatLanguageModel( model) .chatMemory( memory) .build();

Quarkus AI Services

What’s the diﬀerence between these? Application Database Application Service CRUD
application Microservice Application Model AI-Infused application

application Microservice Application Model AI-Infused application Integration Points

application Microservice Application Model AI-Infused application Integration Points Observability (metrics, tracing, auditing) Fault-Tolerance (timeout, circuit-breaker, non-blocking, fallbacks…)

Quarkus AI Services Application Component AI Service - Define the
API (Interface) - Configure the prompt for each method - Configure the tools, memory… Chat Model Tools Memory Retriever Audit Moderation Model (RAG) (Observability) (Agent) Inject and invoke (Manage the context using CDI scopes)

Quarkus AI Services Map LLM interaction to Java interfaces -
Based on LangChain4j AI Service - Made CDI aware - Injectable - Scope - Dev UI, Templating… - Metrics, Audit, Tracing… @Inject Assistant assistant; @ActivateRequestContext public int run() { println(assistant.answer("My name is Clement, can you say \"Hello World\" in Greek?")); println(assistant.answer( "What's my name?")); return 0; } @RegisterAiService interface Assistant { String answer(String question); } Injectable bean, Request scope by default

Quarkus AI Services - Scopes and memory Request scope by
default - Overridable - Keep messages for the duration of the scope - Request - the request only - Application - the lifetime of the application - Because it’s risky, you need a memory id - Session - the lifetime of the websocket session @RegisterAiService @RequestScoped interface ShortMemoryAssistant { String answer(String question); } @RegisterAiService @ApplicationScoped interface LongMemoryAssistant { String answer(@MemoryId int id, @UserMessage String question); } @RegisterAiService @SessionScoped interface ConversationalMemoryAssistant { String answer(String question); }

Quarkus AI Services - Custom Memory Memory Provider - You
can implement a custom memory provider - Can implement persistence - Conversation represented by MemoryId - For session - it’s the WS session ID. @ApplicationScoped public class MyMemoryStore implements ChatMemoryStore { public List<ChatMessage> getMessages( Object memoryId) { // … } public void updateMessages(Object memoryId, List<ChatMessage> messages) // … } public void deleteMessages( Object memoryId){ // … } }

Quarkus AI Services - Parameter and Structured Output Prompt can
be parameterized - Use Qute template engine - Can contain logic Structured output - Based on Jackson @UserMessage(""" What are the {number}th last teams in which {player} played? Only return the team names. """) List<String> ask(int number, String player); @UserMessage(""" What are the last team in which {question.player} played? Return the team and the last season. """) Entry ask(MyHttpEndpoint.Question question); record Entry(String team, String years) {} Single {}

Quarkus AI Services - Complex templating @SystemMessage(""" Given the following
conversation and a follow-up question, rephrase the follow-up question to be a standalone question. Context: {#for m in chatMessages} {#if m.type.name() == "USER"} User: {m.text()} {/if} {#if m.type.name() == "AI"} Assistant: {m.text()} {/if} {/for} """) public String rephrase(List<ChatMessage> chatMessages, @UserMessage String question);

Quarkus AI Services Application Component AI Service Quarkus Extended with
Quarkus capabilities (REST client, Metrics, Tracing…)

Quarkus AI Services - Observability Collect metrics - Exposed as
Prometheus OpenTelemetry Tracing - Trace interactions with the LLM <dependency> <groupId>io.quarkus</groupId> <artifactId> quarkus-opentelemetry </artifactId> </dependency> <dependency> <groupId>io.quarkus</groupId> <artifactId> quarkus-micrometer-registry-prometheus </artifactId> </dependency>

Quarkus AI Services - Tracing

Quarkus AI Services - Auditing Audit Service - Allow keeping
track of interactions with the LLM - Can be persisted - Implemented by the application code @Override public void initialMessages( Optional<SystemMessage> systemMessage, UserMessage userMessage ) { } @Override public void addLLMToApplicationMessage ( Response<AiMessage> response) {} @Override public void onFailure(Exception e) {} @Override public void onCompletion(Object result) {}

Quarkus AI Services - Fault Tolerance Retry / Timeout /
Fallback / Circuit Breaker / Rate Limiting… - Protect against error - Graceful recovery There are other resilience patterns (guardrails) @UserMessage("…") @Retry(maxRetries = 2) @Timeout(value = 1, unit = MINUTES) @Fallback(fallbackMethod = "fallback") Entry ask(Question question); default Entry fallback(Question question) { return new Entry("Unknown", "Unknown"); } <dependency> <groupId>io.quarkus</groupId> <artifactId> quarkus-smallrye-fault-tolerance </artifactId> </dependency>

Retrieval Augmented Generation (RAG) Enhance LLM knowledge by providing relevant
information in real-time from other sources – Dynamic data that changes frequently Fine-tuning is expensive! 2 stages Indexing / Ingestion Retrieval / Augmentation

Indexing / Ingestion

Indexing / Ingestion FileSystemDocumentLoader UrlDocumentLoader AmazonS3DocumentLoader AzureBlobStorageDocumentLoader GitHubDocumentLoader TencentCosDocumentLoader

Indexing / Ingestion TextDocumentParser ApachePdfBoxDocumentParser ApachePoiDocumentParser ApacheTikaDocumentParser

Indexing / Ingestion What do I need to think about?
What is the representation of the data? How do I want to split? Per document? Chapter? Sentence? How many tokens do I want to end up with?

Indexing / Ingestion DocumentByParagraphSplitter DocumentByLineSplitter DocumentBySentenceSplitter DocumentByWordSplitter DocumentByCharacterSplitter DocumentByRegexSplitter DocumentSplitters.recursive()

Indexing / Ingestion Compute an embedding (numerical vector) representing semantic
meaning of each segment. Requires an embedding model In-process/Onnx, Amazon Bedrock, Azure OpenAI, Cohere, DashScope, Google Vertex AI, Hugging Face, Jine, Jlama, LocalAI, Mistral, Nomic, Ollama, OpenAI, OVHcloud, Voyage AI, Cloudfare Workers AI, Zhipu AI

Store embedding alone or together with segment. Requires a vector
store In-memory, Chroma, Elasticsearch, Milvus, Neo4j, OpenSearch, Pinecone, PGVector, Redis, Vespa, Weaviate, Qdrant Indexing / Ingestion

Indexing / Ingestion var ingestor = EmbeddingStoreIngestor.builder() .embeddingModel(embeddingModel) .embeddingStore(embeddingStore) //
Add userId metadata entry to each Document to be able to filter by it later .documentTransformer(document -> { document.metadata().put("userId", "12345"); return document; }) // Split each Document into TextSegments of 1000 tokens each with a 200-token overlap .documentSplitter(DocumentSplitters.recursive(1000, 200)) // Add the name of the Document to each TextSegment to improve the quality of search .textSegmentTransformer(textSegment -> TextSegment.from( textSegment.metadata().getString("file_name") + "\n" + textSegment.text(), textSegment.metadata() ) ) .build(); // Get the path of where the documents are and load them recursively Path path = Path.of(...); List<Document> documents = FileSystemDocumentLoader.loadDocumentsRecursively(path); // Ingest the documents into the embedding store ingestor.ingest(documents);

Retrieval / Augmentation

Retrieval / Augmentation Compute an embedding (numerical vector) representing semantic
meaning of the query. Requires an embedding model.

Retrieval / Augmentation Retrieve & rank relevant content based on
cosine similarity or other similarity/distance measures.

Retrieval / Augmentation Augment input to the LLM with related
content. What do I need to think about? Will I exceed the max number of tokens? How much chat memory is available?

Retrieval / Augmentation public class RagRetriever { @Produces @ApplicationScoped public
RetrievalAugmentor create(EmbeddingStore store, EmbeddingModel model) { var contentRetriever = EmbeddingStoreContentRetriever. builder() .embeddingModel(model) .embeddingStore(store) .maxResults( 3) .minScore( 0.75) .filter( metadataKey("userId").isEqualTo("12345")) .build(); return DefaultRetrievalAugmentor. builder() .contentRetriever(contentRetriever) .build(); } }

Advanced RAG

public class RagRetriever { @Produces @ApplicationScoped public RetrievalAugmentor create(EmbeddingStore store,
EmbeddingModel model) { var embeddingStoreRetriever = EmbeddingStoreContentRetriever.builder() .embeddingModel(model) .embeddingStore(store) .maxResults(3) .minScore(0.75) .filter(metadataKey("userId").isEqualTo("12345")) .build(); var googleSearchEngine = GoogleCustomWebSearchEngine.builder() .apiKey(System.getenv("GOOGLE_API_KEY")) .csi(System.getenv("GOOGLE_SEARCH_ENGINE_ID")) .build(); var webSearchRetriever = WebSearchContentRetriever.builder() .webSearchEngine(googleSearchEngine) .maxResults(3) .build(); return DefaultRetrievalAugmentor.builder() .queryRouter(new DefaultQueryRouter(embeddingStoreRetriever, webSearchRetriever)) .build(); } } Advanced RAG https://github.com/cescoffier/langchain4j-deep-dive/blob/main/4-rag/src/main/java/dev/langchain4j/quarkus/deepdive/RagRetriever.java

public class RagRetriever { @Produces @ApplicationScoped public RetrievalAugmentor create(EmbeddingStore store,
EmbeddingModel model, ChatLanguageModel chatModel) { var embeddingStoreRetriever = ... var webSearchRetriever = ... var queryRouter = LanguageModelQueryRouter.builder() .chatLanguageModel(chatModel) .fallbackStrategy(FallbackStrategy.ROUTE_TO_ALL) .retrieverToDescription( Map.of( embeddingStoreContentRetriever, “Local Documents”, webSearchContentRetriever, “Web Search” ) ) .build(); return DefaultRetrievalAugmentor.builder() .queryRouter(queryRouter) .build(); } } Advanced RAG https://github.com/cescoffier/langchain4j-deep-dive/blob/main/4-rag/src/main/java/dev/langchain4j/quarkus/deepdive/RagRetriever.java

application.properties quarkus.langchain4j.easy-rag.path=path/to/files quarkus.langchain4j.easy-rag.max-segment-size=1000 quarkus.langchain4j.easy-rag.max-overlap-size=200 quarkus.langchain4j.easy-rag.max-results=3 quarkus.langchain4j.easy-rag.ingestion-strategy=on|off quarkus.langchain4j.easy-rag.reuse-embeddings=true|false pom.xml <dependency> <groupId>io.quarkiverse.langchain4j</groupId>
<artifactId>quarkus-langchain4j-easy-rag</artifactId> <version>${quarkus-langchain4j.version}</version> </dependency>  <dependency> <groupId>io.quarkiverse.langchain4j</groupId> <artifactId>quarkus-langchain4j-openai</artifactId> <version>${quarkus-langchain4j.version}</version> </dependency>   <dependency> <groupId>io.quarkiverse.langchain4j</groupId> <artifactId>quarkus-langchain4j-pgvector</artifactId> <version>${quarkus-langchain4j.version}</version> </dependency> Easy RAG!

Function Calling, Agents and Tools

Agent and Tools A tool is a function that the
model can call: - Tools are parts of CDI beans - Tools are defined and described using the @Tool Prompt (Context) Extend the context with tool descriptions Invoke the model The model asks for a tool invocation (name + parameters) The tool is invoked (on the caller) and the result sent to the model The model computes the response using the tool result Response

<~~ My prompt <~~ Tool invocation request <~~ Tool invocation
response <~~ Model Response

Tools - A tool is just a method - It
can access databases, or invoke a remote service - It can also use another LLM Tools require memory Application

Using tools with LangChain4j Assistant assistant = AiServices.builder(Assistant.class) .chatLanguageModel( model)
.tools(new Calculator()) .chatMemory( MessageWindowChatMemory .withMaxMessages(10)) .build(); static class Calculator { @Tool("Calculates the length of a string") int stringLength(String s) { return s.length(); } @Tool("Calculates the square root of a number" ) double sqrt(int x) { System.out.println("Called sqrt() with x=" + x); return Math.sqrt(x); } } Objects to use as tools Declare an tool method (description optional)

Using tools with Quarkus LangChain4j @RegisterAiService interface Assistant { @ToolBox(Calculator.class)
String chat(String userMessage ); } @ApplicationScoped static class Calculator { @Tool("Calculates the length of a string" ) int stringLength(String s) { return s.length(); } } Class of the bean declaring tools Declare an tool method (description optional) Must be a bean (singleton and dependant supported) Tools can be listed in the `tools` attribute

Giving access to database (Quarkus Panache) @ApplicationScoped public class BookingRepository
implements PanacheRepository<Booking> { @Tool("Cancel a booking" ) @Transactional public void cancelBooking(long bookingId, String customerFirstName , String customerLastName ) { var booking = getBookingDetails( bookingId, customerFirstName, customerLastName); delete(booking); } @Tool("List booking for a customer" ) public List<Booking> listBookingsForCustomer (String customerName , String customerSurname ) { var found = Customer.find("firstName = ?1 and lastName = ?2", customerName, customerSurname).singleResultOptional(); return list("customer", found.get()); } }

Function Calling - Tracing

Web Search Tools (Tavily) @UserMessage(""" Search for information about the
user query: {query}, and answer the question. """) @ToolBox(WebSearchTool.class) String chat(String query); Provided by quarkus-langchain4j-tavily Can also be used with RAG

Risks • Things can go wrong quickly • Risk of
prompt injection ◦ Access can be protected in Quarkus • Audit is very important to check the parameters • Distinction between read and write beans Application

Guardrails

https://www.upworthy.com/prankster-tricks-a-gm-dealership-chatbot-to-sell-him-a-76000-chevy-tahoe-for-1-rp3 https://www.cbsnews.com/news/aircanada-chatbot-discount-customer https://www.bbc.com/news/technology-35902104 https://www.spiceworks.com/tech/artiﬁcial-intelligence/news/meta-blender-bot-3-controversy

Guardrails - Functions used to validate the input and output
of the model - Detect invalid input - Detect prompt injection - Detect hallucination - Chain of guardrails - Sequential - Stop at first failure Quarkus LangChain4j only (for now)

Retry and Reprompt Output guardrails can have 4 different outcomes:
- Success - the response is passed to the caller or next guardrail - Fatal - we stop and throw an exception - Retry - we call the model again with the same context (we never know ;-) - Reprompt - we call the model again with another message in the model indicating how to fix the response

Implement an input guardrail @ApplicationScoped public class UppercaseInputGuardrail implements InputGuardrail
{ @Override public InputGuardrailResult validate(UserMessage userMessage ) { var message = userMessage.singleText(); var isAllUppercase = message.chars().filter(Character::isLetter) .allMatch( Character::isUpperCase); if (isAllUppercase) { return success(); } else { return failure("The input must be in uppercase." ); } } } CDI beans Interface to implement Can also access the chat memory and the augmentation results OK Failure

Implement an output guardrail @ApplicationScoped public class UppercaseOutputGuardrail implements OutputGuardrail
{ @Override public OutputGuardrailResult validate(OutputGuardrailParams params ) { System.out.println("response is: " + params.responseFromLLM().text() + " / " + params.responseFromLLM().text().toUpperCase()); var message = params.responseFromLLM().text(); var isAllUppercase = message.chars().filter(Character::isLetter).allMatch(Character::isUpperCase); if (isAllUppercase) { return success(); } else { return reprompt("The output must be in uppercase." , "Please provide the output in uppercase." ); } } CDI beans Interface to implement Can also access the chat memory and the augmentation results OK Reprompt

Declaring guardrails @RegisterAiService public interface Assistant { @InputGuardrails(UppercaseInputGuardrail .class) @OutputGuardrails(UppercaseOutputGuardrail
.class) String chat(String userMessage ); } Both can receive multiple values

Images

Process or Generate images Image Model - Image Models are
specialized for … Images - Can generate images from text - Can process images from input (like the OCR demo) - Chat Model: GPT4-o | Image Model: Dall-e - Important: Not every model serving provider provides image support (as it needs specialized models)

Using Image Model to generate pictures @Inject ImageModel model; @Override
public int run(String... args) throws IOException { var prompt = "Generate a picture of a rabbit software developers coming to Devoxx" ; var response = model.generate(prompt); System.out.println(response.content().url()); return 0; } Image Model (can also be created with a builder) Response<Image> quarkus.langchain4j.openai.timeout =1m quarkus.langchain4j.openai.image-model.size =1024x1024 quarkus.langchain4j.openai.image-model.quality =standard quarkus.langchain4j.openai.image-model.style =vivid quarkus.langchain4j.openai.image-model.persist =true Print the persisted image

Generating images from AI Services @RegisterAiService @ApplicationScoped public interface ImageGenerator
{ Image generate(String userMessage ); } Indicate to use the image model to generate the picture var prompt = "Generate a picture of a rabbit going to Devoxx. The rabbit should be wearing a Quarkus tee-shirt." ; var response = generator.generate(prompt); var file = new File("rabbit-at-devoxx.jpg" ); Files.copy(response.url().toURL().openStream(), file.toPath(), StandardCopyOption.REPLACE_EXISTING);

Processing picture from AI Services @RegisterAiService @ApplicationScoped public interface ImageDescriber
{ @UserMessage(""" Describe the given message. """) String describe(Image image); } Indicate to the model to use the image

The almost-all-in-one demo

The almost-all-in-one demo - React - Quarkus WebSockets.NEXT - Quarkus
Quinoa - Ollama - RAG - Ingest data from filesystem - Tools - Update database - Send email - Observability - OpenTelemetry

Conclusion

What did we see? How to Build AI-Infused applications in
Java https://docs.quarkiverse.io/ quarkus-langchain4j https://docs.langchain4j.dev Code Slides Langchain4J Quarkus Chat Models RAG PROMPT MESSAGES AI SERVICE MEMORY CONTEXT TOOLS FUNCTION CALLING GUARDRAILS IMAGE MODELS OBSERVABILITY audit TRACING agent

Other Quarkus & LangChain4j sessions • Developing Cloud-Native Java AI
applications with DJL and LangChain4j - Monday, 09:30, Room 9 • Project Leyden & Quarkus - Monday, 12:35, Room 6 • Squeezing Performance out of Quarkus - Monday, 16:50, BOF 2 • jbang - Unleash the power of Java - Monday, 18:20, Room 8 • Java meets AI: Build LLM-Powered Apps with LangChain4j - Tuesday, 09:30, Room 9 • Create AI-Infused Apps with LangChain4j: Insights from the Quarkus Developers - Tuesday, 13:30, BOF 1 • Crafting intelligent GitHub Bots - Wednesday, 12:00, Room 4 • Pushing LLMs over the Edge: Exploring the Limits of the Possible - Wednesday, 16:40, Room 6 • Quarkus Community BOF - Wednesday, 19:00, BOF 2 • Panel Discussion: LangChain4j, a year later. - Thursday, 11:50, Room 10 • Crafting Intelligent Applications with Quarkus/LangChain4j - Thursday, 12:50, Room 5 • Introduction to Quarkus Security - Thursday, 15:00, Room 9 • Zero Waste, Radical Magic, and Italian Graft – Quarkus Efficiency Secrets - Thursday, 17:40, Room 6

Devoxx BE 24 - Deep Dive LangChain4J and Quarku...

Devoxx BE 24 - Deep Dive LangChain4J and Quarkus LangChain4J

More Decks by Clement Escoffier

Featured

Transcript