Slide 1

Slide 1 text

@edeandrea Eric Deandrea, Red Hat Java Champion | Senior Principal Developer Advocate LangChain4j Deep Dive

Slide 2

Slide 2 text

@edeandrea 2 ● Java Champion ● 25+ years software development experience ● ~11 years DevOps Architect ● Contributor to Open Source projects Quarkus Spring Boot, Spring Framework, Spring Security LangChain4j (& Quarkus LangChain4j) Wiremock Microcks ● Boston Java Users ACM Chapter Board Member ● Published Author About Me

Slide 3

Slide 3 text

@edeandrea https://www.meetup.com/virtualjug/events/305649189 https://devnexus.com/presentations/test-driven-development-it-s-easier-than-you-think

Slide 4

Slide 4 text

@edeandrea ● Showcase & explain Quarkus, how it enables modern Java development & the Kubernetes-native experience ● Introduce familiar Spring concepts, constructs, & conventions and how they map to Quarkus ● Equivalent code examples between Quarkus and Spring as well as emphasis on testing patterns & practices 4 https://red.ht/quarkus-spring-devs

Slide 5

Slide 5 text

@edeandrea

Slide 6

Slide 6 text

@edeandrea

Slide 7

Slide 7 text

@edeandrea What are we going to see? How to build AI-Infused applications in Java - Some examples - Main concepts - Chat Models - Ai Services - Memory management - RAG - Function calling - Guardrails - Image models - The almost-all-in-one demo - Plain LangChain4j & Quarkus - Remote model (Open AI) & Local models (Ollama, Podman AI Studio) Example Code Slides https://github.com/cescoffier/langchain4j-deep-dive https://speakerdeck.com/edeandrea/java-meets-ai-build-llm-powered-apps-with-langchain4j

Slide 8

Slide 8 text

@edeandrea Some examples of AI-Infused applications

Slide 9

Slide 9 text

@edeandrea Some examples Summarizer Chatbot Text Extraction from Image https://github.com/cescoffier/langchain4j-deep-dive/tree/main/0-examples

Slide 10

Slide 10 text

@edeandrea AI-Infused applications

Slide 11

Slide 11 text

@edeandrea What are Large Language Models (LLMs)? Neural Networks ● Recognize, Predict, and Generate text ● Trained on a VERY large corpuses of text ● Deduce the statistical relationships between tokens ● Can be fine-tuned A LLM predicts the next token based on its training data and statistical deduction

Slide 12

Slide 12 text

@edeandrea The L of LLM means Large LLama 3.3: - 70B parameters - Trained on > 15T of tokens - 128K token window - 43 Gb on disk Granite: - 34B parameters - Trained on 3500B of tokens - 3.8 Gb of RAM, 4.8Gb on disk More on: An idea of the size

Slide 13

Slide 13 text

@edeandrea More parameters means more capabilities https://research.google/blog/pathways-language-model-palm-scaling-to-540-billion-parameters-for-breakthrough-performance/

Slide 14

Slide 14 text

@edeandrea Model and Model Serving Model Model Serving - Run the model - CPU / GPU - Expose an API - REST - gRPC - May support multiple models

Slide 15

Slide 15 text

@edeandrea Prompt and Prompt Engineering Model Input (Prompt) Output Input: - Prompt (text) - Instructions to give to the model - Taming a model is hard Output: - Depends on the modality of the model

Slide 16

Slide 16 text

@edeandrea Application Model AI-infused application |ˌeɪˌaɪ ˈɪnˌfjuːzd ˌæplɪˈkeɪʃən| noun (Plural AI-Infused applications) A software program enhanced with artificial intelligence capabilities, utilizing AI models to implement intelligent features and functionalities.

Slide 17

Slide 17 text

@edeandrea Using models to build apps on top Dev Ops Release Deploy Operate Monitor Plan Code Build Test Train Evaluate Deploy Collect Evaluate Curate Analyze Data ML Need some clients and toolkits

Slide 18

Slide 18 text

@edeandrea LangChain (Python, JS) LangChain Chains Agents Prompts Vector Stores Models Document Loaders

Slide 19

Slide 19 text

@edeandrea LangChain4j LangChain4j Chains Agents Prompts Vector Stores Models Document Loaders

Slide 20

Slide 20 text

@edeandrea LangChain4j https://github.com/langchain4j/langchain4j ● Toolkit to build AI-Infused Java applications ○ Provides integration with many LLM/SML providers ○ Provides building blocks for the most common patterns (RAG, Function calling…) ○ Abstractions to manipulate prompts, messages, memory, tokens… ○ Integrate a large variety of vector stores and document loaders

Slide 21

Slide 21 text

@edeandrea LangChain / LangChain4j / Quarkus LangChain4j LangChain LangChain4j Quarkus LangChain4j Inspired By Uses and extends

Slide 22

Slide 22 text

@edeandrea LangChain4j https://github.com/langchain4j/langchain4j AI Service Loaders Splitters Vector Store Embedding Models Language Models Image Models Prompt Function calling Memory Output Parsers Building blocks RAG

Slide 23

Slide 23 text

@edeandrea Quarkus LangChain4j https://docs.quarkiverse.io/quarkus-langchain4j LangChain4j Quarkus LangChain4j Application LLMs Vector stores Embedding Models - Declarative clients - CDI integration - Observability (Otel, Prometheus) - Auditing - Resilience - RAG building blocks - Tool support - Mockable

Slide 24

Slide 24 text

@edeandrea Bootstrapping LangChain4j dev.langchain4j groupId> langchain4j artifactId> dev.langchain4j groupId> langchain4j-open-ai artifactId> io.quarkiverse.langchain4j groupId> quarkus-langchain4j-openai artifactId> Quarkus LangChain4j

Slide 25

Slide 25 text

@edeandrea The basics - Chat Models

Slide 26

Slide 26 text

@edeandrea Chat Models ● Text to Text ○ Text in -> Text out ○ NLP ● Prompt ○ Set of instructions explaining what the model must generate ○ Use plain English (or other language) ○ There are advanced prompting technique ■ Prompt depends on the model ■ Prompt engineering is an art ChatLanguageModel modelA = OpenAiChatModel.builder() .apiKey(System.getenv("...")).build(); String answerA = modelA.chat("Say Hello World"); @Inject ChatLanguageModel model; String answer = model.chat("Say Hello"); LangChain4j Quarkus LangChain4j - Chat Model Quarkus LangChain4j - AI Service @RegisterAiService interface PromptA { String ask(String prompt); } @Inject PromptA prompt; String answer = prompt.ask("Say Hello");

Slide 27

Slide 27 text

@edeandrea Messages Application Role=User (prompt) Role=Assistant (response) LLM

Slide 28

Slide 28 text

@edeandrea Messages Application Role=User Role=Assistant (response) Role=System LLM Define the Context and scope (higher priority)

Slide 29

Slide 29 text

@edeandrea var system = new SystemMessage( "You are Georgios, all your answers should be using the Java language using greek letters "); var user = new UserMessage("Say Hello World" ); var response = model.chat(system, user); // Pass a list of messages System.out.println( "Answer: " + response.aiMessage().text()); Messages Context or Memory

Slide 30

Slide 30 text

@edeandrea Memory, well, the absence of memory Application LLM (stateless)

Slide 31

Slide 31 text

@edeandrea Manual Memory List memory = new ArrayList<>(); memory.addAll(List.of( new SystemMessage( "You are a useful AI assistant." ), new UserMessage("Hello, my name is Clement." ), new UserMessage("What is my name?" ) )); var response = model.chat( memory); System.out.println( "Answer 1: " + response.aiMessage().text()); memory.add(response.aiMessage()); memory.add(new UserMessage("What's my name again?" )); response = model.chat( memory); System.out.println( "Answer 2: " + response.aiMessage().text()); var m = new UserMessage("What's my name again?" ); response = model.chat(m); // No memory System.out.println( "Answer 3: " + response.aiMessage().text());

Slide 32

Slide 32 text

@edeandrea Messages and Memory Application LLM (stateless) Size limit

Slide 33

Slide 33 text

@edeandrea Messages and Memory Model Context Output Message Models are stateless - Pass a set of messages named context - These messages are stored in a memory - Context size is limited (eviction strategy) Context = (Stored input messages + Output messages) + New input

Slide 34

Slide 34 text

@edeandrea Chat Memory var memory = MessageWindowChatMemory .builder() .id("user-id") .maxMessages( 3) // Only 3 messages will be stored .build(); memory.add(new SystemMessage( "You are a useful AI assistant." )); memory.add(new UserMessage("Hello, my name is Clement and I live in Valence, France" )); memory.add(new UserMessage("What is my name?" )); var response = model.chat(memory.messages()); System.out.println("Answer: " + response.aiMessage().text());

Slide 35

Slide 35 text

@edeandrea Context Limit & Pricing Number of tokens - Depends on the model and model serving (provider) - Tokens are not words Context size is not in terms of messages, but in number of tokens This_talk_is_really_ boring._Hopefully,_it_will _be_over_soon. [2500, 838, 2082, 15224, 3067, 2146, 1535, 7443, 2697, 127345, 46431, 278, 3567, 492, 40729, 34788, 62, 84908, 13] https://platform.openai.com/tokenizer

Slide 36

Slide 36 text

@edeandrea Token Usage var memory = MessageWindowChatMemory .builder() .id("user-id") .maxMessages( 3) // Only 3 messages will be stored .build(); memory.add(new SystemMessage( "You are a useful AI assistant." )); memory.add(new UserMessage("Hello, my name is Clement and I live in Valence, France" )); memory.add(new UserMessage("What is my name?" )); var response = model.generate(memory.messages()); System.out.println("Answer 1: " + response.aiMessage().text()); System.out.println("Input token: " + response.tokenUsage().inputTokenCount()); System.out.println("Output token: " + response.tokenUsage().outputTokenCount()); System.out.println("Total token: " + response.tokenUsage().totalTokenCount());

Slide 37

Slide 37 text

@edeandrea AI Services

Slide 38

Slide 38 text

@edeandrea LangChain4j AI Services Map LLM interaction to Java interfaces - Declarative model - You define the API the rest of the code uses - Mapping of the output - Parameterized prompt - Abstract/Integrate some of the concepts we have seen public void run() { Assistant assistant = AiServices.create(Assistant.class, model); System.out.println( assistant.answer("Say Hello World") ); } // Represent the interaction with the LLM interface Assistant { String answer(String question); }

Slide 39

Slide 39 text

@edeandrea LangChain4j AI Services - System Message - @SystemMessage annotation - Or System message provider public void run() { var assistant = AiServices .create(Assistant.class, model); System.out.println( assistant.answer("Say Hello World") ); } interface Assistant { @SystemMessage("You are a Shakespeare, all your response must be in iambic pentameter.") String answer(String question); } var rapper = AiServices.builder(Friend.class) .chatLanguageModel( model) .systemMessageProvider( chatMemoryId -> "You’re a west coast rapper, all your response must be in rhymes." ) .build();

Slide 40

Slide 40 text

@edeandrea LangChain4j AI Services - User Message and Parameters public void run() { Poet poet = AiServices.create(Poet.class, model); System.out.println(poet.answer("Devoxx")); } interface Poet { @SystemMessage ("You are Shakespeare, all your response must be in iambic pentameter." ) @UserMessage("Write a poem about {{topic}}. It should not be more than 5 lines long." ) String answer(@V("topic") String topic); }

Slide 41

Slide 41 text

@edeandrea LangChain4j AI Services - Structured Output AI Service methods are not limited to returning String - Primitive types - Enum - JSON Mapping TriageService triageService = … System.out.println(triageService.triage( "It was a great experience!" )); System.out.println(triageService.triage( "It was a terrible experience!" )); // … enum Sentiment { POSITIVE, NEGATIVE,} record Feedback(Sentiment sentiment, String summary) {} interface TriageService { @SystemMessage("You are an AI that need to triage user feedback." ) @UserMessage(""" Analyze the given feedback, and determine i it is positive, or negative. Then, provide a summary of the feedback: {{fb}} """) Feedback triage(@V("feedback") String fb); }

Slide 42

Slide 42 text

@edeandrea LangChain4j AI Services - Chat Memory - You can plug a ChatMemory to an AI service to automatically add and evict messages var memory = MessageWindowChatMemory .builder() .id( "user-id") .maxMessages( 3) .build(); var assistant = AiServices.builder(Assistant.class) .chatLanguageModel( model) .chatMemory( memory) .build();

Slide 43

Slide 43 text

@edeandrea Quarkus AI Services

Slide 44

Slide 44 text

@edeandrea What’s the difference between these? Application Database Application Service CRUD application Microservice Application Model AI-Infused application

Slide 45

Slide 45 text

@edeandrea What’s the difference between these? Application Database Application Service CRUD application Microservice Application Model AI-Infused application Integration Points

Slide 46

Slide 46 text

@edeandrea What’s the difference between these? Application Database Application Service CRUD application Microservice Application Model AI-Infused application Integration Points Observability (metrics, tracing, auditing) Fault Tolerance (timeout, circuit-breaker, non-blocking, rate limiting, fallbacks …)

Slide 47

Slide 47 text

@edeandrea Quarkus AI Services Application Component AI Service - Define the API (Interface) - Configure the prompt for each method - Configure the tools, memory… Chat Model Tools Memory Retrieval Audit Moderation Model (RAG) (Observability) (Agent) Inject and invoke (Manage the context using CDI scopes)

Slide 48

Slide 48 text

@edeandrea Quarkus AI Services Map LLM interaction to Java interfaces - Based on LangChain4j AI Service - Made CDI aware - Injectable - Scope - Dev UI, Templating… - Metrics, Audit, Tracing… @Inject Assistant assistant; @ActivateRequestContext public int run() { println(assistant.answer("My name is Clement, can you say \"Hello World\" in Greek?")); println(assistant.answer( "What's my name?")); return 0; } @RegisterAiService interface Assistant { String answer(String question); } Injectable bean, Request scope by default

Slide 49

Slide 49 text

@edeandrea Quarkus AI Services - Scopes and memory Request scope by default - Overridable - Keep messages for the duration of the scope - Request - the request only - Application - the lifetime of the application - Because it’s risky, you need a memory id - Session - the lifetime of the websocket session @RegisterAiService @RequestScoped interface ShortMemoryAssistant { String answer(String question); } @RegisterAiService @ApplicationScoped interface LongMemoryAssistant { String answer(@MemoryId int id, @UserMessage String question); } @RegisterAiService @SessionScoped interface ConversationalMemoryAssistant { String answer(String question); }

Slide 50

Slide 50 text

@edeandrea Quarkus AI Services - Custom Memory Memory Provider - You can implement a custom memory provider - Can implement persistence - Conversation represented by MemoryId - For session - it’s the WS session ID. @ApplicationScoped public class MyMemoryStore implements ChatMemoryStore { public List getMessages( Object memoryId) { // … } public void updateMessages(Object memoryId, List messages) // … } public void deleteMessages( Object memoryId){ // … } }

Slide 51

Slide 51 text

@edeandrea Quarkus AI Services - Parameter and Structured Output Prompt can be parameterized - Use Qute template engine - Can contain logic Structured output - Based on Jackson @UserMessage(""" What are the {number}th last teams in which {player} played? Only return the team names. """) List ask(int number, String player); @UserMessage(""" What are the last team in which {question.player} played? Return the team and the last season. """) Entry ask(Question question); record Question(String player) {} record Entry(String team, String years) {} Single {}

Slide 52

Slide 52 text

@edeandrea Quarkus AI Services - Complex templating @SystemMessage(""" Given the following conversation and a follow-up question, rephrase the follow-up question to be a standalone question. Context: {#for m in chatMessages} {#if m.type.name() == "USER"} User: {m.text()} {/if} {#if m.type.name() == "AI"} Assistant: {m.text()} {/if} {/for} """) String rephrase(List chatMessages, @UserMessage String question);

Slide 53

Slide 53 text

@edeandrea Quarkus AI Services Application Component AI Service Quarkus Extended with Quarkus capabilities (REST client, Metrics, Tracing…)

Slide 54

Slide 54 text

@edeandrea Quarkus AI Services - Observability Collect metrics - Exposed as Prometheus OpenTelemetry Tracing - Trace interactions with the LLM io.quarkus quarkus-opentelemetry io.quarkiverse.micrometer.registry quarkus-micrometer-registry-otlp

Slide 55

Slide 55 text

@edeandrea Quarkus AI Services - Tracing

Slide 56

Slide 56 text

@edeandrea Quarkus AI Services - Tracing

Slide 57

Slide 57 text

@edeandrea Quarkus AI Services - Auditing - Allow keeping track of interactions with the LLM - Can be persisted - Implemented by application code by observing CDI events - Each event type captures information about the source of the event @ApplicationScoped public class AuditingListener { public void initialMessagesCreated( @Observes InitialMessagesCreatedEvent e) {} public void llmInteractionComplete( @Observes LLMInteractionCompleteEvent e) {} public void llmInteractionFailed( @Observes LLMInteractionFailureEvent e) {} public void responseFromLLMReceived( @Observes ResponseFromLLMReceivedEvent e) {} public void toolExecuted( @Observes ToolExecutedEvent e) {} } https://docs.quarkiverse.io/quarkus-langchain4j/dev/ai-services.html#_auditing

Slide 58

Slide 58 text

@edeandrea Quarkus AI Services - Fault Tolerance Retry / Timeout / Fallback / Circuit Breaker / Rate Limiting… - Protect against error - Graceful recovery There are other resilience patterns (guardrails) @UserMessage("…") @Retry(maxRetries = 2) @Timeout(value = 1, unit = MINUTES) @RateLimit(value=50,window=1,windowUnit=MINUTES) @Fallback(fallbackMethod = "fallback") Entry ask(Question question); default Entry fallback(Question question) { return new Entry("Unknown", "Unknown"); } io.quarkus quarkus-smallrye-fault-tolerance

Slide 59

Slide 59 text

@edeandrea RAG

Slide 60

Slide 60 text

@edeandrea Retrieval Augmented Generation (RAG) Enhance LLM knowledge by providing relevant information in real-time from other sources – Dynamic data that changes frequently Fine-tuning is expensive! 2 stages Indexing / Ingestion Retrieval / Augmentation

Slide 61

Slide 61 text

@edeandrea Indexing / Ingestion

Slide 62

Slide 62 text

@edeandrea Indexing / Ingestion FileSystemDocumentLoader ClassPathDocumentLoader UrlDocumentLoader AmazonS3DocumentLoader AzureBlobStorageDocumentLoader GitHubDocumentLoader TencentCosDocumentLoader

Slide 63

Slide 63 text

@edeandrea Indexing / Ingestion TextDocumentParser ApachePdfBoxDocumentParser ApachePoiDocumentParser ApacheTikaDocumentParser

Slide 64

Slide 64 text

@edeandrea Indexing / Ingestion What do I need to think about? What is the representation of the data? How do I want to split? Per document? Chapter? Sentence? How many tokens do I want to end up with?

Slide 65

Slide 65 text

@edeandrea Indexing / Ingestion DocumentByParagraphSplitter DocumentByLineSplitter DocumentBySentenceSplitter DocumentByWordSplitter DocumentByCharacterSplitter DocumentByRegexSplitter DocumentSplitters.recursive()

Slide 66

Slide 66 text

@edeandrea Indexing / Ingestion Compute an embedding (numerical vector) representing semantic meaning of each segment. Requires an embedding model In-process/Onnx, Amazon Bedrock, Azure OpenAI, Cohere, DashScope, Google Vertex AI, Hugging Face, Jine, Jlama, LocalAI, Mistral, Nomic, Ollama, OpenAI, OVHcloud, Voyage AI, Cloudfare Workers AI, Zhipu AI

Slide 67

Slide 67 text

@edeandrea Store embedding alone or together with segment. Requires a vector store In-memory, Chroma, Elasticsearch, Milvus, Neo4j, OpenSearch, Pinecone, PGVector, Redis, Vespa, Weaviate, Qdrant Indexing / Ingestion

Slide 68

Slide 68 text

@edeandrea

Slide 69

Slide 69 text

@edeandrea Indexing / Ingestion var ingestor = EmbeddingStoreIngestor.builder() .embeddingModel(embeddingModel) .embeddingStore(embeddingStore) // Add userId metadata entry to each Document to be able to filter by it later .documentTransformer(document -> { document.metadata().put("userId", "12345"); return document; }) // Split each Document into TextSegments of 1000 tokens each with a 200-token overlap .documentSplitter(DocumentSplitters.recursive(1000, 200)) // Add the name of the Document to each TextSegment to improve the quality of search .textSegmentTransformer(textSegment -> TextSegment.from( textSegment.metadata().getString("file_name") + "\n" + textSegment.text(), textSegment.metadata() ) ) .build(); // Get the path of where the documents are and load them recursively Path path = Path.of(...); List documents = FileSystemDocumentLoader.loadDocumentsRecursively(path); // Ingest the documents into the embedding store ingestor.ingest(documents);

Slide 70

Slide 70 text

@edeandrea

Slide 71

Slide 71 text

@edeandrea Retrieval / Augmentation

Slide 72

Slide 72 text

@edeandrea Retrieval / Augmentation Compute an embedding (numerical vector) representing semantic meaning of the query. Requires an embedding model.

Slide 73

Slide 73 text

@edeandrea Retrieval / Augmentation Retrieve & rank relevant content based on cosine similarity or other similarity/distance measures.

Slide 74

Slide 74 text

@edeandrea Retrieval / Augmentation Augment input to the LLM with related content. What do I need to think about? Will I exceed the max number of tokens? How much chat memory is available?

Slide 75

Slide 75 text

@edeandrea Retrieval / Augmentation public class RagRetriever { @Produces @ApplicationScoped public RetrievalAugmentor create(EmbeddingStore store, EmbeddingModel model) { var contentRetriever = EmbeddingStoreContentRetriever. builder() .embeddingModel(model) .embeddingStore(store) .maxResults( 3) .minScore( 0.75) .filter( metadataKey("userId").isEqualTo("12345")) .build(); return DefaultRetrievalAugmentor. builder() .contentRetriever(contentRetriever) .build(); } }

Slide 76

Slide 76 text

@edeandrea Advanced RAG

Slide 77

Slide 77 text

@edeandrea public class RagRetriever { @Produces @ApplicationScoped public RetrievalAugmentor create(EmbeddingStore store, EmbeddingModel model) { var embeddingStoreRetriever = EmbeddingStoreContentRetriever.builder() .embeddingModel(model) .embeddingStore(store) .maxResults(3) .minScore(0.75) .filter(metadataKey("userId").isEqualTo("12345")) .build(); var googleSearchEngine = GoogleCustomWebSearchEngine.builder() .apiKey(System.getenv("GOOGLE_API_KEY")) .csi(System.getenv("GOOGLE_SEARCH_ENGINE_ID")) .build(); var webSearchRetriever = WebSearchContentRetriever.builder() .webSearchEngine(googleSearchEngine) .maxResults(3) .build(); return DefaultRetrievalAugmentor.builder() .queryRouter(new DefaultQueryRouter(embeddingStoreRetriever, webSearchRetriever)) .build(); } } Advanced RAG https://github.com/cescoffier/langchain4j-deep-dive/blob/main/4-rag/src/main/java/dev/langchain4j/quarkus/deepdive/RagRetriever.java

Slide 78

Slide 78 text

@edeandrea public class RagRetriever { @Produces @ApplicationScoped public RetrievalAugmentor create(EmbeddingStore store, EmbeddingModel model, ChatLanguageModel chatModel) { var embeddingStoreRetriever = ... var webSearchRetriever = ... var queryRouter = LanguageModelQueryRouter.builder() .chatLanguageModel(chatModel) .fallbackStrategy(FallbackStrategy.ROUTE_TO_ALL) .retrieverToDescription( Map.of( embeddingStoreContentRetriever, “Local Documents”, webSearchContentRetriever, “Web Search” ) ) .build(); return DefaultRetrievalAugmentor.builder() .queryRouter(queryRouter) .build(); } } Advanced RAG https://github.com/cescoffier/langchain4j-deep-dive/blob/main/4-rag/src/main/java/dev/langchain4j/quarkus/deepdive/RagRetriever.java

Slide 79

Slide 79 text

@edeandrea

Slide 80

Slide 80 text

@edeandrea application.properties quarkus.langchain4j.easy-rag.path=path/to/files quarkus.langchain4j.easy-rag.max-segment-size=1000 quarkus.langchain4j.easy-rag.max-overlap-size=200 quarkus.langchain4j.easy-rag.max-results=3 quarkus.langchain4j.easy-rag.ingestion-strategy=on|off quarkus.langchain4j.easy-rag.reuse-embeddings=true|false pom.xml io.quarkiverse.langchain4j quarkus-langchain4j-easy-rag ${quarkus-langchain4j.version} io.quarkiverse.langchain4j quarkus-langchain4j-openai ${quarkus-langchain4j.version} io.quarkiverse.langchain4j quarkus-langchain4j-pgvector ${quarkus-langchain4j.version} Easy RAG!

Slide 81

Slide 81 text

@edeandrea Function Calling, Agents and Tools

Slide 82

Slide 82 text

@edeandrea Agent and Tools A tool is a function that the model can call: - Tools are parts of CDI beans - Tools are defined and described using @Tool Prompt (Context) Extend the context with tool descriptions Invoke the model The model asks for a tool invocation (name + parameters) The tool is invoked (on the caller) and the result sent to the model The model computes the response using the tool result Response

Slide 83

Slide 83 text

@edeandrea <~~ My prompt <~~ Tool invocation request <~~ Tool invocation response <~~ Model Response

Slide 84

Slide 84 text

@edeandrea Tools - A tool is just a method - It can access databases, or invoke a remote service - It can also use another LLM Tools require memory Application

Slide 85

Slide 85 text

@edeandrea Using tools with LangChain4j Assistant assistant = AiServices.builder(Assistant.class) .chatLanguageModel( model) .tools(new Calculator()) .chatMemory( MessageWindowChatMemory .withMaxMessages(10)) .build(); static class Calculator { @Tool("Calculates the length of a string") int stringLength(String s) { return s.length(); } @Tool("Calculates the square root of a number" ) double sqrt(int x) { System.out.println("Called sqrt() with x=" + x); return Math.sqrt(x); } } Objects to use as tools Declare an tool method (description optional)

Slide 86

Slide 86 text

@edeandrea Using tools with Quarkus LangChain4j @RegisterAiService interface Assistant { @ToolBox(Calculator.class) String chat(String userMessage ); } @ApplicationScoped static class Calculator { @Tool("Calculates the length of a string" ) int stringLength(String s) { return s.length(); } } Class of the bean declaring tools Declare an tool method (description optional) Must be a bean (singleton and dependant supported) Tools can be listed in the `tools` attribute

Slide 87

Slide 87 text

@edeandrea Giving access to database (Quarkus Panache) @ApplicationScoped public class BookingRepository implements PanacheRepository { @Tool("Cancel a booking" ) @Transactional public void cancelBooking(long bookingId, String customerFirstName , String customerLastName ) { var booking = getBookingDetails( bookingId, customerFirstName, customerLastName); delete(booking); } @Tool("List booking for a customer" ) public List listBookingsForCustomer (String customerName , String customerSurname ) { var found = Customer.find("firstName = ?1 and lastName = ?2", customerName, customerSurname).singleResultOptional(); return list("customer", found.get()); } }

Slide 88

Slide 88 text

@edeandrea Function Calling - Tracing

Slide 89

Slide 89 text

@edeandrea Function Calling - Tracing

Slide 90

Slide 90 text

@edeandrea Web Search Tools (Tavily) @UserMessage(""" Search for information about the user query: {query}, and answer the question. """) @ToolBox(WebSearchTool.class) String chat(String query); Provided by quarkus-langchain4j-tavily Can also be used with RAG

Slide 91

Slide 91 text

@edeandrea Risks ● Things can go wrong quickly ● Risk of prompt injection ○ Access can be protected in Quarkus ● Audit is very important to check the parameters ● Distinction between read and write beans Application

Slide 92

Slide 92 text

@edeandrea Guardrails

Slide 93

Slide 93 text

@edeandrea https://www.upworthy.com/prankster-tricks-a-gm-dealership-chatbot-to-sell-him-a-76000-chevy-tahoe-for-1-rp3 https://www.cbsnews.com/news/aircanada-chatbot-discount-customer https://www.bbc.com/news/technology-35902104 https://www.spiceworks.com/tech/artificial-intelligence/news/meta-blender-bot-3-controversy https://www.linkedin.com/posts/stephanjanssen_princoming-activity-7285987635628507136-9Ubw

Slide 94

Slide 94 text

@edeandrea Guardrails - Functions used to validate the input and output of the model - Detect invalid input - Detect prompt injection - Detect hallucination - Chain of guardrails - Sequential - Stop at first failure Quarkus LangChain4j only (for now) https://github.com/langchain4j/langchain4j/issues/2549

Slide 95

Slide 95 text

@edeandrea Retry and Reprompt Output guardrails can have 4 different outcomes: - Success - the response is passed to the caller or next guardrail - Fatal - we stop and throw an exception - Retry - we call the model again with the same context (we never know ;-) - Reprompt - we call the model again with another message in the model indicating how to fix the response

Slide 96

Slide 96 text

@edeandrea Implement an input guardrail @ApplicationScoped public class UppercaseInputGuardrail implements InputGuardrail { @Override public InputGuardrailResult validate(UserMessage userMessage ) { var message = userMessage.singleText(); var isAllUppercase = message.chars().filter(Character::isLetter) .allMatch( Character::isUpperCase); return isAllUppercase ? success() : failure( "The input must be in uppercase." ); } } CDI beans Interface to implement Can also access the chat memory and the augmentation results OK Failure

Slide 97

Slide 97 text

@edeandrea Implement an output guardrail @ApplicationScoped public class UppercaseOutputGuardrail implements OutputGuardrail { @Override public OutputGuardrailResult validate(OutputGuardrailParams params ) { System.out.println("response is: " + params.responseFromLLM().text() + " / " + params.responseFromLLM().text().toUpperCase()); var message = params.responseFromLLM().text(); var isAllUppercase = message.chars().filter(Character::isLetter).allMatch(Character::isUpperCase); return isAllUppercase ? success() : reprompt( "The output must be in uppercase." , "Please provide the output in uppercase." ); } } CDI beans Interface to implement Can also access the chat memory and the augmentation results OK Reprompt

Slide 98

Slide 98 text

@edeandrea Declaring guardrails @RegisterAiService public interface Assistant { @InputGuardrails(UppercaseInputGuardrail .class) @OutputGuardrails(UppercaseOutputGuardrail .class) String chat(String userMessage ); } Both can receive multiple values

Slide 99

Slide 99 text

@edeandrea Testing guardrails @QuarkusTest class UppercaseOutputGuardrailTests { @Inject UppercaseOutputGuardrail uppercaseOutputGuardrail; @Test void success() { var params = OutputGuardrailParams. from(AiMessage.from("THIS IS ALL UPPERCASE" )); GuardrailAssertions. assertThat(uppercaseOutputGuardrail.validate(params)) .isSuccessful(); } @ParameterizedTest @ValueSource (strings = { "EVERYTHING IS UPPERCASE EXCEPT FOR oNE CHARACTER" , "this is all lowercase" }) void guardrailReprompt(String output) { var params = OutputGuardrailParams. from(AiMessage.from(output)); GuardrailAssertions. assertThat(uppercaseOutputGuardrail.validate(params)) .hasResult(Result. FATAL) .hasSingleFailureWithMessageAndReprompt( "The output must be in uppercase." , "Please provide the output in uppercase." ); } } https://docs.quarkiverse.io/quarkus-langchain4j/dev/guardrails.html#_unit_testing

Slide 100

Slide 100 text

@edeandrea Images

Slide 101

Slide 101 text

@edeandrea Process or Generate images Image Model - Image Models are specialized for … Images - Can generate images from text - Can process images from input (like the OCR demo) - Chat Model: GPT4-o | Image Model: Dall-e - Important: Not every model serving provider provides image support (as it needs specialized models)

Slide 102

Slide 102 text

@edeandrea Processing picture from AI Services @RegisterAiService @ApplicationScoped public interface ImageDescriber { @UserMessage(""" Describe the given message. """) String describe(@ImageUrl Image image); } Indicate to the model to use the image Can be String, URL, URI, or Image

Slide 103

Slide 103 text

@edeandrea Using Image Model to generate pictures @Inject ImageModel model; @Override public void run(String... args) throws IOException { var prompt = "Generate a picture of a rabbit software developers coming to Devoxx" ; var response = model.generate(prompt); System.out.println(response.content().url()); } Image Model (can also be created with a builder) Response quarkus.langchain4j.openai.timeout =1m quarkus.langchain4j.openai.image-model.size =1024x1024 quarkus.langchain4j.openai.image-model.quality =standard quarkus.langchain4j.openai.image-model.style =vivid quarkus.langchain4j.openai.image-model.persist =true Print the persisted image

Slide 104

Slide 104 text

@edeandrea Generating images from AI Services @RegisterAiService @ApplicationScoped public interface ImageGenerator { Image generate(String userMessage ); } Indicate to use the image model to generate the picture var prompt = "Generate a picture of a rabbit going to Devoxx. The rabbit should be wearing a Quarkus tee-shirt."; var response = generator.generate(prompt); var file = Paths.get("rabbit-at-devoxx.jpg"); Files.copy(response.url().toURL().openStream(), file, StandardCopyOption.REPLACE_EXISTING);

Slide 105

Slide 105 text

@edeandrea The almost-all-in-one demo

Slide 106

Slide 106 text

@edeandrea The almost-all-in-one demo - React - Quarkus WebSockets.NEXT - Quarkus Quinoa - Guardrails - RAG - Ingest data from filesystem - Tools - Update database - Send email - Observability - OpenTelemetry - Auditing

Slide 107

Slide 107 text

@edeandrea The almost-all-in-one demo Chat Bot Web Socket Claim AI Assistant Claim Status Notification Tool invocation Generate Email AI Assistant Output Guardrails Politeness AI Assistant AI replacing humans AI replacing software Code I write Is this code? Legend RAG Retrieval Input Guardrails https://github.com/edeandrea/non-deterministic-no-problem

Slide 108

Slide 108 text

@edeandrea Conclusion

Slide 109

Slide 109 text

@edeandrea What did we see? How to Build AI-Infused applications in Java https://docs.quarkiverse.io/ quarkus-langchain4j/dev https://docs.langchain4j.dev Code Slides Langchain4J Quarkus Chat Models RAG PROMPT MESSAGES AI SERVICE MEMORY CONTEXT TOOLS FUNCTION CALLING GUARDRAILS IMAGE MODELS OBSERVABILITY audit TRACING agent https://github.com/cescoffier/langchain4j-deep-dive https://speakerdeck.com/edeandrea/java-meets-ai-build-llm-powered-apps-with-langchain4j

Slide 110

Slide 110 text

@edeandrea

Slide 111

Slide 111 text

@edeandrea @edeandrea Thank you!