Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Devoxx BE - Quarkus Langchain4J

Devoxx BE - Quarkus Langchain4J

Clement Escoffier

October 10, 2024
Tweet

More Decks by Clement Escoffier

Other Decks in Programming

Transcript

  1. Crafting Intelligent Applications with Quarkus Georgios Andrianakis, Principal Software Engineer,

    Red Hat Clement Escoffier, Distinguished Engineer, Red Hat @geoand86 @edeandrea @clementplop
  2. What are Large Language Models (LLMs)? Neural Networks • Transformer

    based • Recognize, Predict, and Generate text/image/sound/video content • Trained on a VERY large corpuses of media • Deduce the statistical relationships between tokens • Can be fine-tuned A LLM predicts the next token based on its training data and statistical deduction
  3. The L of LLM means Large LLama 3: - 70B

    parameters - Trained on 15000B of tokens - 4.7Gb on disk Granite: - 34B parameters - Trained on 3500B of tokens - 3.8 Gb of RAM, 4.8Gb on disk More on: An idea of the size LLM Very LLM SLM Local or on-Prem (vLLM, Ollama, Jllama, Podman, InstructLab,...)
  4. Model and Model Serving Model Model Serving - Run the

    model - CPU / GPU - Expose an API - REST - gRPC GPUs
  5. Prompt and Prompt Engineering Model Input (Prompt) Output Input: -

    Prompt (text) - Instructions to give to the model - Taming a model is hard Output: - Depends on the modality of the model
  6. Application Model AI-infused application |ˌeɪˌaɪ ˈɪnˌfjuːzd ˌæplɪˈkeɪʃən| noun (Plural AI-Infused

    applications) A software program enhanced with artificial intelligence capabilities, utilizing AI models to implement intelligent features and functionalities.
  7. Using models to build apps on top Train Evaluate Deploy

    Collect Evaluate Curate Analyze Data ML Dev Ops Release Deploy Operate Monitor Plan Code Build Test APIs and abstractions
  8. Using models to build apps on top Train Evaluate Deploy

    Collect Evaluate Curate Analyze Data ML Dev Ops Release Deploy Operate Monitor Plan Code Build Test Need some clients and toolkits
  9. The API war Model Serving KServe / NVidia (Tensor In

    - Tensor Out) Unusable by application developers Hugging Face Inference Endpoint OpenAI HTTP API LLama HTTP API Chat, Image Models }
  10. The need for abstractions: Tokens Number of tokens - Depends

    on the model and model serving (provider) - Tokens are not words Context size is not in terms of messages, but in number of tokens This_talk_is_really_ boring._Hopefully,_it_will _be_over_soon. [2500, 838, 2082, 15224, 3067, 2146, 1535, 7443, 2697, 127345, 46431, 278, 3567, 492, 40729, 34788, 62, 84908, 13] https://platform.openai.com/tokenizer
  11. LangChain4j https://github.com/langchain4j/langchain4j • Toolkit to build AI-Infused Java applications ◦

    Provides integration with many LLM/SML providers ◦ Provides building blocks for the most common patterns (RAG, Function calling…) ◦ Abstractions to manipulate prompts, messages, memory, tokens,images… ◦ Integrate a large variety of vector stores and document loaders
  12. LangChain4j https://github.com/langchain4j/langchain4j AI Service Loaders Splitters Vector Store Embedding Models

    Language Models Image Models Prompt Function calling Memory Output Parsers Building blocks RAG
  13. What’s the difference between these? Application Database Application Service CRUD

    application Microservice Application Model AI-Infused application
  14. What’s the difference between these? Application Database Application Service CRUD

    application Microservice Application Model AI-Infused application Integration Points
  15. What’s the difference between these? Application Database Application Service CRUD

    application Microservice Application Model AI-Infused application Integration Points Observability (metrics, tracing, auditing) Fault-Tolerance (timeout, circuit-breaker, non-blocking, fallbacks…)
  16. Quarkus LangChain4j https://docs.quarkiverse.io/quarkus-langchain4j LangChain4j Quarkus LangChain4j Application LLMs Vector stores

    Embedding Models - Declarative clients - CDI integration - Observability (Otel, Prometheus) - Auditing - Resilience - RAG building blocks - Agentic architecture - Testability (mocks) - Developer Joy
  17. AI-infused application with Quarkus noun (Plural AI-Infused applications with Quarkus)

    A Quarkus application enhanced with artificial intelligence capabilities, utilizing AI models to implement intelligent features and functionalities. The interaction between the application and the model implements the stability patterns and is designed for production.
  18. What is Retrieved Augmented Generation? RAG is a technique to

    augment the context with your own data: 1) Ingestion - Split documents into segments - Compute a vectorized representation (embedding) of these segments - Store them into a vector database (alongside the segment) 2) Query - Find the relevant text segments from the database - Extend the prompt with these segments - Invoke the model
  19. What is Retrieved Augmented Generation? > Tell me more about

    your retirement account. Answer using the following information: The "Retirement Money Market" saving account offers individual retirement account holders tax advantages and diversification. The is no monthly maintenance fee. The minimum deposit to open an account is $100, or a 25$ automatic monthly deposit. With the "Elite Money Market Account", you will enjoy all the benefits of a traditional checking account, but with tiered interest rates that pay more for higher balances than a standard savings account. Plus you can access your funds at any time unlike with certificates of deposit that require your money tyo be untouched for a set timeframe. <~~ My prompt <~~ Augmentation <~~ Text segments
  20. The limits of the RAG patterns How to split the

    documents? - Keep the hierarchy? - What size? Overmatching - Too many documents are relevant - Limit the number of documents may lead to partial knowledge - => incomplete or inaccurate answers Undermatching - Not enough relevant data - => Hallucination guaranteed
  21. Agent vs. RAG vs. Fine-Tuning Agents: - Invoke local function,

    can retrieve data - Can use fresh and processed data RAG: - Query and attach segments from your document - Can use freshly ingested data Fine-Tuning: - Add the knowledge into your own model - May be out of date, but can be more accurate It’s not an OR. It’s an AND.
  22. Agent and Tools A tool is a function that the

    model can call: - Tools are parts of CDI bean - Tools are defined and described using the @Tool Prompt (Context) Extend the context with tool descriptions Invoke the model The model asks for a tool invocation (name + parameters) The tool is invoked and the result sent to the model The model computes the response using the tool result Response
  23. Using tools with Quarkus LangChain4j @RegisterAiService interface Assistant { @ToolBox(Calculator.class)

    String chat(String userMessage ); } @ApplicationScoped static class Calculator { @Tool("Calculates the length of a string" ) int stringLength(String s) { return s.length(); } } Class of the bean declaring tools Declare an tool method (description optional) Must be a bean (singleton and dependant supported) Tools can be listed in the `tools` attribute
  24. Giving access to database (Quarkus Panache) @ApplicationScoped public class BookingRepository

    implements PanacheRepository<Booking> { @Tool("Cancel a booking" ) @Transactional public void cancelBooking(long bookingId, String customerFirstName , String customerLastName ) { var booking = getBookingDetails( bookingId, customerFirstName, customerLastName ); delete(booking); } @Tool("List booking for a customer" ) public List<Booking> listBookingsForCustomer (String customerName , String customerSurname ) { var found = Customer.find("firstName = ?1 and lastName = ?2", customerName , customerSurname).singleResultOptional(); return list("customer", found.get()); } }
  25. Risks • Things can go wrong quickly • Risk of

    prompt injection ◦ Access can be protected in Quarkus • Audit is very important to check the parameters • Distinction between read and write beans Application
  26. Guardrails - Functions used to validate the input and output

    of the model - Detect invalid input - Detect prompt injection - Detect hallucination - Chain of guardrails - Sequential - Stop at first failure Quarkus LangChain4j only (for now)
  27. Retry and Reprompt Output guardrails can have 4 different outcomes:

    - Success - the response is passed to the caller or next guardrail - Fatal - we stop and throw an exception - Retry - we call the model again with the same context (we never know ;-) - Reprompt - we call the model again with another message in the model indicating how to fix the response
  28. Implement an input guardrail @ApplicationScoped public class UppercaseInputGuardrail implements InputGuardrail

    { @Override public InputGuardrailResult validate(UserMessage userMessage ) { var message = userMessage.singleText(); var isAllUppercase = message.chars().filter(Character::isLetter) .allMatch( Character::isUpperCase); if (isAllUppercase) { return success(); } else { return failure("The input must be in uppercase." ); } } } CDI beans Interface to implement Can also access the chat memory and the augmentation results OK Failure
  29. Implement an output guardrail @ApplicationScoped public class UppercaseOutputGuardrail implements OutputGuardrail

    { @Override public OutputGuardrailResult validate(OutputGuardrailParams params ) { System.out.println("response is: " + params.responseFromLLM().text() + " / " + params.responseFromLLM().text().toUpperCase()); var message = params.responseFromLLM().text(); var isAllUppercase = message.chars().filter(Character::isLetter).allMatch(Character::isUpperCase); if (isAllUppercase) { return success(); } else { return reprompt("The output must be in uppercase." , "Please provide the output in uppercase." ); } } CDI beans Interface to implement Can also access the chat memory and the augmentation results OK Reprompt
  30. Quarkus Langchain4J Application Component AI Service - Define the API

    (Interface) - Configure the prompt for each method - Configure the tools, memory… Chat Model Tools Memory Retriever Audit Moderation Model (RAG) (Observability) (Agent) Inject and invoke (Manage the context using CDI scopes)
  31. What did we see? How to Build AI-Infused applications with

    Quarkus https://docs.quarkiverse.io/ quarkus-langchain4j Code Slides Langchain4J Quarkus Chat Models RAG PROMPT MESSAGES AI SERVICE MEMORY CONTEXT TOOLS FUNCTION CALLING GUARDRAILS IMAGE MODELS OBSERVABILITY audit TRACING agent
  32. • Developing Cloud-Native Java AI applications with DJL and LangChain4j

    - Monday, 09:30, Room 9 • Project Leyden & Quarkus - Monday, 12:35, Room 6 • Squeezing Performance out of Quarkus - Monday, 16:50, BOF 2 • jbang - Unleash the power of Java - Monday, 18:20, Room 8 • Java meets AI: Build LLM-Powered Apps with LangChain4j - Tuesday, 09:30, Room 9 • Create AI-Infused Apps with LangChain4j: Insights from the Quarkus Developers - Tuesday, 13:30, BOF 1 • Crafting intelligent GitHub Bots - Wednesday, 12:00, Room 4 • Pushing LLMs over the Edge: Exploring the Limits of the Possible - Wednesday, 16:40, Room 6 • Quarkus Community BOF - Wednesday, 19:00, BOF 2 • Welcome to the AI Jungle! Now what? - Thursday, 11:50, Room 3 • Panel Discussion: LangChain4j, a year later. - Thursday, 11:50, Room 10 • Crafting Intelligent Applications with Quarkus/LangChain4j - Thursday, 12:50, Room 5 • Introduction to Quarkus Security - Thursday, 15:00, Room 9 • Zero Waste, Radical Magic, and Italian Graft – Quarkus Efficiency Secrets - Thursday, 17:40, Room 6 Other Quarkus & LangChain4j sessions
  33. Other Quarkus & LangChain4j sessions • Developing Cloud-Native Java AI

    applications with DJL and LangChain4j - Monday, 09:30, Room 9 • Project Leyden & Quarkus - Monday, 12:35, Room 6 • Squeezing Performance out of Quarkus - Monday, 16:50, BOF 2 • jbang - Unleash the power of Java - Monday, 18:20, Room 8 • Java meets AI: Build LLM-Powered Apps with LangChain4j - Tuesday, 09:30, Room 9 • Create AI-Infused Apps with LangChain4j: Insights from the Quarkus Developers - Tuesday, 13:30, BOF 1 • Crafting intelligent GitHub Bots - Wednesday, 12:00, Room 4 • Pushing LLMs over the Edge: Exploring the Limits of the Possible - Wednesday, 16:40, Room 6 • Quarkus Community BOF - Wednesday, 19:00, BOF 2 • Panel Discussion: LangChain4j, a year later. - Thursday, 11:50, Room 10 • Crafting Intelligent Applications with Quarkus/LangChain4j - Thursday, 12:50, Room 5 • Introduction to Quarkus Security - Thursday, 15:00, Room 9 • Zero Waste, Radical Magic, and Italian Graft – Quarkus Efficiency Secrets - Thursday, 17:40, Room 6