Spring AI vs LangChain4J (meetup in Paris on 05/12/2024)

Slide 1

Slide 1 text

LangChain4j vs Spring AI Julien Dubois / Michael Isvy

Slide 2

Slide 2 text

Who we are ● Julien Dubois ○ DevRel manager at Microsoft, JHipster creator ○ Second contributor to the LangChain4j project ● Michael Isvy ○ VP of Engineering at cynapse.ai (Computer Vision) in Singapore ○ 17 years of Spring, 2 years of AI

Slide 3

Slide 3 text

Agenda ● The AI Space ● AI in Java: myth or reality? ● Prompting with LangChain4J and Spring AI ● RAG and Vector Databases

Slide 4

Slide 4 text

The AI space M

Slide 5

Slide 5 text

Artificial Intelligence: which Way? Conventional AI Generative AI LLM, ChatGPT, Claude, Gemini, Mistral, Ollama. llama… Programming: mostly API calls Based on custom-trained models Programming: requires AI engineers

Slide 6

Slide 6 text

Conventional AI example: Licence Plate Recognition ● Find a base model online Typically on Github or HuggingFace ● Evaluate the model Identify gaps (example: doesn’t work with Singapore truck license plates) ● Prepare a fine-tuning dataset ● Spend 3-4 days training the model

Slide 7

Slide 7 text

Generative AI: usually an API call! ● Example: API call to ChatGPT #!/bin/bash API_KEY="sk-proj-7devtvnBIsYXVHJuHBQAT3BlbkFJNBB4uz8Iog5F2y" curl https://api.openai.com/v1/chat/completions \ -H "Content-Type: application/json" -H "Authorization: Bearer $API_KEY" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "user", "content": "Tell me a joke." } ] }' Linux/OSX In GenAI, models are much more complex. But most of the time you don’t need to build them

Slide 8

Slide 8 text

Quiz ● In the below example, there is something you should never do. What is it? #!/bin/bash API_KEY="sk-proj-7devtvnBIsYXVHJuHBQAT3BlbkFJNBB4uz8Iog5F2y" curl https://api.openai.com/v1/chat/completions \ -H "Content-Type: application/json" -H "Authorization: Bearer $API_KEY" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "user", "content": "Tell me a joke." } ] }' Linux/OSX

Slide 9

Slide 9 text

AI in Java: Myth or reality?

Slide 10

Slide 10 text

Conventional AI with Java? ● Java has a tiny ecosystem in Conventional AI ● NVIDIA rules the game ○ Native NVIDIA libraries: CUDA, TensorRT, DeepStream… ○ Work natively with Python and/or C++ only

Slide 11

Slide 11 text

Generative AI with Java ● Java is getting a lot of momentum for GenAI ○ GenAI calls: mostly about calling API and using (Vector) databases ○ Java is good at calling APIs and databases! ● Similar shift as database access ○ Java developers aren’t great at developing a database ○ We’re good at accessing it (JDBC) ○ We’re exceptional at working with advanced tools to use it effectively (Hibernate) J

Slide 12

Slide 12 text

TL;DR (Too Long; Didn’t Read) LangChain4J Spring AI Sponsor Community + Red Hat @IBM Spring team @Broadcom Well-designed API? Good Great Documentation Good Great Community Support Great Great Support for models All models supported no Google Gemini yet RAG and Vector db support Great (lots of options) Good

Slide 13

Slide 13 text

In the Cloud On premise / on your laptop The Generative AI landscape OpenAI ChatGPT Mistral AI Mistral Anthropic Claude Google Gemini Spin off from OpenAI Ollama Llama3.1 Mixtral LLava tinyllama … Choose your local model J

Slide 14

Slide 14 text

Spring AI

Slide 15

Slide 15 text

The Spring AI ecosystem ● Current version: Spring AI 1.0.0-M4 ○ Not in final release version yet! ● Based on Spring and Spring Boot

Slide 16

Slide 16 text

Generating a Spring AI project ● Use start.spring.io

Slide 17

Slide 17 text

● Best practice: store the API key is an env variable application.properties spring.application.name=demo-pring-ai spring.ai.openai.api-key=${OPENAI_API_KEY} spring.ai.openai.chat.options.model=gpt-4o application.properties IntelliJ Community Edition

Slide 18

Slide 18 text

Making a call @Service public class MusicService { private final ChatClient chatClient; public MusicService(ChatClient.Builder builder) { this.chatClient = builder.build(); } public String findBestSongs() { return this.chatClient.prompt() .user("which were the best songs in 1993?") .call().content(); } } ● Use ChatClient’s fluent API Generic (does not depend on the LLM implementation) Demo

Slide 19

Slide 19 text

Calling from a JUnit test ● Run configuration should include the environment variable @SpringBootTest class MusicServiceTest { @Autowired private MusicService musicService; private static final Logger logger = LoggerFactory.getLogger(MusicService.class); @Test void shouldFindBestSongs() { String response = this.musicService.findBestSongs(); logger.info(response); } } Demo

Slide 20

Slide 20 text

Mapping a prompt to an entity @Service public class ChatService { private final ChatClient chatClient; public ChatService(ChatClient.Builder builder) { this.chatClient = builder.build(); } public ActorFilms generateResponse() { return this.chatClient.prompt() .user("Generate the 10 most popular movies starring Bruce Willis") .call() .entity(ActorFilms.class); } } public record ActorFilms(String actor, List movies) {} works with Java records Demo

Slide 21

Slide 21 text

JSON schema under the hood public ActorFilms generateResponse() { return this.chatClient.prompt() .user("Generate the 10 most popular movies starring Bruce Willis") .call() .entity(ActorFilms.class); } public record ActorFilms(String actor, List movies) {} Do not include any explanations, only provide an RFC8259 compliant JSON response … { \"$schema\" : \"https://json-schema.org/draft/2020-12/schema\", \"type\" : \"object\", \"properties\" : { \"actor\" : { \"type\" : \"string\" }, \"movies\" : { \"type\" : \"array\", \"items\" : { \"type\" : \"string\" } } }

Slide 22

Slide 22 text

public String recommendMovie(String topic) { return this.chatClient.prompt() .user( userSpec -> userSpec.text("Can you recommend a movie about {topic}") .param("topic", topic)) .call() .content(); } Using a prompt with Parameters var response = this.movieService.recommendMovie("computers"); this.logger.info(response); Certainly! One highly regarded film that delves into the world of computers is "The Imitation Game" (2014). This biographical drama stars Benedict Cumberbatch as Alan Turing, a pioneering computer scientist and mathematician.

Slide 23

Slide 23 text

Working with images @Service class ImageService { @Value("classpath:images/scientist.jpg") private Resource imageResourceScientist; //… public String describeScientist() { return this.chatClient.prompt() .user( userSpec -> userSpec.text("can you describe this person? And what is written on top of his head?") .media(MimeTypeUtils.IMAGE_PNG, this.imageResourceScientist) ).call().content(); } } Import image file as a Resource Optical Character Recognition Demo

Slide 24

Slide 24 text

Spring AI - Fluent API calls return this.chatClient .prompt() .user("which were the best songs in 1993?") .call().content(); ● Simple call Fluent API call General trend in the Spring team: Client over Template See: RestClient over RestTemplate, JdbcClient over JdbcTemplate etc.

Slide 25

Slide 25 text

Spring AI - unified API return this.chatClient .prompt() .user("which were the best songs in 1993?") .call() .content(); ● Simple call ● Image return this.chatClient .prompt() .user( userSpec -> userSpec.text("what’s the weather like?") .media(MimeTypeUtils.IMAGE_PNG, this.image)) .call().content(); return this.chatClient .prompt() .user("which were the best songs in 1993?") .call() .content(); ● Using a Vector DB and RAG Vector DB: requires specific configuration

Slide 26

Slide 26 text

LangChain4j

Slide 27

Slide 27 text

Introduction ● Current version: 0.36.2 ○ Not in final release version yet! ● No need for a framework to use it ● Great integration with Quarkus, strong relationship with Red Hat ● Spring Boot starter integration

Slide 28

Slide 28 text

Full demo available All examples (and more!) are available on GitHub at: https://github.com/jdubo is/jdubois-langchain4j-d emo

Slide 29

Slide 29 text

Making a call

Slide 30

Slide 30 text

Configuration using a Spring Bean This is a manual configuration, not using the Spring Boot starter

Slide 31

Slide 31 text

Similar code for creating an image

Slide 32

Slide 32 text

Using local models with Ollama M

Slide 33

Slide 33 text

What is Ollama ● Desktop software (Linux, Windows, OSX) ● Allows to run LLM models locally ● Exposes an API so models can be queried ● Ollama is different from Llama ○ Ollama is a platform that runs local models ○ llama is a series of open source models from Meta

Slide 34

Slide 34 text

Ollama is inspired by Docker ● Ollama has been founded by 2 former Docker employees ● Aims to run models in the way Docker runs containers ● Uses Modelfile (similar to Dockerfile) ● Also written in Go

Slide 35

Slide 35 text

Ollama installation ● Download from https://ollama.com/download ● Once installed, Ollama runs on port 11434:

Slide 36

Slide 36 text

Ollama basic commands ● List models michael@macbook-air % ollama list NAME ID SIZE MODIFIED tinyllama:latest 2644915ede 0.6 GB 5 days ago llama3.1:latest f66fc8dc39ea 4.7 GB 5 days ago mistral:latest f974a74358d6 4.1 GB 2 weeks ago llama3.1:8b 91ab477bec9d 4.7 GB 3 weeks ago michaelisvy2@macbook-air documentation % ● Install a model michael@macbook-air % ollama pull tinyllama pulling manifest pulling 2af3b81862c6... 100% ▕█████████████████████████████▏ 637 MB verifying sha256 digest writing manifest success

Slide 37

Slide 37 text

Ollama - running models ● Language model ● Vision model michael@macbook-air % ollama run tinyllama >>> tell me a joke Sure, here's a joke for you: Q: What does an AI robot have in common with a smartphone? A: It's always looking at its screen. michael@macbook-air % ollama run llava >>> what is the weather like in Singapore? /Users/michael//spring-ai/lab-files/lab-02/singapore-weather.png Added image '/Users/michael/singapore-weather.png' The image displays a weather forecast for Singapore on a particular day. According to the forecast, it will be 34 degrees Celsius (93.2 degrees Fahrenheit) with humidity at 76%.

Slide 38

Slide 38 text

Retrieval- Augmented Generation M

Slide 39

Slide 39 text

Quiz ● Using Google Gemini’s prompt context window, up to how many Harry Potter books can I fit at most? (7 books, 600 pages in average) ○ 10% of a book ○ 2 books ○ All of them Context window Cost for providing such a large context: between $0.20 and $1 dependending on LLM provider

Slide 40

Slide 40 text

RAG / Vector Databases ● Retrieval Augmented Generation ○ Queries to the LLM should have context ● Vector Database ○ Subset of RAG ○ Context is broken into multiple chunks and stored in a Vector database

Slide 41

Slide 41 text

Vector databases

Slide 42

Slide 42 text

Solution: Vector databases ● Split your data into chunks, and encode each chunk into numbers that the ML model can understand {0.345, 0.465, 0.856, …, 0.1543} {0.545, 0.665, 0.056, …, 0.3543} {0.645, 0.765, 0.156, …, 0.4543} “My house is black” “My garden is big” “My dog is playful” AI Model Each Vector is an array of 1,536 numbers

Slide 43

Slide 43 text

Definition of a Vector ● A Vector is just a type of data ○ Typically an array of 1,536 decimal numbers ○ Value between -1 and 1 CREATE TABLE paragraph ( id SERIAL PRIMARY KEY, paragraph_text TEXT, vector VECTOR(1536) ); example with pgvector {-0.345, 0.465, 0.856, …, 0.1543} Each Vector is an array of 1,536 numbers “Vector” and “Embedding” are similar concepts. For simplicity, we use the word “Vector” whenever possible in this course

Slide 44

Slide 44 text

How are Vectors created? ● Vectors typically represent the output generated by Machine Learning models {0.345, 0.465, 0.856, 0.1543} {0.545, 0.665, 0.056, 0.3543} {0.645, 0.765, 0.156, 0.4543} “My house is black” “My garden is big” “My dog is playful” AI Model The above example is simplified and uses random numbers It is based on text AI models. Vectors may also be used with Computer Vision AI models or audio-based AI models

Slide 45

Slide 45 text

How to search vectors: Similarity Search ○ Selects the closest Vector(s) ○ OpenAI recommends using cosine similarity {0.345, -0.465, 0.856, 0.1543} {-0.436, 0.578, 0.935, 0.2193} {-0.445, 0.565, 0.956, 0.2543} {0.545, 0.665, 0.056, 0.3543} {0.645, 0.765, 0.156, -0.4543} {0.745, 0.865, 0.256, 0.5543} {0.845, 0.965, -0.356, 0.6543} SELECT id, name, vector <=> '[0.436, 0.578, 0.935, 0.2193]' AS distance FROM items ORDER BY distance LIMIT 10; sample SQL query with pgvector

Slide 46

Slide 46 text

RAG Demo J

Slide 47

Slide 47 text

RAG pattern: ingestion Testing Document splitters: -https://langchain-text-split ter.streamlit.app

Slide 48

Slide 48 text

RAG pattern: retrieval

Slide 49

Slide 49 text

Putting it all together ● LangChain4j’s EasyRAG makes it easy to use the RAG pattern ○ Sensible defaults ○ Tooling ○ Easy to extend ● Best practices ○ Ingestion: clean up the text, test with different models and splitters ○ Retrieval: limit the prompt size ($$$), improve the prompt, test different models

Slide 50

Slide 50 text

Conclusion

Slide 51

Slide 51 text

Cost of 3 months learning Spring AI / LangChain4J ● OpenAI ○ $1.36 ● Ollama ○ $0.00 https://platform.openai.com/usage M

Slide 52

Slide 52 text

Our favorite Resources online ● https://www.youtube.com/@DanVega (Dan Vega) ● https://www.youtube.com/@springinaction (Craig Walls) ● RAG from dumb implementation to serious results (Guillaume Laforge) ● https://github.com/ThomasVitale/llm-apps-java-spring-ai/ (best Spring AI samples online!) ● https://course.fast.ai/ Practical deep learning ● Our demos: ● https://github.com/michaelisvy/demo-spring-ai ● https://github.com/michaelisvy/demo-langchain4j ● https://github.com/jdubois/jdubois-langchain4j-demo 52

Slide 53

Slide 53 text

Thank you!