Slide 1

Slide 1 text

@shelajev @edeandrea Eric Deandrea, Java Champion Oleg Šelajev, Java Champion Non-deterministic? No problem! You can test it!

Slide 2

Slide 2 text

@shelajev @edeandrea ● Java Champion ● 25+ years software development experience ● Contributor to Open Source projects Quarkus Spring Boot, Spring Framework, Spring Security LangChain4j (& Quarkus LangChain4j) Wiremock Microcks ● Boston Java Users ACM Chapter Board Member ● Published Author ● Cat lover ● Black belt in martial arts About Us

Slide 3

Slide 3 text

@shelajev @edeandrea ● Showcase & explain Quarkus, how it enables modern Java development & the Kubernetes-native experience ● Introduce familiar Spring concepts, constructs, & conventions and how they map to Quarkus ● Equivalent code examples between Quarkus and Spring as well as emphasis on testing patterns & practices 3 https://red.ht/quarkus-spring-devs

Slide 4

Slide 4 text

@shelajev @edeandrea ● Surprisingly, also, a Java Champion ● 18+ years software development experience ● ~11 years Developer Advocate ● Loves to stare at the code of Open Source projects Quarkus Spring Boot LangChain4j Microcks Testcontainers (sometimes contributes bugs too!) About Us

Slide 5

Slide 5 text

@shelajev @edeandrea What are you hoping to learn here? What are you hoping to learn here? What are you going to leave with?

Slide 6

Slide 6 text

@shelajev @edeandrea What is AI right now? Neural Networks ● Recognize, Predict, and Generate text ● Trained on a VERY large corpuses of text ● Deduce the statistical relationships between tokens ● Can be fine-tuned ● Different models have varying capabilities An LLM predicts the next token based on its training data and statistical deduction

Slide 7

Slide 7 text

@shelajev @edeandrea The L of LLM == Large LLama 3.3: - 70B parameters - Trained on > 15T publicly-available & > 25M synthetically-generated tokens - 128K token window - 43 Gb on disk DeepSeek R1: - 671B parameters - Trained on > 14.8T tokens - 32K token window - 404Gb on disk https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJwbB-ooYvzhCHaHcNUiA0_hY/edit?usp=sharing

Slide 8

Slide 8 text

@shelajev @edeandrea The rise of AI developer AI developers

Slide 9

Slide 9 text

@shelajev @edeandrea

Slide 10

Slide 10 text

@shelajev @edeandrea AI replacing humans

Slide 11

Slide 11 text

@shelajev @edeandrea AI replacing software

Slide 12

Slide 12 text

@shelajev @edeandrea https://www.youtube.com/watch?v=y57wwucbXR8

Slide 13

Slide 13 text

@shelajev @edeandrea https://github.com/edeandrea/non-deterministic-no-problem non-deterministic-no-problem

Slide 14

Slide 14 text

@shelajev @edeandrea Chat Bot Web Socket Claim AI Assistant Claim Status Notification Tool invocation Generate Email AI Assistant Output Guardrails Politeness AI Assistant AI replacing humans AI replacing software https://github.com/edeandrea/non-deterministic-no-problem non-deterministic-no-problem Code I write Is this code? Legend RAG Retrieval Input Guardrails

Slide 15

Slide 15 text

@shelajev @edeandrea How does your DevOps evolve when you infuse your applications with AI?

Slide 16

Slide 16 text

@shelajev @edeandrea

Slide 17

Slide 17 text

@shelajev @edeandrea DevOps Evolution Dev Ops Release Deploy Operate Monitor Plan Code Build Test Train Evaluate Deploy Collect Evaluate Curate Analyze Data ML

Slide 18

Slide 18 text

@shelajev @edeandrea

Slide 19

Slide 19 text

@shelajev @edeandrea @shelajev @edeandrea Vanilla AI

Slide 20

Slide 20 text

@shelajev @edeandrea Application Database Application Service CRUD application Microservice Application Model AI-Infused application What’s the difference between these?

Slide 21

Slide 21 text

@shelajev @edeandrea Application Database Application Service CRUD application Microservice Application Model AI-Infused application Integration Points What’s the difference between these?

Slide 22

Slide 22 text

@shelajev @edeandrea Testing AI Replacing Humans

Slide 23

Slide 23 text

@shelajev @edeandrea Testing AI Replacing Humans 23

Slide 24

Slide 24 text

@shelajev @edeandrea Rethink your approach

Slide 25

Slide 25 text

@shelajev @edeandrea Signal from tests: ❌ - stuff needs fixing ✅ - confident to release

Slide 26

Slide 26 text

@shelajev @edeandrea Signal from tests: - stuff needs fixing - confident to release Purpose of tests: ❌ - prevent breaking prod ✅ - continuously improve your app

Slide 27

Slide 27 text

@shelajev @edeandrea https://www.upworthy.com/prankster-tricks-a-gm-dealership-chatbot-to-sell-him-a-76000-chevy-tahoe-for-1-rp3 https://www.cbsnews.com/news/aircanada-chatbot-discount-customer https://www.bbc.com/news/technology-35902104 https://www.spiceworks.com/tech/artificial-intelligence/news/meta-blender-bot-3-controversy https://www.linkedin.com/posts/stephanjanssen_princoming-activity-7285987635628507136-9Ubw

Slide 28

Slide 28 text

@shelajev @edeandrea Testing Strategies

Slide 29

Slide 29 text

@shelajev @edeandrea Why don’t normal tests work? What do we need to do differently?

Slide 30

Slide 30 text

@shelajev @edeandrea This isn’t the answer!

Slide 31

Slide 31 text

@shelajev @edeandrea Application Database Application Service CRUD application Microservice Application Model AI-Infused application Integration Points Observability (metrics, tracing, auditing) Fault Tolerance (timeout, circuit-breaker, non-blocking, rate limiting, fallbacks, …) What’s the difference between these?

Slide 32

Slide 32 text

@shelajev @edeandrea https://library.wiremock.org/catalog/api/o/openai.com/openai-com https://mockgpt.wiremock.io https://docs.quarkiverse.io/quarkus-wiremock/dev

Slide 33

Slide 33 text

@shelajev @edeandrea https://www.trtworld.com/europe/swedish-recycling-so-successful-it-is-importing-rubbish-24491

Slide 34

Slide 34 text

@shelajev @edeandrea What happens when we do this?

Slide 35

Slide 35 text

@shelajev @edeandrea What happens when we do this?

Slide 36

Slide 36 text

@shelajev @edeandrea

Slide 37

Slide 37 text

@shelajev @edeandrea

Slide 38

Slide 38 text

@shelajev @edeandrea Guardrails

Slide 39

Slide 39 text

@shelajev @edeandrea Guardrails Prompt: Please return a JSON document in the following format: { “name: “String”, “countryOfOrigin”: “String”} Response: Here is your JSON: ```json { “name”: “Eric”, “countryOfOrigin”: “USA” } ``` 👿 😱 Just give me the JSON!! 😭

Slide 40

Slide 40 text

@shelajev @edeandrea Guardrails - Functions used to validate the input and output of the model - Detect invalid input or output - Detect prompt injection - Detect hallucination - Chain of guardrails - Sequential - Stop at first failure

Slide 41

Slide 41 text

@shelajev @edeandrea Retry and Reprompt Output guardrails can have 4 different outcomes: - Success - Response is passed to the caller or next guardrail - Fatal - Stop and throw an exception - Retry - Call the model again with the same context we never know ;-) - Reprompt - Call the model again with another message in the model indicating how to fix the response

Slide 42

Slide 42 text

@shelajev @edeandrea

Slide 43

Slide 43 text

@shelajev @edeandrea Observability

Slide 44

Slide 44 text

@shelajev @edeandrea Observability Collect metrics - Exposed as Prometheus - Track token usage & cost OpenTelemetry Tracing - Trace interactions with the LLM Auditing - Track of interactions with the LLM - Can be persisted - Implemented by the application code

Slide 45

Slide 45 text

@shelajev @edeandrea Practices

Slide 46

Slide 46 text

@shelajev @edeandrea

Slide 47

Slide 47 text

@shelajev @edeandrea GitHub Actions

Slide 48

Slide 48 text

@shelajev @edeandrea Testcontainers Module

Slide 49

Slide 49 text

@shelajev @edeandrea Quarkus DevService

Slide 50

Slide 50 text

@shelajev @edeandrea AI and CI name: build-and-test on: push: pull_request: jobs: jvm-build-test : runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Setup Java uses: actions/setup-java@v4 with: java-version: 21 distribution: temurin cache: maven - name: Build and test run: ./mvnw clean verify [INFO] ------------------------------------------------------- [INFO] T E S T S [INFO] ------------------------------------------------------- [INFO] Running org.ericoleg.ndnp.ai.guardrail.CompositeOutputGuardrailTests INFO [org.tes.DockerClientFactory] (build-6) Testcontainers version: 1.20.4 INFO [org.tes.ima.PullPolicy] (build-6) Image pull policy will be performed by: DefaultPullPolicy() INFO [tc.ollama/ollama:latest] (build-28) Pulling docker image: ollama/ollama:latest. Please be patient; this may take some time but only needs to be done once. INFO [tc.ollama/ollama:latest] (docker-java-stream--32075139) Pulling image layers: 1 pending, 3 downloaded, 3 extracted, (1 GB/? MB) INFO [tc.ollama/ollama:latest] (docker-java-stream--32075139) Pull complete. 4 layers, pulled in 27s (downloaded 1 GB at 55 MB/s) INFO [tc.ollama/ollama:latest] (build-28) Image ollama/ollama:latest pull took PT28.552217137S INFO [tc.ollama/ollama:latest] (build-28) Creating container for image: ollama/ollama:latest INFO [tc.ollama/ollama:latest] (build-28) Container ollama/ollama:latest is starting: f2e61ad1b3490bec2f69db44ee0bd946d543c703fd3f30a0c507ac0b9c5db9a1 INFO [tc.ollama/ollama:latest] (build-28) Container ollama/ollama:latest started in PT0.681272661S INFO [io.qua.lan.oll.dep.dev.OllamaDevServicesProcessor] (build-28) Dev Services for Ollama started. INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (build-6) Pulling model llama3.2 INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 0.01% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 27.39% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 60.84% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 94.33% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-2-Worker-0) Downloading llama3.2 - Progress: 99.43% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-4-Worker-0) Verifying and cleaning up INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (build-6) Pulling model snowflake-arctic-embed INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-5-Worker-0) Downloading snowflake-arctic-embed - Progress: 1.90% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-5-Worker-0) Downloading snowflake-arctic-embed - Progress: 94.85% INFO [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (HttpClient-5-Worker-0) Verifying and cleaning up INFO [io.qua.lan.eas.run.EasyRagIngestor] (main) Ingesting documents from path: src/main/resources/policies, path matcher = glob:**, recursive = true INFO [io.qua.lan.eas.run.EasyRagIngestor] (main) Ingested 1 files as 2 documents INFO [io.qua.lan.eas.run.EasyRagIngestor] (main) Writing embeddings to /home/runner/work/non-deterministic-no-problem/non-deterministic-no-problem/easy-rag-embeddings.json INFO [io.quarkus] (main) non-deterministic-no-problem 1.0 on JVM (powered by Quarkus 3.17.7) started in 68.656s. Listening on: http://0.0.0.0:8081 INFO [io.quarkus] (main) Profiles test,ollama activated. INFO [io.quarkus] (main) Installed features: [agroal, awt, cdi, config-yaml, hibernate-orm, hibernate-orm-panache, jdbc-h2, langchain4j, langchain4j-easy-rag, langchain4j-ollama, langchain4j-ollama-dev-service, langchain4j-openai, langchain4j-websockets-next, mailer, mailpit, micrometer, narayana-jta, opentelemetry, playwright, poi, quinoa, qute, rest, rest-client, rest-client-jackson, rest-jackson, smallrye-context-propagation, smallrye-health, vertx, websockets-next] ... [INFO] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 72.79 s -- in org.ericoleg.ndnp.ai.guardrail.CompositeOutputGuardrailTests [INFO] Running org.ericoleg.ndnp.resources.ClaimWebsocketChatBotTests ... INFO [io.qua.lan.eas.run.EasyRagRecorder] (main) Reading embeddings from /home/runner/work/non-deterministic-no-problem/non-deterministic-no-problem/easy-rag-embeddings.json ...

Slide 51

Slide 51 text

@shelajev @edeandrea Use production setup

Slide 52

Slide 52 text

@shelajev @edeandrea @shelajev @edeandrea Prompt Engineering

Slide 53

Slide 53 text

@shelajev @edeandrea Prompts: Configuration, Code or Data?

Slide 54

Slide 54 text

@shelajev @edeandrea Selection decisions are not application based

Slide 55

Slide 55 text

@shelajev @edeandrea Prompt Engineering and team topologies I said JSON!

Slide 56

Slide 56 text

@shelajev @edeandrea ● Like static analysis ○ Are we getting better or worse over time? ● Need to be able to monitor/track Systematic Eval: are you getting better or worse? https://docs.quarkiverse.io/quarkus-langchain4j/dev/testing.html

Slide 57

Slide 57 text

@shelajev @edeandrea @shelajev @edeandrea Takeaways

Slide 58

Slide 58 text

@shelajev @edeandrea ● Quarkus is awesome! Get the simple problems out of the way first. ● Don’t forget your craft: DevOps process is there to help, write tests, expect change and failure, deploy often. ● Local models are fun, but unless you’re an expert going with expensive, but powerful models is a good default rule of thumb. Eval later into using dumber models. ● Models are just like all other software, package into containers, run like everything else. Actual takeaways

Slide 59

Slide 59 text

@shelajev @edeandrea https://quarkus.io @quarkusio https://quarkusio.zulipchat.com @quarkus.io

Slide 60

Slide 60 text

@shelajev @edeandrea How do I develop with and use containers? How do I find and share container images? How do I build compliant container images? How do I make my image builds faster? How can I run my resource-heavy services? Streamline your development practice A suite of solutions supporting great developer experiences with enterprise control docker.com

Slide 61

Slide 61 text

@shelajev @edeandrea @shelajev @edeandrea Thank You! https://www.jfokus.se/rate/2070 Please rate the talk!