Did you really get better?

@shelajev @edeandrea Eric Deandrea, Sr. Principal Software Engineer Oleg Šelajev,
AI Developer Relations Did you really get better?

@shelajev @edeandrea • Java Champion • 27+ years software development
experience • Works on Open Source projects Quarkus LangChain4j (& Quarkus LangChain4j) Docking Java (Project lead) Spring Boot, Spring Framework, Spring Security Wiremock Testcontainers • Boston Java Users ACM Chapter Vice Chair • Published Author • Black belt in martial arts • Cat lover About Us

@shelajev @edeandrea • Showcase & explain Quarkus, how it enables
modern Java development & the Kubernetes-native experience • Introduce familiar Spring concepts, constructs, & conventions and how they map to Quarkus • Equivalent code examples between Quarkus and Spring as well as emphasis on testing patterns & practices 3 https://red.ht/quarkus-spring-devs

@shelajev @edeandrea • Also a Java Champion • Java developer
• ~12 years Developer Advocate • Loves to stare at Open Source projects • Allergic to cats About Us

@shelajev @edeandrea What are you hoping to learn here? What
are you hoping to learn here? What are you going to leave with?

@shelajev @edeandrea @shelajev @edeandrea Did we get better or worse
with this release? (& can we ﬁgure it out before we release?)

@shelajev @edeandrea

@shelajev @edeandrea Nature is healing… JFokus 2025 JFokus 2026

@shelajev @edeandrea What’s changed in the last year? • Standardization
◦ Or lack thereof (lots of competing standards)? • Distributed • Orchestrated • Agentic • Agents • Agentic Agents • Autonomous Agents • Autonomous Agentic Agents Smells like microservices?

@shelajev @edeandrea @shelajev @edeandrea “Types” of AI

@shelajev @edeandrea AI replacing humans

@shelajev @edeandrea AI replacing software

@shelajev @edeandrea How does your DevOps evolve when you infuse
your applications with AI?

@shelajev @edeandrea DevOps Evolution Dev Ops Release Deploy Operate Monitor
Plan Code Build Test Train Evaluate Deploy Collect Evaluate Curate Analyze Data ML

@shelajev @edeandrea https://github.com/edeandrea/non-deterministic-no-problem

@shelajev @edeandrea Chat Bot Web Socket Claim AI Assistant Claim
Status Notiﬁcation Tool invocation Generate Email AI Assistant Output Guardrails Politeness AI Assistant AI replacing humans AI replacing software https://github.com/edeandrea/non-deterministic-no-problem Code I write Voodoo magic Legend RAG Retrieval Input Guardrails Could be an agent? Could be an agent?

@shelajev @edeandrea AI: String -> String Agent: String -> IO
String For you, Haskell aﬁcionados:

@shelajev @edeandrea @shelajev @edeandrea Vanilla AI

@shelajev @edeandrea Application Database Application Service CRUD application Microservice Application
Model AI-Infused application What’s the difference between these?

Model AI-Infused application Integration Points What’s the difference between these?

@shelajev @edeandrea Testing AI Replacing Humans

@shelajev @edeandrea Testing AI Replacing Humans 25

@shelajev @edeandrea Rethink your approach to tests

@shelajev @edeandrea Signal from tests: ❌ - stuff needs ﬁxing
✅ - conﬁdent to release

@shelajev @edeandrea Signal from tests: - stuff needs ﬁxing -
conﬁdent to release Purpose of tests: ❌ - prevent breaking prod ✅ - continuously improve your app

@shelajev @edeandrea https://www.upworthy.com/prankster-tricks-a-gm-dealership-chatbot-to-sell-him-a-76000-chevy-tahoe-for-1-rp3 https://www.cbsnews.com/news/aircanada-chatbot-discount-customer https://www.bbc.com/news/technology-35902104 https://www.spiceworks.com/tech/artiﬁcial-intelligence/news/meta-blender-bot-3-controversy https://www.linkedin.com/posts/stephanjanssen_princoming-activity-7285987635628507136-9Ubw

@shelajev @edeandrea What does failure look like? What do we
need to do differently?

@shelajev @edeandrea This isn’t the answer!

Model AI-Infused application Integration Points Observability (metrics, tracing, logs, auditing) Fault Tolerance (timeout, bulkhead, circuit breaker, rate limiting, fallbacks, …) What’s the difference between these?

@shelajev @edeandrea https://library.wiremock.org/catalog/api/o/openai.com/openai-com https://mockgpt.wiremock.io https://docs.quarkiverse.io/quarkus-wiremock/dev

@shelajev @edeandrea What happens when we do this?

@shelajev @edeandrea What happens when we do this? Is this
an error?

@shelajev @edeandrea @shelajev @edeandrea end-to-end tests unit tests integration tests
low effort high realism high value

@shelajev @edeandrea @shelajev @edeandrea end-to-end tests unit tests integration tests
low effort high realism tests with application server test REST endpoints tests using AI

@shelajev @edeandrea Stupidity will always ﬁnd a way…

@shelajev @edeandrea Stupidity Prompt: Please return a JSON document in
the following format: { “name: “String”, “countryOfOrigin”: “String”} Response: Sure I’d love to give you some JSON! Here it is: ```json { “name”: “Eric”, “countryOfOrigin”: “USA” } ```

@shelajev @edeandrea Prompt Engineering and team topologies I said JSON!

@shelajev @edeandrea Guardrails - Out of the box in LangChain4j
& Quarkus! - Functions used to validate the input and output of the model - Detect invalid input or output - Detect prompt injection - Detect hallucination - Chain of guardrails - Sequential - Stop at ﬁrst failure

@shelajev @edeandrea Retry and Reprompt Output guardrails can have 4
different outcomes: - Success - Response is passed to the caller or next guardrail - Fatal - Stop and throw an exception - Retry - Call the model again with the same context we never know ;-) - Reprompt - Call the model again with another message in the model indicating how to ﬁx the response

@shelajev @edeandrea @shelajev @edeandrea

@shelajev @edeandrea Stupidity will always ﬁnd a way…

@shelajev @edeandrea Observability

@shelajev @edeandrea Observability Collect metrics - Exposed as Prometheus -
Track token usage & cost OpenTelemetry Tracing - Trace interactions with the LLM Auditing - Track of interactions with the LLM - Ability to replay & re-score interactions

@shelajev @edeandrea Observability

@shelajev @edeandrea @shelajev @edeandrea Continuous Evaluation

@shelajev @edeandrea Continuous Scoring Application doesn’t know Its being scored!

@shelajev @edeandrea Continuous Scoring

@shelajev @edeandrea Continuous Scoring @shelajev @edeandrea

@shelajev @edeandrea Continuous Scoring (interaction capture) @shelajev @edeandrea https://github.com/edeandrea/non-deterministic-no-problem/blob/main/parasol-app/src/main/java/ai/scoring/InteractionPublisher.java

@shelajev @edeandrea Continuous Scoring - Communication @shelajev @edeandrea https://github.com/edeandrea/non-deterministic-no-problem/blob/main/openapi/ai-interactions.yml

@shelajev @edeandrea Continuous Scoring - Client Communication @shelajev @edeandrea https://docs.quarkiverse.io/quarkus-openapi-generator/dev/client.html

@shelajev @edeandrea Continuous Scoring - Scorer Communication

@shelajev @edeandrea Continuous Scoring - Scorer Communication @shelajev @edeandrea

@shelajev @edeandrea Continuous Scoring @shelajev @edeandrea https://github.com/edeandrea/non-deterministic-no-problem/blob/main/ai-scorer/src/main/java/ai/scoring/scoring/InteractionScorer.java

@shelajev @edeandrea Continuous Scoring @shelajev @edeandrea https://github.com/edeandrea/non-deterministic-no-problem/blob/main/ai-scorer/src/main/java/ai/scoring/scoring/InteractionScoringInterceptor.java

@shelajev @edeandrea @shelajev @edeandrea Rescoring

@shelajev @edeandrea Continuous Scoring

@shelajev @edeandrea Rescoring

@shelajev @edeandrea Continuous Scoring @shelajev @edeandrea

@shelajev @edeandrea Rescoring @shelajev @edeandrea

@shelajev @edeandrea Rescoring - Communication @shelajev @edeandrea https://github.com/edeandrea/non-deterministic-no-problem/blob/main/openapi/ai-interactions.yml Same as
Continuous Scoring

@shelajev @edeandrea Rescoring @shelajev @edeandrea https://github.com/edeandrea/non-deterministic-no-problem/blob/main/ai-scorer/src/main/java/ai/scoring/scoring/InteractionScorer.java https://github.com/edeandrea/non-deterministic-no-problem/blob/main/ai-scorer/src/main/java/ai/scoring/evaluation/InteractionEvaluator.java

@shelajev @edeandrea @shelajev @edeandrea Evaluations

@shelajev @edeandrea Rescoring - Evaluation @shelajev @edeandrea https://docs.quarkiverse.io/quarkus-langchain4j/dev/testing.html#_evaluation 1. Sample
◦ The test case containing input parameters & expected output. 2. Function under test ◦ The function being evaluated. Receives input parameters & produces and actual output. 3. Evaluation Strategy ◦ Logic that determines if the actual output is acceptable based on the expected output. 4. Evaluation Result ◦ Outcome (pass/fail), score, explanation, and metadata from the evaluation

@shelajev @edeandrea Rescoring - Loading Samples https://docs.quarkiverse.io/quarkus-langchain4j/dev/testing.html#_custom_sample_loaders https://github.com/edeandrea/non-deterministic-no-problem/blob/main/ai-scorer/src/main/java/ai/scoring/evaluation/InteractionsSampleLoader.java

@shelajev @edeandrea Rescoring - Evaluation Strategy https://docs.quarkiverse.io/quarkus-langchain4j/dev/testing.html#_built_in_evaluation_strategies https://github.com/edeandrea/non-deterministic-no-problem/blob/main/ai-scorer/src/main/java/ai/scoring/evaluation/InteractionEvaluator.java

@shelajev @edeandrea Rescoring - Scoring https://docs.quarkiverse.io/quarkus-langchain4j/dev/testing.html#_ﬂuent_builder_api https://github.com/edeandrea/non-deterministic-no-problem/blob/main/ai-scorer/src/main/java/ai/scoring/evaluation/InteractionEvaluator.java

@shelajev @edeandrea @shelajev @edeandrea Putting it all together

@shelajev @edeandrea What happens when we do this?

@shelajev @edeandrea @shelajev @edeandrea What’s still missing?

@shelajev @edeandrea RAG Evaluation: Two Surfaces • RAG introduces new
surfaces for necessary evaluation • Evaluate the retrieval of relevant context documents • Evaluate the generation based on the retrieved context • Semantic similarity checks are crucial for RAG evaluation

@shelajev @edeandrea @shelajev @edeandrea Takeaways

@shelajev @edeandrea Selection decisions are not application based https://artiﬁcialanalysis.ai/models

@shelajev @edeandrea • Naming things is still the hardest thing
in computer science • LangChain4j & Quarkus are awesome! They provide foundational building blocks! • Don’t build observability into your apps - build it around your apps • Don’t forget your craft: DevOps process is there to help • Write tests, expect change and failure, deploy often • AI is just an API call Actual takeaways

@shelajev @edeandrea https://quarkus.io @quarkusio https://quarkusio.zulipchat.com @quarkus.io

@shelajev @edeandrea @shelajev @edeandrea Thank You! Slides https://bit.ly/jf26-did-you-get-better Rate the
talk (honest feedback)! https://bit.ly/jf26-edos-rate

Did you really get better?

Did you really get better?

Video

More Decks by Eric Deandrea

Other Decks in Technology

Featured

Transcript