Slide 1

Slide 1 text

Testing OpenAI Applications https://github.com/elastic/testing-genai-applications Or scan -> QR code for the same!

Slide 2

Slide 2 text

Introductions! I’m Adrian from the Elastic Observability team. I mostly work on GenAI Ecosystem, including OpenTelemetry.. github.com/codefromthecrypt x.com/adrianfcole

Slide 3

Slide 3 text

Objectives ● Understand the basics of testing GenAI applications ● Learn to use OpenAI CLI and SDK ● Explore traffic inspection and observability ● Write and test Python scripts for GenAI ● Learn advanced testing techniques like HTTP replay and LLM eval

Slide 4

Slide 4 text

Agenda ● Example application ● Prerequisites Setup ● Exercises ● Q&A https://github.com/elastic/testing-genai-applications Or scan -> QR code for the same!

Slide 5

Slide 5 text

All exercises use the same example application We use the OpenAI CLI or Python SDK to ask a question via its API

Slide 6

Slide 6 text

Prerequisites Setup Docker is generally required, but you can run exercises in Python ● If you use python, it will be less downloading especially if one .venv ● Docker is the easiest, and we use docker compose frequently An OpenAI compatible Inference Platform for running LLMs. ● We have instructions for OpenAI, Ollama and Ramalama. OpenTelemetry is used to demonstrate observability ● We have instructions for console, Elastic Stack and otel-tui. mitmproxy is used to demonstrate HTTP traffic interception

Slide 7

Slide 7 text

Exercises ● Use the OpenAI CLI ● Inspect CLI traffic with mitmproxy ● Trace CLI traffic with OpenTelemetry ● Write an OpenAI application ● Integration test your application ● Unit test your application with recorded HTTP responses ● Evaluation test your application using an LLM as a Judge

Slide 8

Slide 8 text

1: Use the OpenAI CLI ● Learn to query LLMs with the OpenAI CLI ● Run a simple question and get a response ● Expect "Atlantic Ocean" as the answer

Slide 9

Slide 9 text

2: Inspect OpenAI traffic with mitmproxy ● Run mitmweb to start the proxy ● Run the OpenAI CLI with proxy configuration ● Inspect the captured traffic in the web interface

Slide 10

Slide 10 text

3: Trace OpenAI traffic with OpenTelemetry ● Instrumentation without code changes ● GenAI signals for latency, prompt and usage ● Choose your own APM with portable export

Slide 11

Slide 11 text

4: Write an OpenAI Application ● Create Python script using OpenAI SDK ● Manage configurations with environment variables ● Enable observability via OpenTelemetry

Slide 12

Slide 12 text

5: Integration test your application

Slide 13

Slide 13 text

6: Offline Unit Testing with VCR

Slide 14

Slide 14 text

7: Evaluate your application using an LLM as a Judge ● Assess relevancy and hallucinations with DeepEval metrics. ● Trace evaluation via OpenTelemetry.

Slide 15

Slide 15 text

Takeaways and Thanks! OpenAI requires your best and most creative testing skills Unit Tests should record real HTTP requests in whatever way is best for your language If using python, use pytest-vcr Integration Tests should use OpenAI, but allow local model usage as well. Ollama is a very good option for local model hosting, and Qwen 2.5 is a great model Tests themselves should be strict in unit tests and flexible in integration tests LLMs responses are not entirely predictable, and can sometimes miss. Be aware of this. Observability and Evaluation use Elastic Distribution of OpenTelemetry (EDOT) SDKs to enable observability. Try ElasticStack and Eval platforms like Arize Phoenix and Langtrace github.com/codefromthecrypt x.com/adrianfcole www.linkedin.com/in/adrianfcole