Slide 40
Slide 40 text
Why serve models with Java?
Fast development/prototyping → Not necessary to install, configure and interact with any external server.
Security → Embedding the model inference in the same JVM instance of the application using it, eliminates
the need of interacting with the LLM only through REST calls, thus preventing the leak of private data.
Legacy support: Legacy users still running monolithic applications on EAP can include LLM-based
capabilities in those applications without changing their architecture or platform.
Monitoring and Observability: Gathering statistics on the reliability and speed of the LLM response can be
done using the same tools already provided by EAP or Quarkus.
Developer Experience → Debuggability will be simplified, allowing Java developers to also navigate and
debug the Jlama code if necessary.
Distribution → Possibility to include the model itself into the same fat jar of the application using it (even
though this could probably be advisable only in very specific circumstances).
Edge friendliness → Deploying a self-contained LLM-capable Java application will also make it a better fit
than a client/server architecture for edge environments.
Embedding of auxiliary LLMs → Apps using different LLMs, for instance a smaller one to to validate the
responses of the main bigger one, can use a hybrid approach, embedding the auxiliary LLMs.
Similar lifecycle between model and app →Since prompts are very dependent on the model, when it gets
updated, even through fine-tuning, the prompt may need to be replaced and the app updated accordingly.