Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Java-Powered AI on Kubernetes: From Development...

Java-Powered AI on Kubernetes: From Development to Deployment with Ease #JavaCro25

Are you a Java developer eager to explore AI but unsure where to begin? In this session, we’ll guide you through building and deploying AI applications in Java, leveraging familiar tools and running them effortlessly on Kubernetes. We’ll start by showcasing how Quarkus and LangChain4J simplify the development of AI-driven applications, making advanced use cases more accessible than ever.

Next, we'll address the challenge of running AI models for inference at scale using the Ollama operator, a lightweight yet robust solution for managing model serving on Kubernetes. In addition, we’ll dive into why vector databases are critical for many AI workloads and how these can serve as a high-performance storage and retrieval system for your data.

Join us for a live demo where we’ll walk through building and running a real-world AI use case on Kubernetes, demonstrating the tools and best practices you need to succeed. Whether you’re just starting with AI or looking to optimise your current workflows, this talk will equip you with practical insights to accelerate your AI journey in Java.

Avatar for M.-Leander Reimer

M.-Leander Reimer PRO

October 13, 2025
Tweet

More Decks by M.-Leander Reimer

Other Decks in Programming

Transcript

  1. qaware.de Java-Powered AI on Kubernetes: From Development to Deployment with

    Ease Mario-Leander Reimer [email protected] @LeanderReimer @qaware #CloudNativeNerd #gerneperdude
  2. qaware.de ... and we have the perfect surfboard! The logical

    continuation: a. From applications to microservices to AI agents b. From on-prem to cloud platforms to AI platforms
  3. Micro-Agent GenAI Usage Prompts, Flow control Tools (MCP) Antwort enthält

    Aufrufe an OpenAI API ❏ Clear responsibility ❏ Vertical in terms of expertise ❏ manageably large ❏ potentially reusable Micro-Agent A2A AI agents will be implemented according to the microservice architecture paradigm. … … … Tool Server Business Logic LLM, LAM, SLM, domain-specific foundation models ? MCP
  4. "According to Gartner, 80% of AI PoCs fail on their

    way into productive use." https://www.qaware.de/ki-vom-proof-of-concept-poc-zur-entwicklung/
  5. The 80% Fallacy of AI projects. 8 QAware Juan Pablo

    Bottaro, LinkedIn Engineering Blog
  6. The 60% Fallacy of production ready AI projects. 9 QAware

    Important quality attributes and architectural drivers are either postponed or neglected.
  7. Chatbots and AI assistants: The more specific the use case,

    the more complex it becomes. ChatGPT or comparable with world knowhow ChatGPT with organisational context knowledge Specialized AI Assistent ▪ Retrieval Augment Generation ▪ Transfer Learning ▪ Specially trained model ▪ Hyper Automation Complexity Benefit ▪ Easy to realise and relatively cost-efficient ▪ Requires data protection and compliance guidelines 10 QAware
  8. Key challenges: technology, models and tools, scaling. Source: https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-in-2023-generative-ais-breakout-year ▪

    Different challenges are seen depending on the maturity of the group ▪ AI newcomers often underestimate the complexity of technologies, models and tools ▪ Production and scaling challenges often hinder production readiness ▪ High cognitive load and lack of expertise are also drivers for failing projects 11
  9. vs

  10. Conceptual Demo Showcase Architecture 13 QAware REST Beer Service Chatbot

    Easy RAG Web UI Websockets gRPC Beer Service Ollama Model Llama 3.1 OpenAI Chat Service OpenAI Proxy REST Ollama Chat Service REST REST REST ADK Time Agent
  11. The Kubernetes cluster topology requires precise planning. Otherwise the costs

    will go through the roof! 18 QAware ▪ There are different GPU machines ▪ Not all types are available in all regions ▪ Prices vary drastically, accurate research is recommended ▪ Additional local SSDs are recommended ▪ To be decided: – all nodes with GPU – different nodes optimised for normal as well as GPU workloads https://cloud.google.com/compute/gpus-pricing?hl=de#other-gpu-models
  12. Platform Plane Observability Operability Resource Plane Compute Data Integration Security

    Delivery FinOps Integration & Delivery Plane Quality Plane Data Plane Model Plane Compliance Plane Service Plane User Serving Plane Access Plane / APIs Orchestration Plane Data Modelling Plane
  13. Compliance Plane Integration & Delivery Plane Service Plane Platform Plane

    Operability Resource Plane Compute Data: Local SSD Integration Security Delivery FinOps Quality Plane Data Plane Model Plane User Serving Plane Access Plane Data Modelling Pl.
  14. QAware GmbH | Aschauer Straße 30 | 81549 München |

    GF: Dr. Josef Adersberger, Michael Stehnken, Michael Rohleder, Mario-Leander Reimer Niederlassungen in München, Mainz, Rosenheim, Darmstadt | +49 89 232315-0 | [email protected] Thank you! The next step? Let's talk. Mario-Leander Reimer Managing Director, CTO [email protected] +49 151 61314748