JCON - Local Development in the AI Era

kevindubois Local Development in the AI Era Kevin Dubois Sr
Principal Developer Advocate IBM

kevindubois * same prompt, but generated locally with comfyUI :D
Local Development in the AI Era Kevin Dubois Sr Principal Developer Advocate IBM

kevindubois Why would I, and how can I run models
locally? Can I use local models with code assistants? How do I choose the right model? How do I use local models in my AI app development? 🤔

kevindubois Kevin Dubois ★ Sr. Principal Developer Advocate at ★
Java Champion ★ Technical Lead, CNCF DevEx TAG ★ From Belgium 󰎐 / Live in Switzerland󰎤 ★ 🗣 English, Dutch, French, Italian youtube.com/@thekevindubois linkedin.com/in/kevindubois github.com/kdubois @kevindubois.com

kevindubois Running and working with models… locally?

kevindubois https://fortune.com/2026/04/01/ai-data-centers-heat-island-hyperscalers/ https://www.theguardian.com/technology/2026/apr/02/google-ai-datacenter

kevindubois Why run a model locally? For Developers Familiarity with
the Development Environment and adherence of the developers to their “local developer experience” in particular for testing and debugging Convenience & Simplicity Direct Access to Hardware Ease of Integration Simplify the integration of the model with existing systems and applications that are already running locally. For Organizations Data Privacy and Security Data is the fuel for AI, and a differentiator factor (quality, quantity, qualiﬁcation). Keeping data on-premises ensures sensitive information doesn’t leave the local environment → crucial for privacy-sensitive applications Cost Control While there is an initial investment in hardware and setup, running locally can potentially reduce ongoing costs of cloud computing services and alleviate the vendor-locking played by Amazon, MSFT, Google Regulatory Compliance Some industries have strict regulations about where and how data is processed Customization & Control Easily train or ﬁne-tune your own model, from the convenience of the developer’s local machine.

kevindubois What open source tech can help us run LLMs
locally? 8

kevindubois Fortunately, there’s a lot… for every use case!

kevindubois A tale of two inference engines

kevindubois ▸ Simple CLI: “Docker” style tool for running LLMs
locally, offline, and privately ▸ Extensible: Basic model customization (Modelfile) and importing of fine-tuned LLMs ▸ Cons: Loads in ALL accelerators/libraries, no clear path to production Tool #1: Ollama https://ollama.com

kevindubois • User Friendly • Easy way to find and
serve models • Debug Mode: See what’s happening in the background • Ability to customize runtime for best performance • NOT Open Source ☹ Tool #2: LM Studio https://lmstudio.ai/

kevindubois ▸ For App Builders: Choose from various recipes like
RAG, Agentic, Summarizers ▸ Curated Models: Easily access Apache 2.0 open-source options. ▸ Container Native: Easy app integration and movement from local to production. ▸ Interactive Playgrounds: Test & optimize models with your custom prompts and data. Tool #3: Podman AI Lab https://podman-desktop.io/docs/ai-lab

kevindubois ▸ AI in Containers: Run models with Podman/Docker with
no config needed. ▸ Registry Agnostic: Freedom to pull models from Hugging Face, Ollama, or OCI registries. ▸ GPU Optimized: Auto-detect & accelerate performance. ▸ Flexible: Supports llama.cpp, vLLM, whisper.cpp & more. ▸ Path to production Tool #4: Ramalama https://ramalama.ai/

kevindubois ramalama rag ~/Documents/*.pptx.pdf pdf-docs podman image list | grep
pdf ramalama run --rag localhost/pdf-docs granite https://www.docling.ai/

kevindubois Ok! But, what specific model should I be using,
and where do find them? 16

kevindubois “Open Source” models on Hugging Face

kevindubois 18

kevindubois

kevindubois Gemma

kevindubois

kevindubois ▸ It depends on the use case that you
want to tackle & how ”Open Source” it should be. ▸ DeepSeek, gpt-oss, Gemma4 models excel in reasoning tasks and complex problem-solving. ▸ Qwen, GLM, Devstral, Minimax are good coding assistant models. ▸ IBM’s Granite models are great for general tasks using minimal resources, RAG with docling, and fine-tuning So, which local model should you select?

kevindubois Not all models are the same! Text Image Unimodal
text-to-image text-to-text image-to-text image-to-image text-to-code Text Image Audio Video Multimodal any-to-any ✓ Single data input ✓ Less resources ✓ Single modality ✓ Limited depth and accuracy ✓ Multiple data inputs ✓ More resources ✓ Multiple modality ✓ Better understanding and accuracy OR

kevindubois ”Open Source”

kevindubois Kind of like how our apps are compiled for
various architectures! Also! There’s a naming convention ibm-granite/granite-4.0-8b-base Family name Model architecture and version Number of parameters Model ﬁne-tuned to be a baseline Mixtral-8x7B-Instruct-v0.1 Family name Model version Number of parameters Model ﬁne-tuned for instructive tasks Architecture type

kevindubois Local models mean there’s no security risk, right!? https://www.darkreading.com/cyber-risk/open-source-ai-models-pose-risks-of-malicious-code-vulnerabilities

kevindubois 27 What about model size?

kevindubois How to deploy a larger model? Let’s say you
want the best benchmarks with a frontier model

kevindubois Neither of these situations is ideal :)

kevindubois ▸ Quantization: A technique to compress LLMs by reducing
numerical precision. ▸ Converts high-precision weights (FP32) into lower-bit formats (FP16, INT8, INT4). ▸ Reduces size and memory footprint, making models easier to deploy. It’s a way to compress models, think like a .zip or .tar Most models for local usage are quantized!

kevindubois AI Engine? Check ✔ AI Model? Check ✔ Let’s
start using these models!

kevindubois Code Assistance Use a local model as a pair
programmer, to generate and explain your codebase. Tools: Continue, Roo Code, Cline, OpenCode, Devoxx Genie … How to use local, disconnected (?) code assistants Fortunately, many tools exist for this too!

kevindubois Local models have less capacity than large cloud models,
so a few adjustments help: • Be specific. Instead of "fix the bug," point to the file and describe the issue. • Work in small steps. Ask for one change at a time rather than large refactors across multiple files. • Use capable models. For complex code tasks, the 26B+ parameter models perform significantly better than smaller ones. (if your system can handle it) • Keep context focused. Smaller context windows mean your assistant may struggle with very large files. Break big files into smaller modules when possible. Tips for local models

kevindubois Continue (IntelliJ/VsCode extension)

kevindubois OpenCode (TUI)

kevindubois Local models: ➔ Free! ➔ Needs a lot of
compute / GPUs ➔ Good for small tasks ➔ Can be slow ➔ Need to craft prompts carefully ➔ Not the best for more complex work ➔ Able to work disconnected Premium: ➔ Pretty much works on any hardware ➔ Faster and typically much better at doing complex work ➔ No/unstable network or out of tokens? game over ➔ Can become expensive ➔ Support

kevindubois Building & running local, network optional, Agentic Java AI
Apps

kevindubois Model Context Protocol (MCP) LangChain4j MCP Client

kevindubois

kevindubois ▸ There are many options for serving and using
models locally ▸ Pick the right model for the right use case ▸ Make sure the model comes from a reputable source (!) ▸ Local code assistants work… ish ▸ You might need to ask for hardware upgrades 😅 ▸ Developing local Agentic AI apps with Java is definitely possible ▸ Quarkus has a lot of productivity AI tricks up its sleeve Wrapping it up

kevindubois Thank you! slides podman-desktop.io docs.quarkiverse.io/quarkus-langchain4j github.com/kdubois/netatmo-java-mcp www.ibm.com/granite opencode.ai ramalama.ai
ollama.com huggingface.co ibm.com/products/bob youtube.com/@thekevindubois linkedin.com/in/kevindubois github.com/kdubois @kevindubois.com @[email protected] Thank you! speakerdeck.com/kdubois

JCON - Local Development in the AI Era

JCON - Local Development in the AI Era

More Decks by Kevin Dubois

Other Decks in Programming

Featured

Transcript