history includes Observability OpenZipkin, OpenTelemetry, OpenInference (GenAI obs) Usability wazero (Go for Wasm), func-e (easy Envoy) GenAI Envoy AI Gateway, Goose, ACP, LlamaStack FOCUSED ON THE GENAI DEV -> PROD TRANSITION
text, image, audio MCP: Client+server protocol for tools (GitHub, Kiwi, Postman, etc.) Agent: LLM loop that auto-completes actions (with tools), not just text
title: Flight Search description: Search flights via AI gateway. prompt: | Use search-flight to find flights from New York to Los Angeles on {{flight_date}}. Return the first 3 results. extensions: - name: mcp_gateway type: streamable_http uri: http://127.0.0.1:1975/mcp parameters: - key: flight_date input_type: string requirement: required default: "31/12/2026"
Source CNCF-backed, Apache 2.0 AI-aware 20+ LLM providers + MCP server routing TARS First public SaaS, recommended Goose provider Production users: A CLUSTERED, FAULT-TOLERANT PROXY FOR AI AGENT TRAFFIC
Calendar MCP AI Gateway LLM Backup LLM Search MCP One Token ONE PROXY FOR ROUTING, AUTH, RATE LIMITS, AND OBSERVABILITY. THE AGENT DOESN’T CARE WHICH BACKEND IS BEHIND THE GATEWAY.
auth API keys for LLM, OAuth for MCP, all in one place Token rate limiting Per-tenant budgets; prevent runaway agents Observability Access logs, Prometheus metrics, OpenInference traces Session correlation Agent session IDs auto-tag all logs and traces MCP multiplexing Aggregate tools from multiple backends
MCP 1. Book flight 2. user message 3. tool_call: search_flights 4. tools/call: search_flights 5. flight options 6. tool_result 7. "Found AF123 for €299" 8. response BOTH LLM AND MCP CALLS THROUGH THE SAME PROXY. ONE ORIGIN, ONE SET OF LOGS.
headers Gateway auto-tags every access log entry Filter by session in your observability tool No agent-side instrumentation needed {"session_id":"goose-abc123","method":"POST","path":"/v1/chat/completions", "upstream":"openai","tokens_in":342,"tokens_out":89,"latency_ms":1240} {"session_id":"goose-abc123","method":"POST","path":"/mcp", "upstream":"kiwi","tool":"search-flight","latency_ms":820} {"session_id":"goose-abc123","method":"POST","path":"/v1/chat/completions", "upstream":"openai","tokens_in":1205,"tokens_out":156,"latency_ms":2100}
— top plans are $200/mo Dozens of agents: Claude, Codex, Cursor, Goose, OpenClaw, StakPak Each tied to a specific CLI or SDK "Love the agent but need JetBrains? Love the UI but the agent is too expensive?"
Direct APIs Dec 2024 Rust rewrite Desktop app Oct 2025 ACP server Zed, JetBrains connect Feb 2026 ACP providers Claude, Codex, Gemini as engine Goose's Journey Started as monolith: own UI, own LLM calls, own tools ACP separated frontend from engine Now: editors use Goose (ACP server) AND Goose uses other agents (ACP client) Claude/Codex via Zed-built adapters; Gemini CLI natively speaks ACP
Rewrite MCP servers Set LLM per user Centralize auth Goose ACP ACP Gateway sits between editor and agent — controls what the agent sees Editor connects to agent via ACP — gateway sits in between Rewrite MCP servers per user, set LLM policy, centralize auth Agent sees only what the gateway allows
+ inject auth Claude Codex Gemini ACP Goose delegates to any engine via ACP — gateway routes and authenticates Goose uses ACP providers: Claude, Codex, Gemini as engines Gateway routes and authenticates between Goose and each engine Keep Goose’s recipes and extensions, use any subscription
One proxy: routing, auth, throttling, observability. No per-developer drift. 2. Goose recipes Reusable multi-step workflows tested end-to-end through the gateway. Session IDs propagate for free observability. 3. ACP changes the landscape 44 agents, 42 clients. Break up with your agent without breaking your workflow. Gateways extend from LLM+MCP to the full agent communication stack.