Skip to content

autosre

autosre is an open-source platform that orchestrates local Large Language Model inference, evaluation, and SRE automation on k3s clusters. It runs vLLM in k3s behind a native Anthropic Messages API proxy, enabling tools like Claude Code and OpenAI Codex CLI to interact with local models without cloud API keys or LiteLLM dependencies 1. The system provides a modular CLI for managing the stack, including local MCP servers for web research, a self-hosted HTTPS file dropbox, and agent swarm templates, all while maintaining strict credential isolation and XDG-compliant state management.

The architecture relies on statically pinned k3s ClusterIPs to route traffic between CLI clients, API proxies, and the vLLM inference backend. The autosre CLI acts as the control plane, handling deployment via Helm, model management, and environment injection for local AI agents. The backend layer ensures that all inference requests are routed through the proxy, which translates Anthropic Messages API calls into OpenAI Chat Completions for the vLLM pod, ensuring that every token stays on the local hardware 2 3.

diagram
Subsystem Description
CLI (autosre) The primary control plane for deploying the stack, managing models, and launching AI agents 1.
vLLM Backend The inference engine running as a k3s Deployment, serving models via a pinned ClusterIP 2.
Anthropic Proxy A k3s pod that translates Anthropic Messages API calls to OpenAI Chat Completions for Claude Code 1 2.
Codex Proxy A k3s pod that translates OpenAI Responses API calls for the Codex CLI 1 2.
MCP Servers Local web search and fetch tools using curl_cffi and DuckDuckGo, requiring no API keys 1.
Dropbox A self-hosted HTTPS file dropbox with stealth TLS and HMAC-signed cookie authentication.
Helm Charts The deployment manifest source of truth, pinning ClusterIPs and model configurations 3.