autosre
autosre is an open-source platform that orchestrates local Large Language Model inference, evaluation, and SRE automation on k3s clusters. It runs vLLM in k3s behind a native Anthropic Messages API proxy, enabling tools like Claude Code and OpenAI Codex CLI to interact with local models without cloud API keys or LiteLLM dependencies 1. The system provides a modular CLI for managing the stack, including local MCP servers for web research, a self-hosted HTTPS file dropbox, and agent swarm templates, all while maintaining strict credential isolation and XDG-compliant state management.
The architecture relies on statically pinned k3s ClusterIPs to route traffic between CLI clients, API proxies, and the vLLM inference backend. The autosre CLI acts as the control plane, handling deployment via Helm, model management, and environment injection for local AI agents. The backend layer ensures that all inference requests are routed through the proxy, which translates Anthropic Messages API calls into OpenAI Chat Completions for the vLLM pod, ensuring that every token stays on the local hardware 2 3.
| Subsystem | Description |
|---|---|
CLI (autosre) |
The primary control plane for deploying the stack, managing models, and launching AI agents 1. |
| vLLM Backend | The inference engine running as a k3s Deployment, serving models via a pinned ClusterIP 2. |
| Anthropic Proxy | A k3s pod that translates Anthropic Messages API calls to OpenAI Chat Completions for Claude Code 1 2. |
| Codex Proxy | A k3s pod that translates OpenAI Responses API calls for the Codex CLI 1 2. |
| MCP Servers | Local web search and fetch tools using curl_cffi and DuckDuckGo, requiring no API keys 1. |
| Dropbox | A self-hosted HTTPS file dropbox with stealth TLS and HMAC-signed cookie authentication. |
| Helm Charts | The deployment manifest source of truth, pinning ClusterIPs and model configurations 3. |
# Auto-SRE
> **Disclaimer: unofficial and unsupported.** Provided for testing and
> evaluation only, on an "AS IS" basis, with no warranty and no support. Not
> affiliated with or endorsed by Dell. See [DISCLAIMER.md](DISCLAIMER.md).
Wiki: https://sddcinfo.github.io/autosre/
Local LLM server management for Claude Code. Runs vLLM in k3s behind a
native Anthropic Messages API proxy (also in k3s), no LiteLLM needed, no
cloud API keys required.
Ships with local MCP servers for web search/fetch and an optional stealth
file-sharing dropbox for pushing artefacts to the box over HTTPS. Hooks and
plan review are delegated to the
[claude-code-recipes](https://github.com/bradlay/claude-code-recipes)
marketplace plugins for bare `claude` sessions; `autosre claude` (offline
mode) runs a clean session with no plugins or hooks.
## Requirements
- **Python 3.14+**
- **k3s** with the autosre stack deployed (vLLM + proxy + codex-proxy +
browser as pods, on an NVIDIA GB10). Deploy with `autosre k3s up`.
## Quick start
```bash
git clone https://github.com/sddcinfo/autosre.git
cd autosre
pip install -e '.[dev]'
autosre k3s up # deploy the stack into k3s (helm)
autosre start # scale vLLM up + warm it
autosre test
autosre claude # launches Claude Code against the local stack
autosre codex # launches Codex CLI against the local stack
```
`autosre claude` injects the environment and settings that make Claude
"""vLLM backend: a thin client for the vLLM pod running in k3s.
The stack is k3s-only. vLLM runs as the ``autosre-vllm-local`` Deployment and
is reached at the pinned ClusterIP (``config.get_vllm_url()``); Claude Code talks
to the ``autosre-proxy`` pod (``config.get_proxy_url()``), which translates the
Anthropic Messages API to vLLM's OpenAI Chat Completions. This class only
resolves URLs, checks health, and discovers the proxy's model label - it does
not start or stop anything (deploy with ``autosre k3s up``, scale with
``autosre start`` / ``autosre stop``).
"""
from __future__ import annotations
import logging
import os
from typing import Any, ClassVar
import httpx
from autosre.config import get_codex_url, get_proxy_url, get_vllm_url
from . import vllm_serve
from .base import Backend
logger = logging.getLogger(__name__)
class VllmBackend(Backend):
"""vLLM-in-k3s client (URL + health + model-label discovery)."""
name: ClassVar[str] = "vllm"
description: ClassVar[str] = "vLLM on GB10 (k3s pod, NVFP4)"
api_port: ClassVar[int] = 8010
proxy_port: ClassVar[int] = 8011 # Anthropic API proxy port (for Claude Code)
codex_proxy_port: ClassVar[int] = 8012 # OpenAI Responses-API proxy port (for Codex CLI)
# Single canonical recipe served by the k3s pod (see helm/autosre/values.yaml,
# the runtime source of truth). The multimodal recipe is selected by editing
# helm values + `autosre k3s up`, not a live model swap.
models: ClassVar[dict[str, str]] = {
"""autosre runtime URL configuration: statically pinned k3s ClusterIPs.
In host/docker mode all URLs default to localhost. In k3s mode the getters
return the K3S_* constants below: the ClusterIPs are pinned in the helm
charts (helm/autosre/values.yaml here, the meeting-scribe chart for the
cross-namespace backends), so there is no kubectl resolution, no cache, and
nothing to go stale across helm reinstall / redeploy.
Static IP registry (appliance-wide contract, mirrored by the charts):
meeting-scribe-pyannote 10.96.0.21:8001
meeting-scribe-tts 10.96.0.22:8002
meeting-scribe-vllm-asr 10.96.0.23:8003
meeting-scribe-tts-trial 10.96.0.29:8012
autosre-vllm-local 10.96.0.30:8010
autosre-proxy 10.96.0.31:8011
autosre-codex-proxy 10.96.0.32:8012
autosre-browser 10.96.0.33:3010
autosre-dropbox 10.96.0.34:8443
Why static pins are safe: the k3s service-cidr is 10.96.0.0/12 and these pins
sit in the KEP-3070 lower band (10.96.0.0/24), which the allocator prefers
NOT to use for dynamic assignment (it allocates upper-band-first). Collision
risk with dynamically allocated ClusterIPs is therefore low, though not a
hard guarantee. 10.96.0.1 is the kubernetes apiserver VIP and 10.96.0.10 is
kube-dns; the registry avoids both.
``config_dir() / "k3s.yaml"`` carries only ``managed_by: k3s``. Stale URL
keys left behind by older releases are simply ignored.
Usage::
from autosre.config import get_vllm_url, get_proxy_url, get_codex_url
vllm_url = get_vllm_url() # "http://10.96.0.30:8010" in k3s, "http://localhost:8010" otherwise
proxy_url = get_proxy_url() # "http://10.96.0.31:8011" in k3s, "http://localhost:8011" otherwise
codex_url = get_codex_url() # "http://10.96.0.32:8012" in k3s, "http://localhost:8012" otherwise
"""
from __future__ import annotations