CLI & Command Registry
The autosre CLI is built on the Click framework, with the main entrypoint defined in autosre/cli.py 1. The cli group serves as the root command, supporting an interactive dashboard mode when invoked without subcommands. Command registration is handled primarily through Click decorators within this file, with additional command groups imported from the autosre/commands/ subpackage to keep the main module size manageable 2.
Main Entrypoint and Interactive Mode
Section titled “Main Entrypoint and Interactive Mode”The cli function is the primary entrypoint, decorated with @click.group 1. It includes a version option and uses @click.pass_context to manage state. If no subcommand is provided (ctx.invoked_subcommand is None), the CLI automatically launches the interactive TUI dashboard by importing and calling main() from autosre.tui.
Core Management Subcommands
Section titled “Core Management Subcommands”Several subcommands are defined directly in autosre/cli.py to manage the vLLM backend and system status:
setup: Checks and installs requirements for the vLLM stack. It usesget_backend(BackendType.VLLM)to verify the environment and exits with an error if requirements are missing.stop: Scales theautosre-vllm-localdeployment to 0 to free the GPU, while leaving proxy and browser pods running. It usesk3s_lifecycleto manage the pod termination.status: Displays pod health, pinned endpoint status, and vLLM model/URL information usingkubectland backend status methods.test: Sends a test prompt to the vLLM server to verify connectivity and response correctness, displaying token usage and latency 3.watch: Provides live introspection of the vLLM instance and host metrics (GPU, CPU, RAM) using a Rich live view, combining/metrics, docker logs, and nv-monitor data 4.backends: Lists available backends, currently showing onlyvllmas the detected platform.bench: Benchmarks vLLM models for throughput, concurrency, and GPU memory usage, with options to list models, view history, or specify models by name or index.precommit: Scans the working tree for sensitive data using the vendoredprecommit_scannermodule 3.ui: An alias for the interactive TUI dashboard, callingmain()fromautosre.tui.
Model Management
Section titled “Model Management”The models group manages models for the current backend:
models list: Lists configured model recipes and the currently deployed model 5.models pull: Informs the user that model pulling is handled by the k3s chart, not the CLI.
Swarm Management
Section titled “Swarm Management”The swarm group manages agent swarms:
swarm launch: Launches an agent swarm with optional task templates, supporting both local vLLM and Anthropic providers 6.swarm templates: Lists available task templates with agent counts and roles.
Dedicated Mode Management
Section titled “Dedicated Mode Management”The dedicated group flips the GB10 between SHARED, DEDICATED-coding, and IMAGEGEN modes:
dedicated status: Shows the current dedicated-mode latch 2.dedicated down: Restores shared mode.dedicated reconcile: Applies the durable latch’s desired mode, used by the boot service.
External Command Groups
Section titled “External Command Groups”Additional command groups are imported from autosre/commands/ and registered with cli.add_command():
perffromautosre.commands.perfclusterfromautosre.commands.clusterconfigurefromautosre.commands.configuredemofromautosre.commands.demodropboxfromautosre.commands.dropboximagesfromautosre.commands.imagesk3sfromautosre.commands.k3skeysfromautosre.commands.keysmcpfromautosre.commands.mcpprovisionfromautosre.commands.provisionssh_groupfromautosre.commands.ssh_cmdsswarm_demofromautosre.commands.swarm_demoworkflowfromautosre.commands.workflowclaudefromautosre.commands.claudecodexfromautosre.commands.codex_cmdeval_groupfromautosre.commands.eval_cmdsmetricsfromautosre.commands.metricsstartfromautosre.commands.start
"""CLI entry point for autosre."""
import json
import sys
import time
from pathlib import Path
import click
import httpx
from . import __version__
# detect_platform / get_backend are re-exported explicitly (the `as` form) so the
# autosre/commands/* modules can resolve the test-patched `autosre.cli.<name>`
# attribute at call time and still type-check under strict mypy. k3s-only: vLLM
# is the single backend, so there is no host active-state to load.
from .backends import BackendType
from .backends import (
detect_platform as detect_platform, # noqa: PLC0414 (explicit re-export for _cli.*)
)
from .backends import get_backend as get_backend # noqa: PLC0414
from .models import OPUS_1M
def _resolve_backend(backend: str) -> BackendType:
"""Resolve backend string to BackendType (k3s-only: always vLLM)."""
return BackendType.VLLM if backend in ("auto", "vllm") else BackendType(backend)
@click.group(invoke_without_command=True)
@click.version_option(version=__version__)
@click.pass_context
def cli(ctx: click.Context) -> None:
"""Auto-SRE: Local LLM server management for Claude Code.
Run with no arguments to open the interactive dashboard.
"""
if ctx.invoked_subcommand is None:
from .tui import main
@dedicated_group.command("down")
@click.option("--yes", is_flag=True, help="Skip the confirmation prompt.")
@click.option("--dry-run", is_flag=True, help="Print the plan without mutating the cluster.")
def dedicated_down(yes: bool, dry_run: bool) -> None:
"""Restore shared mode (stops autoswe.service, restores base vLLM + scales meeting-scribe back)."""
from autosre import dedicated as _d
if not dry_run and not yes:
click.confirm(
"DEDICATED down: stop autoswe.service, restore the shared vLLM profile, and scale "
"meeting-scribe back up. Proceed?",
abort=True,
)
res = _d.down(_d.LiveRunner(), dry_run=dry_run)
click.echo(json.dumps(res, indent=2))
if not res.get("ok"):
raise click.exceptions.Exit(1)
@dedicated_group.command("imagegen-up")
@click.option("--yes", is_flag=True, help="Skip the confirmation prompt.")
@click.option("--dry-run", is_flag=True, help="Print the plan without mutating the cluster.")
def dedicated_imagegen_up(yes: bool, dry_run: bool) -> None:
"""Vacate the GPU for image generation (scales meeting-scribe AND the vLLM pod to 0).
Requires the shared baseline; run `dedicated down` first if in coding mode. This
only frees the GPU; bringing up the SwarmUI pod is owned by `sddc imagegen up`.
"""
from autosre import dedicated as _d
if not dry_run and not yes:
click.confirm(
"IMAGEGEN up: scale meeting-scribe to 0 AND scale the vLLM pod to 0 (frees "
"the whole GPU; no coding/scribe inference until `dedicated down`). Proceed?",
abort=True,
)
is_flag=True,
default=False,
help="Treat all hits as warnings (exit 0 even on block-level findings).",
)
@click.option(
"--include-all-tracked",
is_flag=True,
default=False,
help="Scan every tracked file (deep scan) instead of only the working tree.",
)
@click.pass_context
def precommit(
ctx: click.Context,
verbose: bool,
warn_only: bool,
include_all_tracked: bool,
) -> None:
"""Scan autosre working tree for sensitive data before commit.
Uses the vendored ``precommit_scanner`` module - no external tool
required. Flags credentials, private keys, MAC/IP leaks, and other
block-level patterns.
"""
from autosre import precommit_scanner
repo_root = Path(__file__).resolve().parents[1]
ctx.invoke(
precommit_scanner.precommit,
repo=repo_root,
include_staged=True,
include_all_tracked=include_all_tracked,
verbose=verbose,
warn_only=warn_only,
)
@cli.command()
@click.option(
"--prompt", "-p", default="Say 'hello world' and nothing else.", help="Test prompt to send"
)
@cli.command()
@click.option(
"--refresh",
type=float,
default=1.0,
show_default=True,
help="Refresh interval in seconds.",
)
@click.option(
"--port",
type=int,
default=8010,
show_default=True,
help="vLLM API port (also used for the TCP-client snapshot).",
)
@click.option(
"--nv-url",
type=str,
default="http://localhost:9100/metrics",
show_default=True,
help="nv-monitor Prometheus URL. Default matches `autosre start`'s "
"headless launch (`nv-monitor -n -p 9100`).",
)
def watch(refresh: float, port: int, nv_url: str) -> None:
"""Live introspection of the local vLLM instance + GB10 host.
Combines vLLM /metrics, docker-logs streaming, TCP client snapshot,
nv-monitor host metrics (GPU util/temp/power, CPU, RAM, swap), and
the recent-request JSONL log into one Rich live view. Press 'q' or
Ctrl-C to exit.
Unlike `autosre metrics --follow`, this command tails the vLLM
container log so per-request HTTP events appear the moment vLLM
handles them - not only when the proxy / scribe finalizes its JSONL
entry after the response completes. nv-monitor surfaces GB10
unified-memory and Grace big.LITTLE signals that vLLM's /metrics
can't see.
"""
if 0 <= idx < len(MODELS):
specs.append(MODELS[idx])
else:
click.secho(f"Invalid index: {m} (0-{len(MODELS) - 1})", fg="red")
sys.exit(1)
else:
matched = [
s
for s in MODELS
if m.lower() in s.name.lower() or m.lower() in s.model_id.lower()
]
if matched:
specs.extend(matched)
else:
click.secho(f"No model matching: {m}", fg="red")
sys.exit(1)
else:
specs = list(MODELS)
click.secho("Auto-SRE Model Benchmark", bold=True)
click.echo(f"Models: {len(specs)} | Concurrent: {concurrent}x")
click.echo()
results = []
for spec in specs:
click.secho(f"━━━ {spec.name} ({spec.weight_size_gb:.0f}GB weights) ━━━", bold=True)
result = run_single_benchmark(spec, concurrent_n=concurrent)
print_result(result)
results.append(result)
click.echo()
click.secho("Summary", bold=True)
print_summary_table(results)
path = save_results(results)
click.echo(f"\nResults saved to: {path}")
@cli.group()
def models() -> None:
template: str | None,
model: str | None,
provider: str,
anthropic_model: str,
) -> None:
"""Launch an agent swarm with optional task template.
The team size comes from the template (see ``swarm templates``);
there is no per-launch agent-count override.
\b
Examples:
autosre swarm launch # Local basic swarm
autosre swarm launch -t code-review # Local, code review template
autosre swarm launch -t incident-response -m nemotron-super
autosre swarm launch --provider=anthropic -t code-review
autosre swarm launch --provider=anthropic --anthropic-model=claude-opus-4-8
"""
from autosre.swarm.launcher import SwarmLauncher
from autosre.swarm.templates import TASK_TEMPLATES
task_template = TASK_TEMPLATES.get(template) if template else None
if provider == "anthropic":
# Online mode uses Claude Code's native auth. The launcher still wants a
# backend instance, but its env/model args are ignored once the provider
# is anthropic, so the k3s vLLM client serves as the stub.
b = get_backend(BackendType.VLLM)
launcher = SwarmLauncher(
b,
template=task_template,
provider="anthropic",
anthropic_model=anthropic_model,
)
try:
launcher.launch()
except RuntimeError as e:
click.secho(f"ERROR: {e}", fg="red")
sys.exit(1)
return