CLI & Command Registry

The autosre CLI is built on the Click framework, with the main entrypoint defined in autosre/cli.py ¹. The cli group serves as the root command, supporting an interactive dashboard mode when invoked without subcommands. Command registration is handled primarily through Click decorators within this file, with additional command groups imported from the autosre/commands/ subpackage to keep the main module size manageable ².

Main Entrypoint and Interactive Mode

The cli function is the primary entrypoint, decorated with @click.group ¹. It includes a version option and uses @click.pass_context to manage state. If no subcommand is provided (ctx.invoked_subcommand is None), the CLI automatically launches the interactive TUI dashboard by importing and calling main() from autosre.tui.

Core Management Subcommands

Several subcommands are defined directly in autosre/cli.py to manage the vLLM backend and system status:

setup: Checks and installs requirements for the vLLM stack. It uses get_backend(BackendType.VLLM) to verify the environment and exits with an error if requirements are missing.
stop: Scales the autosre-vllm-local deployment to 0 to free the GPU, while leaving proxy and browser pods running. It uses k3s_lifecycle to manage the pod termination.
status: Displays pod health, pinned endpoint status, and vLLM model/URL information using kubectl and backend status methods.
test: Sends a test prompt to the vLLM server to verify connectivity and response correctness, displaying token usage and latency ³.
watch: Provides live introspection of the vLLM instance and host metrics (GPU, CPU, RAM) using a Rich live view, combining /metrics, docker logs, and nv-monitor data ⁴.
backends: Lists available backends, currently showing only vllm as the detected platform.
bench: Benchmarks vLLM models for throughput, concurrency, and GPU memory usage, with options to list models, view history, or specify models by name or index.
precommit: Scans the working tree for sensitive data using the vendored precommit_scanner module ³.
ui: An alias for the interactive TUI dashboard, calling main() from autosre.tui.

Model Management

The models group manages models for the current backend:

models list: Lists configured model recipes and the currently deployed model ⁵.
models pull: Informs the user that model pulling is handled by the k3s chart, not the CLI.

Swarm Management

The swarm group manages agent swarms:

swarm launch: Launches an agent swarm with optional task templates, supporting both local vLLM and Anthropic providers ⁶.
swarm templates: Lists available task templates with agent counts and roles.

Dedicated Mode Management

The dedicated group flips the GB10 between SHARED, DEDICATED-coding, and IMAGEGEN modes:

dedicated status: Shows the current dedicated-mode latch ².
dedicated down: Restores shared mode.
dedicated reconcile: Applies the durable latch’s desired mode, used by the boot service.

External Command Groups

Additional command groups are imported from autosre/commands/ and registered with cli.add_command():

perf from autosre.commands.perf
cluster from autosre.commands.cluster
configure from autosre.commands.configure
demo from autosre.commands.demo
dropbox from autosre.commands.dropbox
images from autosre.commands.images
k3s from autosre.commands.k3s
keys from autosre.commands.keys
mcp from autosre.commands.mcp
provision from autosre.commands.provision
ssh_group from autosre.commands.ssh_cmds
swarm_demo from autosre.commands.swarm_demo
workflow from autosre.commands.workflow
claude from autosre.commands.claude
codex from autosre.commands.codex_cmd
eval_group from autosre.commands.eval_cmds
metrics from autosre.commands.metrics
start from autosre.commands.start

autosre/cli.py L1-120 (showing 40 of 120)

"""CLI entry point for autosre."""

import json
import sys
import time
from pathlib import Path

import click
import httpx

from . import __version__

# detect_platform / get_backend are re-exported explicitly (the `as` form) so the
# autosre/commands/* modules can resolve the test-patched `autosre.cli.<name>`
# attribute at call time and still type-check under strict mypy. k3s-only: vLLM
# is the single backend, so there is no host active-state to load.
from .backends import BackendType
from .backends import (
    detect_platform as detect_platform,  # noqa: PLC0414  (explicit re-export for _cli.*)
)
from .backends import get_backend as get_backend  # noqa: PLC0414
from .models import OPUS_1M


def _resolve_backend(backend: str) -> BackendType:
    """Resolve backend string to BackendType (k3s-only: always vLLM)."""
    return BackendType.VLLM if backend in ("auto", "vllm") else BackendType(backend)


@click.group(invoke_without_command=True)
@click.version_option(version=__version__)
@click.pass_context
def cli(ctx: click.Context) -> None:
    """Auto-SRE: Local LLM server management for Claude Code.

    Run with no arguments to open the interactive dashboard.
    """
    if ctx.invoked_subcommand is None:
        from .tui import main

autosre/cli.py L601-707 (showing 40 of 107)



@dedicated_group.command("down")
@click.option("--yes", is_flag=True, help="Skip the confirmation prompt.")
@click.option("--dry-run", is_flag=True, help="Print the plan without mutating the cluster.")
def dedicated_down(yes: bool, dry_run: bool) -> None:
    """Restore shared mode (stops autoswe.service, restores base vLLM + scales meeting-scribe back)."""

    from autosre import dedicated as _d

    if not dry_run and not yes:
        click.confirm(
            "DEDICATED down: stop autoswe.service, restore the shared vLLM profile, and scale "
            "meeting-scribe back up. Proceed?",
            abort=True,
        )
    res = _d.down(_d.LiveRunner(), dry_run=dry_run)
    click.echo(json.dumps(res, indent=2))
    if not res.get("ok"):
        raise click.exceptions.Exit(1)


@dedicated_group.command("imagegen-up")
@click.option("--yes", is_flag=True, help="Skip the confirmation prompt.")
@click.option("--dry-run", is_flag=True, help="Print the plan without mutating the cluster.")
def dedicated_imagegen_up(yes: bool, dry_run: bool) -> None:
    """Vacate the GPU for image generation (scales meeting-scribe AND the vLLM pod to 0).

    Requires the shared baseline; run `dedicated down` first if in coding mode. This
    only frees the GPU; bringing up the SwarmUI pod is owned by `sddc imagegen up`.
    """

    from autosre import dedicated as _d

    if not dry_run and not yes:
        click.confirm(
            "IMAGEGEN up: scale meeting-scribe to 0 AND scale the vLLM pod to 0 (frees "
            "the whole GPU; no coding/scribe inference until `dedicated down`). Proceed?",
            abort=True,
        )

autosre/cli.py L121-240 (showing 40 of 120)

    is_flag=True,
    default=False,
    help="Treat all hits as warnings (exit 0 even on block-level findings).",
)
@click.option(
    "--include-all-tracked",
    is_flag=True,
    default=False,
    help="Scan every tracked file (deep scan) instead of only the working tree.",
)
@click.pass_context
def precommit(
    ctx: click.Context,
    verbose: bool,
    warn_only: bool,
    include_all_tracked: bool,
) -> None:
    """Scan autosre working tree for sensitive data before commit.

    Uses the vendored ``precommit_scanner`` module - no external tool
    required. Flags credentials, private keys, MAC/IP leaks, and other
    block-level patterns.
    """
    from autosre import precommit_scanner

    repo_root = Path(__file__).resolve().parents[1]
    ctx.invoke(
        precommit_scanner.precommit,
        repo=repo_root,
        include_staged=True,
        include_all_tracked=include_all_tracked,
        verbose=verbose,
        warn_only=warn_only,
    )


@cli.command()
@click.option(
    "--prompt", "-p", default="Say 'hello world' and nothing else.", help="Test prompt to send"
)

autosre/cli.py L241-360 (showing 40 of 120)



@cli.command()
@click.option(
    "--refresh",
    type=float,
    default=1.0,
    show_default=True,
    help="Refresh interval in seconds.",
)
@click.option(
    "--port",
    type=int,
    default=8010,
    show_default=True,
    help="vLLM API port (also used for the TCP-client snapshot).",
)
@click.option(
    "--nv-url",
    type=str,
    default="http://localhost:9100/metrics",
    show_default=True,
    help="nv-monitor Prometheus URL. Default matches `autosre start`'s "
    "headless launch (`nv-monitor -n -p 9100`).",
)
def watch(refresh: float, port: int, nv_url: str) -> None:
    """Live introspection of the local vLLM instance + GB10 host.

    Combines vLLM /metrics, docker-logs streaming, TCP client snapshot,
    nv-monitor host metrics (GPU util/temp/power, CPU, RAM, swap), and
    the recent-request JSONL log into one Rich live view. Press 'q' or
    Ctrl-C to exit.

    Unlike `autosre metrics --follow`, this command tails the vLLM
    container log so per-request HTTP events appear the moment vLLM
    handles them - not only when the proxy / scribe finalizes its JSONL
    entry after the response completes. nv-monitor surfaces GB10
    unified-memory and Grace big.LITTLE signals that vLLM's /metrics
    can't see.
    """

autosre/cli.py L361-480 (showing 40 of 120)

                if 0 <= idx < len(MODELS):
                    specs.append(MODELS[idx])
                else:
                    click.secho(f"Invalid index: {m} (0-{len(MODELS) - 1})", fg="red")
                    sys.exit(1)
            else:
                matched = [
                    s
                    for s in MODELS
                    if m.lower() in s.name.lower() or m.lower() in s.model_id.lower()
                ]
                if matched:
                    specs.extend(matched)
                else:
                    click.secho(f"No model matching: {m}", fg="red")
                    sys.exit(1)
    else:
        specs = list(MODELS)

    click.secho("Auto-SRE Model Benchmark", bold=True)
    click.echo(f"Models: {len(specs)}  |  Concurrent: {concurrent}x")
    click.echo()

    results = []
    for spec in specs:
        click.secho(f"━━━ {spec.name} ({spec.weight_size_gb:.0f}GB weights) ━━━", bold=True)
        result = run_single_benchmark(spec, concurrent_n=concurrent)
        print_result(result)
        results.append(result)

    click.echo()
    click.secho("Summary", bold=True)
    print_summary_table(results)

    path = save_results(results)
    click.echo(f"\nResults saved to: {path}")


@cli.group()
def models() -> None:

autosre/cli.py L481-600 (showing 40 of 120)

    template: str | None,
    model: str | None,
    provider: str,
    anthropic_model: str,
) -> None:
    """Launch an agent swarm with optional task template.

    The team size comes from the template (see ``swarm templates``);
    there is no per-launch agent-count override.

    \b
    Examples:
      autosre swarm launch                                   # Local basic swarm
      autosre swarm launch -t code-review                    # Local, code review template
      autosre swarm launch -t incident-response -m nemotron-super
      autosre swarm launch --provider=anthropic -t code-review
      autosre swarm launch --provider=anthropic --anthropic-model=claude-opus-4-8
    """
    from autosre.swarm.launcher import SwarmLauncher
    from autosre.swarm.templates import TASK_TEMPLATES

    task_template = TASK_TEMPLATES.get(template) if template else None

    if provider == "anthropic":
        # Online mode uses Claude Code's native auth. The launcher still wants a
        # backend instance, but its env/model args are ignored once the provider
        # is anthropic, so the k3s vLLM client serves as the stub.
        b = get_backend(BackendType.VLLM)
        launcher = SwarmLauncher(
            b,
            template=task_template,
            provider="anthropic",
            anthropic_model=anthropic_model,
        )
        try:
            launcher.launch()
        except RuntimeError as e:
            click.secho(f"ERROR: {e}", fg="red")
            sys.exit(1)
        return