Journey Synthesis

The Journey Synthesis pipeline transforms raw runtime artifacts - transcripts, state files, and system logs - into a static documentation site. It begins by loading and caching these sources, then correlates them into discrete iterations using the iter_correlator. The pipeline renders narrative and tabular content via specific renderers, applies strict redaction to prevent data leaks, and writes the output as a Content Collection for an Astro static site. Finally, it performs an accuracy check against the raw sources and triggers a rebuild of the static site to ensure the published documentation reflects the current run’s state.

Source Ingestion and Caching

The pipeline loads several key data structures using lookup_or_compute, which checks the cache before invoking the corresponding sources reader . These sources include:

state: Read from runtime/state.json to determine the current run’s start time and epoch .
history: Read from runtime/state.json .
wedges: Read from runtime/wedge-events.jsonl and runtime/wedge-lessons.jsonl .
restarts: Read from runtime/liveness-restarts.jsonl .
refinements: Read from runtime/rule-refinements.jsonl .
bypass: Read from runtime/orchestrator-bypass-attempts.jsonl .

Transcripts are listed from the transcripts_dir, scoped to the current run’s start time using _run_start_unix to filter out abandoned pre-reset runs .

Iteration Correlation

The core of the synthesis is the correlate function from journey_synth.iter_correlator . It takes the loaded history, transcripts, wedges, restarts, refinements, commits, and an artifacts_reader lambda as inputs . This function re-walks the transcripts to correlate events into discrete iterations (iters) [src: scripts/journey_synth/cli.py:L257]. The resulting iters list is the primary data structure used for all subsequent rendering and site generation [src: scripts/journey_synth/cli.py:L267].

Rendering and Redaction

Once iterations are correlated, the pipeline renders content using specific renderers from journey_synth.renderers ¹. The _write helper function handles the actual file writing .

Narrative Docs: SUMMARY.md and lessons.md are processed through codex_cleanup.clean_markdown if the --codex-cleanup flag is enabled . This step strips AI-speak phrasing and em-dashes .
Per-Iter Docs: Each iteration is rendered using per_iter_r.render(it) [src: scripts/journey_synth/cli.py:L282].
Aggregates: lessons.md uses lessons_r.render, cost.md uses cost_r.render, and timeline.md uses timeline_r.render .

All content is passed through write_with_redaction, which uses redact_obj and redact to ensure no sensitive data leaks . In --public mode, any detected leak causes the write to fail and the pipeline to exit with an error .

Static Site Generation (Astro)

The pipeline generates a single source of truth for the Astro static site: site-data.json and a Content Collection of per-iteration markdown files . This is handled by _write_site_content .

Site Data: site_data_r.render aggregates metrics from iters, state, wedges, restarts, refinements, bypass, commits, parity, system_state, meta_judge, and codex_by_iter .
Per-Iter Content: For each iteration, a YAML frontmatter block is generated using site_data_r._iter_row, and the body is rendered using per_iter_r.render . This content is written to src/content/iters/ under the JOURNEY_WEB directory .
Leak Scanning: The site data and per-iter content are redacted, and any leaks are tracked .

After writing the content, the pipeline optionally rebuilds the Astro site via _rebuild_site . This function runs npm install (if needed) and npm run build in the JOURNEY_WEB directory ². It also runs npm run scan to perform a final leak scan on the built dist/ directory .

Accuracy and Hygiene Checks

Before finalizing, the pipeline runs an accuracy check using journey_synth.accuracy_check . It loads the written site-data.json and recomputes headlines from raw sources to ensure they match . If any discrepancies are found, the pipeline fails .

Additionally, the pipeline runs tools/check_prose_hygiene.py with the --fix flag to normalize dashes and emoji in the generated content, ensuring it passes the repo’s prose gate .

scripts/journey_synth/cli.py L1-120 (showing 40 of 120)

"""journey_synth.cli - argparse entry point.

Invoke via `scripts/synthesize-journey.sh` (NOT bare `python3 -m journey_synth.cli`
from the repo root). The wrapper sets PYTHONPATH=scripts.
"""
from __future__ import annotations

import argparse
import json
import os
import subprocess
import sys
from datetime import datetime, timezone
from pathlib import Path

from journey_synth import sources
from journey_synth.cache import Cache, lookup_or_compute, load, save
from journey_synth.codex_cleanup import clean_markdown
from journey_synth.iter_correlator import correlate
from journey_synth.parser import PARSER_VERSION
from journey_synth.redactor import (
    is_synthesizer_file,
    redact,
    redact_obj,
    write_with_redaction,
)
from journey_synth.renderers import (
    cost as cost_r,
    lessons as lessons_r,
    per_iter as per_iter_r,
    site_data as site_data_r,
    summary as summary_r,
    timeline as timeline_r,
)

# Narrative-heavy docs go through codex cleanup; mostly-tabular docs do not
# (codex sometimes damages table alignment).
_CLEANUP_NARRATIVE_DOCS = {"SUMMARY.md", "lessons.md"}

# Repo-rooted via the shared, monorepo-free contract in sources (AUTOSWE_REPO_ROOT

scripts/journey_synth/cli.py L121-240 (showing 40 of 120)

    env = dict(os.environ)
    env["PATH"] = f"{Path.home()}/.local/share/mise/shims:" + env.get("PATH", "")

    def _npm(args: list[str], timeout: int) -> subprocess.CompletedProcess | None:
        try:
            return subprocess.run(["npm", *args], cwd=JOURNEY_WEB, env=env,
                                  capture_output=True, text=True, timeout=timeout)
        except Exception as e:  # noqa: BLE001
            print(f"npm {' '.join(args)} failed: {type(e).__name__}: {e}", file=sys.stderr)
            return None

    if not (JOURNEY_WEB / "node_modules").is_dir():
        inst = "ci" if (JOURNEY_WEB / "package-lock.json").exists() else "install"
        ri = _npm([inst], 900)
        if ri is None or ri.returncode != 0:
            tail = ri.stderr[-2000:] if ri else ""
            print(f"npm {inst} failed:\n{tail}", file=sys.stderr)
            return 1

    r = _npm(["run", "build"], 600)
    if r is None or r.returncode != 0:
        tail = r.stderr[-2000:] if r else ""
        print(f"site rebuild failed:\n{tail}", file=sys.stderr)
        return 1

    # Build-time leak gate over the rendered dist/ (fail-closed): the published
    # bytes are what actually ship, so this is the final publishability backstop.
    scan = _npm(["run", "scan"], 120)
    if scan is None or scan.returncode != 0:
        tail = ((scan.stdout or "") + (scan.stderr or ""))[-2000:] if scan else ""
        print(f"site rebuild blocked: leak scan failed:\n{tail}", file=sys.stderr)
        return 1
    print("site rebuilt + leak-scanned -> journey-web/dist")
    return 0


def cmd_synthesize(args: argparse.Namespace) -> int:
    """Run the full synthesis. Writes 5 documents to docs/journey/."""
    runtime = Path(args.runtime) if args.runtime else sources.RUNTIME
    sddc = Path(args.sddc) if args.sddc else sources.SDDC