Journey Synthesis
The Journey Synthesis pipeline transforms raw runtime artifacts - transcripts, state files, and system logs - into a static documentation site. It begins by loading and caching these sources, then correlates them into discrete iterations using the iter_correlator. The pipeline renders narrative and tabular content via specific renderers, applies strict redaction to prevent data leaks, and writes the output as a Content Collection for an Astro static site. Finally, it performs an accuracy check against the raw sources and triggers a rebuild of the static site to ensure the published documentation reflects the current run’s state.
Source Ingestion and Caching
Section titled “Source Ingestion and Caching”The pipeline loads several key data structures using lookup_or_compute, which checks the cache before invoking the corresponding sources reader . These sources include:
state: Read fromruntime/state.jsonto determine the current run’s start time and epoch .history: Read fromruntime/state.json.wedges: Read fromruntime/wedge-events.jsonlandruntime/wedge-lessons.jsonl.restarts: Read fromruntime/liveness-restarts.jsonl.refinements: Read fromruntime/rule-refinements.jsonl.bypass: Read fromruntime/orchestrator-bypass-attempts.jsonl.
Transcripts are listed from the transcripts_dir, scoped to the current run’s start time using _run_start_unix to filter out abandoned pre-reset runs .
Iteration Correlation
Section titled “Iteration Correlation”The core of the synthesis is the correlate function from journey_synth.iter_correlator . It takes the loaded history, transcripts, wedges, restarts, refinements, commits, and an artifacts_reader lambda as inputs . This function re-walks the transcripts to correlate events into discrete iterations (iters) [src: scripts/journey_synth/cli.py:L257]. The resulting iters list is the primary data structure used for all subsequent rendering and site generation [src: scripts/journey_synth/cli.py:L267].
Rendering and Redaction
Section titled “Rendering and Redaction”Once iterations are correlated, the pipeline renders content using specific renderers from journey_synth.renderers 1. The _write helper function handles the actual file writing .
- Narrative Docs:
SUMMARY.mdandlessons.mdare processed throughcodex_cleanup.clean_markdownif the--codex-cleanupflag is enabled . This step strips AI-speak phrasing and em-dashes . - Per-Iter Docs: Each iteration is rendered using
per_iter_r.render(it)[src: scripts/journey_synth/cli.py:L282]. - Aggregates:
lessons.mduseslessons_r.render,cost.mdusescost_r.render, andtimeline.mdusestimeline_r.render.
All content is passed through write_with_redaction, which uses redact_obj and redact to ensure no sensitive data leaks . In --public mode, any detected leak causes the write to fail and the pipeline to exit with an error .
Static Site Generation (Astro)
Section titled “Static Site Generation (Astro)”The pipeline generates a single source of truth for the Astro static site: site-data.json and a Content Collection of per-iteration markdown files . This is handled by _write_site_content .
- Site Data:
site_data_r.renderaggregates metrics fromiters,state,wedges,restarts,refinements,bypass,commits,parity,system_state,meta_judge, andcodex_by_iter. - Per-Iter Content: For each iteration, a YAML frontmatter block is generated using
site_data_r._iter_row, and the body is rendered usingper_iter_r.render. This content is written tosrc/content/iters/under theJOURNEY_WEBdirectory . - Leak Scanning: The site data and per-iter content are redacted, and any leaks are tracked .
After writing the content, the pipeline optionally rebuilds the Astro site via _rebuild_site . This function runs npm install (if needed) and npm run build in the JOURNEY_WEB directory 2. It also runs npm run scan to perform a final leak scan on the built dist/ directory .
Accuracy and Hygiene Checks
Section titled “Accuracy and Hygiene Checks”Before finalizing, the pipeline runs an accuracy check using journey_synth.accuracy_check . It loads the written site-data.json and recomputes headlines from raw sources to ensure they match . If any discrepancies are found, the pipeline fails .
Additionally, the pipeline runs tools/check_prose_hygiene.py with the --fix flag to normalize dashes and emoji in the generated content, ensuring it passes the repo’s prose gate .
"""journey_synth.cli - argparse entry point.
Invoke via `scripts/synthesize-journey.sh` (NOT bare `python3 -m journey_synth.cli`
from the repo root). The wrapper sets PYTHONPATH=scripts.
"""
from __future__ import annotations
import argparse
import json
import os
import subprocess
import sys
from datetime import datetime, timezone
from pathlib import Path
from journey_synth import sources
from journey_synth.cache import Cache, lookup_or_compute, load, save
from journey_synth.codex_cleanup import clean_markdown
from journey_synth.iter_correlator import correlate
from journey_synth.parser import PARSER_VERSION
from journey_synth.redactor import (
is_synthesizer_file,
redact,
redact_obj,
write_with_redaction,
)
from journey_synth.renderers import (
cost as cost_r,
lessons as lessons_r,
per_iter as per_iter_r,
site_data as site_data_r,
summary as summary_r,
timeline as timeline_r,
)
# Narrative-heavy docs go through codex cleanup; mostly-tabular docs do not
# (codex sometimes damages table alignment).
_CLEANUP_NARRATIVE_DOCS = {"SUMMARY.md", "lessons.md"}
# Repo-rooted via the shared, monorepo-free contract in sources (AUTOSWE_REPO_ROOT
env = dict(os.environ)
env["PATH"] = f"{Path.home()}/.local/share/mise/shims:" + env.get("PATH", "")
def _npm(args: list[str], timeout: int) -> subprocess.CompletedProcess | None:
try:
return subprocess.run(["npm", *args], cwd=JOURNEY_WEB, env=env,
capture_output=True, text=True, timeout=timeout)
except Exception as e: # noqa: BLE001
print(f"npm {' '.join(args)} failed: {type(e).__name__}: {e}", file=sys.stderr)
return None
if not (JOURNEY_WEB / "node_modules").is_dir():
inst = "ci" if (JOURNEY_WEB / "package-lock.json").exists() else "install"
ri = _npm([inst], 900)
if ri is None or ri.returncode != 0:
tail = ri.stderr[-2000:] if ri else ""
print(f"npm {inst} failed:\n{tail}", file=sys.stderr)
return 1
r = _npm(["run", "build"], 600)
if r is None or r.returncode != 0:
tail = r.stderr[-2000:] if r else ""
print(f"site rebuild failed:\n{tail}", file=sys.stderr)
return 1
# Build-time leak gate over the rendered dist/ (fail-closed): the published
# bytes are what actually ship, so this is the final publishability backstop.
scan = _npm(["run", "scan"], 120)
if scan is None or scan.returncode != 0:
tail = ((scan.stdout or "") + (scan.stderr or ""))[-2000:] if scan else ""
print(f"site rebuild blocked: leak scan failed:\n{tail}", file=sys.stderr)
return 1
print("site rebuilt + leak-scanned -> journey-web/dist")
return 0
def cmd_synthesize(args: argparse.Namespace) -> int:
"""Run the full synthesis. Writes 5 documents to docs/journey/."""
runtime = Path(args.runtime) if args.runtime else sources.RUNTIME
sddc = Path(args.sddc) if args.sddc else sources.SDDC