Skip to content

Journey Synthesis

The Journey Synthesis pipeline transforms raw runtime artifacts - transcripts, state files, and system logs - into a static documentation site. It begins by loading and caching these sources, then correlates them into discrete iterations using the iter_correlator. The pipeline renders narrative and tabular content via specific renderers, applies strict redaction to prevent data leaks, and writes the output as a Content Collection for an Astro static site. Finally, it performs an accuracy check against the raw sources and triggers a rebuild of the static site to ensure the published documentation reflects the current run’s state.

The pipeline loads several key data structures using lookup_or_compute, which checks the cache before invoking the corresponding sources reader . These sources include:

  • state: Read from runtime/state.json to determine the current run’s start time and epoch .
  • history: Read from runtime/state.json .
  • wedges: Read from runtime/wedge-events.jsonl and runtime/wedge-lessons.jsonl .
  • restarts: Read from runtime/liveness-restarts.jsonl .
  • refinements: Read from runtime/rule-refinements.jsonl .
  • bypass: Read from runtime/orchestrator-bypass-attempts.jsonl .

Transcripts are listed from the transcripts_dir, scoped to the current run’s start time using _run_start_unix to filter out abandoned pre-reset runs .

diagram

The core of the synthesis is the correlate function from journey_synth.iter_correlator . It takes the loaded history, transcripts, wedges, restarts, refinements, commits, and an artifacts_reader lambda as inputs . This function re-walks the transcripts to correlate events into discrete iterations (iters) [src: scripts/journey_synth/cli.py:L257]. The resulting iters list is the primary data structure used for all subsequent rendering and site generation [src: scripts/journey_synth/cli.py:L267].

Once iterations are correlated, the pipeline renders content using specific renderers from journey_synth.renderers 1. The _write helper function handles the actual file writing .

  1. Narrative Docs: SUMMARY.md and lessons.md are processed through codex_cleanup.clean_markdown if the --codex-cleanup flag is enabled . This step strips AI-speak phrasing and em-dashes .
  2. Per-Iter Docs: Each iteration is rendered using per_iter_r.render(it) [src: scripts/journey_synth/cli.py:L282].
  3. Aggregates: lessons.md uses lessons_r.render, cost.md uses cost_r.render, and timeline.md uses timeline_r.render .

All content is passed through write_with_redaction, which uses redact_obj and redact to ensure no sensitive data leaks . In --public mode, any detected leak causes the write to fail and the pipeline to exit with an error .

The pipeline generates a single source of truth for the Astro static site: site-data.json and a Content Collection of per-iteration markdown files . This is handled by _write_site_content .

  1. Site Data: site_data_r.render aggregates metrics from iters, state, wedges, restarts, refinements, bypass, commits, parity, system_state, meta_judge, and codex_by_iter .
  2. Per-Iter Content: For each iteration, a YAML frontmatter block is generated using site_data_r._iter_row, and the body is rendered using per_iter_r.render . This content is written to src/content/iters/ under the JOURNEY_WEB directory .
  3. Leak Scanning: The site data and per-iter content are redacted, and any leaks are tracked .

After writing the content, the pipeline optionally rebuilds the Astro site via _rebuild_site . This function runs npm install (if needed) and npm run build in the JOURNEY_WEB directory 2. It also runs npm run scan to perform a final leak scan on the built dist/ directory .

diagram

Before finalizing, the pipeline runs an accuracy check using journey_synth.accuracy_check . It loads the written site-data.json and recomputes headlines from raw sources to ensure they match . If any discrepancies are found, the pipeline fails .

Additionally, the pipeline runs tools/check_prose_hygiene.py with the --fix flag to normalize dashes and emoji in the generated content, ensuring it passes the repo’s prose gate .