Data Models & Storage

The core data architecture of meetingscribe relies on Pydantic models to enforce strict schema validation for meeting metadata, transcript events, and room configurations. These models serve as the single source of truth for in-memory state and are serialized into durable JSON artifacts on disk. The system distinguishes between transient, ephemeral meetings that exist only in scratch space and durable meetings that persist full audio and transcript journals.

Core Transcript and Meeting Models

The TranscriptEvent is the atomic unit of the system, keyed by a unique segment_id. Each segment supports multiple revisions as the Automatic Speech Recognition (ASR) engine refines its output, though the UI only displays the highest revision. The model includes fields for timing (start_ms, end_ms), language detection, and raw text. It also supports optional HTML-annotated text via furigana_html for Japanese characters and a list of SpeakerAttribution objects to handle speaker identity and overlap. Translation status is tracked via a nested TranslationState model, which manages lifecycle states like PENDING, IN_PROGRESS, DONE, FAILED, and SKIPPED, along with structured failure reasons for retry logic ¹.

Meeting metadata is encapsulated in the MeetingMeta model, which is persisted as meta.json using atomic write-rename operations. This model tracks the meeting lifecycle through the MeetingState enum, which supports states such as CREATED, RECORDING, FINALIZING, COMPLETE, INTERRUPTED, and REPROCESSING. The REPROCESSING state is specifically designed to handle server interruptions during reprocessing tasks, allowing the system to recover and flip the state back to COMPLETE on the next startup. The metadata also anchors the audio recording with recording_started_epoch_ms, enabling the conversion of relative transcript timestamps to absolute UTC times. It includes a language_pair field validated to ensure it contains one or two distinct language codes, and an ephemeral flag that determines if the meeting’s data is stored in scratch space and wiped on stop ².

Room Layout and Speaker Identity

Room configuration is managed through RoomLayout, TableObject, and SeatPosition models. These models allow for configurable table positions, sizes, and shapes using percentage-based coordinates, as well as seat positions that can be linked to enrolled speakers via enrollment_id. The RoomLayout is persisted as room.json within each meeting directory and supports presets like “boardroom” or “classroom” that can be edited post-setup ³.

Speaker identity is tracked using the DetectedSpeaker model, which captures per-meeting state for speakers discovered during the session. This model records the speaker’s display name, match confidence against enrolled speakers, segment counts, and the first and last seen timestamps. It is stored in detected_speakers.json per meeting, separate from the enrolled reference speakers.

Durable Storage and Ephemeral Modes

The system employs distinct storage mechanisms based on the ephemeral flag in MeetingMeta. When ephemeral is True, the meeting’s on-disk tree resides in scratch space, and both raw audio and the transcript journal are never written to disk. This mode is captured at the start of the meeting to prevent partial saves if the mode is toggled mid-meeting. The process-wide ephemeral registry serves as the live source of truth for path resolution in these cases ².

For durable meetings, artifacts are persisted as JSON files. meta.json stores the meeting metadata, room.json stores the room layout, and detected_speakers.json stores speaker detection results. The transcript journal is stored in journal.ndjson, preserving the full revision history of all TranscriptEvent objects. The MeetingMeta model also includes an is_favorite flag, which is surfaced in the meetings list to help users easily identify meetings marked as useful for demo or reference purposes.

src/meeting_scribe/models.py L1-120 (showing 40 of 120)

"""Core data models for meeting transcription events.

TranscriptEvent is the fundamental unit - every ASR result, revision,
and translation flows through this model. UI renders by segment_id,
showing only the highest revision.
"""

from __future__ import annotations

import uuid
from enum import StrEnum

from pydantic import BaseModel, Field, field_validator

from meeting_scribe.languages import is_valid_languages


class TranslationStatus(StrEnum):
    """Translation lifecycle states."""

    PENDING = "pending"
    IN_PROGRESS = "in_progress"
    DONE = "done"
    FAILED = "failed"
    SKIPPED = "skipped"


class MeetingState(StrEnum):
    """Meeting lifecycle state machine.

    created -> recording -> finalizing -> complete | interrupted
    complete -> reprocessing -> complete   (transient, cleared on success)
    """

    CREATED = "created"
    RECORDING = "recording"
    FINALIZING = "finalizing"
    COMPLETE = "complete"
    INTERRUPTED = "interrupted"
    # Set by reprocess_meeting() at step 0 and cleared at step 7. If the

src/meeting_scribe/models.py L121-240 (showing 40 of 120)

        """
        return self.model_copy(
            update={
                "translation": TranslationState(
                    status=status,
                    text=text,
                    target_language=target_language or "",
                    reason=reason,
                ),
            },
        )


class MeetingMeta(BaseModel):
    """Meeting metadata - persisted as meta.json with atomic write-rename."""

    meeting_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    state: MeetingState = MeetingState.CREATED
    created_at: str = ""  # ISO 8601 - when the meeting record was created
    # Wall-clock anchor for the recording. Set to the unix epoch (ms)
    # of the FIRST audio sample in recording.pcm. Together with
    # audio_sample_rate this makes the PCM file an absolute-time record:
    # wall_clock_ms = recording_started_epoch_ms + byte_offset / (sample_rate * 2 / 1000)
    # The transcript `start_ms` fields are byte-offset-relative to the
    # audio file, so combining them with this anchor yields the real
    # UTC timestamp each segment was spoken.
    recording_started_epoch_ms: int = 0
    organizer_token_hash: str = ""
    invite_code_hash: str = ""
    max_attendees: int = 10
    audio_sample_rate: int = 16000
    # Languages spoken in the meeting. Length 1 = monolingual (no translation
    # work is scheduled, UI collapses the second pane); length 2 = bilingual
    # pair with distinct codes. The field is named ``language_pair`` for
    # historical reasons (it predates monolingual support); kept as-is because
    # it is baked into persisted meta.json, journal.ndjson, and exported
    # artifacts - all of which are internal-only and can still evolve freely.
    #
    # The Pydantic validator below is the authoritative shape check; every
    # code path that constructs a ``MeetingMeta`` (API, reload, fixtures)

src/meeting_scribe/models.py L241-280

    seat_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    x: float = Field(ge=0, le=100, default=50.0)
    y: float = Field(ge=0, le=100, default=50.0)
    enrollment_id: str | None = None
    speaker_name: str = ""


class RoomLayout(BaseModel):
    """Room configuration - tables + seats, all freely positionable.

    Persisted as room.json in each meeting directory.
    During setup, exists as a draft in server memory.
    Presets (boardroom, classroom, etc.) set initial positions but
    everything is editable after.
    """

    preset: str = "rectangle"  # which preset was last applied
    tables: list[TableObject] = Field(default_factory=list)
    seats: list[SeatPosition] = Field(default_factory=list)


# ── Speaker Identity ─────────────────────────────────────────


class DetectedSpeaker(BaseModel):
    """A speaker discovered during a meeting.

    Separate from enrolled reference speakers - per-meeting state only.
    May be matched to an enrolled speaker or remain as "Speaker N".
    Stored in detected_speakers.json per meeting.
    """

    speaker_id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    display_name: str = ""
    matched_enrollment_id: str | None = None
    match_confidence: float = 0.0
    segment_count: int = 0
    first_seen_ms: int = 0
    last_seen_ms: int = 0