Skip to content

Data Models & Storage

The core data architecture of meetingscribe relies on Pydantic models to enforce strict schema validation for meeting metadata, transcript events, and room configurations. These models serve as the single source of truth for in-memory state and are serialized into durable JSON artifacts on disk. The system distinguishes between transient, ephemeral meetings that exist only in scratch space and durable meetings that persist full audio and transcript journals.

The TranscriptEvent is the atomic unit of the system, keyed by a unique segment_id. Each segment supports multiple revisions as the Automatic Speech Recognition (ASR) engine refines its output, though the UI only displays the highest revision. The model includes fields for timing (start_ms, end_ms), language detection, and raw text. It also supports optional HTML-annotated text via furigana_html for Japanese characters and a list of SpeakerAttribution objects to handle speaker identity and overlap. Translation status is tracked via a nested TranslationState model, which manages lifecycle states like PENDING, IN_PROGRESS, DONE, FAILED, and SKIPPED, along with structured failure reasons for retry logic 1.

Meeting metadata is encapsulated in the MeetingMeta model, which is persisted as meta.json using atomic write-rename operations. This model tracks the meeting lifecycle through the MeetingState enum, which supports states such as CREATED, RECORDING, FINALIZING, COMPLETE, INTERRUPTED, and REPROCESSING. The REPROCESSING state is specifically designed to handle server interruptions during reprocessing tasks, allowing the system to recover and flip the state back to COMPLETE on the next startup. The metadata also anchors the audio recording with recording_started_epoch_ms, enabling the conversion of relative transcript timestamps to absolute UTC times. It includes a language_pair field validated to ensure it contains one or two distinct language codes, and an ephemeral flag that determines if the meeting’s data is stored in scratch space and wiped on stop 2.

diagram

Room configuration is managed through RoomLayout, TableObject, and SeatPosition models. These models allow for configurable table positions, sizes, and shapes using percentage-based coordinates, as well as seat positions that can be linked to enrolled speakers via enrollment_id. The RoomLayout is persisted as room.json within each meeting directory and supports presets like “boardroom” or “classroom” that can be edited post-setup 3.

Speaker identity is tracked using the DetectedSpeaker model, which captures per-meeting state for speakers discovered during the session. This model records the speaker’s display name, match confidence against enrolled speakers, segment counts, and the first and last seen timestamps. It is stored in detected_speakers.json per meeting, separate from the enrolled reference speakers.

The system employs distinct storage mechanisms based on the ephemeral flag in MeetingMeta. When ephemeral is True, the meeting’s on-disk tree resides in scratch space, and both raw audio and the transcript journal are never written to disk. This mode is captured at the start of the meeting to prevent partial saves if the mode is toggled mid-meeting. The process-wide ephemeral registry serves as the live source of truth for path resolution in these cases 2.

For durable meetings, artifacts are persisted as JSON files. meta.json stores the meeting metadata, room.json stores the room layout, and detected_speakers.json stores speaker detection results. The transcript journal is stored in journal.ndjson, preserving the full revision history of all TranscriptEvent objects. The MeetingMeta model also includes an is_favorite flag, which is surfaced in the meetings list to help users easily identify meetings marked as useful for demo or reference purposes.