Skip to content

meetingscribe

meetingscribe is a real-time multilingual meeting transcription system designed for the Dell Pro Max with GB10 1. It orchestrates local hardware, external AI backends, and real-time WebSocket streams to transcribe, translate, and synthesize interpretation audio between any pair of 20 supported languages. The system identifies speakers via diarization, streams interpretation audio to guests over a local WiFi hotspot, and records the full meeting with time-aligned audio and bilingual transcript views.

diagram
Subsystem Description
FastAPI Server Central orchestrator managing real-time transcription, translation, and TTS pipelines .
ASR Backend Qwen3-ASR-1.7B model running on vLLM for real-time multilingual transcription .
Translation Backend Qwen3.6-35B-A3B-FP8 model on vLLM for multilingual translation and name extraction .
TTS Backend Qwen3-TTS-12Hz-0.6B-Base model using faster-qwen3-tts for synthesized interpretation audio .
Diarization Backend pyannote.audio 4.0.4 for speaker identification with optional voice enrollment .
PipeWire Audio Server-side mic capture and local playback routing for Poly room devices and headsets .
WiFi Hotspot Local WiFi AP with captive portal for guest device access and interpretation audio streaming .