Powered by faster-whisper and NVIDIA NeMo Sortformer (runs locally on this Space).
Leave blank for auto-detect.
Uses NVIDIA Sortformer model (max 4 speakers, downloads ~700MB on first use).
Set to 0 for automatic detection, or specify 2-4 to consolidate speakers.
large-v3
nvidia/diar_streaming_sortformer_4spk-v2
WHISPER_MODEL_SIZE
medium