Voice-to-Voice Streaming

Whisper realtime transcription → VAD segmentation → Sentence chunking → Kokoro TTS, with interruption handling.

Whisper

boot
Live transcript
Speak…

VAD

0.60
Idle

TTS

1.30x

Player

800ms

Spoken (approx)

Nothing spoken yet.

Remaining

Queue is empty.

Queue

Segments will appear here as you speak and pause.