Voice-to-Voice Streaming

Whisper realtime transcription → VAD segmentation → Sentence chunking → Kokoro TTS, with interruption handling.

Whisper

boot

Live transcript

Speak…

VAD

Threshold0.60

Idle

TTS

VoiceSpeed1.30x

Player

Autoplay

Crossover800ms

Spoken (approx)

Nothing spoken yet.

Remaining

Queue is empty.

Queue

Segments will appear here as you speak and pause.