Select or upload a dialogue audio file to convert
Drag & drop your file here
or click to browse
MP3, MP4, M4A, WAV, WEBM · max 24 MB
Select the AI service to convert your audio
Whisper STT + GPT-4o TTS
Direct S2S or per-speaker TTS
Your key is used only for this session and is never stored.
Your key is used only for this session and is never stored.
Faster. Converts audio directly. One voice for all speakers.
Transcribes first. Assign a different voice to each speaker.
Find Voice IDs in your ElevenLabs dashboard.
Transcription typically takes 15–30 seconds…
Found speakers in your audio. Assign a voice to each one.
Transcript loaded in memory — assign a voice to each detected speaker below.
The output will be in English, but spoken with the accent of a native speaker of the selected language.
Synthesis can take up to a minute for longer audio…
Your voice-converted audio is ready
voice_output.mp3