Clone voices, generate multi-lingual speech, and dictate into any app with an open-source, local-first AI voice studio that runs on your machine.
Voicebox is an open-source, local-first AI voice studio created by Jamie Pine. It provides a free alternative to cloud-based services like ElevenLabs by handling the entire voice input/output loop on the user's machine. It solves the problem of privacy and control for voice cloning, speech generation, and system-wide dictation, allowing users to keep their voice data and models entirely private.
Voicebox is a self-hosted desktop application for macOS, Windows, and Linux that runs AI models locally on the user's GPU (MLX, CUDA, ROCm). Users can import an audio file to clone a voice, then type text to generate speech. The output is an audio file that can be edited with effects or arranged in a timeline. All processing, models, and data remain on the user's machine. The software is free and open-source.
Developers and content creators who need a private, highly customizable toolkit for voice I/O, such as building voice-enabled AI agents, producing podcasts, or creating game dialogue.
Performance is dependent on your local hardware, particularly GPU capabilities. While macOS and Windows have installers, Linux users must build the application from the source code.