The gist

Voice-Pro is an open-source web application for speech processing, developed by the ABUS team. It integrates multiple AI models to solve the problem of multilingual content creation by combining video downloading, voice separation, speech recognition, translation, and text-to-speech into a single self-hosted tool. It serves as a free alternative to commercial dubbing and transcription services.

What it does

Transcribe audio and video using multiple Whisper-based models.
Translate speech or subtitle files into more than 100 languages.
Generate multilingual text-to-speech and perform zero-shot voice cloning.
Download videos from YouTube and extract the audio for processing.
Separate vocals from background music using Demucs.
Create and display word-level subtitles integrated with video playback.

How it works

Voice-Pro is a self-hosted application with a Gradio-based web interface. Users clone the GitHub repository and run installation scripts to set it up locally. It processes local media files or YouTube URLs, outputting audio files, translated subtitle files, or dubbed videos. The software is completely open-source and free, and works best on a Windows machine with an NVIDIA GPU.

Best for

Voice-Pro is best for content creators, podcasters, or researchers who need a comprehensive, free tool for transcribing, translating, and dubbing audio/video content on their local machine without cloud service costs.

Watch out for

The project is currently not being updated or developed, as the team is focused on another project. While the tool is listed as supporting Mac and Linux, operation on those systems has not been officially verified.