sidebar_position: 2
:::warning This tutorial is a community contribution and is not supported by the OpenWebUI team. It serves only as a demonstration on how to customize OpenWebUI for your specific use case. Want to contribute? Check out the contributing tutorial. :::
openedai-speech into Open WebUI using Dockeropenedai-speech?:::info openedai-speech is an OpenAI audio/speech API compatible text-to-speech server.
It serves the /v1/audio/speech endpoint and provides a free, private text-to-speech experience with custom voice cloning capabilities. This service is in no way affiliated with OpenAI and does not require an OpenAI API key.
:::
openedai-speech serviceCreate a new folder, for example, openedai-speech-service, to store the docker-compose.yml and speech.env files.
openedai-speech repository from GitHubgit clone https://github.com/matatonic/openedai-speech.git
This will download the openedai-speech repository to your local machine, which includes the Docker Compose files (docker-compose.yml, docker-compose.min.yml, and docker-compose.rocm.yml) and other necessary files.
sample.env file to speech.env (Customize if needed)In the openedai-speech repository folder, create a new file named speech.env with the following contents:
TTS_HOME=voices
HF_HOME=voices
#PRELOAD_MODEL=xtts
#PRELOAD_MODEL=xtts_v2.0.2
#PRELOAD_MODEL=parler-tts/parler_tts_mini_v0.1
#EXTRA_ARGS=--log-level DEBUG --unload-timer 300
#USE_ROCM=1
You can use any of the following Docker Compose files:
ghcr.io/matatonic/openedai-speech image and builds from Dockerfile.ghcr.io/matatonic/openedai-speech-min image and builds from Dockerfile.min.
This image is a minimal version that only includes Piper support and does not require a GPU.ghcr.io/matatonic/openedai-speech-rocm image and builds from Dockerfile with ROCm support.Before running the Docker Compose file, you need to build the Docker image:
Nvidia GPU (CUDA support):
docker build -t ghcr.io/matatonic/openedai-speech .
AMD GPU (ROCm support):
docker build -f Dockerfile --build-arg USE_ROCM=1 -t ghcr.io/matatonic/openedai-speech-rocm .
CPU only, No GPU (Piper only):
docker build -f Dockerfile.min -t ghcr.io/matatonic/openedai-speech-min .
docker compose up -d commandNvidia GPU (CUDA support): Run the following command to start the openedai-speech service in detached mode:
docker compose up -d
AMD GPU (ROCm support): Run the following command to start the openedai-speech service in detached mode:
docker compose -f docker-compose.rocm.yml up -d
ARM64 (Apple M-series, Raspberry Pi): XTTS only has CPU support here and will be very slow. You can use the Nvidia image for XTTS with CPU (slow), or use the Piper only image (recommended):
docker compose -f docker-compose.min.yml up -d
CPU only, No GPU (Piper only): For a minimal docker image with only Piper support (< 1GB vs. 8GB):
docker compose -f docker-compose.min.yml up -d
This will start the openedai-speech service in detached mode.
You can also use the following Docker run commands to start the openedai-speech service in detached mode:
Nvidia GPU (CUDA): Run the following command to build and start the openedai-speech service:
docker build -t ghcr.io/matatonic/openedai-speech .
docker run -d --gpus=all -p 8000:8000 -v voices:/app/voices -v config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech
ROCm (AMD GPU): Run the following command to build and start the openedai-speech service:
To enable ROCm support, uncomment the
#USE_ROCM=1line in thespeech.envfile.
docker build -f Dockerfile --build-arg USE_ROCM=1 -t ghcr.io/matatonic/openedai-speech-rocm .
docker run -d --privileged --init --name openedai-speech -p 8000:8000 -v voices:/app/voices -v config:/app/config ghcr.io/matatonic/openedai-speech-rocm
CPU only, No GPU (Piper only): Run the following command to build and start the openedai-speech service:
docker build -f Dockerfile.min -t ghcr.io/matatonic/openedai-speech-min .
docker run -d -p 8000:8000 -v voices:/app/voices -v config:/app/config --name openedai-speech ghcr.io/matatonic/openedai-speech-min
openedai-speech for TTSOpen the Open WebUI settings and navigate to the TTS Settings under Admin Panel > Settings > Audio. Add the following configuration:
http://host.docker.internal:8000/v1sk-111111111 (Note that this is a dummy API key, as openedai-speech doesn't require an API key. You can use whatever you'd like for this field, as long as it is filled.)Under TTS Voice within the same audio settings menu in the admin panel, you can set the TTS Model to use from the following choices below that openedai-speech supports. The voices of these models are optimized for the English language.
tts-1 or tts-1-hd: alloy, echo, echo-alt, fable, onyx, nova, and shimmer (tts-1-hd is configurable; uses OpenAI samples by default)Save to apply the changes and start enjoying naturally sounding voicesPress the Save button to apply the changes to your Open WebUI settings. Refresh the page for the change to fully take effect and enjoy using openedai-speech integration within Open WebUI to read aloud text responses with text-to-speech in a natural sounding voice.
openedai-speech supports multiple text-to-speech models, each with its own strengths and requirements. The following models are available:
voice_to_speaker.yaml configuration file. This model is great for applications that require low latency and high performance. Piper TTS also supports multilingual voices.If you encounter any problems integrating openedai-speech with Open WebUI, follow these troubleshooting steps:
openedai-speech service: Ensure that the openedai-speech service is running and the port you specified in the docker-compose.yml file is exposed.host.docker.internal is resolvable from within the Open WebUI container. This is necessary because openedai-speech is exposed via localhost on your PC, but open-webui cannot normally access it from inside its container. You can add a volume to the docker-compose.yml file to mount a file from the host to the container, for example, to a directory that will be served by openedai-speech.openedai-speech doesn't require an API key.voice_to_speaker.yaml file and the corresponding files (e.g., voice XML files) are present in the correct directory.voice_to_speaker.yaml file match the actual locations of your voice models.docker-compose.yml file is correctly configured for your environment.openedai-speech service or the entire Docker environment.openedai-speech GitHub repository or seek help on a relevant community forum.How can I control the emotional range of the generated audio?
There is no direct mechanism to control the emotional output of the generated audio. Certain factors such as capitalization or grammar may affect the output audio, but internal testing has yielded mixed results.
Where are the voice files stored? What about the configuration file?.
The configuration files, which define the available voices and their properties, are stored in the config volume. Specifically, the default voices are defined in voice_to_speaker.default.yaml.
For more information on configuring Open WebUI to use openedai-speech, including setting environment variables, see the Open WebUI documentation.
For more information about openedai-speech, please visit the GitHub repository.
How to add more voices to openedai-speech: Custom-Voices-HowTo
:::note
You can change the port number in the docker-compose.yml file to any open and usable port, but be sure to update the API Base URL in Open WebUI Admin Audio settings accordingly.
:::