OuteTTS is an experimental text-to-speech model that uses a pure language modeling approach to generate speech.
pip install outetts
Important: For GGUF support, you must manually install llama-cpp-python
first.
Visit https://github.com/abetlen/llama-cpp-python for specific installation instructions
from outetts.v0_1.interface import InterfaceHF, InterfaceGGUF
# Initialize the interface with the Hugging Face model
interface = InterfaceHF("OuteAI/OuteTTS-0.1-350M")
# Or initialize the interface with a GGUF model
# interface = InterfaceGGUF("path/to/model.gguf")
# Generate TTS output
# Without a speaker reference, the model generates speech with random speaker characteristics
output = interface.generate(
text="Hello, am I working?",
temperature=0.1,
repetition_penalty=1.1,
max_length=4096
)
# Play the generated audio
output.play()
# Save the generated audio to a file
output.save("output.wav")
# Create a custom speaker from an audio file
speaker = interface.create_speaker(
"path/to/reference.wav",
"reference text matching the audio"
)
# Save the speaker to a file
interface.save_speaker(speaker, "speaker.pkl")
# Load the speaker from a file
speaker = interface.load_speaker("speaker.pkl")
# Generate TTS with the custom voice
output = interface.generate(
text="This is a cloned voice speaking",
speaker=speaker,
temperature=0.1,
repetition_penalty=1.1,
max_length=4096
)
https://www.outeai.com/blog/OuteTTS-0.1-350M
- WavTokenizer: GitHub Repository
- Decoder and encoder folder files are from this repository
- CTC Forced Alignment: PyTorch Tutorial