Skip to content

edwko/OuteTTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OuteTTS

HuggingFace HuggingFace HuggingFace

OuteTTS is an experimental text-to-speech model that uses a pure language modeling approach to generate speech.

Installation

pip install outetts

Important: For GGUF support, you must manually install llama-cpp-python first.

Visit https://github.com/abetlen/llama-cpp-python for specific installation instructions

Usage

Interface Usage

from outetts.v0_1.interface import InterfaceHF, InterfaceGGUF

# Initialize the interface with the Hugging Face model
interface = InterfaceHF("OuteAI/OuteTTS-0.1-350M")

# Or initialize the interface with a GGUF model
# interface = InterfaceGGUF("path/to/model.gguf")

# Generate TTS output
# Without a speaker reference, the model generates speech with random speaker characteristics
output = interface.generate(
    text="Hello, am I working?",
    temperature=0.1,
    repetition_penalty=1.1,
    max_length=4096
)

# Play the generated audio
output.play()

# Save the generated audio to a file
output.save("output.wav")

Voice Cloning

# Create a custom speaker from an audio file
speaker = interface.create_speaker(
    "path/to/reference.wav",
    "reference text matching the audio"
)

# Save the speaker to a file
interface.save_speaker(speaker, "speaker.pkl")

# Load the speaker from a file
speaker = interface.load_speaker("speaker.pkl")

# Generate TTS with the custom voice
output = interface.generate(
    text="This is a cloned voice speaking",
    speaker=speaker,
    temperature=0.1,
    repetition_penalty=1.1,
    max_length=4096
)

Technical Blog

https://www.outeai.com/blog/OuteTTS-0.1-350M

Credits