SillyTavern has a wide range of TTS options. This page explains the setup and use.
TTS is used to have a voice narrate parts of your chat.
Used to select which TTS service you want to use.
- ElevenLabs - paid subscription required, highest quality voices available at present.
- Silero - free, runs on your PC, quality can vary widely
- System - uses your OS TTS engine, if one exists. Quality can vary widely depending on the OS.
- Edge - free, runs via Azure, generally quite fast, and voices feel natural but dry and emotionless. Like listening to the evening news or a radio announcer. When running with "Plugin" selected as the provider, you also need to install this server plugin, otherwise the TTS won't work.
- Coqui-TTS - free, No API Implementation at this time. High-performance Text2Speech models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech) as well as Bark.
- Novel - requires a paid NovelAI subscription, generated by NovelAI's TTS engine
- RVC - free, voice cloning
- Enabled - turns TTS playback on/off
- Auto Generation - lets TTS start playing automatically when a new message enters the chat
- Only narrate "quotes" - Limits TTS playback to only include text within
"quotation marks"
. This will*include "quotes" within asterisk lines*
(internal variable name =narrate_quoted_only
) - Ignore *text, even "quotes", inside asterisks* - TTS will not play any text within
*asterisks*
, even "quotes" (internal variable name =narrate_dialogues_only
) - having both "only narrate quotes" and "ignore asterisks" checkboxes both checked will result in the TTS only reading "quotes" which are not in asterisks, and ignoring everything else.
- Narrate only the translated text - this will make the TTS only narrate the translated text.
Given the example text: *Cohee approaches you with a faint "nya"* "Good evening, senpai", she says.
Here's a table showing how the text will be modified based on the boolean states of Ignore *text, even "quotes", inside asterisks* and Only narrate "quotes":
Ignore *text, even "quotes", inside asterisks* | Only narrate "quotes" | Output |
---|---|---|
Disabled | Disabled | Cohee approaches you with a faint "nya" "Good evening, senpai", she says. |
Disabled | Enabled | "nya"... "Good evening, senpai" |
Enabled | Disabled | "Good evening, senpai", she says. |
Enabled | Enabled | "Good evening, senpai" |
These will change depending on the API you select.
(explanation coming soon)
- Apply - this must be clicked after setting a TTS API and after editing the voice map.
- Available voices - loads a popup with all voices available for your selected API, and lets you preview them with sample dialogues.
- Click the "Enable" checkbox, or nothing will ever happen.
- Click the "Auto-generation" checkbox if you want the TTS to start automatically every time a new message arrives in chat.
- Optionally, click the megaphone icon inside the top-right of any message to playback on demand.
- Click the lower right "Stop" button (found inside the wand menu) to stop any playback.
You must provide a voice map for the TTS to use, otherwise, it won't know what voices should be used for each character.
These must be in the exact format stated below:
CharacterName:TTSVoice,CharacterName2:TTSVoice2
For Coqui-TTS the format needs to include the speaker and language from the WebGUI:
CharacterName:TTSVoice[speakerid][langid]
or
Aqua:tts_models--multilingual--multi-dataset--your_tts\model_file.pth[2][1]
If using Bark you must create a voice folder with a voice file to clone. Ensure you add voices to homedir\tts\bark_v0\speakers. On Windows it is probably C:\Users\USERACCOUNT\AppData\Local\tts\bark_v0\speakers\ type %appdata% in windows explorer then go UP a directory to local and you should see tts.
The directory should look like this:
- homedir
- tts
- bark_v0
- speakers
- customvoice1
- speaker.wav
- speaker.npz
- robinwilliams
- speaker.mp3
- me
- speaker.mp3
- customvoice1
- speakers
- bark_v0
- tts
One first load of this model and voice bark will clone the voice and create a .npz file, this is needed for faster TTS.