Skip to content

Latest commit

 

History

History
91 lines (63 loc) · 4.99 KB

TTS.md

File metadata and controls

91 lines (63 loc) · 4.99 KB

TTS

SillyTavern has a wide range of TTS options. This page explains the setup and use.

What is it?

TTS is used to have a voice narrate parts of your chat.

Configuring TTS

TTS Provider Selectbox

Used to select which TTS service you want to use.

  • ElevenLabs - paid subscription required, highest quality voices available at present.
  • Silero - free, runs on your PC, quality can vary widely
  • System - uses your OS TTS engine, if one exists. Quality can vary widely depending on the OS.
  • Edge - free, runs via Azure, generally quite fast, and voices feel natural but dry and emotionless. Like listening to the evening news or a radio announcer. When running with "Plugin" selected as the provider, you also need to install this server plugin, otherwise the TTS won't work.
  • Coqui-TTS - free, No API Implementation at this time. High-performance Text2Speech models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech) as well as Bark.
  • Novel - requires a paid NovelAI subscription, generated by NovelAI's TTS engine
  • RVC - free, voice cloning

Checkboxes

  • Enabled - turns TTS playback on/off
  • Auto Generation - lets TTS start playing automatically when a new message enters the chat
  • Only narrate "quotes" - Limits TTS playback to only include text within "quotation marks". This will *include "quotes" within asterisk lines* (internal variable name = narrate_quoted_only)
  • Ignore *text, even "quotes", inside asterisks* - TTS will not play any text within *asterisks*, even "quotes" (internal variable name = narrate_dialogues_only)
  • having both "only narrate quotes" and "ignore asterisks" checkboxes both checked will result in the TTS only reading "quotes" which are not in asterisks, and ignoring everything else.
  • Narrate only the translated text - this will make the TTS only narrate the translated text.

Given the example text: *Cohee approaches you with a faint "nya"* "Good evening, senpai", she says. Here's a table showing how the text will be modified based on the boolean states of Ignore *text, even "quotes", inside asterisks* and Only narrate "quotes":

Ignore *text, even "quotes", inside asterisks* Only narrate "quotes" Output
Disabled Disabled Cohee approaches you with a faint "nya" "Good evening, senpai", she says.
Disabled Enabled "nya"... "Good evening, senpai"
Enabled Disabled "Good evening, senpai", she says.
Enabled Enabled "Good evening, senpai"

Sliders

These will change depending on the API you select.

(explanation coming soon)

Buttons

  • Apply - this must be clicked after setting a TTS API and after editing the voice map.
  • Available voices - loads a popup with all voices available for your selected API, and lets you preview them with sample dialogues.

Using TTS

  1. Click the "Enable" checkbox, or nothing will ever happen.
  2. Click the "Auto-generation" checkbox if you want the TTS to start automatically every time a new message arrives in chat.
  3. Optionally, click the megaphone icon inside the top-right of any message to playback on demand.
  4. Click the lower right "Stop" button (found inside the wand menu) to stop any playback.

Voice Map

You must provide a voice map for the TTS to use, otherwise, it won't know what voices should be used for each character.

These must be in the exact format stated below:

CharacterName:TTSVoice,CharacterName2:TTSVoice2

For Coqui-TTS the format needs to include the speaker and language from the WebGUI:

CharacterName:TTSVoice[speakerid][langid] or Aqua:tts_models--multilingual--multi-dataset--your_tts\model_file.pth[2][1]

Bark ZeroShot Voice Cloning Speakers

If using Bark you must create a voice folder with a voice file to clone. Ensure you add voices to homedir\tts\bark_v0\speakers. On Windows it is probably C:\Users\USERACCOUNT\AppData\Local\tts\bark_v0\speakers\ type %appdata% in windows explorer then go UP a directory to local and you should see tts.

The directory should look like this:

  • homedir
    • tts
      • bark_v0
        • speakers
          • customvoice1
            • speaker.wav
            • speaker.npz
          • robinwilliams
            • speaker.mp3
          • me
            • speaker.mp3

One first load of this model and voice bark will clone the voice and create a .npz file, this is needed for faster TTS.