A speech-to-speech talking bot based on Vicuna & FastChat, locally running on your computer.
- record a piece of audio file yourself, which includes the question you would like to ask to Vicuna.
- run the program, and get an audio file with the answer included.
- load a sythetic demo and test it, so there is no need to record;
- write down the questions, and you can generate the demo yourself. No limit for the number of questions!
- English is the only language supported currently.
- The performance of the speech-to-text model is not that POWERFUL, so you may find out that when you record the audio yourself, the recognized text is sometimes in a mess. That's why we provide the "demo generation" function though.
- It may take long to load the models with only CPU.
- You may need to download the Vicuna dataset in advance. Check the tutorial.
clone the repo and cd into the directory.
> git clone
> cd VicunaTalk
Check the instructions below first (or you may probably fail to get it to work), and run VicunaTalk.py
to talk!
> python VicunaTalk.py
Use requirements.txt
to set up the conda virtual environment:
> conda env create -f environment.yml
> conda activate vicuna-talk
There are two kinds of configurations (config.yaml
), one is for global models, and the other is for running demos.
- The global one (in the root directory)
- The demo one (in the
demo
directory)
In most cases, there is no need to change the demo one, before you know what you are doing. Check the demo part for detail.
This file contains some information for different models included.
speech-to-text:
model-id: "jonatasgrosman/wav2vec2-large-xlsr-53-english"
text-to-speech:
model-id: "microsoft/speecht5_tts"
vocoder: "microsoft/speecht5_hifigan"
voice_path: XXXX # voice json path, modify this.
language-model:
model-id: XXXX # vicuna model path, modify this.
device: XXXX
num_gpus: XXXX
max_gpu_memory: XXXX
load_8bit: XXXX
cpu_offloading: XXXX
temperature: XXXX
As mentioned above, modify these two settings, or it may never run as a program.
In the configuration, we referenced 2 models from HuggingFace, speecht5_tts and wav2vec2-large-xlsr-53-english.
What does a "demo" mean?
A "demo" is a bunch of audio files including a series of questions, from recording or generating. When you are loading a demo, the questions are automatically sent to Vicuna in order, so you do not need to record every time.
Demos are stored in the demo
folder, named with its topic.
A demo consists of ...
- a
config.yaml
specifying all of its information - a series of
.wav
audio files named in order, including the questions you want to ask.
Talk to vicuna, and choose "D" for loading demos. Type in the demo name, such as intro_sorting
which is provided in the repo. And the questions in the audios will be asked to Vicuna one by one.
Open demo_creator.py
, you will see something like this:
if __name__ == "__main__":
# Put all of your questions here, one by one
query = [
"Tell me three sorting algorithms, just the names.",
"Which one has the least time complexity? Just tell me the name.",
]
# Put the name of the demo here
d = Demo("intro_sorting", query)
d()
After modifying the code, just run python demo_creator.py
. And your demo will show up in the demo
folder now!