This is an implementation of Qwen2-Audio-7B-Instruct-Int4 by ComfyUI, including support for text-based queries and audio queries to generate captions or responses.
- Text-based Query: Users can submit textual queries to request information or generate descriptions. For instance, a user might input a description like "What is the meaning of life?"
- Audio Query: When a user uploads an audio file, the system can analyze the content and generate a detailed caption or a summary of the entire audio. For example, "Tell me what you hear in this audio clip."
-
Install from ComfyUI Manager (search for
Qwen2
) -
Download or git clone this repository into the
ComfyUI\custom_nodes\
directory and run:
pip install -r requirements.txt
All the models will be downloaded automatically when running the workflow if they are not found in the ComfyUI\models\prompt_generator\
directory.