Open
Description
Model description
A new large language and vision model (LLVM) that uses auxiliary visual information and natural language for prediction.
It uses 2 modules: 𝙈𝙤𝘼𝙄-𝘾𝙤𝙢𝙥𝙧𝙚𝙨𝙨𝙤𝙧 and 𝙈𝙤𝘼𝙄-𝙈𝙞𝙭𝙚𝙧. Here 𝗖𝗼𝗺𝗽𝗿𝗲𝘀𝘀𝗼𝗿 condenses the verbalized outputs of the external CV models into auxiliary visual information and 𝗠𝗶𝘅𝗲𝗿 blends three types of intelligence — visual features, auxiliary features from external CV models and language features into a cohesive whole.
MoAI-7B surpasses both open-source and closed-source LLVMs in vision language tasks.
Model repo: https://github.com/ByungKwanLee/MoAI
Open source status
- The model implementation is available
- The model weights are available
Provide useful links for the implementation
No response
Activity