With the rise of recent interest in Vision Language Models (VLMs), we decided to make a push to include an ImageField within Argilla! This means any open source developer can now work on better models for vision ML tasks too and we would like to show you how.
We would love to introduce this new feature to you, so we've prepared a set of notebooks to go over some common image scenarios. finetune an CLIP retrieval model with sentence transformers use ColPali+ Qwen VL for RAG and log the results to Argilla image-generation preference: creating multi-modal preference datasets for free using Hugging Face inference endpoints.