Skip to content
/ MSCLIP Public
forked from Hxyou/MSCLIP

Official Code of ECCV 2022 paper MS-CLIP

Notifications You must be signed in to change notification settings

labusch/MSCLIP

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP)

This repo contains the source code of our ECCV 2022 paper MS-CLIP:

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
2022 European Conference on Computer Vision (ECCV 2022)
By Haoxuan You*, Luowei Zhou*, Bin Xiao*, Noel Codella*, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan.

Introduction

MS-CLIP

We investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks. More specifically, we question how many parameters of a transformer model can be shared across modalities during contrastive pre-training, and rigorously examine architectural design choices that position the proportion of parameters shared along a spectrum. In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters. Additionally, we find that lightweight modality-specific parallel modules further improve performance.

MS-CLIP-S

Update

  • [07/20/2022] Released pretrained model and zero-shot evaluation on ImageNet-1k.

Pre-trained Weights

Model Training Set Top-1 on IN-1K LP* on 24 datasets Download
MS-CLIP-S (ViT-B/32) YFCC-22M 36.7 68.5 ckpt/config
MS-CLIP-S (ViT-B/16) YFCC-22M 39.0 70.4 ckpt/config
MS-CLIP-S (ViT-B/32) LAION-20M 40.2 73.3 ckpt/config

*LP: Linear Probing

Getting Started

Installation

Please follow INSTALL.md for installation

Data preparation

Please follow DATA.md for data preparation.

Pre-trained weights preparation

Download from the links in the table above. Put the weights under ./OUTPUT_MODEL/.

Evaluation

To evaluate a pre-trained MS-CLIP-S on ImageNet Zero-shot Classification, run:

CUDA_VISIBLE_DEVICES=0 python tools/eval_zeroshot.py --model <config-file> 

where <config-file> is the config yaml under experiments/model/. E.g. experiments/model/b32-laion-msclips.yaml

Contact

If you have any questions, please contact Haoxuan You or Luowei Zhou.

About

Official Code of ECCV 2022 paper MS-CLIP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%