Adding imagebind #30690

EduardoPach · 2024-05-07T09:35:05Z

What does this PR do?

This PR fixes #23240 by adding ImageBind model.

This is based on #26310 which is currently stale and the author said it would not have time to work on it (though welcome to help @dg845 ).

Taking into consideration the points raised by @dg845 here #26310 (comment) I'll focus on adding the text/image/audio portion and try to contact the authors.

Who can Review

@amyeroberts (?)

…MU) and update config classes for text and image modalities.

…al, IMU).

…, thermal).

…ality.

…h, thermal, imu).

…ImageBind follows Audio Spectrogram Transformer audio processing).

…uding audio (depth, thermal).

…s/image processors to ImageBind's __init__.py file.

… processing.

…clipped images) following VideoMAE.

Co-authored-by: Pablo Montalvo <[email protected]>

…orTesterMixin

RUFFY-369 · 2024-10-09T12:45:30Z

Hi @molbap , I have addressed all the final pass checks as well and left one two questions. If you can please have a look when you get the time.

Final pass I think before passing it to @ArthurZucker (pinging so it's on his radar). Added some comments on qkv biases that are nonstandard a a few other things, overall looks really good! I did run slow tests locally, left a comment on one but all seems inline.

I think if you agree with the changes done then cc @ArthurZucker can take on with the review round.

Thank you

RUFFY-369 · 2024-10-09T12:45:46Z

All tests are green

ylacombe

Sorry for the delay, I left some comments on the feature extractor!

src/transformers/models/imagebind/feature_extraction_imagebind.py

ylacombe · 2024-10-10T14:48:48Z

src/transformers/models/imagebind/feature_extraction_imagebind.py

+                num_mel_bins=self.num_mel_bins,
+            )
+        else:
+            waveform = np.squeeze(waveform)


(nit)
I think you also want to make sure you do it on the right dimension to avoid edge case (empty audio), WDYT?

Yeah, edge cases are important to handle just in case if the waveform of raw speech clip is empty.
Done in the recent commits 👍

ylacombe · 2024-10-10T14:55:36Z

tests/models/imagebind/test_feature_extraction_imagebind.py

+            feature_extractor.max_length,
+        )
+        self.assertEqual(input_values.shape, expected_shape)
+        self.assertTrue(torch.allclose(input_values[:, :, 0, 0, 0], expected_input, atol=1e-4))


Let's do another test like this, but focusing on a different segment of the expected output

Sure, made it more robust in the recent commits 👍

ylacombe · 2024-10-10T14:55:50Z

tests/models/imagebind/test_feature_extraction_imagebind.py

ylacombe · 2024-10-10T15:00:51Z

src/transformers/models/imagebind/feature_extraction_imagebind.py

+                remove_dc_offset=True,
+            ).T
+
+            fbank = torch.from_numpy(fbank)


Since the model is using torch already, I don't mind keeping this dependent on torch

But we could probably do the rest of the operation in numpy right ?

~~@ylacombe I tried doing the rest of the operation in numpy like this:~~

~~but keeping it in numpy during padding or truncation results in inconsistent shapes and fails the test~~

Update: had to overwrite previous comment as the changes were wrong because numpy operations were performed on torch tensors
(Comment can be ignored)

But we could probably do the rest of the operation in numpy right ?

@ylacombe When if is_speech_available(): , ta_kaldi.fbank is used to perform operation on the numpy array converted into torch tensor like as you mentioned in next review comment that numpy to torch conversion should be done during ta_kaldi.fbank. So, the output is a torch tensor and that's why at line 267 that you referred, numpy to torch conversion has to be done when is_speech_available is False because out of the if-else block, padding operations are done with torch because when is_speech_available is True, we have output as torch tensor to proceed ahead

ylacombe · 2024-10-10T15:09:39Z

src/transformers/models/imagebind/feature_extraction_imagebind.py

+    return result
+
+
+class ImageBindFeatureExtractor(SequenceFeatureExtractor):


This looks good to me. However, there might be too many back and forth from numpy arrays to torch tensors.

IMO yo should do the operation numpy->torch.tensor and torch.tensor->numpy only once, i.e when torchaudio is available and you use ta_kaldi.fbank.
Every other operations should be left in numpy array IMO. That way, you benefit from torchaudio fbank speedups while keeping the dependency to torch minimal.

@ylacombe With reference to the above reply, the numpy->torch.tensor and torch.tensor->numpy is only done to create spectrogram just here in the code file and is necessary because the operations ahead are done on the torch tensor because when is_speech_available is True ta_kaldi.fbank() will output a torch tensor as well. So, to facilitate that the spectogram created with numpy array has to be created into a torch tensor

Co-authored-by: Yoach Lacombe <[email protected]>

ArthurZucker

Huge work @RUFFY-369 and @EduardoPach congrats! 🤗
Some small nits here and there but overall good for me!

ArthurZucker · 2024-10-11T17:14:22Z

src/transformers/models/imagebind/convert_imagebind_to_hf.py

+def rename_encoder_layers(config, modality):
+    rename_keys = []
+    # fmt: off
+    for layer_idx in range(config.num_hidden_layers):
+        rename_keys.extend(
+            [
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.attn.in_proj_weight",f"{modality}_model.encoder.layers.{layer_idx}.self_attn.qkv_proj.weight"),
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.attn.in_proj_bias",f"{modality}_model.encoder.layers.{layer_idx}.self_attn.qkv_proj.bias"),
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.attn.out_proj.weight",f"{modality}_model.encoder.layers.{layer_idx}.self_attn.out_proj.weight"),
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.attn.out_proj.bias",f"{modality}_model.encoder.layers.{layer_idx}.self_attn.out_proj.bias"),
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.norm_1.weight",f"{modality}_model.encoder.layers.{layer_idx}.layernorm_before.weight"),
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.norm_1.bias",f"{modality}_model.encoder.layers.{layer_idx}.layernorm_before.bias"),
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.mlp.fc1.weight",f"{modality}_model.encoder.layers.{layer_idx}.mlp.fc1.weight"),
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.mlp.fc1.bias",f"{modality}_model.encoder.layers.{layer_idx}.mlp.fc1.bias"),
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.mlp.fc2.weight",f"{modality}_model.encoder.layers.{layer_idx}.mlp.fc2.weight"),
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.mlp.fc2.bias",f"{modality}_model.encoder.layers.{layer_idx}.mlp.fc2.bias"),
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.norm_2.weight",f"{modality}_model.encoder.layers.{layer_idx}.layernorm_after.weight"),
+                (f"modality_trunks.{modality}.blocks.{layer_idx}.norm_2.bias",f"{modality}_model.encoder.layers.{layer_idx}.layernorm_after.bias"),
+            ]
+        )
+        if config.add_kv_bias:
+            rename_keys.extend(
+                [
+                    (f"modality_trunks.{modality}.blocks.{layer_idx}.attn.bias_k",f"{modality}_model.encoder.layers.{layer_idx}.self_attn.k_bias",),
+                    (f"modality_trunks.{modality}.blocks.{layer_idx}.attn.bias_v",f"{modality}_model.encoder.layers.{layer_idx}.self_attn.v_bias",),
+                ]
+            )
+    # fmt: on
+
+    return rename_keys
+
+
+# here we list all keys to be renamed (original name on the left, our name on the right)
+def create_rename_keys(config):
+    vision_config = config.vision_config
+    text_config = config.text_config
+    audio_config = config.audio_config
+
+    rename_keys = []
+
+    # fmt: off
+
+    # Convert Vision
+    rename_keys.extend([
+        ("modality_preprocessors.vision.cls_token", "vision_model.embeddings.cls_token"),
+        ("modality_preprocessors.vision.rgbt_stem.proj.1.weight", "vision_model.embeddings.patch_embedding.projection.weight"),
+        ("modality_preprocessors.vision.pos_embedding_helper.pos_embed", "vision_model.embeddings.position_embeddings"),
+        ("modality_heads.vision.0.weight", "vision_model.layernorm.weight"),
+        ("modality_heads.vision.0.bias", "vision_model.layernorm.bias"),
+        ("modality_heads.vision.2.weight", "vision_projection.weight"),
+        ("modality_trunks.vision.pre_transformer_layer.0.weight", "vision_model.pre_layernorm.weight"),
+        ("modality_trunks.vision.pre_transformer_layer.0.bias", "vision_model.pre_layernorm.bias"),
+    ])
+
+    rename_keys.extend(
+        rename_encoder_layers(vision_config, "vision")
+    )
+
+    # Convert Text
+    rename_keys.extend([
+        ("modality_preprocessors.text.pos_embed", "text_model.embeddings.position_embedding.weight"),
+        ("modality_preprocessors.text.token_embedding.weight", "text_model.embeddings.token_embedding.weight"),
+        ("modality_heads.text.proj.0.weight", "text_model.layernorm.weight"),
+        ("modality_heads.text.proj.0.bias", "text_model.layernorm.bias"),
+        ("modality_heads.text.proj.1.weight", "text_projection.weight"),
+        ("modality_postprocessors.text.1.log_logit_scale", "text_postprocessor.log_logit_scale"),
+    ])
+
+    rename_keys.extend(
+        rename_encoder_layers(text_config, "text")
+    )
+
+    # Convert Audio
+    rename_keys.extend([
+        ("modality_preprocessors.audio.cls_token", "audio_model.embeddings.cls_token"),
+        ("modality_preprocessors.audio.rgbt_stem.proj.weight", "audio_model.embeddings.patch_embedding.projection.weight"),
+        ("modality_preprocessors.audio.rgbt_stem.norm_layer.weight", "audio_model.embeddings.patch_embedding.layernorm.weight"),
+        ("modality_preprocessors.audio.rgbt_stem.norm_layer.bias", "audio_model.embeddings.patch_embedding.layernorm.bias"),
+        ("modality_preprocessors.audio.pos_embedding_helper.pos_embed", "audio_model.embeddings.position_embeddings"),
+        ("modality_heads.audio.0.weight", "audio_model.layernorm.weight"),
+        ("modality_heads.audio.0.bias", "audio_model.layernorm.bias"),
+        ("modality_heads.audio.2.weight", "audio_projection.weight"),
+    ])
+
+    rename_keys.extend(
+        rename_encoder_layers(audio_config, "audio")
+    )
+    # fmt: on
+
+    return rename_keys


I don't want to be a pain 😅 but a lot of this can be simplified with regexed! Would love to see something simple like we have in mllama!

transformers/src/transformers/models/mllama/convert_mllama_weights_to_hf.py

Line 41 in 144852f

ORIGINAL_TO_CONVERTED_KEY_MAPPING = {

Made the changes while using regex as re in the recent commits 👍 . Please check and mention if any changes are necessary.

ArthurZucker · 2024-10-11T17:14:42Z

src/transformers/models/imagebind/convert_imagebind_to_hf.py

+def prepare_input():
+    ds = load_dataset("EduardoPacheco/imagebind-example-data", split="train")
+    images = ds["image"]
+    texts = ds["text"]
+    audios = [
+        torchaudio.functional.resample(
+            torch.from_numpy(audio["array"]), orig_freq=audio["sampling_rate"], new_freq=16000
+        ).numpy()
+        for audio in ds["audio"]
+    ]
+
+    return images, texts, audios


Looks like this is unused no ?

Yes, it was used for converted model weights testing but when @molbap reviewed the files he suggested to move the assertion tests. They have been moved and this func can be omitted 👍

Status: Done in the recent commits

src/transformers/models/imagebind/image_processing_imagebind.py

ArthurZucker · 2024-10-11T17:38:39Z

src/transformers/models/imagebind/modeling_imagebind.py

+        if self.scale_logits:
+            self.logit_scale_init = config.logit_scale_init_value
+            self.max_logit_scale = max_logit_scale
+            self.learnable = config.learnable_logit_scale
+
+            log_logit_scale = torch.ones([]) * np.log(self.logit_scale_init)
+            if self.learnable:
+                self.log_logit_scale = nn.Parameter(log_logit_scale)
+            else:
+                self.register_buffer("log_logit_scale", log_logit_scale)


same comment about code paths, that is something we try to avoid.
Is this used in released checkpoints?

It is used here and then to the original modeling file.
It's also used in the released checkpoints for eg. this param: 'modality_postprocessors.audio.1.log_logit_scale'

ArthurZucker · 2024-10-11T17:41:45Z

src/transformers/models/imagebind/modeling_imagebind.py

+    def _build_attention_mask(self, attention_mask, batch_size, seq_len, dtype, device=None):
+        # Build causal mask
+        mask = torch.empty(batch_size, seq_len, seq_len, dtype=dtype, device=device)
+        mask.fill_(torch.finfo(dtype).min)
+        mask.triu_(1)
+        mask = mask.unsqueeze(1)  # expand mask
+
+        # If attention_mask update causal mask
+        if attention_mask is not None:
+            attention_mask = AttentionMaskConverter._expand_mask(attention_mask, dtype)
+            return mask + attention_mask
+        return mask
+


AttentionMaskConverter is here to hide sdpa masking logic but we kinda deprecated it otherwise. Specifically it cannot expand into static cache.
using _update_causal_mask / a simplified version would be better!

Are you talking about this _update_causal_mask?Because AttentionMaskConverter is also used here?!
Although,I have pushed a simplified version in the recent commits

ArthurZucker · 2024-10-11T17:50:16Z

src/transformers/models/imagebind/processing_imagebind.py

+class ImageBindProcessorAudioKwargs(AudioKwargs, total=False):
+    do_normalize: Optional[bool]
+    mean: Optional[float]
+    std: Optional[float]
+    do_chunk: Optional[bool]
+    chunk_duration: Optional[float]
+    num_chunks: Optional[int]


kinda wondering why we don't use them in the FeatureExtractor as well!

We could - the main appeal for standardizing the processor was for pipeline + api use, but there's no harm in doing it at all call levels!

@molbap @ArthurZucker If this change is to be done then should we accompany it in this PR?

Yep we can!

tests/models/imagebind/test_feature_extraction_imagebind.py

tests/models/imagebind/test_processor_imagebind.py

Co-authored-by: Arthur <[email protected]>

ArthurZucker · 2024-11-15T20:12:15Z

Super sorry @RUFFY-369 we went on a company wide offsite for a week, getting back to it now 🤗

ArthurZucker

Well done! It's a super big work, lots of parts are involved but you pushed through!

Just added some nits / updates that we made on main, but approving as it should be fairly easy to fix

ArthurZucker · 2024-11-19T14:51:58Z

src/transformers/models/imagebind/configuration_imagebind.py

+    @classmethod
+    def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig":
+        config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
+
+        # get the text config dict if we are loading from ImageBindConfig
+        if config_dict.get("model_type") == "imagebind":
+            config_dict = config_dict["text_config"]
+
+        if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:
+            logger.warning(
+                f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
+                f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
+            )
+
+        return cls.from_dict(config_dict, **kwargs)


this is no longer required! 🔥

ArthurZucker · 2024-11-19T14:52:41Z

src/transformers/models/imagebind/configuration_imagebind.py

+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""
+


Suggested change

base_config_key = "text_config"

ArthurZucker · 2024-11-19T14:52:57Z

src/transformers/models/imagebind/configuration_imagebind.py

+    model_type = "imagebind_vision_model"
+


Suggested change

model_type = "imagebind_vision_model"

model_type = "imagebind_vision_model"

base_config_key = "vision_config"

ArthurZucker · 2024-11-19T14:53:04Z

src/transformers/models/imagebind/configuration_imagebind.py

+    @classmethod
+    def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig":
+        config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
+
+        # get the vision config dict if we are loading from ImageBindConfig
+        if config_dict.get("model_type") == "imagebind":
+            config_dict = config_dict["vision_config"]
+
+        if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:
+            logger.warning(
+                f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
+                f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
+            )
+
+        return cls.from_dict(config_dict, **kwargs)


Suggested change

@classmethod

def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig":

config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)

# get the vision config dict if we are loading from ImageBindConfig

if config_dict.get("model_type") == "imagebind":

config_dict = config_dict["vision_config"]

if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:

logger.warning(

f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "

f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."

)

return cls.from_dict(config_dict, **kwargs)

ArthurZucker · 2024-11-19T14:53:18Z

src/transformers/models/imagebind/configuration_imagebind.py

+    @classmethod
+    def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig":
+        config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
+
+        # get the audio config dict if we are loading from ImageBindConfig
+        if config_dict.get("model_type") == "imagebind":
+            config_dict = config_dict["audio_config"]
+
+        if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:
+            logger.warning(
+                f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
+                f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
+            )
+
+        return cls.from_dict(config_dict, **kwargs)
+


Suggested change

@classmethod

def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig":

config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)

# get the audio config dict if we are loading from ImageBindConfig

if config_dict.get("model_type") == "imagebind":

config_dict = config_dict["audio_config"]

if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:

logger.warning(

f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "

f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."

)

return cls.from_dict(config_dict, **kwargs)

ArthurZucker · 2024-11-19T15:04:39Z

src/transformers/models/imagebind/modeling_imagebind.py

+                def create_custom_forward(module):
+                    def custom_forward(*inputs):
+                        return module(*inputs, output_attentions)
+
+                    return custom_forward
+
+                layer_outputs = torch.utils.checkpoint.checkpoint(
+                    create_custom_forward(encoder_layer),
+                    hidden_states,
+                    attention_mask,
+                )


this can be a lot simpler!

Suggested change

def create_custom_forward(module):

def custom_forward(*inputs):

return module(*inputs, output_attentions)

return custom_forward

layer_outputs = torch.utils.checkpoint.checkpoint(

create_custom_forward(encoder_layer),

hidden_states,

attention_mask,

)

layer_outputs = self._gradient_checkpointing_func((

encoder_layer.__call__,

hidden_states,

attention_mask,

)

_gradient_checkpointing_func is defined as a super

ArthurZucker · 2024-11-19T15:10:34Z

src/transformers/models/imagebind/modeling_imagebind.py

+        # Build causal mask
+        mask = torch.empty(batch_size, seq_len, seq_len, dtype=dtype, device=device)
+        mask.fill_(torch.finfo(dtype).min)
+        mask.triu_(1)


inplace operations are usually not so great for accelerators!

ArthurZucker · 2024-11-19T15:11:39Z

src/transformers/models/imagebind/modeling_imagebind.py

+    """The text model from ImageBind without any head or projection on top.""",
+    IMAGEBIND_START_DOCSTRING,
+)
+class ImageBindTextModel(ImageBindPreTrainedModel):


I don't remember what we said about this class but it's a pity to have it as it wraps around ImageBindTextTransformer not super useful!

ArthurZucker · 2024-11-19T15:11:50Z

src/transformers/models/imagebind/modeling_imagebind.py

+        )
+
+
+@add_start_docstrings(


same comment here!

ArthurZucker · 2024-11-19T15:12:01Z

src/transformers/models/imagebind/modeling_imagebind.py

+        )
+
+
+@add_start_docstrings(


dg845 added 30 commits September 20, 2023 23:52

initial commit for ImageBind model

d72c9a3

add initial testing code for ImageBind model

6be5464

Add config classes for remaining modalities (audio, depth, thermal, I…

190e727

…MU) and update config classes for text and image modalities.

Update ImageBindOutput with remaining modalities (audio, depth, therm…

3692190

…al, IMU).

Add embedding classes for image-like modalities (vision, audio, depth…

4037f6a

…, thermal).

Implement IMU embedding class.

970dc5d

Add module to convert still images into video frames.

ffd1460

Add implementation for shared model encoder blocks.

ee74943

Add key and value biases to ImageBindAttention.

93ce319

Add ImageBind heads and postprocessors.

c7968d6

Update ImageBindModel.forward to compare images against any other mod…

0000bbc

…ality.

Separate normalized embeddings into their own output field.

a1bdbf7

Add initial tester/test classes for remaining modalities (audio, dept…

69fa517

…h, thermal, imu).

Create initial audio feature extractor based on ASTFeatureExtractor (…

a8341e4

…ImageBind follows Audio Spectrogram Transformer audio processing).

Add image processing classes for remaining image-like modalities excl…

ac926ad

…uding audio (depth, thermal).

Add IMU feature extractor class declaration and add feature extractor…

e151140

…s/image processors to ImageBind's __init__.py file.

Update ImageBindAudioFeatureExtractor to use ImageBind-specific audio…

789559a

… processing.

Add final dropout layer to ImageBindImuTransformer.

84851a5

Fix typo

43016df

Change model test parameters to be closer to ImageBind defaults.

93d7749

Update audio feature extractor to output batched and clipped audio.

1b4bb43

Add modeling support for batched and clipped vision and audio inputs.

d9a0a80

Update ImageBind image processor to always output video (batched and …

b5d46cd

…clipped images) following VideoMAE.

Merge branch 'main' into imagebind-model

029d424

Implement ImageBindDepthImageProcessor.

a9d432c

Implement ImageBindImuFeatureExtractor.

90543ce

Fix some modeling code bugs.

8ce499b

Move Image2Video logic into RGBDTPatchEmbedding.

484cd3f

Fix attention kv bias initialization bug.

284ffe5

Implement ImageBind conversion script.

c5d1e3b

RUFFY-369 and others added 5 commits October 9, 2024 15:46

Merge remote-tracking branch 'upstream/main' into adding-imagebind

e853fc9

chore:suggested deprecate_kwarg for return_numpy

e0f741b

chore:suggested nit for image_to_video

85337c7

test:update atol due to observed flakyness

f9fae40

Co-authored-by: Pablo Montalvo <[email protected]>

test:remove unwanted tests as they are already available with Process…

f878996

…orTesterMixin

ylacombe reviewed Oct 10, 2024

View reviewed changes

RUFFY-369 and others added 4 commits October 11, 2024 08:49

chore: make suggested changes

58e1c3a

Co-authored-by: Yoach Lacombe <[email protected]>

chore:do nit suggested changes

e3353e5

test:add suggested assertion

76f99ab

Merge remote-tracking branch 'upstream/main' into adding-imagebind

17525ac

ArthurZucker approved these changes Oct 11, 2024

View reviewed changes

RUFFY-369 and others added 14 commits October 13, 2024 23:38

Merge remote-tracking branch 'upstream/main' into adding-imagebind

f893147

chore:simplify weight conversion file with regex as suggested

50e2ca3

style:make style

3d3887b

chore:remove unused func(from review suggestions)

0951775

chore: apply suggested changes

e031e0d

Co-authored-by: Arthur <[email protected]>

chore: apply suggested changes

7ea5f59

Co-authored-by: Arthur <[email protected]>

chore: apply suggested changes

9d09258

Co-authored-by: Arthur <[email protected]>

chore: apply suggested changes

0adf14f

Co-authored-by: Arthur <[email protected]>

chore: apply suggested changes

40d50c9

Co-authored-by: Arthur <[email protected]>

Merge remote-tracking branch 'upstream/main' into adding-imagebind

f8fa533

chore:add suggested changes for single loop

cfefa9b

chore:apply suggested changes for abstract feature_size

30370f7

chore:make few suggested changes

106dfb0

Merge remote-tracking branch 'upstream/main' into adding-imagebind

a637d59

ArthurZucker self-requested a review October 22, 2024 15:08

ArthurZucker approved these changes Nov 19, 2024

View reviewed changes

		return result


		class ImageBindFeatureExtractor(SequenceFeatureExtractor):

	model_type = "imagebind_vision_model"
	model_type = "imagebind_vision_model"
	base_config_key = "vision_config"

		)


		@add_start_docstrings(

		)


		@add_start_docstrings(

Adding imagebind #30690

Are you sure you want to change the base?

Adding imagebind #30690

Conversation

EduardoPach commented May 7, 2024 • edited Loading

What does this PR do?

Who can Review

RUFFY-369 commented Oct 9, 2024 • edited Loading

RUFFY-369 commented Oct 9, 2024

ylacombe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RUFFY-369 Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

RUFFY-369 Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

RUFFY-369 Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RUFFY-369 Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArthurZucker commented Nov 15, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EduardoPach commented May 7, 2024 •

edited

Loading

RUFFY-369 commented Oct 9, 2024 •

edited

Loading

RUFFY-369 Oct 11, 2024 •

edited

Loading

RUFFY-369 Oct 11, 2024 •

edited

Loading

RUFFY-369 Oct 11, 2024 •

edited

Loading

RUFFY-369 Oct 17, 2024 •

edited

Loading