Kosmos-2.5 implementation in transformers #30877

Natyren · 2024-05-17T10:28:14Z

Model description

Hello everyone,

The Kosmos-2.5 is a multimodal literate model that can be used for tasks such as OCR and text-rich image comprehension. It includes a ViT encoder, a Resampler, and a shared decoder module. To the best of my knowledge, the architecture of this model is similar to Kosmos-2 but has some differences. Due to these differences, using this model in Transformers requires a standalone implementation.

Open source status

The model implementation is available
The model weights are available

Provide useful links for the implementation

Paper: https://arxiv.org/pdf/2309.11419
Code: https://github.com/microsoft/unilm/tree/master/kosmos-2.5
Authors: @Dod-o @wolfshow

amyeroberts · 2024-05-17T12:19:43Z

cc @ydshieh

Natyren · 2024-05-17T13:25:45Z

I would like to assist with this implementation. If there are any guidelines on how to do it effectively, I would like to join implementation process

amyeroberts · 2024-05-20T10:49:32Z

Hi @Natyren, there's a guide in the documentation here: https://huggingface.co/docs/transformers/add_new_model

ydshieh · 2024-05-29T14:26:10Z

Hi @Natyren, sorry for the late reply. I am thinking to talk to the model author about if they are interested in porting this model into transformers. I will come back to you here for the updates.

tic-top · 2024-06-06T08:53:34Z

Hi @ydshieh, I'm an intern at MSRA, and my mentor @Dod-o want to convert the ks25 to hf form. Is thre anything I can do for you.

ydshieh · 2024-06-06T12:34:40Z

Hi @tic-top Thank you for this message! This is a great news. It's probably better for me to add you into one of our slack channel. Let me check.

(just sent an email :-) now)

EwoutH · 2024-06-23T16:40:11Z

I would love this! What’s needed to move it forward?

ydshieh · 2024-06-24T13:44:37Z

@EwoutH We are collaborating with @tic-top to port this into transformers 🤗

Natyren added the New model label May 17, 2024

amyeroberts added the Multimodal label May 17, 2024

ydshieh self-assigned this May 21, 2024

wolfshow mentioned this issue Jun 5, 2024

Kosmos 2.5 microsoft/unilm#1297

Closed

tic-top mentioned this issue Jun 29, 2024

Support Kosmos-2.5 #31711

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kosmos-2.5 implementation in transformers #30877

Kosmos-2.5 implementation in transformers #30877

Natyren commented May 17, 2024 •

edited

Loading

amyeroberts commented May 17, 2024

Natyren commented May 17, 2024

amyeroberts commented May 20, 2024

ydshieh commented May 29, 2024

tic-top commented Jun 6, 2024

ydshieh commented Jun 6, 2024 •

edited

Loading

EwoutH commented Jun 23, 2024

ydshieh commented Jun 24, 2024

Kosmos-2.5 implementation in transformers #30877

Kosmos-2.5 implementation in transformers #30877

Comments

Natyren commented May 17, 2024 • edited Loading

Model description

Open source status

Provide useful links for the implementation

amyeroberts commented May 17, 2024

Natyren commented May 17, 2024

amyeroberts commented May 20, 2024

ydshieh commented May 29, 2024

tic-top commented Jun 6, 2024

ydshieh commented Jun 6, 2024 • edited Loading

EwoutH commented Jun 23, 2024

ydshieh commented Jun 24, 2024

Natyren commented May 17, 2024 •

edited

Loading

ydshieh commented Jun 6, 2024 •

edited

Loading