Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kosmos-2.5 implementation in transformers #30877

Open
2 tasks done
Natyren opened this issue May 17, 2024 · 8 comments
Open
2 tasks done

Kosmos-2.5 implementation in transformers #30877

Natyren opened this issue May 17, 2024 · 8 comments

Comments

@Natyren
Copy link
Contributor

Natyren commented May 17, 2024

Model description

Hello everyone,

The Kosmos-2.5 is a multimodal literate model that can be used for tasks such as OCR and text-rich image comprehension. It includes a ViT encoder, a Resampler, and a shared decoder module. To the best of my knowledge, the architecture of this model is similar to Kosmos-2 but has some differences. Due to these differences, using this model in Transformers requires a standalone implementation.

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

Paper: https://arxiv.org/pdf/2309.11419
Code: https://github.com/microsoft/unilm/tree/master/kosmos-2.5
Authors: @Dod-o @wolfshow

@amyeroberts
Copy link
Collaborator

cc @ydshieh

@Natyren
Copy link
Contributor Author

Natyren commented May 17, 2024

I would like to assist with this implementation. If there are any guidelines on how to do it effectively, I would like to join implementation process

@amyeroberts
Copy link
Collaborator

Hi @Natyren, there's a guide in the documentation here: https://huggingface.co/docs/transformers/add_new_model

@ydshieh ydshieh self-assigned this May 21, 2024
@ydshieh
Copy link
Collaborator

ydshieh commented May 29, 2024

Hi @Natyren, sorry for the late reply. I am thinking to talk to the model author about if they are interested in porting this model into transformers. I will come back to you here for the updates.

@tic-top
Copy link

tic-top commented Jun 6, 2024

Hi @ydshieh, I'm an intern at MSRA, and my mentor @Dod-o want to convert the ks25 to hf form. Is thre anything I can do for you.

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 6, 2024

Hi @tic-top Thank you for this message! This is a great news. It's probably better for me to add you into one of our slack channel. Let me check.

(just sent an email :-) now)

@EwoutH
Copy link

EwoutH commented Jun 23, 2024

I would love this! What’s needed to move it forward?

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 24, 2024

@EwoutH We are collaborating with @tic-top to port this into transformers 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants