-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kosmos-2.5 implementation in transformers #30877
Comments
cc @ydshieh |
I would like to assist with this implementation. If there are any guidelines on how to do it effectively, I would like to join implementation process |
Hi @Natyren, there's a guide in the documentation here: https://huggingface.co/docs/transformers/add_new_model |
Hi @Natyren, sorry for the late reply. I am thinking to talk to the model author about if they are interested in porting this model into |
Hi @tic-top Thank you for this message! This is a great news. It's probably better for me to add you into one of our slack channel. Let me check. (just sent an email :-) now) |
I would love this! What’s needed to move it forward? |
Model description
Hello everyone,
The Kosmos-2.5 is a multimodal literate model that can be used for tasks such as OCR and text-rich image comprehension. It includes a ViT encoder, a Resampler, and a shared decoder module. To the best of my knowledge, the architecture of this model is similar to Kosmos-2 but has some differences. Due to these differences, using this model in Transformers requires a standalone implementation.
Open source status
Provide useful links for the implementation
Paper: https://arxiv.org/pdf/2309.11419
Code: https://github.com/microsoft/unilm/tree/master/kosmos-2.5
Authors: @Dod-o @wolfshow
The text was updated successfully, but these errors were encountered: