Fix loading pretrained-mm-projector errors under Deepspeed Zero3. by lockon-n · Pull Request #1250 · haotian-liu/LLaVA

lockon-n · 2024-03-08T18:20:22Z

In the fine-tuning stage of llava, if we apply deepspeed zero3, it will put placeholders in model parameters instead of initializing real ones.

As a result, the naive load_state_dict raises errors when the code tries to load the pretrained mm projector from somewhere like mm_projector.bin.

This PR solves this by detecting if deepspeed zero3 is applied by the is_deepspeed_zero3_enabled() from transformers, and wraps the loading code with deepspeed.zero.GatheredParameters to make it effective in that case.

Fix.

Update llava_arch.py

1487a5a

Fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix loading pretrained-mm-projector errors under Deepspeed Zero3.#1250

Fix loading pretrained-mm-projector errors under Deepspeed Zero3.#1250
lockon-n wants to merge 1 commit intohaotian-liu:mainfrom
lockon-n:main

lockon-n commented Mar 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lockon-n commented Mar 8, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant