-
Notifications
You must be signed in to change notification settings - Fork 31.5k
🔴 Video processors as a separate class #35206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🔴 Video processors as a separate class #35206
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
This one is ready for review, I guess the first review will be from @qubvel . The failing tests are not related, very flaky |
qubvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thanks for working on this feature! I did the first, not very thorough, pass, see the comments below
src/transformers/models/instructblipvideo/processing_instructblipvideo.py
Show resolved
Hide resolved
src/transformers/models/instructblipvideo/processing_instructblipvideo.py
Show resolved
Hide resolved
src/transformers/models/instructblipvideo/video_processing_instructblipvideo.py
Show resolved
Hide resolved
Co-authored-by: Pavel Iakubovskii <[email protected]>
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I am just wondering if we can't for now, and for models for which it makes sense, just inherit the image processor, and the base video processor when we can call the same pre processing functions!
| elif is_remote_url(pretrained_model_name_or_path): | ||
| video_processor_file = pretrained_model_name_or_path | ||
| resolved_video_processor_file = download_url(pretrained_model_name_or_path) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a whole bunch of this is re-used from other processors, I really don't mind some for of inheritance or just using the same function if we can!
|
TLDR from internal coms:
|
* initial design * update all video processors * add tests * need to add qwen2-vl (not tested yet) * add qwen2-vl in auto map * fix copies * isort * resolve confilicts kinda * nit: * qwen2-vl is happy now * qwen2-5 happy * other models are happy * fix copies * fix tests * add docs * CI green now? * add more tests * even more changes + tests * doc builder fail * nit * Update src/transformers/models/auto/processing_auto.py Co-authored-by: Pavel Iakubovskii <[email protected]> * small update * imports correctly * dump, otherwise this is getting unmanagebale T-T * dump * update * another update * update * tests * move * modular * docs * test * another update * init * remove flakiness in tests * fixup * clean up and remove commented lines * docs * skip this one! * last fix after rebasing * run fixup * delete slow files * remove unnecessary tests + clean up a bit * small fixes * fix tests * more updates * docs * fix tests * update * style * fix qwen2-5-vl * fixup * fixup * unflatten batch when preparing * dump, come back soon * add docs and fix some tests * how to guard this with new dummies? * chat templates in qwen * address some comments * remove `Fast` suffix * fixup * oops should be imported from transforms * typo in requires dummies * new model added with video support * fixup once more * last fixup I hope * revert image processor name + comments * oh, this is why fetch test is failing * fix tests * fix more tests * fixup * add new models: internvl, smolvlm * update docs * imprt once * fix failing tests * do we need to guard it here again, why? * new model was added, update it * remove testcase from tester * fix tests * make style * not related CI fail, lets' just fix here * mark flaky for now, filas 15 out of 100 * style * maybe we can do this way? * don't download images in setup class --------- Co-authored-by: Pavel Iakubovskii <[email protected]>
What does this PR do?
This is quite a breaking PR, because from now on some processor will not have a separate
self.video_processorand save/load priority is given to new file namesimage/video_preprocessor_config.json. Save/load perserved BC and can load from any file nameReasons for PR:
Description - more verbose
Fixes #33504 and makes a separate Base class for video processors, only those that are from VLMs because video-only models like ViViT don't need both, image and video processor. We can add it in other video-only them later in the run, if needed.
This idea is to keep video processors as child of image processor mixin, but the processor can use transform over one video as a 4D array instead of looping over each frame. The only transform that still works on per-frame basis is
resize. It could be rewritten in torch maybe, but not worth the effort. In terms of speed, the new slow video processor performs same as old image processor, and has some speedup with larger batch sizes. The advantage here is mostly about readability, cleaner code and a separate API for videos.For the fast video processors, it is basically same as in image except for the extra dimension we have for timeframes. Plus, I added
convert_rgbinnpandtorch. Also, we useVideosKwargsinstead of having to add a new TypedDict. See my comments below, in files changed 👇🏻What was done?
What was done:
VideoProcessorBaseandVideoProcessorBaseFastas added, recommend to start review from thereAutoVideoProcessorimage_preprocessor_config.json. We keep BC by trying to load from newimage_preprocessor_config.jsonif exists, otherwise fallback topreprocessor_config.jsonwith a deprecation warningQwen2VLandVideoLlava)Usage examples
How does it work when loading (same goes for model-class laoding)
If model was not updated and has only
processor_config: the processor will grab the oldpreprocessor_configand use it for video and for image processing classes with shared set of args. Warning emitted!If model was updated and has only
image_processor_configandvideo_processor_config: the processor will grab the files and load each preprocessor with their own set of argsIf model has all types of config files and didn't delete the old
processor_config: priority will be given to(image/video)_processor_configwith different sets of args for eachWhen saving the model back, we'll save in
image_processor_configandvideo_processor_configfilesBenchmark results on llava onevision
Benchmark on different batch sizes and number of frames per video sampled. Also with different devices and input types: could be numpy, torch or list PIL frames.

Next steps:
test_processing_common.py(there was a PR somewhere but prob it got stale)audio_preprocessor_config, just for consistency and easier navigation in hub configs