Fix OOM in dynamic_preprocess for extreme aspect ratios by Mr-Neutr0n · Pull Request #1256 · OpenGVLab/InternVL

Mr-Neutr0n · 2026-02-09T18:41:25Z

Summary

Adds safety bounds to dynamic_preprocess to prevent out-of-memory (OOM) errors when processing images with extreme aspect ratios or when max_num is set to very high values.
Caps max_num at MAX_PATCHES_LIMIT (24) to prevent runaway patch counts that lead to excessive memory allocation.
Filters out target patch grid ratios with extreme aspect ratios (beyond MAX_ASPECT_RATIO_THRESHOLD of 200) to avoid degenerate patch grids like (12, 1) that produce large intermediate images without useful visual information.
Adds a safety fallback to (1, 1) grid if all candidate ratios are filtered out.
Applies the fix consistently across all three copies of the function in internvl_chat, internvl_chat_gpt_oss, and streamlit_demo.

Details

When dynamic_preprocess receives an image with extreme proportions (e.g., a very wide panorama or a very tall screenshot), the generated target ratios can include highly elongated grids. Combined with large max_num values that can be configured via training metadata (max_dynamic_patch), this causes:

Large intermediate resize buffers (e.g., 5376x448 pixels for a (12, 1) grid with image_size=448)
Many patch tensors being created simultaneously
OOM crashes, especially on GPU-constrained environments

The fix introduces two configurable module-level constants:

MAX_PATCHES_LIMIT = 24: Hard upper bound on patch count regardless of max_num parameter
MAX_ASPECT_RATIO_THRESHOLD = 200: Maximum allowed ratio between grid dimensions

These values are intentionally generous to avoid affecting normal usage while preventing pathological cases.

Test plan

Verify normal images (standard aspect ratios like 4:3, 16:9) produce identical results before and after the change
Test with extreme aspect ratio images (e.g., 10000x100, 100x10000) to confirm no OOM
Test with max_num values > 24 to confirm capping works correctly
Verify the fallback to (1, 1) works when all ratios are filtered
Run existing eval scripts (e.g., MME, VQA) to confirm no regression

…spect ratios Images with extreme aspect ratios (e.g., very wide panoramas or tall screenshots) can cause excessive memory allocation in dynamic_preprocess because the function generates patch grid ratios without any aspect ratio filtering. This can lead to OOM errors, especially when max_num is set to high values. Changes: - Cap max_num at MAX_PATCHES_LIMIT (24) to prevent runaway patch counts - Filter out target ratios where either dimension ratio exceeds MAX_ASPECT_RATIO_THRESHOLD (200) to avoid degenerate patch grids - Add a safety fallback to (1,1) if all ratios are filtered out - Apply the fix consistently across all three copies of the function: internvl_chat, internvl_chat_gpt_oss, and streamlit_demo

Mr-Neutr0n · 2026-02-12T18:11:14Z

Friendly bump! Let me know if there's anything I should update or improve to help move this forward.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OOM in dynamic_preprocess for extreme aspect ratios#1256

Fix OOM in dynamic_preprocess for extreme aspect ratios#1256
Mr-Neutr0n wants to merge 1 commit intoOpenGVLab:mainfrom
Mr-Neutr0n:fix/dynamic-preprocess-oom-extreme-aspect-ratios

Mr-Neutr0n commented Feb 9, 2026

Uh oh!

Mr-Neutr0n commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mr-Neutr0n commented Feb 9, 2026

Summary

Details

Test plan

Uh oh!

Mr-Neutr0n commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant