Skip to content

Torchvision API - center crop operator#6266

Merged
mdabek-nvidia merged 4 commits intoNVIDIA:mainfrom
mdabek-nvidia:torchvision_center_crop
Mar 24, 2026
Merged

Torchvision API - center crop operator#6266
mdabek-nvidia merged 4 commits intoNVIDIA:mainfrom
mdabek-nvidia:torchvision_center_crop

Conversation

@mdabek-nvidia
Copy link
Collaborator

Category:

New feature

Description:

Implementation of Torchvision's center crop operators: object oriented and functional

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Marek Dabek <[email protected]>
@mdabek-nvidia
Copy link
Collaborator Author

@greptileai - please review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR adds a CenterCrop operator and matching center_crop functional to the experimental torchvision-compatible DALI API. The OO operator correctly replicates torchvision's banker's-rounding crop-origin calculation using purely DALI graph arithmetic, and handles the padding case (crop > image) via out_of_bounds_policy="pad". Previous review feedback (disabled value-equality assertions, narrow type hint, empty-sequence validation, confusing loop variable) has all been addressed.

Key changes:

  • v2/centercrop.py — new CenterCrop operator with DALI-graph-safe banker's-rounding and symmetric padding
  • v2/functional/centercrop.py — functional wrapper; contains a layout-check bug (in "HW" substring test instead of equality/list membership, diverging from the functional/resize.py pattern)
  • test_tv_centercrop.py — full test suite with value equality assertions now enabled, covering normal crops, padding, square/rectangular sizes, batch inputs, and error paths

Confidence Score: 4/5

  • PR is nearly ready to merge; one targeted fix to the layout substring-check in the functional path is needed.
  • All prior review concerns are resolved and the core banker's-rounding logic is solid with full value-equality tests. One concrete bug remains: inpt.layout[-2:] in "HW" in functional/centercrop.py is a substring check rather than equality, diverging from the established pattern in functional/resize.py. It doesn't break standard layouts (CHW, NCHW) but is semantically wrong and easy to fix in one line.
  • dali/python/nvidia/dali/experimental/torchvision/v2/functional/centercrop.py — layout detection logic (line 44–51)

Important Files Changed

Filename Overview
dali/python/nvidia/dali/experimental/torchvision/v2/centercrop.py New OO CenterCrop operator using DALI graph nodes. The banker's-rounding-via-integer-arithmetic formula is well-documented and correct; the normalised crop-position computation handles all three cases (normal crop, exact-fit, padding) without Python-level conditionals on graph nodes.
dali/python/nvidia/dali/experimental/torchvision/v2/functional/centercrop.py Functional wrapper for center_crop. Contains a layout-detection bug on line 46: inpt.layout[-2:] in "HW" performs a substring check rather than equality/list-membership, diverging from the pattern in functional/resize.py. Standard layouts (CHW, NCHW) happen to work, but the check is semantically incorrect and should be == "HW" or replaced with the list-membership form used by resize.py.
dali/test/python/torchvision/test_tv_centercrop.py Comprehensive test suite. All previously-commented-out value equality assertions have been re-enabled; the larger-than-tensor case validates both shape and pixel content; error path tests cover type and value errors. Good coverage.
dali/python/nvidia/dali/experimental/torchvision/init.py Adds CenterCrop to the public top-level import and all. No issues.
dali/python/nvidia/dali/experimental/torchvision/v2/functional/init.py Exposes center_crop in the functional sub-package's public API. No issues.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant adjust_input as adjust_input (decorator)
    participant center_crop as center_crop / CenterCrop
    participant DALI as DALI fn.crop / ndd.crop

    Caller->>adjust_input: inpt (PIL/Tensor), output_size, device
    adjust_input->>adjust_input: transform_input → ndd.Tensor / ndd.Batch
    adjust_input->>center_crop: _input (ndd), output_size, device

    center_crop->>center_crop: CenterCrop.verify_args(size)
    center_crop->>center_crop: adjust_size(output_size) → (crop_h, crop_w)
    center_crop->>center_crop: derive in_h, in_w from layout
    center_crop->>center_crop: compute banker's-rounded crop_pos_y / crop_pos_x

    center_crop->>DALI: crop(tensor, crop=(crop_h,crop_w), crop_pos_x, crop_pos_y,\nout_of_bounds_policy="pad", fill_values=0)
    DALI-->>center_crop: cropped ndd output
    center_crop-->>adjust_input: output.evaluate()
    adjust_input->>adjust_input: adjust_output → PIL Image / torch.Tensor
    adjust_input-->>Caller: result
Loading

Comments Outside Diff (1)

  1. dali/python/nvidia/dali/experimental/torchvision/v2/functional/centercrop.py, line 44-51 (link)

    Layout check uses in string substring test instead of list membership

    inpt.layout[-2:] in "HW" is a Python substring membership check — it tests whether inpt.layout[-2:] is a substring of the string "HW", not whether it equals "HW". A single-character suffix like "H" or "W" would silently pass this guard and then incorrectly index inpt_shape[-2] / inpt_shape[-1] as H and W.

    Compare with the identical logic in functional/resize.py which correctly uses full-string list membership:

    if inpt.layout in ["HWC", "NHWC"]:
        ...
    elif inpt.layout in ["HW", "CHW", "NCHW"]:
        ...

    The centercrop implementation should adopt the same pattern:

Reviews (7): Last reviewed commit: "Apply suggestion from @stiepan" | Re-trigger Greptile

Signed-off-by: Marek Dabek <[email protected]>
@mdabek-nvidia mdabek-nvidia force-pushed the torchvision_center_crop branch from cd93185 to 1b043c0 Compare March 19, 2026 16:18
@mdabek-nvidia
Copy link
Collaborator Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [46532136]: BUILD STARTED

@mdabek-nvidia
Copy link
Collaborator Author

@greptileai - please re-review

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [46532136]: BUILD FAILED

@mdabek-nvidia mdabek-nvidia marked this pull request as ready for review March 22, 2026 12:42
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [46532136]: BUILD PASSED

@mdabek-nvidia mdabek-nvidia requested a review from szalpal March 23, 2026 12:34
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how important that is, but following up on https://github.com/NVIDIA/DALI/pull/6266/changes#r2976016315 I wonder if we should test cases with at least one of the extents, such that extent/2 is not representable exactly with float32 and whether we are consitent with torchvision there (one thing is how we compute the absolute offset, the other is how dali accepts the ratio as float32).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the casting. Although I think that idea of testing extent/2 is plausible is there a scenario where it would be used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just seen int32 with fp32 intermediate steps and wanted to raise the point that this may (or may not :D) result in more than off-by-one rounding errors. Are >> 2**24 extents plausible, I guess for some satelate images etc it could be the case. Having a test with extents not representable in fp32 (that could be very thin or narrow to keep it leightweight) would be a way to know if we're not diverging from torchvision too much. Anyway, I am just raising a concern, it could be overcautious, I am relying on your call here!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the fp32 casting. Tests are passing

Signed-off-by: Marek Dabek <[email protected]>
@mdabek-nvidia mdabek-nvidia force-pushed the torchvision_center_crop branch from b1a9d82 to 3337aa9 Compare March 24, 2026 10:51
@mdabek-nvidia
Copy link
Collaborator Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [46867909]: BUILD STARTED

Signed-off-by: Marek Dabek <[email protected]>
Co-authored-by: Kamil Tokarski <[email protected]>
@mdabek-nvidia mdabek-nvidia force-pushed the torchvision_center_crop branch from 3337aa9 to 58cdffb Compare March 24, 2026 11:59
@mdabek-nvidia
Copy link
Collaborator Author

@greptileai - please re-review

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [46867909]: BUILD PASSED

@mdabek-nvidia mdabek-nvidia merged commit e3d0407 into NVIDIA:main Mar 24, 2026
6 checks passed
mdabek-nvidia added a commit to mdabek-nvidia/DALI that referenced this pull request Mar 24, 2026
Center crop operators implementation

Signed-off-by: Marek Dabek <[email protected]>
Co-authored-by: Kamil Tokarski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants