Skip to content

Torchvision padding#6276

Open
mdabek-nvidia wants to merge 27 commits intoNVIDIA:mainfrom
mdabek-nvidia:torchvision_pad
Open

Torchvision padding#6276
mdabek-nvidia wants to merge 27 commits intoNVIDIA:mainfrom
mdabek-nvidia:torchvision_pad

Conversation

@mdabek-nvidia
Copy link
Collaborator

Category:

New feature

Description:

Torchvision pad operator implementation

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

mdabek-nvidia and others added 21 commits March 19, 2026 13:30
Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
Co-authored-by: Kamil Tokarski <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
- switch form launchpad API to ppa as it sometimes can be blocked

Signed-off-by: Janusz Lisiecki <[email protected]>
…hreadPool per device. (NVIDIA#6254)

* Use NewThreadPool in dynamic mode.
* By default use only one instance of ThreadPool per device.
* Add thread_pool argument to EvalContext constructor.

Signed-off-by: Michal Zientkiewicz <[email protected]>
…IA#6268)

- Replace deprecated trusted=yes with proper GPG keyring for the deadsnakes
  PPA

Signed-off-by: Janusz Lisiecki <[email protected]>
- arg_helper_test.cc: pass local TensorListShape<0> variables via
  std::move() into SetupData() since they are not used after the call
  (CIDs 25217386, 25217389, 25217390, 26094597)

- arg_helper_test.cc: use `const auto &` instead of `auto` when binding
  the return value of arg.get() (which returns const TLV&) to avoid
  unnecessary copies (CIDs 25217387, 25217388, 25217391)

- roi_image_decoder.h: pass `shape` parameter via std::move() into
  RoiFromCropWindowGenerator() in all three GetRoi() overrides
  (WithCropAttr, WithSliceAttr, WithRandomCropAttr), since shape is
  not used after the call (CIDs 25217545, 25217546, 25217547)

- thread_pool_base.h: add explicit noexcept to ~ThreadPoolBase() and
  ~TaskBulkAdd() to document intent that exceptions cause termination
  (CIDs 25783968, 25783975)

- workspace_policy.h: add explicit noexcept to ~AOT_WS_Policy() to
  document intent that exceptions cause termination (CID 26095288)

- thread_pool_base.cc: fix lock order reversal in WaitOrRunTasks() and
  Run() - release mtx_ before acquiring sem_ to maintain consistent
  order (sem_ before mtx_) matching all other acquisition sites,
  eliminating potential for ORDER_REVERSAL deadlock (CID 25783972)

- new_thread_pool.cc: pass `name` parameter via std::move() into
  name_ member initializer since it is not used after the call
  (CID 26190478)

Signed-off-by: Janusz Lisiecki <[email protected]>
Signed-off-by: Michał Zientkiewicz <[email protected]>
Co-authored-by: Michał Zientkiewicz <[email protected]>
…NVIDIA#6270)

* Add quiet argument to RandomBBoxCrop to suppress crop failure warning

When using mosaic augmentation or other pipelines where failing to find
a valid crop window is expected, the repeated warning at bbox_crop.cc:794
is noisy and misleading. The new quiet=True argument suppresses it.

Fixes NVIDIA#5695.

* Add test for quiet=True suppressing crop failure warning
The fallback to the legacy CropMirrorNormalize implementation is the
correct and intended path for video (4D) tensors and other non-2D-image
inputs, since the optimized kernels only support 3D HWC/CHW tensors.
The warning incorrectly implied something was wrong. Fixes NVIDIA#6267.
Center crop operators implementation

Signed-off-by: Marek Dabek <[email protected]>
Co-authored-by: Kamil Tokarski <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
@mdabek-nvidia
Copy link
Collaborator Author

@greptileai please review

@mdabek-nvidia
Copy link
Collaborator Author

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [46880686]: BUILD STARTED

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 24, 2026

Greptile Summary

This PR adds Pad, GaussianBlur, ColorJitter, and Grayscale torchvision-compatible operators to DALI's experimental torchvision module, along with their functional counterparts. It also refactors shared layout-detection helpers (get_HWC_from_layout_pipeline, get_HWC_from_layout_dynamic) into operator.py and removes duplication in resize.py and centercrop.py.

  • Pad (_PadBase hierarchy) is the primary new feature. It correctly uses axis_names="WH" with (W, H)-ordered anchor/shape arguments in the pipeline mode, and explicit axes in the dynamic functional mode — the axis-ordering concern from prior review rounds is resolved.
  • Grayscale and ColorJitter look correct; the VerificationHue fix from prior rounds (uniform scalar→(-float(hue), float(hue)) conversion) is in place.
  • GaussianBlurVerifyKernel.verify still contains isinstance(kernel_size, int, Sequence) with three positional arguments at line 37. Python's isinstance() only accepts two arguments, so this raises TypeError for every call, making GaussianBlur construction fail for all valid kernel sizes. A prior review thread flagged this exact issue and received a "Fixed" reply, but the fix does not appear in the current code. This renders the GaussianBlur class non-functional and will cause several tests in test_tv_gaussian_blur.py to fail.
  • One minor cleanup: error messages in VerifyKernel contain the typo sequenece (should be sequence).

Confidence Score: 3/5

  • Not ready to merge — GaussianBlur is broken due to a TypeError in VerifyKernel that was declared fixed but remains in the code.
  • The core Pad implementation and the Grayscale/ColorJitter operators look correct, and prior axis-ordering and hue-validation concerns are resolved. However, VerifyKernel.verify still has the isinstance(kernel_size, int, Sequence) three-argument call that raises TypeError for every valid input, rendering GaussianBlur completely non-functional. This breaks the primary user path for that operator and will cause test failures in test_tv_gaussian_blur.py. A single targeted fix is needed before the GaussianBlur path is safe to merge.
  • dali/python/nvidia/dali/experimental/torchvision/v2/gaussian_blur.py — VerifyKernel.verify (line 37) requires attention.

Important Files Changed

Filename Overview
dali/python/nvidia/dali/experimental/torchvision/v2/pad.py New padding operator implementing constant, edge, reflect, and symmetric modes via DALI's fn.slice. The _PadBase._kernel correctly uses axis_names="WH" with (W, H)-ordered anchor/shape arguments, resolving the axis-ordering concern from prior review. Validation (non-negative padding, padding mode) looks correct.
dali/python/nvidia/dali/experimental/torchvision/v2/functional/pad.py Functional pad implementation correctly derives axes from the input type before calling the @adjust_input-decorated _pad helper. For HWC (PIL Image) axes=[-3, -2] and for CHW (Tensor) axes=[-2, -1] are passed to ndd.slice, making the anchor (-top, -left) in (H, W) order consistent with those axes.
dali/python/nvidia/dali/experimental/torchvision/v2/gaussian_blur.py VerifyKernel.verify still contains isinstance(kernel_size, int, Sequence) with three positional arguments (line 37) — Python's isinstance() only accepts two, so this raises TypeError for every input, making GaussianBlur construction fail for all valid kernel sizes. The prior review thread flagged this and marked it "Fixed", but the fix does not appear in the current code.
dali/python/nvidia/dali/experimental/torchvision/v2/color.py ColorJitter and Grayscale operators newly added. VerificationHue now consistently converts scalar hue to (-float(hue), float(hue)) for both int and float inputs, addressing the prior review concern. VerificationBCS and Grayscale logic look correct.
dali/test/python/torchvision/test_tv_pads.py Comprehensive tests covering all four padding modes, single-side and multi-side padding, constant fill value, PIL images, and both CPU/GPU variants. Correctness is verified by exact equality against torchvision reference outputs.
dali/test/python/torchvision/test_tv_gaussian_blur.py Tests for GaussianBlur will be affected by the broken VerifyKernel.verify (TypeError from isinstance with 3 args). In particular, the invalid-params test for ((3,3),-1.0) expects ValueError but would get TypeError, and valid-input tests for GaussianBlur construction would also fail with TypeError.

Class Diagram

%%{init: {'theme': 'neutral'}}%%
classDiagram
    class Operator {
        +arg_rules: list
        +preprocess_data
        +device: str
        +__call__(data_input)
        +_kernel(data_input)*
        +verify_args(**kwargs)
    }

    class _PadBase {
        +border_type: str
        +pad_left: int
        +pad_top: int
        +pad_right: int
        +pad_bottom: int
        +fill
        +get_padding(padding)$
        +_kernel(data_input)
    }

    class PadConstant {
        +border_type = "pad"
    }
    class PadEdge {
        +border_type = "clamp"
    }
    class PadReflect {
        +border_type = "reflect_101"
    }
    class PadSymmetric {
        +border_type = "reflect_1001"
    }

    class Pad {
        +pad: _PadBase
        +__init__(padding, fill, padding_mode, device)
        +__call__(data_input)
    }

    class GaussianBlur {
        +kernel_size
        +sigma
        +_kernel(data_input)
    }

    class ColorJitter {
        +brightness
        +contrast
        +saturation
        +hue
        +_kernel(data_input)
    }

    class Grayscale {
        +num_output_channels: int
        +_kernel(data_input)
    }

    Operator <|-- _PadBase
    _PadBase <|-- PadConstant
    _PadBase <|-- PadEdge
    _PadBase <|-- PadReflect
    _PadBase <|-- PadSymmetric
    Pad o-- _PadBase : delegates to
    Operator <|-- GaussianBlur
    Operator <|-- ColorJitter
    Operator <|-- Grayscale
Loading

Reviews (4): Last reviewed commit: "Review fixes" | Re-trigger Greptile

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [46880686]: BUILD PASSED

Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
Signed-off-by: Marek Dabek <[email protected]>
@mdabek-nvidia
Copy link
Collaborator Author

@greptileai please re-review

@mdabek-nvidia mdabek-nvidia marked this pull request as ready for review March 25, 2026 13:10
Signed-off-by: Marek Dabek <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants