Skip to content

broadcast_inputs triggers tensor storage copy, peaks CUDA memory consumption #252

Open
@hxu296

Description

Summary

It seems that the following line in def broadcast_inputs(x, y) triggerred a tensor storage copy that caused a CUDA memory overflow when I tried to run a small bundle adjustment dataset with 31843 pixel observations. Both reshape and contiguous could trigger a memory copy. If we can avoid memory copy in broadcast_inputs, we can avoid overflowing CUDA memory at this step.

x = x.expand(shape+(x.shape[-1],)).reshape(-1,x.shape[-1]).contiguous()

image

Improvements

refactor broadcast_inputs to not use reshape and contiguous.

Risks

TBD

Involved components

Optional: Intended side effects

TBD

Optional: Missing test coverage

TBD

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions