broadcast_inputs triggers tensor storage copy, peaks CUDA memory consumption

## Summary

It seems that the following line in `def broadcast_inputs(x, y)` triggerred a tensor storage copy that caused a CUDA memory overflow when I tried to run a small bundle adjustment dataset with 31843 pixel observations. Both `reshape` and `contiguous` could trigger a memory copy. If we can avoid memory copy in `broadcast_inputs`, we can avoid overflowing CUDA memory at this step.

https://github.com/pypose/pypose/blob/6598a84c8fb993a1825ee9629df035a985111312/pypose/lietensor/operation.py#L914

<img width="1036" alt="image" src="https://github.com/pypose/pypose/assets/61036578/3bd6c109-dfb7-4e84-9413-41d1fe2637bb">

## Improvements

refactor `broadcast_inputs` to not use reshape and contiguous.

## Risks

TBD

## Involved components

- [pypose/pypose/lietensor/operation.py](https://github.com/pypose/pypose/blob/6598a84c8fb993a1825ee9629df035a985111312/pypose/lietensor/operation.py#L914)

## Optional: Intended side effects


TBD


## Optional: Missing test coverage

TBD


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

broadcast_inputs triggers tensor storage copy, peaks CUDA memory consumption #252

Summary

Improvements

Risks

Involved components

Optional: Intended side effects

Optional: Missing test coverage

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development