Cat and Shrink Along Sharded Axis #6380

tobias17 · 2024-09-06T02:16:59Z

tobias17
Sep 6, 2024

I am making this writeup to show some of my findings and get some feedback.

What

The following is not supported by tinygrad

a, b = [Tensor.rand(3,4).shard(("NV:0","NV:1","NV:2"), axis=0) for _ in range(2)]
c = Tensor.cat(a, b)

This is due to Tensor.cat not working along the sharded axis (0). It tries to pad a and b but fails an assert blocking that from happening on the sharded axis.

Why

When one performs inference with Stable Diffusion, you go through a process called Classifier Free Guidance. The TLDR is that one needs to run 2 separate samples through the model (for each denoising step), one conditioned on the text prompt and the other conditioned on an empty prompt.

To make this more efficient, most implementations will cat together the two calls and run that through the model a single time, before chunking the output like such:

def denoise_step(x_u, x_c, cond_u, cond_c):
  latent_u, latent_c = model(Tensor.cat(x_u, x_c), Tensor.cat(cond_u, cond_c)).chunk(2)

This is how I implemented the tinygrad version of SDXL and has worked fine, until I went to run this on multiple GPUs. Since the Tensor.cat occurs along the batch dim, it runs into the issue since this is also the shard axis.

I originally tried taking the lazy approach and just splitting it into 2 seperate calls of the model. This does work, but George pushed me not just hack things and search for a proper solution.

How

Chenyu suggested an approach where lazybuffers and devices are concatenated when one attempts to cat 2 MultiLazyBuffers. This is doable, but ends up looking quite ugly with all of the checks needed.

    if all(isinstance(y.lazydata, MultiLazyBuffer) and all(y.lazydata.real) and (y.lazydata.axis == dim) for y in catargs) \
        and all(len(self.shape) == len(y.shape) and self.dtype == y.dtype and all(y.shape[i] == s for i,s in enumerate(self.shape)) for y in args):
      return Tensor(MultiLazyBuffer(tuple(lb for y in catargs for lb in y.lazydata.lbs), dim), tuple(dev for y in catargs for dev in y.device), self.dtype)

This also only solves half of the problem and the same needs to be implemented for Tensor.shrink for when one wants to chunk.

I took a look at using MultiLazyBuffer.real but this just feels like a cheap hack that would cause more problems.

One could add a similar looking if statement to the one above into Tensor.shrink that checks for a MultiLazyBuffer with devices along the bounds and returns a subset of the current devices and lazybuffers.

I also have not looked into the implications this might have on the backwards pass.

Feedback

Is the approach I outlined above reasonable and something we would want to add?
Did I miss something? An approach or implication I have not thought of?

chenyuxyz · 2024-09-06T02:37:22Z

chenyuxyz
Sep 6, 2024
Maintainer

what's the expected output of concating two sharded tensors that shared devices? even better is to add it as a failed test in test_multitensor.py. then we can discuss how to make it work as expected

3 replies

tobias17 Sep 6, 2024
Author

This would be the expected

>>> a, b = [Tensor.rand(3,4).shard(("NV:0","NV:1","NV:2"), axis=0) for _ in range(2)]
>>> Tensor.cat(a, b).device
('NV:0','NV:1','NV:2','NV:0','NV:1','NV:2')

chenyuxyz Sep 6, 2024
Maintainer

and what's the issue with concatting the devices of the underlying buffers and re-construct the new Tensor? is it's a bug in device.real? it should be all 1 if you are not shrinking the Tensors though

tobias17 Sep 6, 2024
Author

The whole MultiLazyBuffer.real seemed to be a distraction and the wrong road to go down. the concatting seems to be the right road with an inverse on the shrink end. This whole discussion was for if this was the right approach, which it seems to be judging by your response. Will implement it with some tests and see what rough edges it has, can open a PR then to get more feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cat and Shrink Along Sharded Axis #6380

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Cat and Shrink Along Sharded Axis #6380

Uh oh!

tobias17 Sep 6, 2024

What

Why

How

Feedback

Replies: 1 comment · 3 replies

Uh oh!

chenyuxyz Sep 6, 2024 Maintainer

Uh oh!

tobias17 Sep 6, 2024 Author

Uh oh!

chenyuxyz Sep 6, 2024 Maintainer

Uh oh!

tobias17 Sep 6, 2024 Author

tobias17
Sep 6, 2024

Replies: 1 comment 3 replies

chenyuxyz
Sep 6, 2024
Maintainer

tobias17 Sep 6, 2024
Author

chenyuxyz Sep 6, 2024
Maintainer

tobias17 Sep 6, 2024
Author