Add memory bandwidth utilization metric

One of the key metrics in determining if the LLM inference server is performant is by looking at the memory bandwidth utilization. This is a function of the throughput and total GPU/accelerator HBM bandwidth. Calculation taken from PyTorch blog post here: https://pytorch.org/blog/accelerating-generative-ai-2/#step-2-alleviating-memory-bandwidth-bottleneck-through-int8-weight-only-quantization-1574-toks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add memory bandwidth utilization metric #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add memory bandwidth utilization metric #31

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions