Grammar based sampling for inference #5205

shroominic · 2024-06-28T00:07:44Z

shroominic
Jun 28, 2024

To achieve features like function calling or structured output we need a way to constrain the output with a grammar. This process needs to run along the inference on cpu while the gpu is generating tokens. For every new token the grammar needs to reevaluate and filter the logic biases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Grammar based sampling for inference #5205

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Grammar based sampling for inference #5205

Uh oh!

shroominic Jun 28, 2024

Replies: 0 comments

shroominic
Jun 28, 2024