Grammar based sampling for inference #5205
shroominic
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
To achieve features like function calling or structured output we need a way to constrain the output with a grammar. This process needs to run along the inference on cpu while the gpu is generating tokens. For every new token the grammar needs to reevaluate and filter the logic biases.
Beta Was this translation helpful? Give feedback.
All reactions