-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusing error message #34658
Comments
Indeed! cc @gante, @zucchini-nlp |
gentle ping @gante @zucchini-nlp |
@jpiabrantes sorry for later reply. Do you mean there are cases when you want to generate from an input that ends with an The warning is mostly an advice for beginners who try to generate, since generating with right padding might result in gibberish or lower quality text. So we point out to those who don't have much experience with generation, what are the best practices. If you already have the padding set on the correct side, you can ignore the warning |
when generating in batches some generations will reach eos faster than
others, but the generation carries on.
however most importantly I think the error message should reflect what is
actually being checked - which is not the padding side.
…On Wed, 11 Dec 2024 at 01:51, Raushan Turganbay ***@***.***> wrote:
@jpiabrantes <https://github.com/jpiabrantes> sorry for later reply. Do
you mean there are cases when you want to generate from an input that ends
with an eos token?
The warning is mostly an advice for beginners who try to generate, since
generating with right padding might result in gibberish or lower quality
text. So we point out to those who don't have much experience with
generation, what are the best practices. If you already have the padding
set on the correct side, you can ignore the warning
—
Reply to this email directly, view it on GitHub
<#34658 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASCMU4JDUSJATDAJMCIBFT2FADKTAVCNFSM6AAAAABRNVGP4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZVGM2DOOJYHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@jpiabrantes yep, we could check the tokenizer's attribute directly but since tokenizer is not a compulsory kwargs when calling
True, but we don't try to generate from an EOS token right? When we just pass inputs to the |
if 'tokenizer' in kwargs and kwargs['tokenizer'].padding_side == 'right':
logger.log('Right padding side is being used with a decoder
only architecture')
…On Wed, Dec 11, 2024 at 8:57 AM Raushan Turganbay ***@***.***> wrote:
@jpiabrantes <https://github.com/jpiabrantes> yep, we could check the
tokenizer's attribute directly but since tokenizer is not a compulsory
kwargs when calling generate(), we opted to check the inputs. What I cane
think now is to change the warning level in warning.warn to be
suppressible
some generations will reach eos faster than
True, but we don't try to generate from an EOS token right? When we just
pass inputs to the generate() method, tokenizer doesn't add any EOS so it
can be continued by an LLM
—
Reply to this email directly, view it on GitHub
<#34658 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASCMUZZV2DQMCUKQVP6AGT2FBVINAVCNFSM6AAAAABRNVGP4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZWGU2TCNRQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes, that;s an option in case users decide to pass the tokenizer. The initial idea was to allow the tokenizer as arg in special cases like when we have |
transformers/src/transformers/generation/utils.py
Line 2022 in a06a0d1
The error message says:
But the code does not check if the input
tokenizer.padding_side == "left"
. Instead the code checks if the last token id is a padding token which is often the case when people settokenizer.pad_token_id = tokenizer.eos_token_id
.The text was updated successfully, but these errors were encountered: