-
Notifications
You must be signed in to change notification settings - Fork 38.4k
tinyformat: Add compile-time checking for literal format strings #31174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is needed in the next commit to add compile-time checking to strprintf
calls, because bitcoin-cli.cpp uses dynamic width in many format strings.
This change is easiest to review ignoring whitespace.
Co-authored-by: MarcoFalke <*~=`'#}+{/-|&$^[email protected]>
Co-authored-by: Hodlinator <[email protected]>
Co-authored-by: l0rinc <[email protected]>
Co-authored-by: MarcoFalke <*~=`'#}+{/-|&$^[email protected]>
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31174. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. This looks less scary than expected. Left a nit to add more compile time checks, but this looks good either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated 1d16d6e -> e6086e0 (pr/tcheck.1 -> pr/tcheck.2, compare) addressing comments and making ConstEvalFormatString parsing stricter to reject incomplete specifiers.
Updated e6086e0 -> e53829d (pr/tcheck.2 -> pr/tcheck.3, compare) cleaning up whitespace and comments.
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK e53829d
Cleanest attempt at increased compile time validation of format so far. When reviewing #31149 I had the gnawing feeling that more complete format string support would have reduced the diff, but pushed it away for expediency (an earlier attempt at more complete support was attempted in #30999).
|
re: #31174 (comment)
In case you do want a test with different behavior, I think you can use %n specifier which is not supported by tinyformat |
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you do want a test with different behavior, I think you can use %n specifier which is not supported by tinyformat
Could document non-parity like so (unless you prefer I do it as part of #30933):
// Non-parity
int n{};
BOOST_CHECK_EXCEPTION(tfm::format(std::string{"%n"}, n), tfm::format_error,
HasReason{"tinyformat: %n conversion spec not supported"});
ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%n");There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated e53829d -> ecc5cb9 (pr/tcheck.3 -> pr/tcheck.4, compare) with review suggestions.
re: #31174 (review)
Could document non-parity like so (unless you prefer I do it as part of #30933):
I think that change doesn't really fit into this PR, since this PR isn't checking type characters. But it does make sense as part of #30933, so would be good to add there and I'd be happy to review it.
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK ecc5cb9
Implemented my suggestions (except comment removal suggestion) + broke out parse_size() since my last review.
util_string_tests tests passed locally.
Left one comment, but nothing blocking.
maflcko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some nit ideas for more tests, but this is good either way.
review ACK ecc5cb9 🕯
Show signature
Signature:
untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: review ACK ecc5cb9a89c6b001df839675b23d8fc1f7ac69ba 🕯
+oFB4Q8dHdvzp6J/1Ir4akTLS5GbDLpGOeKvcRP31CsusrUqTTnwOMie2fGfDcGYiEyKkNN9JiriK4ne+GSICw==
l0rinc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, simple approach, like it a lot!
I think we can simplify the validator a bit more - let me know what you think.
Also, not exactly sure why %n parity wasn't added like in https://github.com/bitcoin/bitcoin/pull/30999/files#diff-71badc1cc71ba46244f7841a088251bb294265f4fe9e662c0ad6b15be540eee4R69 (is it controversial or unnecessary or not useful)?
| ++it; | ||
| add_arg(); | ||
| } else { | ||
| while ('0' <= *it && *it <= '9') ++it; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we have two separate number "parsers" (one that keeps the result and one that throws it away), we might as well extract number parsing to a local lambda like you did with the other ones.
Diff
diff --git a/src/util/string.h b/src/util/string.h
--- a/src/util/string.h (revision ecc5cb9a89c6b001df839675b23d8fc1f7ac69ba)
+++ b/src/util/string.h (date 1731267170701)
@@ -45,14 +45,16 @@
continue;
}
+ auto parse_number = [&] {
+ unsigned num{0};
+ for (; '0' <= *it && *it <= '9'; ++it) {
+ num = num * 10 + (*it - '0');
+ }
+ return num;
+ };
+
auto add_arg = [&] {
- unsigned maybe_num{0};
- while ('0' <= *it && *it <= '9') {
- maybe_num *= 10;
- maybe_num += *it - '0';
- ++it;
- }
-
+ unsigned maybe_num = parse_number();
if (*it == '$') {
++it;
// Positional specifier, like %8$s
@@ -75,7 +77,7 @@
++it;
add_arg();
} else {
- while ('0' <= *it && *it <= '9') ++it;
+ parse_number();
}
};There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #31174 (comment)
we might as well extract number parsing to a local lambda like you did with the other ones.
This seems reasonable but I"m not sure it's clearer, and it does make the diff bigger replacing the maybe_num code that doesn't have to change currently. Happy to apply this suggestion if other reviews think it is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hodlinator, @maflcko, what do you think?
I can ack without this as well, but I'd prefer reducing duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think while ('0' <= *it && *it <= '9') ++it; is fine. It is pretty standard self-explanatory code. I don't think a one-line while loop needs to be de-duplicated. Also, I like that the diff is minimal as-is now.
| add_arg(); | ||
|
|
||
| // Consume flags. | ||
| while (*it == '#' || *it == '0' || *it == '-' || *it == ' ' || *it == '+') ++it; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C++23 this could be a simple .contains, but even in C++20 we should be able to group the flags to something like:
| while (*it == '#' || *it == '0' || *it == '-' || *it == ' ' || *it == '+') ++it; | |
| while ("#0- +"sv.find(*it) != std::string_view::npos) ++it; |
(we could even extract the flag in which case we could get rid of the comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time to return to C89? ;)
| while (*it == '#' || *it == '0' || *it == '-' || *it == ' ' || *it == '+') ++it; | |
| while (strchr("#0- +", *it)) ++it; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of that, but not sure how to make it work, I'm getting:
note: non-constexpr function 'strchr' cannot be used in a constant expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still rebooting 🧠 for this week, sorry for the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #31174 (comment)
I can apply the "#0- +"sv.find change if others like it, but to me it seems less readable and only a little shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be more readable and a bit shorter if we extract the flags as a variable and delete the comment stating the same - what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to leave style-only nits to a follow-up, especially given that they will be temporary anyway until C++23. This pull request is basically ready for two weeks now, with more than 50 style-only or test-only comments. Unless there are any real issues or bugs with the code, and a foce-push needs to happen anyway, I don't really see the value of asking reviewers to go through more comments and code changes, some of which don't even compile.
| // Unlike ConstevalFormatString this supports std::string for runtime string | ||
| // formatting without compile time checks. | ||
| template <unsigned num_params> | ||
| struct FormatStringCheck { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked and failures seem to be validated successfully from command line, but - unlike the previous versions - doesn't seem to be shown in the IDE... Weird :/
ryanofsky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated ecc5cb9 -> fe39acf (pr/tcheck.4 -> pr/tcheck.5, compare) with suggested changes. Thanks for the reviews and suggestions!
| add_arg(); | ||
|
|
||
| // Consume flags. | ||
| while (*it == '#' || *it == '0' || *it == '-' || *it == ' ' || *it == '+') ++it; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #31174 (comment)
I can apply the "#0- +"sv.find change if others like it, but to me it seems less readable and only a little shorter.
| ++it; | ||
| add_arg(); | ||
| } else { | ||
| while ('0' <= *it && *it <= '9') ++it; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #31174 (comment)
we might as well extract number parsing to a local lambda like you did with the other ones.
This seems reasonable but I"m not sure it's clearer, and it does make the diff bigger replacing the maybe_num code that doesn't have to change currently. Happy to apply this suggestion if other reviews think it is a good idea.
|
re: #31174 (review)
Thanks I hadn't seen #30999, and it seems like that would be a reasonable thing to check for, though I think there is a case for keeping the code as simple as possible and not trying to reproduce tinyformat quirks here. But the reason for not making that change here is I don't think it's related to this PR, and I think it's generally better to make separate changes n separate PRs so they can be evaluated correctly and discussed more clearly. |
|
re-ACK fe39acf 🕐 Show signatureSignature: |
|
Thanks for improving developer productivity with these small changes <3 ACK fe39acf |
| PassFmt<3>("'%- 0+*.*f'"); | ||
| PassFmt<3>("'%1$- 0+*3$.*2$f'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the extra surrounding ' deliberate? If so, what do they mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #31174 (comment)
I'm pretty sure they must be accidental. These cases came from #31174 (comment), and I just pasted them without noticing the single quotes. Can remove if the PR is updated again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ' are used to denote the begin and the end of the string, which would otherwise not be possible, because trailing spaces can normally not be seen when printing. They are not needed in this test and they are a leftover when I tested this against tinyformat at runtime.
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add compile-time checking for literal format strings passed to
strprintfandtfm::formatto make sure the right number of format arguments are passed.There is still no compile-time checking if non-literal
std::stringorbilingual_strformat strings are passed, but this is improved in other PRs:std::stringandbilingual_strstd::stringoverload forstrprintfto require compile-time checking