Skip to content

Conversation

@ryanofsky
Copy link
Contributor

@ryanofsky ryanofsky commented Oct 28, 2024

Add compile-time checking for literal format strings passed to strprintf and tfm::format to make sure the right number of format arguments are passed.

There is still no compile-time checking if non-literal std::string or bilingual_str format strings are passed, but this is improved in other PRs:

  • #31061 implements compile-time checking for translated strings
  • #31072 increases compile-time checking by using literal strings as format strings, instead of std::string and bilingual_str
  • #31149 may drop the std::string overload for strprintf to require compile-time checking

ryanofsky and others added 2 commits October 28, 2024 19:11
This is needed in the next commit to add compile-time checking to strprintf
calls, because bitcoin-cli.cpp uses dynamic width in many format strings.

This change is easiest to review ignoring whitespace.

Co-authored-by: MarcoFalke <*~=`'#}+{/-|&$^[email protected]>
Co-authored-by: Hodlinator <[email protected]>
Co-authored-by: l0rinc <[email protected]>
@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 28, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31174.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK maflcko, l0rinc, hodlinator

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31061 (refactor: Check translatable format strings at compile-time by maflcko)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. This looks less scary than expected. Left a nit to add more compile time checks, but this looks good either way.

Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 1d16d6e -> e6086e0 (pr/tcheck.1 -> pr/tcheck.2, compare) addressing comments and making ConstEvalFormatString parsing stricter to reject incomplete specifiers.
Updated e6086e0 -> e53829d (pr/tcheck.2 -> pr/tcheck.3, compare) cleaning up whitespace and comments.

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK e53829d

Cleanest attempt at increased compile time validation of format so far. When reviewing #31149 I had the gnawing feeling that more complete format string support would have reduced the diff, but pushed it away for expediency (an earlier attempt at more complete support was attempted in #30999).

@hodlinator
Copy link
Contributor

hodlinator commented Oct 30, 2024

Rebased #30933 on top of this PR (as a separate branch for now) and the mismatches between our custom consteval checking and tinyformat are gone as far as our current tests go - rebased commit: 32d4cf3

@ryanofsky
Copy link
Contributor Author

ryanofsky commented Oct 30, 2024

re: #31174 (comment)

mismatches between our custom consteval checking and tinyformat are gone as far as our current tests go

In case you do want a test with different behavior, I think you can use %n specifier which is not supported by tinyformat

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you do want a test with different behavior, I think you can use %n specifier which is not supported by tinyformat

Could document non-parity like so (unless you prefer I do it as part of #30933):

    // Non-parity
    int n{};
    BOOST_CHECK_EXCEPTION(tfm::format(std::string{"%n"}, n), tfm::format_error,
        HasReason{"tinyformat: %n conversion spec not supported"});
    ConstevalFormatString<1>::Detail_CheckNumFormatSpecifiers("%n");

Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated e53829d -> ecc5cb9 (pr/tcheck.3 -> pr/tcheck.4, compare) with review suggestions.

re: #31174 (review)

Could document non-parity like so (unless you prefer I do it as part of #30933):

I think that change doesn't really fit into this PR, since this PR isn't checking type characters. But it does make sense as part of #30933, so would be good to add there and I'd be happy to review it.

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK ecc5cb9

Implemented my suggestions (except comment removal suggestion) + broke out parse_size() since my last review.

util_string_tests tests passed locally.

Left one comment, but nothing blocking.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some nit ideas for more tests, but this is good either way.

review ACK ecc5cb9 🕯

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: review ACK ecc5cb9a89c6b001df839675b23d8fc1f7ac69ba 🕯
+oFB4Q8dHdvzp6J/1Ir4akTLS5GbDLpGOeKvcRP31CsusrUqTTnwOMie2fGfDcGYiEyKkNN9JiriK4ne+GSICw==

Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, simple approach, like it a lot!
I think we can simplify the validator a bit more - let me know what you think.

Also, not exactly sure why %n parity wasn't added like in https://github.com/bitcoin/bitcoin/pull/30999/files#diff-71badc1cc71ba46244f7841a088251bb294265f4fe9e662c0ad6b15be540eee4R69 (is it controversial or unnecessary or not useful)?

++it;
add_arg();
} else {
while ('0' <= *it && *it <= '9') ++it;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have two separate number "parsers" (one that keeps the result and one that throws it away), we might as well extract number parsing to a local lambda like you did with the other ones.

Diff
diff --git a/src/util/string.h b/src/util/string.h
--- a/src/util/string.h	(revision ecc5cb9a89c6b001df839675b23d8fc1f7ac69ba)
+++ b/src/util/string.h	(date 1731267170701)
@@ -45,14 +45,16 @@
                 continue;
             }
 
+            auto parse_number = [&] {
+                unsigned num{0};
+                for (; '0' <= *it && *it <= '9'; ++it) {
+                    num = num * 10 + (*it - '0');
+                }
+                return num;
+            };
+
             auto add_arg = [&] {
-                unsigned maybe_num{0};
-                while ('0' <= *it && *it <= '9') {
-                    maybe_num *= 10;
-                    maybe_num += *it - '0';
-                    ++it;
-                }
-
+                unsigned maybe_num = parse_number();
                 if (*it == '$') {
                     ++it;
                     // Positional specifier, like %8$s
@@ -75,7 +77,7 @@
                     ++it;
                     add_arg();
                 } else {
-                    while ('0' <= *it && *it <= '9') ++it;
+                    parse_number();
                 }
             };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31174 (comment)

we might as well extract number parsing to a local lambda like you did with the other ones.

This seems reasonable but I"m not sure it's clearer, and it does make the diff bigger replacing the maybe_num code that doesn't have to change currently. Happy to apply this suggestion if other reviews think it is a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hodlinator, @maflcko, what do you think?
I can ack without this as well, but I'd prefer reducing duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think while ('0' <= *it && *it <= '9') ++it; is fine. It is pretty standard self-explanatory code. I don't think a one-line while loop needs to be de-duplicated. Also, I like that the diff is minimal as-is now.

add_arg();

// Consume flags.
while (*it == '#' || *it == '0' || *it == '-' || *it == ' ' || *it == '+') ++it;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C++23 this could be a simple .contains, but even in C++20 we should be able to group the flags to something like:

Suggested change
while (*it == '#' || *it == '0' || *it == '-' || *it == ' ' || *it == '+') ++it;
while ("#0- +"sv.find(*it) != std::string_view::npos) ++it;

(we could even extract the flag in which case we could get rid of the comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time to return to C89? ;)

Suggested change
while (*it == '#' || *it == '0' || *it == '-' || *it == ' ' || *it == '+') ++it;
while (strchr("#0- +", *it)) ++it;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of that, but not sure how to make it work, I'm getting:

note: non-constexpr function 'strchr' cannot be used in a constant expression

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still rebooting 🧠 for this week, sorry for the noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31174 (comment)

I can apply the "#0- +"sv.find change if others like it, but to me it seems less readable and only a little shorter.

Copy link
Contributor

@l0rinc l0rinc Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be more readable and a bit shorter if we extract the flags as a variable and delete the comment stating the same - what do you think?

Copy link
Member

@maflcko maflcko Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to leave style-only nits to a follow-up, especially given that they will be temporary anyway until C++23. This pull request is basically ready for two weeks now, with more than 50 style-only or test-only comments. Unless there are any real issues or bugs with the code, and a foce-push needs to happen anyway, I don't really see the value of asking reviewers to go through more comments and code changes, some of which don't even compile.

// Unlike ConstevalFormatString this supports std::string for runtime string
// formatting without compile time checks.
template <unsigned num_params>
struct FormatStringCheck {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked and failures seem to be validated successfully from command line, but - unlike the previous versions - doesn't seem to be shown in the IDE... Weird :/

Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated ecc5cb9 -> fe39acf (pr/tcheck.4 -> pr/tcheck.5, compare) with suggested changes. Thanks for the reviews and suggestions!

add_arg();

// Consume flags.
while (*it == '#' || *it == '0' || *it == '-' || *it == ' ' || *it == '+') ++it;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31174 (comment)

I can apply the "#0- +"sv.find change if others like it, but to me it seems less readable and only a little shorter.

++it;
add_arg();
} else {
while ('0' <= *it && *it <= '9') ++it;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31174 (comment)

we might as well extract number parsing to a local lambda like you did with the other ones.

This seems reasonable but I"m not sure it's clearer, and it does make the diff bigger replacing the maybe_num code that doesn't have to change currently. Happy to apply this suggestion if other reviews think it is a good idea.

@ryanofsky
Copy link
Contributor Author

re: #31174 (review)

Also, not exactly sure why %n parity wasn't added

Thanks I hadn't seen #30999, and it seems like that would be a reasonable thing to check for, though I think there is a case for keeping the code as simple as possible and not trying to reproduce tinyformat quirks here. But the reason for not making that change here is I don't think it's related to this PR, and I think it's generally better to make separate changes n separate PRs so they can be evaluated correctly and discussed more clearly.

@maflcko
Copy link
Member

maflcko commented Nov 13, 2024

re-ACK fe39acf 🕐

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: re-ACK fe39acf88ff552bfc4a502c99774375b91824bb1 🕐
Gv99KVENxeKH2/4i5mCpvzXcxQ66+sVorM28bzE1QmucATBYm7QTTpcvRNaaGKsCRRtGOmk0QEigsfGUPldxCA==

@DrahtBot DrahtBot requested a review from hodlinator November 13, 2024 15:43
@l0rinc
Copy link
Contributor

l0rinc commented Nov 13, 2024

Thanks for improving developer productivity with these small changes <3

ACK fe39acf

Comment on lines +71 to +72
PassFmt<3>("'%- 0+*.*f'");
PassFmt<3>("'%1$- 0+*3$.*2$f'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the extra surrounding ' deliberate? If so, what do they mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31174 (comment)

I'm pretty sure they must be accidental. These cases came from #31174 (comment), and I just pasted them without noticing the single quotes. Can remove if the PR is updated again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ' are used to denote the begin and the end of the string, which would otherwise not be possible, because trailing spaces can normally not be seen when printing. They are not needed in this test and they are a leftover when I tested this against tinyformat at runtime.

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK fe39acf

git range-diff master ecc5cb9 fe39acf

  • Added more FailFmtWithError-tests (maflcko).
  • Terser skipping of %% (l0rinc).
  • Comment regarding format string components updated (me).

@fanquake fanquake merged commit f44e39c into bitcoin:master Nov 14, 2024
16 checks passed
@bitcoin bitcoin locked and limited conversation to collaborators Nov 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants