Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tinyformat: enforce compile-time checks for format string literals #31149

Conversation

stickies-v
Copy link
Contributor

@stickies-v stickies-v commented Oct 24, 2024

Carved out of #30928 - this PR doesn't include the controversial change of suppressing tinyformat::format_error throwing behaviour, and just adds enforcement of compile-time checks for format string literals.

Reduce unexpected run-time crashes from string formatting by enforcing compile-time format string checks for most* tfm::format and strprintf usage.

This PR should introduce no behaviour change.
The main changes are:

  • remove the const std::string& tfm::format overload since it's not necessary. Usage of this overload is removed by one of:
    • replacing string concatenation in the format string with just an extra parameter
    • using the bilingual_str overload
    • using the new tfm::format_raw functions (only for tinyformat implementation or tests)
  • rename the non-compile-time-validated tfm::format overloads to tfm::format_raw, so existing callsites by default use the safer util::ConstevalFormatString tfm::format overloads. Callsites that for some reason don't pass the compile-time checks (such as in bitcoin-cli.cpp) can use tfm::format_raw.

See #31149 (comment) for an example on how this PR can prevent a run-time error for a small format string mistake.

* See #31061 for bilingual_str format(const bilingual_str& fmt, const Args&... args) compile-time checks.

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 24, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK l0rinc
Stale ACK maflcko, TheCharlatan, hodlinator

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31174 (tinyformat: Add compile-time checking for literal format strings by ryanofsky)
  • #30930 (netinfo: add peer services column and outbound-only option by jonatack)
  • #30221 (wallet: Ensure best block matches wallet scan state by achow101)
  • #28710 (Remove the legacy wallet and BDB dependency by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@maflcko
Copy link
Member

maflcko commented Oct 24, 2024

Only change since previous review (#30928 (review)) is dropping one commit.

re-ACK 4cf1221 🔡

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: re-ACK 4cf12216951e91550c81db807fe0ecfafe6834c9 🔡
+Wc7LAsLFEv1ml5wXBpRIhY/WbQ/TIxWPVXqF9OJkPsRA7bKmHlMBit5JhDzclgMU5hAfEPaV/gka44cL1CtBQ==

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 4cf1221

Only dropped commit I had concerns about. l0rinc too.

Does not entirely prevent throwing of exceptions from incorrect format strings (notably translated strings), but is a step in the right direction (validating more format strings at compile time).

@stickies-v stickies-v force-pushed the 2024-10/remove-string-format-overload branch from 4cf1221 to 23560b4 Compare October 25, 2024 11:31
Copy link
Contributor Author

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased to address merge-conflict from #29936. Also added commit 99b12b5 to minimize c_str() usage (an earlier review concern of @ryanofsky) and minimize code churn, as suggested by @maflcko.

src/test/fuzz/strprintf.cpp Outdated Show resolved Hide resolved
@maflcko
Copy link
Member

maflcko commented Oct 25, 2024

re-ACK 23560b4 🎛

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: re-ACK 23560b468c474bbc6a3abd6c0778da62766ac8f7 🎛
Ao1JTeGWaZ/P4YeyDzYeC0BXiuu/VqnjIy0SbF98ltv2ZirS7plmO4GDfWjCbWSTfMYGkSpCvhNTReEu8OBaAg==

@DrahtBot DrahtBot requested a review from hodlinator October 25, 2024 11:39
result += tfm::format_raw("<-> type net v mping ping send recv txn blk hb %*s%*s%*s ",
m_max_addr_processed_length, "addrp",
m_max_addr_rate_limited_length, "addrl",
m_max_age_length, "age");
Copy link
Member

@jonatack jonatack Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all the indentation churn really necessary?

Does changing strprintf to tfm_format_raw here catch any potential issues that wouldn't be seen in manual testing?

Reduce unexpected run-time crashes from string formatting

Can you provide an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all the indentation churn really necessary?

It's clang-format output which I believe is what we're trying to adhere to.

Does changing strprintf to tfm_format_raw here catch any potential issues that wouldn't be seen in manual testing?

The code does not compile with strprintf because this PR enforces compile-time checks, which fail because of the width specifiers in the format string (there's quite some discussion on these limitations in #30546). See also this excerpt from the PR description that addresses this:

  • rename the non-compile-time-validated tfm::format overloads to tfm::format_raw, so existing callsites by default use the safer util::ConstevalFormatString tfm::format overloads. Callsites that for some reason don't pass the compile-time checks (such as in bitcoin-cli.cpp) can use tfm::format_raw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang-format can be helpful for new code, but I don't think it's necessary to re-indent code for it that isn't otherwise touched -- that's just noisy churn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide an example of how this code could crash currently without this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang-format can be helpful for new code, but I don't think it's necessary to re-indent code for it that isn't otherwise touched -- that's just noisy churn.

I'm just going with our clang-format settings as defined in src/.clang-format - if you feel like those settings are suboptimal opening a separate pull/issue for that is probably more appropriate?

Please provide an example of how this code could crash currently without this change?

The PR states that there is no behaviour change, so (unless there are run-time bugs we haven't seen yet) we can't make this code crash without this change. The point of this PR is to make the code more robust against future changes, where a new/modified format string can cause a run-time error, e.g. if not all code-paths are manually tested by reviewers (which is not unusual).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no bikeshedding about this before on #31149, it just started.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By manual testing do you mean automated testing of print statements?

No, manual testing by building and running the CLI commands.

There is no automated test coverage of -netinfo (the touched code above), nor, for instance, of -addrinfo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By doing some of the validation at compile time, there is now some automated 'test coverage' on it - i.e. it will bite back if it's very wrong without any additional manual check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally unneeded changes can be avoided, but that was just a suggestion. Further historical context to get started:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful context, thanks!
We must however keep our kitchen clean, we can't just be cooking all the time.
This change by @stickies-v was a small, low-risk change, making the code slightly more maintainable (or at least slightly less unmaintainable).
We should encourage these.

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, ACK 23560b4

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 23560b4

Changes since previous ACK:

stickies-v and others added 4 commits October 28, 2024 18:54
In future commits, the non-ConstevalFormatString tfm::format
overloads will be renamed or removed. To avoid all callsites
having to include util/string.h (or to have a circular dependency
between tinyformat.h and util/string.h), just move the
ConstevalFormatString overloads to tinyformat.h
In a future commit, the tfm::format std::string overload will
be removed to enforce usage of a compile-time checked overload.

To minimize code churn in future commits, add a test-only
std::string tfm::format helper function.

Co-authored-by: MarcoFalke <*~=`'#}+{/-|&$^[email protected]>
All (non-test) format strings are known at compile-time, so we
don't need the std::string overload. Remove it so users use the
safer util::ConstevalFormatString where possible, or the
bilingual_str overload where applicable.
Callsites such as bitcoin-cli that (incorrectly) fail compile-time
validation can still use tfm::format_raw.
@stickies-v stickies-v force-pushed the 2024-10/remove-string-format-overload branch from 23560b4 to b448d60 Compare October 28, 2024 19:00
@stickies-v
Copy link
Contributor Author

stickies-v commented Oct 28, 2024

Force-pushed to rebase to address merge conflict from #31042, no other changes.

Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to #30928 I see the following changes:

  • clientversion.cpp: Merge conflict in clientversion at CopyrightHolders -> seems to be resolved correctly
  • logging.h: old LogPrintFormatInternal is retained
  • run-lint-format-strings.py: FALSE_POSITIVES not removed
  • util_string_tests.cpp: FALSE_POSITIVES not moved to tests
  • strprintf.cpp: fuzz_fmt added to avoid renames, try-catch added around fuzz_fmt calls (any reason for not try-catching inside fuzz_fmt instead?)
  • tinyformat.h: try-catch removed from vformat
  • wallet.h: new strprintf migrated

ACK b448d60

@ryanofsky
Copy link
Contributor

Sorry for causing unnecessary churn with my earlier comments. After reading Marco's comment in #31072 (comment) I finally understand what this PR is really trying to do, and why all the changes are being made here. Before reading that comment I thought this PR was trying not just to add compile-time checking for tinyformat format strings, but also to force calls that don't use compile-time checking to use different formatting functions like format_raw and fuzz_fmt. But now I see goal of this PR really is to enable compile-time checking, and the other changes here are basically just a side-effect of the approach taken to implement it.

That being said, I think the approach in this PR is not ideal. I think introduction of _raw() functions and uses of c_str() are ugly and unnecessary, and even though there are good changes here like code cleanup in various places, those changes should not be necessary just to have compile-time checking of literal format strings, and bundled in with less positive changes here.

I implemented an different approach in #31174 which only adds compile-time checking for strprintf/tfm::format string literals, and doesn't make other changes. If that PR looks ok, maybe we could use that approach, and then sort out which other changes here are actually improvements on their own terms? I would much prefer that to bundling good, bad, and ugly changes together in one PR.

@stickies-v
Copy link
Contributor Author

Closing this PR in favour of #31174, which I think achieves mostly the same goal but does so in a much more elegant way. Thanks for your review and suggestion @ryanofsky, and everyone else here for the persistent re-review as this work is evolving.

and the other changes here are basically just a side-effect of the approach taken to implement it.

I think there is merit in making the less safe (i.e. std::string overload) less convenient to use so that developers who are less familiar with this code don't accidentally use it when they don't have to, but I'll leave a comment on your PR so we can have the conversation there.

@stickies-v stickies-v closed this Oct 29, 2024
@ryanofsky
Copy link
Contributor

I think there is merit in making the less safe (i.e. std::string overload) less convenient to use so that developers who are less familiar with this code don't accidentally use it when they don't have to, but I'll leave a comment on your PR so we can have the conversation there.

I agree with this, and commit b6a39c8 in this PR shows places where code is doing things like strprintf("[" + comment + "]") that get around compile time checking. It'd be great to keep the changes in this PR which clean up that code, and add a more explicit syntax for passing unchecked strings. I think I would prefer a syntax more like strprintf(Unchecked(desc_fmt), ...) instead of tfm::format_raw(desc_fmt.c_str(), ...), but I mainly just think it is best to make these changes separately so PRs are easier to understand and don't touch so much code.

@maflcko
Copy link
Member

maflcko commented Nov 8, 2024

I agree that the changes here are still useful. It would be good to rebase on top of #31174, maybe after it is merged.

fanquake added a commit that referenced this pull request Nov 14, 2024
…t strings

fe39acf tinyformat: Add compile-time checking for literal format strings (Ryan Ofsky)
184f34f util: Support dynamic width & precision in ConstevalFormatString (Ryan Ofsky)

Pull request description:

  Add compile-time checking for literal format strings passed to `strprintf` and `tfm::format` to make sure the right number of format arguments are passed.

  There is still no compile-time checking if non-literal `std::string` or `bilingual_str` format strings are passed, but this is improved in other PRs:

  - [#31061](#31061) implements compile-time checking for translated strings
  - [#31072](#31072) increases compile-time checking by using literal strings as format strings, instead of `std::string` and `bilingual_str`
  - [#31149](#31149) may drop the `std::string`  overload for `strprintf` to [require](#31149 (comment)) compile-time checking

ACKs for top commit:
  maflcko:
    re-ACK fe39acf 🕐
  l0rinc:
    ACK fe39acf
  hodlinator:
    re-ACK fe39acf

Tree-SHA512: f1ddef0c96b9468c5ffe31b42dc19f1922583dd14f2e180b618d992c98614c5cc7db9f9cd917ef503f833bbc7dbec78e4489d0035416dce6840837e1d66d87cb
fanquake added a commit that referenced this pull request Dec 6, 2024
…ages

0184d33 scripted-diff: Replace strprintf(Untranslated) with Untranslated(strprintf) (Ryan Ofsky)
006e4d1 refactor: Use + instead of strformat to concatenate translated & untranslated strings (Ryan Ofsky)
831d2bf refactor: Don't embed translated string in untranslated string. (Ryan Ofsky)
0580219 refactor: Avoid concatenation of format strings (Ryan Ofsky)

Pull request description:

  This PR cleans up string formatting in the codebase so other PRs adding compile time checking can be simpler and easier to review (specifically #30928, #31061, #31074, and #31149).

  Currently these PRs are hard to review because in addition to changing formatting APIs, they have to update callers that are using the API's in unusual ways. Clean up these callers now so later PRs can be simpler. Specifically:

  - Use string literals instead of `std::string` format strings to enable more compile-time checking.
  - Avoid using untranslated bilingual strings as format strings. Use originals so they can by checked at compile time.
  - Favor `Untranslated(strprintf(...))` over `strprintf(Untranslated(...), ...)` for consistency and to prevent translated and untranslated strings from being unintentionally combined.

ACKs for top commit:
  maflcko:
    lgtm ACK 0184d33 🔹
  l0rinc:
    ACK 0184d33 - no overall difference because of the rebase

Tree-SHA512: 37eb771187d30977f5f054eddb82af6dd69878ace89cede72032bb389e57200898119f9fe486ce6903ebb00cb24648be215ab3e44842b3d206b35e26038da750
@maflcko
Copy link
Member

maflcko commented Dec 7, 2024

I think I would prefer a syntax more like strprintf(Unchecked(desc_fmt), ...) instead of tfm::format_raw(desc_fmt.c_str(), ...)

Due to inactivity here, I went ahead and implemented this syntax in #31061, with a different name.

So I think everything here has been picked up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants