Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tinyformat: refactor: increase compile-time checks and don't throw for tfm::format_error #30928

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

stickies-v
Copy link
Contributor

@stickies-v stickies-v commented Sep 19, 2024

Avoid unexpected run-time crashes from string formatting by enforcing compile-time format string checks for most* tfm::format and strprintf usage, and by returning an error string instead of throwing for a run-time tfm::format_error so callsites don't need to implement this error handling.

This PR should introduce no behaviour change.
The main changes are:

  • remove the const std::string& tfm::format overload since it's not necessary. Usage of this overload is removed by one of:
    • replacing string concatenation in the format string with just an extra parameter
    • using the bilingual_str overload
    • using the tfm::format_raw functions (only for tinyformat implementation or tests)
  • rename the non-compile-time-validated tfm::format overloads to tfm::format_raw, so existing callsites by default use the safer util::ConstevalFormatString tfm::format overloads. Callsites that for some reason don't pass the compile-time checks (such as in bitcoin-cli.cpp) can use tfm::format_raw.
  • make tfm::format_error internal to tinyformat and just return error strings instead, so callsites don't need to implement error handling to prevent crashing.

*The bilingual_str format(const bilingual_str& fmt, const Args&... args) is not yet compile-time checked, which means the lint-format-strings.py test can't be removed yet.

Follow-up on #30889 (comment), an alternative to #15926.

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 19, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK maflcko, l0rinc
Concept ACK hodlinator

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31061 (refactor: Check translatable format strings at compile-time by maflcko)
  • #31042 (build: Rename PACKAGE_* variables to CLIENT_* by hebasto)
  • #30997 (build: Switch to Qt 6 by hebasto)
  • #30930 (netinfo: add peer services column and outbound-only option by jonatack)
  • #30221 (wallet: Ensure best block matches wallet scan state by achow101)
  • #29936 (fuzz: wallet: add target for CreateTransaction by brunoerg)
  • #29418 (rpc: provide per message stats for global traffic via new RPC 'getnetmsgstats' by vasild)
  • #28710 (Remove the legacy wallet and BDB dependency by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/30382700932

Hints

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

src/netbase.cpp Outdated Show resolved Hide resolved
src/util/string.h Outdated Show resolved Hide resolved
src/netbase.cpp Outdated Show resolved Hide resolved
src/test/util_string_tests.cpp Outdated Show resolved Hide resolved
src/util/string.h Outdated Show resolved Hide resolved
Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 6dc6e45

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/30701048188

Hints

Make sure to run all tests locally, according to the documentation.

The failure may happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@stickies-v stickies-v changed the title util: refactor: add and use run-time safe tinyformat::try_format tinyformat: refactor: increase compile-time checks and don't throw for tfm::format_error Sep 26, 2024
@stickies-v
Copy link
Contributor Author

Latest force-push increases the scope of this PR, see updated PR description for an overview. In a nutshell:

  • all non-bilingual_str format/strprintf usage is now compile-time checked. Format errors are returned as strings instead of thrown as error, but throwing behaviour can be maintained through tfm::format_raw.
  • the const std::string& format overload is removed because it's unnecessary

Thanks a lot @maflcko and @l0rinc for your review!

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Thanks for taking the feedback

src/tinyformat.h Outdated Show resolved Hide resolved
@stickies-v
Copy link
Contributor Author

Force-pushed to address @maflcko's comment about generally not letting tinyformat throw format errors, instead of just for the util::ConstevalFormatString overloads. Most notably, this changes behaviour for the bilingual_str format overload to also not throw.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review ACK bb12305 💅

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: review ACK bb12305737e3c22325930b5bdb04028f1a6f2ef6 💅
Fge/JMP/cEBpCNfrVWYhr7mIRY4zXGiZRkBzWtbVouQ4q6WPnKlif9AdGqIb4Jrmkp/sIcvaECTG9QyCy1/mBA==

src/wallet/test/fuzz/notifications.cpp Outdated Show resolved Hide resolved
@DrahtBot DrahtBot requested a review from l0rinc September 27, 2024 13:07
All (non-test) format strings are known at compile-time, so we
don't need the std::string overload. Remove it so users use the
safer util::ConstevalFormatString where possible, or the
bilingual_str overload where applicable.
Callsites such as bitcoin-cli that (incorrectly) fail compile-time
validation can still use tfm::format_raw.
@stickies-v
Copy link
Contributor Author

Force-pushed to rebase addressing a merge conflict, preserve tfm::format_error throwing behaviour for DEBUG builds to improve visibility into programming errors before they manifest, and improve an error string.


Thank you for your thoughtful review @ryanofsky. tl;dr I share your concerns, but I think I have a different opinion on the trade-offs made.

I think everyone here is in agreement that compile-time checks (ideally with std::format) are the proper way forward, and everything else is a temporary patch. I think there are differing views on what we do in the meanwhile. It appears, especially without std::format and having to deal with unverified translated strings, that implementing bulletproof compile-time checking on format strings is not really feasible or the best use of our time.

I don't think putting "error while formatting log message" in a std::string and then returning that std::string is a good way for a C++ API to return errors, especially errors that are almost certainly bugs in the code.

I agree that returning error strings is not a good way to handle errors in general, but I don't think throwing a std::runtime_error is a good way to deal with programming errors either. Typically, these would be handled with assert statements (which is also tinyformat's default), but crashing the node for malformed (usually) informational strings seems perhaps unnecessarily strict, in similar vein to our usage of CHECK_NONFATAL instead of assert?

Can you agree that the concept of not exposing tfm::format_error to callsites makes sense in our codebase, at least philosophically? Fomat errors should never occur at run-time, even if they still can at present. If you agree with that, then I think it all comes down to what we do before we have full compile-time checking? It appears you prefer keeping the increased visibility when errors occur, whereas in this PR I prioritize not crashing the software in case we do miss it or if it's part of an erroneously translated format string.

I've force-pushed to preserve tfm::format_error throwing behaviour in DEBUG builds to keep more of the visibility without reducing stability. If you still feel like that new behaviour is still not an improvement over master (which is a view I can absolutely understand), I will drop the last commit and rework the first ones a bit. Even though the intent of the original version of this PR was to fix a regression, I'd be happy to change course and keep just the first commits that force more compile-time checks, without changing run-time behaviour, which I think everyone here agrees is an improvement.

EDIT: I also think the new uses of .c_str() are gross, though I didn't look into why those are necessary and that is not my main objection to this PR.

The const std::string& overload was dropped so that the util::ConstevalFormatString overload could be forced for all string literals. Although I don't like the additional .c_str() usage this introduced, it is mostly limited to fuzz and bitcoin-cli, and I think is worth the trade-off?

I think tinyformat's original decision to throw exceptions when invalid format strings were specified was flexible and sound

Note: tinyformat natively asserts that there are no formatting errors, the tfm::format_error is Bitcoin Core-specific.

or an inconsistently enforced check that behaves differently based on build flags.

I don't think this is correct:

#define TINYFORMAT_USE_VARIADIC_TEMPLATES

(Also nit: the wording of that message doesn't make sense because tinyformat is used to format things other than log messages, and is also useful for low level string manipulation and things like formatting numbers or constructing urls)

Oops, good catch, thanks. This string was lifted from logging.h and I didn't properly update it - done now.

how would you feel about adding an Assume() statement that could fail Debug-builds when formatting fails?

Adding util/check.h to tinyformat.h introduced a couple of circular dependencies so instead I'm using #ifdef DEBUG, but the effect should be the same! (We could expand on this adding a TFM_THROW_FORMAT_ERROR flag (that defaults to set in DEBUG) so it can be added to more CI jobs without overhead if people would like that).

In almost all cases, invalid format strings should not lead to crashing
behaviour, so suppress run-time format errors by returning the error as
a string instead of throwing a tinyformat::format_error.

In DEBUG builds, tinyformat::format_error throwing behaviour is
preserved.

Note: most format string literals are partially validated at compile-time
through util::ConstevalFormatString. Notably, bilingual_str format strings
do not have this compile-time validation.
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a nit.

Only change is adding throw when DEBUG, which seems fine. Edit: However, I think any attempt to catch bugs at runtime in this context is fragile and inefficient. If bugs should be found, it would be better to catch them at (or before) compile-time with the compiler, or with a tool.

re-ACK 49f7307 🎖

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: re-ACK 49f73071a8ec4131595090a619d6198a5f8b7c3c 🎖
XSyui/vuov5yPzjlUZ4UeGQZHo/JZ6LsYy2ftwOqSStKhn/L2aHSd/uDBco6LvvTWd3tPsTlSY2y9myvd7H5Aw==

@@ -2003,7 +2003,7 @@ CWallet::ScanResult CWallet::ScanForWalletTransactions(const uint256& start_bloc
WalletLogPrintf("Scanning current mempool transactions.\n");
WITH_LOCK(cs_wallet, chain().requestMempoolTransactions(*this));
}
ShowProgress(strprintf("%s " + _("Rescanning…").translated, GetDisplayName()), 100); // hide progress dialog in GUI
ShowProgress(strprintf("%s %s", GetDisplayName(), _("Rescanning…").translated), 100); // hide progress dialog in GUI
Copy link
Member

@maflcko maflcko Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit in 4ce60d8: Could split the changes to make the format string a bit more const, and never exclusively translated in real code into a separate commit? The benefit would be that the doc changes and (constroversial?) c_str changes are in a separate commit. It would also allow easier cherry-picking into other pulls.

Edit: Done in commit fa788db (and the previous commit). Feel free to rebase on them, or not, or otherwise take them, and modify them as you like.

@maflcko
Copy link
Member

maflcko commented Oct 9, 2024

EDIT: I also think the new uses of .c_str() are gross, though I didn't look into why those are necessary and that is not my main objection to this PR.

I removed tfm::format_raw(fmt.original.c_str(), ... in #31061, where it remains just tfm::format(fmt.original, .... Obviously the two pulls conflict a bit, but the conflicts are trivial to solve either way.

@hodlinator
Copy link
Contributor

Worth bringing back the try/catch in src/test/fuzz/strprintf.cpp for Debug builds?

₿ cmake --preset=libfuzzer -DCMAKE_BUILD_TYPE=Debug
...
₿ cmake --build build_fuzz -j<X>
...
₿ FUZZ=str_printf build_fuzz/src/test/fuzz/fuzz ../qa-assets/fuzz_seed_corpus/str_printf/
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 734467281
INFO: Loaded 1 modules   (1292656 inline 8-bit counters): 1292656 [0x5d5503ee60a0, 0x5d5504021a10), 
INFO: Loaded 1 PC tables (1292656 PCs): 1292656 [0x5d5504021a10,0x5d55053db110), 
INFO:     1133 files found in ../qa-assets/fuzz_seed_corpus/str_printf/
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 16446 bytes
terminate called after throwing an instance of 'tinyformat::format_error'
  what():  tinyformat: Not enough conversion specifiers in format string
==59713== ERROR: libFuzzer: deadly signal
    #0 0x5d54f8395fea  (/home/hodlinator/bitcoin/build_fuzz/src/test/fuzz/fuzz+0x5bd1fea)
...
    #30 0x5d54f8245174  (/home/hodlinator/bitcoin/build_fuzz/src/test/fuzz/fuzz+0x5a81174)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 0 ; base unit: 0000000000000000000000000000000000000000


artifact_prefix='./'; Test unit written to ./crash-da39a3ee5e6b4b0d3255bfef95601890afd80709
Base64: 

@l0rinc
Copy link
Contributor

l0rinc commented Oct 23, 2024

I'm not in love with the DEBUG separation, but I'm fine with it if others are.

ACK 49f7307

git range-diff 4f33e9a85c3a8f546deddc2f78cf74bceff30315~4..4f33e9a85c3a8f546deddc2f78cf74bceff30315 aba9ce0eb6e67945f2209a17676f356fe9748 7e6..928538e3782ccbb22f54f4a3b1152b1faba95471

Diff
1:  1f2d0ffd56 < -:  ---------- move-only: ConstevalFormatString tfm::format to tinyformat.h
2:  c937ebf1ce = 1:  efa6f3f372 tinyformat: remove std::string overload
3:  be7de807c8 = 2:  9329e334eb tinyformat: force compile-time checks for format string literals
4:  4f33e9a85c ! 3:  928538e378 tinyformat: don't throw format string errors
    @@ Commit message
         behaviour, so suppress run-time format errors by returning the error as
         a string instead of throwing a tinyformat::format_error.
     
    +    In DEBUG builds, tinyformat::format_error throwing behaviour is
    +    preserved.
    +
         Note: most format string literals are partially validated at compile-time
         through util::ConstevalFormatString. Notably, bilingual_str format strings
         do not have this compile-time validation.
    @@ src/logging.h: template <typename... Args>
     -        try {
     -            log_msg = tfm::format(fmt, args...);
     -        } catch (tinyformat::format_error& fmterr) {
    --            /* Original format string will have newline so don't add one here */
     -            log_msg = "Error \"" + std::string{fmterr.what()} + "\" while formatting log message: " + fmt.fmt;
     -        }
     +        const auto log_msg{tfm::format(fmt, args...)};
    @@ src/test/util_string_tests.cpp
      #include <util/string.h>
      
      #include <boost/test/unit_test.hpp>
    + #include <test/util/setup_common.h>
      
     +#include <sstream>
     +
    @@ src/test/util_string_tests.cpp: BOOST_AUTO_TEST_CASE(ConstevalFormatString_NumSp
      
     +BOOST_AUTO_TEST_CASE(tinyformat_ConstevalFormatString)
     +{
    -+    // Ensure invalid format strings don't throw at run-time.
    -+    BOOST_CHECK_EQUAL(tfm::format("%*c", "dummy"), R"(Error "tinyformat: Cannot convert from argument type to integer for use as variable width or precision" while formatting log message: %*c)");
    -+    BOOST_CHECK_EQUAL(tfm::format("%2$*3$d", "dummy", "value"), R"(Error "tinyformat: Positional argument out of range" while formatting log message: %2$*3$d)");
    ++    // Ensure invalid format strings don't throw at run-time, when not in DEBUG
    ++#ifndef DEBUG
    ++    BOOST_CHECK_EQUAL(tfm::format("%*c", "dummy"), R"(Error "tinyformat: Cannot convert from argument type to integer for use as variable width or precision" while formatting: %*c)");
    ++    BOOST_CHECK_EQUAL(tfm::format("%2$*3$d", "dummy", "value"), R"(Error "tinyformat: Positional argument out of range" while formatting: %2$*3$d)");
     +    std::ostringstream oss;
     +    tfm::format(oss, "%.*f", 5);
    -+    BOOST_CHECK_EQUAL(oss.str(), R"(Error "tinyformat: Too many conversion specifiers in format string" while formatting log message: %.*f)");
    ++    BOOST_CHECK_EQUAL(oss.str(), R"(Error "tinyformat: Too many conversion specifiers in format string" while formatting: %.*f)");
    ++#endif
     +}
     +
     +
    @@ src/tinyformat.h: TINYFORMAT_FOREACH_ARGNUM(TINYFORMAT_MAKE_MAKEFORMATLIST)
     +    try {
     +        detail::formatImpl(out, fmt, list.m_args, list.m_N);
     +    } catch (tinyformat::format_error& fmterr) {
    -+        out << "Error \"" + std::string{fmterr.what()} + "\" while formatting log message: " + fmt;
    ++        out << "Error \"" + std::string{fmterr.what()} + "\" while formatting: " + fmt;
    ++#ifdef DEBUG
    ++        throw;
    ++#endif
     +    }
      }

Worth bringing back the try/catch in src/test/fuzz/strprintf.cpp for Debug builds?

We should definitely be able to see easily what the invalid values were (and add unit tests for those cases)

// Modified for Bitcoin Core to not throw for formatting errors
try {
detail::formatImpl(out, fmt, list.m_args, list.m_N);
} catch (tinyformat::format_error& fmterr) {
Copy link
Contributor

@l0rinc l0rinc Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we're already inside namespace tinyformat:

Suggested change
} catch (tinyformat::format_error& fmterr) {
} catch (format_error& fmterr) {

(same for formatImpl)

BOOST_AUTO_TEST_CASE(tinyformat_ConstevalFormatString)
{
// Ensure invalid format strings don't throw at run-time, when not in DEBUG
#ifndef DEBUG
Copy link
Contributor

@l0rinc l0rinc Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well put the whole test inside, instead of having an empty passing test when DEBUG

@stickies-v
Copy link
Contributor Author

I've carved out the first 3 commits into #31149, since other PRs (e.g. #31061 and #31072) somewhat rely on this, and everyone seems in agreement that adding compile-time checks is good. Existing ACKs should be trivially transferable.

Converting this PR to draft until #31149 is merged to keep the controversial error suppression discussion until later.

@hodlinator
Copy link
Contributor

Worth bringing back the try/catch in src/test/fuzz/strprintf.cpp for Debug builds?

We should definitely be able to see easily what the invalid values were (and add unit tests for those cases)

(Not sure that would add much value, we know the fuzzing will produce mostly garbage format strings that will trigger errors).

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

fanquake added a commit that referenced this pull request Dec 6, 2024
…ages

0184d33 scripted-diff: Replace strprintf(Untranslated) with Untranslated(strprintf) (Ryan Ofsky)
006e4d1 refactor: Use + instead of strformat to concatenate translated & untranslated strings (Ryan Ofsky)
831d2bf refactor: Don't embed translated string in untranslated string. (Ryan Ofsky)
0580219 refactor: Avoid concatenation of format strings (Ryan Ofsky)

Pull request description:

  This PR cleans up string formatting in the codebase so other PRs adding compile time checking can be simpler and easier to review (specifically #30928, #31061, #31074, and #31149).

  Currently these PRs are hard to review because in addition to changing formatting APIs, they have to update callers that are using the API's in unusual ways. Clean up these callers now so later PRs can be simpler. Specifically:

  - Use string literals instead of `std::string` format strings to enable more compile-time checking.
  - Avoid using untranslated bilingual strings as format strings. Use originals so they can by checked at compile time.
  - Favor `Untranslated(strprintf(...))` over `strprintf(Untranslated(...), ...)` for consistency and to prevent translated and untranslated strings from being unintentionally combined.

ACKs for top commit:
  maflcko:
    lgtm ACK 0184d33 🔹
  l0rinc:
    ACK 0184d33 - no overall difference because of the rebase

Tree-SHA512: 37eb771187d30977f5f054eddb82af6dd69878ace89cede72032bb389e57200898119f9fe486ce6903ebb00cb24648be215ab3e44842b3d206b35e26038da750
@DrahtBot
Copy link
Contributor

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants