Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Clean up messy strformat and bilingual_str usages #31072

Merged
merged 4 commits into from
Dec 6, 2024

Conversation

ryanofsky
Copy link
Contributor

@ryanofsky ryanofsky commented Oct 11, 2024

This PR cleans up string formatting in the codebase so other PRs adding compile time checking can be simpler and easier to review (specifically #30928, #31061, #31074, and #31149).

Currently these PRs are hard to review because in addition to changing formatting APIs, they have to update callers that are using the API's in unusual ways. Clean up these callers now so later PRs can be simpler. Specifically:

  • Use string literals instead of std::string format strings to enable more compile-time checking.
  • Avoid using untranslated bilingual strings as format strings. Use originals so they can by checked at compile time.
  • Favor Untranslated(strprintf(...)) over strprintf(Untranslated(...), ...) for consistency and to prevent translated and untranslated strings from being unintentionally combined.

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 11, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31072.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK maflcko, l0rinc
Concept ACK hebasto

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31250 (wallet: Disable creating and loading legacy wallets by achow101)
  • #31061 (refactor: Check translatable format strings at compile-time by maflcko)
  • #30988 (Split CConnman by vasild)
  • #29770 (index: Check all necessary block data is available before starting to sync by fjahr)
  • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind the changes here, but I don't think the motivation is correct and it seems verbose and misleading.

I don't think it matters much whether this is merged or not, but you could just replace the motivation with:

"This replacement makes the implementation of some compile-time checks in the future easier. Also, it makes the code a bit more consistent in that Untranslated isn't allowed as a format-string anymore, only plain std::strings, or translated strings. The downside would be that developers may have to use operator+ more often, making it a bit harder to write code involving formatting of (un)translated strings. However, if it becomes too annoying, the changes here should be easy to revert."

src/init.cpp Outdated Show resolved Hide resolved
@ryanofsky ryanofsky changed the title scripted-diff: Replace strprintf(Untranslated(...)) with Untranslated(strprintf(...)) refactor: Clean up messy strformat and bilingual_str usages Oct 12, 2024
@ryanofsky
Copy link
Contributor Author

ryanofsky commented Oct 12, 2024

Updated 9a101d2 -> 022ffd1 (pr/bfmt.1 -> pr/bfmt.2, compare) to include more cleanups and help simplify implementation of compile-time translated string format checking in #31061 or #31074


re: #31072 (review)

I don't mind the changes here, but I don't think the motivation is correct and it seems verbose and misleading.

I don't know what was misleading, but I rewrote description and will paste old text in comment below in case it's useful. Sorry for verbosity, I like to think by writing, but it doesn't always result in the most helpful thing to read.

Original PR description

scripted-diff: Replace strprintf(Untranslated(...)) with Untranslated(strprintf(...))

This PR is just a scripted-diff replacing strprintf calls like:

strprintf(Untranslated("Error parsing command line arguments: %s"), error))

with:

Untranslated(strprintf("Error parsing command line arguments: %s", error))

Currently code is inconsistent and uses a mix of both styles, but second style is better because:

  1. It scans more cleanly. It is easier to read the strprintf call, and easier to think about because the whole string, not just one part, is passed to Untranslated() to convert from std::string to bilingual_str.
  2. It lets the compiler prevent janky UX where non-english fragments are inserted into english messages. One potential case where a bilingual_str was substituted into an english format string is fixed in the first commit of this PR.
  3. It doesn't waste work formatting the same string twice.
  4. It simplifies the translation API after #31061. Instead of Untranslated() function sometimes returning a bilingual_str, other times returning a Translatable<false> struct with a different set of fields, it can stay the way it is and simply convert std::string to bilingual_str.
  5. It simplifies the implementation of #31061, allowing two commits faf71be and fae6aca to be dropped, as well as other changes in other commits.

One potential downside of this change, pointed out #31061 (comment), is that after this PR if you want to change an untranslated format string to a translated one (or vice versa), you will have to to add _() around the format string and drop Untranslate() around the formatted string, instead of just literally replacing Untranslated with _. But I think this is good, because the change is still easy to make, and it lets you consider whether other parts of the strprintf call also should switch to translated strings to avoid jank of non-translated messages being embedded in translated strings (or vice versa).

An alternative to this PR could be a scripted-diff switching all code to use strprintf(Untranslated(...)) instead of Untranslated(strprintf(...)). I think this alternative would also be an improvement over the status quo, since it would also make the code more consistent, also simplify #31061, and also avoid needing to overload Untranslated() there to return objects with incompatible types. But I think this PR is better than the alternative for reasons 1-3 above, and better because it lets Untranslated remain a simple one-liner that converts a std::string to bilingual_str and is not involved in formatting.

Note: The scripted diff in this PR only removes current strprintf(Untranslated(...)) calls without adding enforcement to prevent new ones. But adding enforcement is easy with #31061 and simplifies that PR by dropping support for the strprintf(Untranslated(...)) syntax.

src/wallet/rpc/backup.cpp Outdated Show resolved Hide resolved
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Left some comments for context.

src/validation.cpp Show resolved Hide resolved
@@ -6091,7 +6091,7 @@ SnapshotCompletionResult ChainstateManager::MaybeCompleteSnapshotValidation()

auto rename_result = m_snapshot_chainstate->InvalidateCoinsDBOnDisk();
if (!rename_result) {
user_error = strprintf(Untranslated("%s\n%s"), user_error, util::ErrorString(rename_result));
user_error += Untranslated("\n") + util::ErrorString(rename_result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fdd1992: I am thinking that strprintf(Untranslated("something"), ...) is something that may be used normally. Right now, it may not be used extensively, or only used to concatenate two blobs. However, I can imagine that in the future someone may want to write strprintf(Untranslated("<a href="...">%s</a><br>%s<br>"), ...) to format some bilingual strings into html, which would be a bit more verbose by doing with plenty of +.

Not saying that this is something common, but I wanted to mention it.

Copy link
Contributor Author

@ryanofsky ryanofsky Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31072 (comment)

Thanks, we also talked about this offline and I agree that strprintf can often be nicer than concatenation for building strings, but I think this approach doesn't work very well for untranslated bilingual strings. For example in the HTML case, you would probably only want to format the translated strings as HTML, not both translated and original strings. I think the downsides of supporting untranslated format strings outweigh the upsides. (Downsides being having two inconsistent ways of formatting the same strings, having a more complicated interface, and potential for introducing mixed translation bugs.)

@hebasto
Copy link
Member

hebasto commented Oct 29, 2024

Concept ACK.

Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased 022ffd1 -> 46c55ab (pr/bfmt.2 -> pr/bfmt.3, compare) due to conflict with #31042, also addressing review comments.

src/wallet/rpc/backup.cpp Outdated Show resolved Hide resolved
src/qt/walletcontroller.cpp Outdated Show resolved Hide resolved
src/validation.cpp Show resolved Hide resolved
@@ -6091,7 +6091,7 @@ SnapshotCompletionResult ChainstateManager::MaybeCompleteSnapshotValidation()

auto rename_result = m_snapshot_chainstate->InvalidateCoinsDBOnDisk();
if (!rename_result) {
user_error = strprintf(Untranslated("%s\n%s"), user_error, util::ErrorString(rename_result));
user_error += Untranslated("\n") + util::ErrorString(rename_result);
Copy link
Contributor Author

@ryanofsky ryanofsky Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31072 (comment)

Thanks, we also talked about this offline and I agree that strprintf can often be nicer than concatenation for building strings, but I think this approach doesn't work very well for untranslated bilingual strings. For example in the HTML case, you would probably only want to format the translated strings as HTML, not both translated and original strings. I think the downsides of supporting untranslated format strings outweigh the upsides. (Downsides being having two inconsistent ways of formatting the same strings, having a more complicated interface, and potential for introducing mixed translation bugs.)

fanquake added a commit that referenced this pull request Nov 14, 2024
…t strings

fe39acf tinyformat: Add compile-time checking for literal format strings (Ryan Ofsky)
184f34f util: Support dynamic width & precision in ConstevalFormatString (Ryan Ofsky)

Pull request description:

  Add compile-time checking for literal format strings passed to `strprintf` and `tfm::format` to make sure the right number of format arguments are passed.

  There is still no compile-time checking if non-literal `std::string` or `bilingual_str` format strings are passed, but this is improved in other PRs:

  - [#31061](#31061) implements compile-time checking for translated strings
  - [#31072](#31072) increases compile-time checking by using literal strings as format strings, instead of `std::string` and `bilingual_str`
  - [#31149](#31149) may drop the `std::string`  overload for `strprintf` to [require](#31149 (comment)) compile-time checking

ACKs for top commit:
  maflcko:
    re-ACK fe39acf 🕐
  l0rinc:
    ACK fe39acf
  hodlinator:
    re-ACK fe39acf

Tree-SHA512: f1ddef0c96b9468c5ffe31b42dc19f1922583dd14f2e180b618d992c98614c5cc7db9f9cd917ef503f833bbc7dbec78e4489d0035416dce6840837e1d66d87cb
@ryanofsky
Copy link
Contributor Author

Updated 46c55ab -> 1e9ea7d (pr/bfmt.3 -> pr/bfmt.4, compare) with no changes. Just updated commit messages to be more accurate now that #31174 is merged.

@ryanofsky
Copy link
Contributor Author

ryanofsky commented Nov 15, 2024

Rebased 1e9ea7d -> 85df2fb (pr/bfmt.4 -> pr/bfmt.5, compare) after #31287 was merged (just for clarity, there were no merge conflicts)

Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 85df2fb

I have basically redone the change locally to understand it better and mostly left questions or nits, I'm fine with merging as is

src/node/interface_ui.cpp Show resolved Hide resolved
src/validation.cpp Show resolved Hide resolved
src/clientversion.cpp Outdated Show resolved Hide resolved
src/init.cpp Show resolved Hide resolved
src/init.cpp Show resolved Hide resolved
test/lint/run-lint-format-strings.py Outdated Show resolved Hide resolved
src/wallet/feebumper.cpp Outdated Show resolved Hide resolved
Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews!

Updated 85df2fb -> 3b24c24 (pr/bfmt.5 -> pr/bfmt.6, compare) fixing a commit description (no code changes)

src/validation.cpp Show resolved Hide resolved
test/lint/run-lint-format-strings.py Outdated Show resolved Hide resolved
src/node/interface_ui.cpp Show resolved Hide resolved
src/clientversion.cpp Outdated Show resolved Hide resolved
@l0rinc
Copy link
Contributor

l0rinc commented Nov 18, 2024

ACK 3b24c24

Instead just concatenate already formatted strings. This allows untranslated
format strings to be checked at compile time now, and translated format strings
to be checked at compile time in bitcoin#31061.
This could produce an english error message containing non-english string
fragments if PopulateAndValidateSnapshot started returning any translated
strings in the future. This change is also needed to make the next
scripted-diff commit work.
…anslated strings

This change manually removes two strprintf(Untranslated...) calls. All
remaining calls are removed in the next scripted-diff commit.

Removing these calls makes code more consistent and makes it easier to
implement compile-time checking enforcing that format strings contain valid
specifiers, by avoiding the need for the Untranslated() function to be involved
in formatting.

Additionally, using + and += instead of strprintf here makes code a little
shorter, and more type-safe because + unlike strprintf only works on strings of
the same type, making it less likely english strings and bilingual strings will
be unintentionally combined.
…rintf)

This makes code more consistent and makes it easier to add compile-time checking to
enforce that format strings contain the right specifiers, because it stops
using Untranslated() to create the format string, so the Untranslated()
function will not need to get involved in formatting.

-BEGIN VERIFY SCRIPT-
quote='"[^"]+"'
quotes="(?:$quote|\\s)*"
nonparens="[^()]*"
single_level_paren="\($nonparens\)"
double_level_paren="\($nonparens\($nonparens\)$nonparens\)"
exprs="(?:$double_level_paren|$single_level_paren|$nonparens)*"
git grep -l 'Untranslated' | xargs perl -0777 -i -pe "s/strprintf\((\\W*)Untranslated\(($quotes)\)($exprs)(\))/Untranslated(\1strprintf(\2\3))/gs"
-END VERIFY SCRIPT-
@ryanofsky
Copy link
Contributor Author

Rebased 3b24c24 -> 0184d33 (pr/bfmt.6 -> pr/bfmt.7, compare) with no changes to simplify after #31295

@maflcko
Copy link
Member

maflcko commented Dec 5, 2024

lgtm ACK 0184d33 🔹

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: lgtm ACK 0184d33b3d28fe78a2ee228417ebfd6f46fcaee5 🔹
z0E0h3Qb/aDEKJqchivwiOliKiEH2e+LZo6omZpyBzBgdPrlsBhmnxX+Q6JrVbQg5jglHOg6YU3NUgI7A4T/Aw==

@DrahtBot DrahtBot requested a review from l0rinc December 5, 2024 20:20
@l0rinc
Copy link
Contributor

l0rinc commented Dec 6, 2024

ACK 0184d33 - no overall difference because of the rebase

@fanquake fanquake merged commit 22723c8 into bitcoin:master Dec 6, 2024
18 checks passed
Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post-merge-ACK

Really cleaned the slate nicely for #31061.

Comment on lines -77 to -78
errors.push_back(strprintf(
Untranslated("New fee rate (%s) is lower than the minimum fee rate (%s) to get into the mempool -- "),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit curious about why the caliber of the scripted diff was so powerful, but I guess this multi-line case is it. Would also catch potential other cases like it which could have been merged between the parent commit of this PR and when it ended up being merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants