Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Check translatable format strings at compile-time #31061

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

maflcko
Copy link
Member

@maflcko maflcko commented Oct 9, 2024

All translatable format strings are fixed. This change surfaces errors in them at compile-time.

The implementation achieves this by allowing to delay the translation (or std::string construction) that previously happened in _() by returning a new type from this function. The new type can be converted to bilingual_str where needed.

This can be tested by adding a format string error in an original string literal and observing a new compile-time failure.

Fixes #30530

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 9, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31061.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK hodlinator, ryanofsky
Concept ACK stickies-v
Stale ACK l0rinc

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31375 (multiprocess: Add bitcoin wrapper executable by ryanofsky)
  • #31296 (wallet: Translate [default wallet] string in progress messages by ryanofsky)
  • #30343 (wallet, logging: Replace WalletLogPrintf() with LogInfo() by ryanofsky)
  • #30221 (wallet: Ensure best block matches wallet scan state by achow101)
  • #28710 (Remove the legacy wallet and BDB dependency by achow101)
  • #26022 (Add util::ResultPtr class by ryanofsky)
  • #25722 (refactor: Use util::Result class for wallet loading by ryanofsky)
  • #25665 (refactor: Add util::Result failure values, multiple error and warning messages by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot changed the title refactor: Check original (translatable) format string at compile-time refactor: Check original (translatable) format string at compile-time Oct 9, 2024
@maflcko maflcko changed the title refactor: Check original (translatable) format string at compile-time refactor: Check translatable format strings at compile-time Oct 9, 2024
@maflcko maflcko force-pushed the 2410-trans branch 2 times, most recently from fac7229 to fabb28f Compare October 9, 2024 12:46
@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 9, 2024

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/31292791221

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK fabb28f. This looks great, and implementation is very clean. However, I think it could be simplified significantly if there were a scripted diff commit to replace strprintf(Untranslated("..."), ...) with Untranslated(strprintf("...", ...)), which would be a good change on its own for clarity and consistency. More details in comments below.

src/clientversion.cpp Outdated Show resolved Hide resolved
src/util/string.h Outdated Show resolved Hide resolved
src/test/util_string_tests.cpp Outdated Show resolved Hide resolved
@@ -107,7 +107,7 @@ std::optional<ConfigError> InitConfig(ArgsManager& args, SettingsAbortFn setting
}
}
} catch (const std::exception& e) {
return ConfigError{ConfigStatus::FAILED, Untranslated(e.what())};
return ConfigError{ConfigStatus::FAILED, Untranslated(std::string{e.what()})};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "refactor: Mark run-time Untranslated() c_str with std::string" (fa213aa)

I think this change is unfortunate and doesn't make sense. It seems like it could be a avoided with a small scripted diff to replace strprintf(Untranslated("..."), ...) with Untranslated(strprintf("...", ...)) and that replacement would make the code clearer and more consistent anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it could be a avoided with a small scripted diff to replace strprintf(Untranslated("..."), ...) with Untranslated(strprintf("...", ...)) and that replacement would make the code clearer and more consistent anyway.

Are you sure? The two are not equivalent, because args can also be a bilingual_str, in which case the scripted-diff does not compile. In any case, it seems unrelated to this pull request and I don't see how it "simplifies" stuff and it touches different lines anyway, so maybe a separate issue or pull request would be better to explore this?

I think this change is unfortunate and doesn't make sense.

Can you explain which part doesn't make sense and how this relates to strprintf and tinyformat at all? I can't see any tfm use in the line you commented on.

Copy link
Contributor

@ryanofsky ryanofsky Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31061 (comment)

Thanks, it's a good point that Untranslated(strprintf) is not equivalent to strprintf(Untranslated) in all cases. But I think those cases are basically broken because they are substituting non-english string fragments inside english strings.

I created a branch which removes these (4) cases and adds a scripted-diff to remove the remaining strprintf(Untranslated) uses. This allows a lot of simplifications to this PR:

  • Untranslated function is simpler. Has a single definition, and is not overloaded to return different things depending on whether it is passed a std:::string or a c string.
  • "Original" struct no longer requires a template parameter, it can be a simple struct.
  • This commit can be dropped, the clang tidy commit can be dropped, and there's no need to convert add manual conversions from Untranslated to bilingual_str next commit.

Branch is https://github.com/ryanofsky/bitcoin/commits/pr/tfmt, commits are:

  • ab4e567 refactor: Don't embed translated strings in untranslated strings.
  • d5a8e2b scripted-diff: Replace strprintf(Untranslated) with Untranslated(strprintf)
  • 5e0b4f9 refactor: Check wallet format strings at compile time
  • 6183767 refactor: Pick translated string after format
  • 71d3826 move-only: Detail_CheckNumFormatSpecifiers and G_TRANSLATION_FUN
  • 86ea584 refactor: Delay translation of Untranslated() or _() literals
  • 08bab04 refactor: Check translatable format strings at compile-time

Copy link
Member Author

@maflcko maflcko Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think those cases are basically broken because they are substituting non-english string fragments inside english strings.

I am not sure I agree. Untranslated("%s:%i ") doesn't look English to me. To me, it looks like Untranslated() is sometimes used to annotate format strings that are valid in any language and do not need to be translated.

And your fix doesn't prevent non-English fragments in English strings anyway, because you are using operator+ to concatenate the English/Untranslated string with a translated (_()) string.

So I am not sure your suggestion is easy for developers to follow (having to sometimes emulate formatting with operator+ and conversions, and other times not), but maybe I am missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To extend a bit: Previously (and after my pull right now), it was easy to toggle between translation and non-translation by replacing _ with Untranslated (or vice-versa):

-strprintf(Untranslated("Cannot create database file %s"), filename);
+strprintf(           _("Cannot create database file %s"), filename);

With your change, the function call order has to be changed as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31061 (comment)

I don't see ability to swap _ and Untranslated as a real benefit here, because it is pretty easy to switch between translated and non translated format strings either way, and if you are doing that it probably makes sense to actually look at format string arguments and see if they should be translated or untranslated for consistency. The PR also already loses the ability to swap between _ and Untranslated in other places anyway since bilingual_str and Translatable structs are incompatible.

You are right that Untranslated("%s:%i ") is not an english string and the commit message in ab4e567 wasn't doing an a good job of describing changes there. I think most of the 54 cases changed in d5a8e2b are untranslated English strings though, and having a compile time check to make sure non-english strings aren't inserted accidentally inside them is a good thing.

If you really disagree with this and prefer strprintf(Untranslated(...)) over Untranslated(strprintf(...)) I think we could also go in that direction with the opposite scripted-diff. I just think trying to support both and making Untranslated function return different things when it is called with a char* instead of std::string, and having to add extra casts here to work around that in this commit is unfortunate.

It seems like implementation of this PR could be simpler, API can be simpler, and code can be more consistent by having Untranslated return one thing or the other instead of both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31061 (comment)

Opened #31072 to implement suggested simplification. Marking this thread resolved since it it's not meant to block this PR in any case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Now that it is merged, I went ahead and pushed commit 08bab04 from #31061 (comment) into this pull (rebased)

src/test/result_tests.cpp Outdated Show resolved Hide resolved
src/util/translation.h Outdated Show resolved Hide resolved
src/util/translation.h Outdated Show resolved Hide resolved
src/util/translation.h Outdated Show resolved Hide resolved
Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, and this is a smaller diff than I'd expected to implement this, nice!

I need to think about it more and work out some code, but I feel like the new interface is not as intuitive yet as it could be. Will follow up with that in next review.

src/util/translation.h Outdated Show resolved Hide resolved
}
/** Type to denote whether an orginal string literal is translatable */
template <bool translatable = true>
struct Original {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in aaaa4fb: seems more straightforward to just extend ConstevalStringLiteral instead of creating a new util::Original? This would 1) minimize the diff and 2) I find ConstevalStringLiteral a much more helpful name than Original.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31061 (comment)

in aaaa4fb: seems more straightforward to just extend ConstevalStringLiteral instead of creating a new util::Original? This would 1) minimize the diff and 2) I find ConstevalStringLiteral a much more helpful name than Original.

I think this is only a question of naming since Original class is replacing ConstevalStringLiteral this commit. In by branch I called this ConstEvalTranslated, since it's the type name for values returned from the underscore function, and is not just referring to a generic string literal. But I didn't put much thought into the name and something else may be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in aaaa4fb: seems more straightforward to just extend ConstevalStringLiteral instead of creating a new util::Original? This would 1) minimize the diff and 2) I find ConstevalStringLiteral a much more helpful name than Original.

Are you sure it would minimize the diff? Original needs to deal with the concept of translations, which seems out-of-scope for ConstevalStringLiteral. Having two separate types to only deal with the things they care about seems easier. However, I may be missing something and if you find a smaller diff, I am happy to consider it.

I've renamed the type to Translatable<bool _>.

@@ -431,7 +431,7 @@ void RestoreWalletActivity::finish()
QMessageBox::warning(m_parent_widget, tr("Restore wallet warning"), QString::fromStdString(Join(m_warning_message, Untranslated("\n")).translated));
} else {
//: Title of message box which is displayed when the wallet is successfully restored.
QMessageBox::information(m_parent_widget, tr("Restore wallet message"), QString::fromStdString(Untranslated("Wallet restored successfully \n").translated));
QMessageBox::information(m_parent_widget, tr("Restore wallet message"), QString::fromStdString(Untranslated("Wallet restored successfully \n").translate()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
QMessageBox::information(m_parent_widget, tr("Restore wallet message"), QString::fromStdString(Untranslated("Wallet restored successfully \n").translate()));
QMessageBox::information(m_parent_widget, tr("Restore wallet message"), QString::fromStdString("Wallet restored successfully \n"));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31061 (comment)

Would be reasonable to change this but no more change is actually required here with my suggested branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31061 (comment)

Simplified this further in 6fb3965 from #31072. It might just be a bug that this string is not translated, and there also seems to be trailing whitespace in the string. Current commit just simplifies code and keeps existing behavior though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this suggestion. If you disagree, you can continue discussion in #31072 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hunk isn't needed anymore, so I dropped it.

src/util/translation.h Outdated Show resolved Hide resolved
return G_TRANSLATION_FUN ? G_TRANSLATION_FUN(lit) : lit;
}
/** Type to denote whether an orginal string literal is translatable */
template <bool translatable = true>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would prefer templating this on an enum {Translateable,Untranslateable} for readability over the non-obvious <{true,false}> boolean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would prefer templating this on an enum {Translateable,Untranslateable} for readability over the non-obvious <{true,false}> boolean

Note: with changes in suggested branch it should no longer be necessary for this to be templated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed the type to Translatable<bool _>, which should clear this up without adding an enum?

The type is only supposed to be used internally, so I can also wrap it into a detail:: if you think this is helpful?

@jarolrod
Copy link
Member

concept ack

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK fae6aca. This is a nice improvement overall, and definitely better than the status quo due to extra type checking it provides. I do think there is extra complexity here and extra changes that could go away if we adopt #31072 and stop passing untranslated format strings to strprintf.

I also think it probably would be nicer if _() would continue to look up translations right away, instead of delaying them until conversion to bilingual str. I think it could return a bilingual_str with extra compile-time information instead of returning different type with .lit and .translate() fields. I'm not sure this is possible though, and it's likely there's something I'm overlooking.

EDIT: I am implemented an approach in #31074 that makes _() and Untranslated() both return bilingual_str with .original and .translated fields and no not delay the time of translation. Unfortunately, because C++ does not currently allow marking function parameters constexpr, it means strings needs to be passed as template parameters instead of function parameters, resulting in a messier syntax, even though the implementation and API are simpler. I closed the PR because it is not a clear win over this one.

@@ -107,7 +107,7 @@ std::optional<ConfigError> InitConfig(ArgsManager& args, SettingsAbortFn setting
}
}
} catch (const std::exception& e) {
return ConfigError{ConfigStatus::FAILED, Untranslated(e.what())};
return ConfigError{ConfigStatus::FAILED, Untranslated(std::string{e.what()})};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #31061 (comment)

I don't see ability to swap _ and Untranslated as a real benefit here, because it is pretty easy to switch between translated and non translated format strings either way, and if you are doing that it probably makes sense to actually look at format string arguments and see if they should be translated or untranslated for consistency. The PR also already loses the ability to swap between _ and Untranslated in other places anyway since bilingual_str and Translatable structs are incompatible.

You are right that Untranslated("%s:%i ") is not an english string and the commit message in ab4e567 wasn't doing an a good job of describing changes there. I think most of the 54 cases changed in d5a8e2b are untranslated English strings though, and having a compile time check to make sure non-english strings aren't inserted accidentally inside them is a good thing.

If you really disagree with this and prefer strprintf(Untranslated(...)) over Untranslated(strprintf(...)) I think we could also go in that direction with the opposite scripted-diff. I just think trying to support both and making Untranslated function return different things when it is called with a char* instead of std::string, and having to add extra casts here to work around that in this commit is unfortunate.

It seems like implementation of this PR could be simpler, API can be simpler, and code can be more consistent by having Untranslated return one thing or the other instead of both.

src/util/translation.h Outdated Show resolved Hide resolved
@DrahtBot DrahtBot requested a review from stickies-v October 11, 2024 13:53
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Oct 12, 2024
Noticed by stickies-v
bitcoin#31061 (comment)

This change makes it easier to update the Untranslated() definition in that PR.
@maflcko maflcko marked this pull request as draft October 24, 2024 17:36
@ryanofsky
Copy link
Contributor

Maybe I am missing something obvious, but if there is no bug in msvc, it would be good to know the exact bug in this code.

I think the following line from fad431e is buggy because it is saving the result of c_str() call from a string that about to be destroyed:

FormatStringCheck(RuntimeFormat run) : fmt{run.fmt.c_str()} {}

The run.fmt string object will be destroyed when the FormatStringCheck constructor finishes because run is a non-reference parameter, so it's a effectively a local variable within FormatStringCheck method.

@ryanofsky
Copy link
Contributor

ryanofsky commented Dec 10, 2024

I think the following line from fad431e is buggy because it is saving the result of c_str() call from a string that about to be destroyed:

Looking at this more I think the constructor is not buggy in c++17 as long as a temporary object is being passed as the constructor argument, because the mandatory copy elision described at the top of https://en.cppreference.com/w/cpp/language/copy_elision should apply, so the temporary should not be copied, and the temporary should have the lifetime of the full-expression. So maybe MSVC is at fault here, either for not applying mandatory copy-elision, or for not extending the lifetime of the constructor argument to outlive the constructor when it does apply copy-elision.

I do think the FormatStringCheck constructor in fad431e is inherently fragile though, because it will be copying the constructor argument if a non-rvalue is passed, and calling .c_str() on the temporary copy that is about to be destroyed. It seems better to just use references and LIFETIMEBOUND as originally suggested and avoid temporaries and copies.

@hodlinator
Copy link
Contributor

I'd prefer it if every TranslatedLiteral instance didn't have a function pointer riding on the side, but I do like the added testability.

One could make the test-implementation of G_TRANSLATION_FUN support hooking it for specific tests. It still comes with some added overhead when it comes to benchmarks and fuzzing over current master though.

Diff
diff --git a/src/test/translation_tests.cpp b/src/test/translation_tests.cpp
index 2ebde95259..bcbfe9db34 100644
--- a/src/test/translation_tests.cpp
+++ b/src/test/translation_tests.cpp
@@ -2,6 +2,7 @@
 // Distributed under the MIT software license, see the accompanying
 // file COPYING or http://www.opensource.org/licenses/mit-license.php.
 
+#include <test/util/setup_common.h>
 #include <tinyformat.h>
 #include <util/translation.h>
 
@@ -9,21 +10,24 @@
 
 BOOST_AUTO_TEST_SUITE(translation_tests)
 
-static TranslateFn translate{[](const char * str) {  return strprintf("t(%s)", str);  }};
-
 BOOST_AUTO_TEST_CASE(translation_namedparams)
 {
-    constexpr util::TranslatedLiteral format{"original [%s]", &translate};
+    assert(!G_TRANSLATION_FUN_HOOK);
+    G_TRANSLATION_FUN_HOOK = [](const char * str) { return strprintf("t(%s)", str); };
+
+    constexpr util::TranslatedLiteral format{"original [%s]"};
 
     bilingual_str arg{"original", "translated"};
     bilingual_str result{strprintf(format, arg)};
     BOOST_CHECK_EQUAL(result.original, "original [original]");
     BOOST_CHECK_EQUAL(result.translated, "t(original [translated])");
 
-    util::TranslatedLiteral arg2{"original", &translate};
+    util::TranslatedLiteral arg2{"original"};
     bilingual_str result2{strprintf(format, arg2)};
     BOOST_CHECK_EQUAL(result2.original, "original [original]");
     BOOST_CHECK_EQUAL(result2.translated, "t(original [t(original)])");
+
+    G_TRANSLATION_FUN_HOOK = {};
 }
 
 BOOST_AUTO_TEST_SUITE_END()
diff --git a/src/test/util/setup_common.cpp b/src/test/util/setup_common.cpp
index 12feba09a5..947f1e1ca7 100644
--- a/src/test/util/setup_common.cpp
+++ b/src/test/util/setup_common.cpp
@@ -72,7 +72,11 @@ using node::LoadChainstate;
 using node::RegenerateCommitments;
 using node::VerifyLoadedChainstate;
 
-const std::function<std::string(const char*)> G_TRANSLATION_FUN = nullptr;
+std::function<std::string(const char*)> G_TRANSLATION_FUN_HOOK;
+
+const std::function<std::string(const char*)> G_TRANSLATION_FUN{[] (const char* str) {
+    return G_TRANSLATION_FUN_HOOK ? G_TRANSLATION_FUN_HOOK(str) : str;
+}};
 
 constexpr inline auto TEST_DIR_PATH_ELEMENT{"test_common bitcoin"}; // Includes a space to catch possible path escape issues.
 /** Random context to get unique temp data dirs. Separate from m_rng, which can be seeded from a const env var */
diff --git a/src/test/util/setup_common.h b/src/test/util/setup_common.h
index b0f7bdade2..f033f20dc8 100644
--- a/src/test/util/setup_common.h
+++ b/src/test/util/setup_common.h
@@ -43,6 +43,8 @@ extern const std::function<std::vector<const char*>()> G_TEST_COMMAND_LINE_ARGUM
 /** Retrieve the unit test name. */
 extern const std::function<std::string()> G_TEST_GET_FULL_NAME;
 
+extern std::function<std::string(const char*)> G_TRANSLATION_FUN_HOOK;
+
 static constexpr CAmount CENT{1000000};
 
 /** Register common test args. Shared across binaries that rely on the test framework. */
diff --git a/src/util/translation.h b/src/util/translation.h
index 3a9316479d..d618b22ee2 100644
--- a/src/util/translation.h
+++ b/src/util/translation.h
@@ -13,8 +13,7 @@
 #include <string>
 
 /** Translate a message to the native language of the user. */
-using TranslateFn = std::function<std::string(const char*)>;
-const extern TranslateFn G_TRANSLATION_FUN;
+const extern std::function<std::string(const char*)> G_TRANSLATION_FUN;
 
 /**
  * Bilingual messages:
@@ -54,10 +53,9 @@ namespace util {
 //! Compile-time literal string that can be translated with an optional translation function.
 struct TranslatedLiteral {
     const char* const original;
-    const TranslateFn* translate_fn;
 
-    consteval TranslatedLiteral(const char* str, const TranslateFn* fn = &G_TRANSLATION_FUN) : original{str}, translate_fn{fn} { assert(original); }
-    operator std::string() const { return translate_fn && *translate_fn ? (*translate_fn)(original) : original; }
+    consteval TranslatedLiteral(const char* str) : original{str} { assert(original); }
+    operator std::string() const { return G_TRANSLATION_FUN ? G_TRANSLATION_FUN(original) : original; }
     operator bilingual_str() const { return {original, std::string{*this}}; }
 };

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK fa7cb52. Looks good but did leave some suggestions which could be saved for a follow-up.


re: #31061 (comment)

I'd prefer it if every TranslatedLiteral instance didn't have a function pointer riding on the side, but I do like the added testability.

I don't think the TranslatedLiteral::translate_fn members should add overhead because TranslatedLiteral constructor is consteval, and in any case TranslatedLiteral objects are ephemeral, they exist only to call a translation function one time and then disappear. (The additional pointers also technically aren't function pointers, but pointers to a std::function object.)

The goal of the pointers isn't just improve testability but also to provide extensibility, so for example a libbitcoinkernel application could use define its own _ translation function and seamlessly use strprintf with its own translated strings. Having the pointers also avoids the need to mutate shared global state during tests so, for example, unlike with the suggested change, other tests could run in parallel with the translation test without being affected, and if translation test throws, it will not leave G_TRANSLATION_FUN pointing at the wrong function.

src/tinyformat.h Outdated Show resolved Hide resolved
BOOST_AUTO_TEST_CASE(translation_namedparams)
{
constexpr util::TranslatedLiteral format{"original [%s]", &translate};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "refactor: Check translatable format strings at compile-time" (fa94825)

Could consider making the test a little more realistic with:

--- a/src/test/translation_tests.cpp
+++ b/src/test/translation_tests.cpp
@@ -11,17 +11,22 @@ BOOST_AUTO_TEST_SUITE(translation_tests)
 
 static TranslateFn translate{[](const char * str) {  return strprintf("t(%s)", str);  }};
 
+// Custom translation function _t(), similar to _() but internal to this test.
+static inline consteval auto _t(util::TranslatedLiteral str)
+{
+    str.translate_fn = &translate;
+    return str;
+}
+
 BOOST_AUTO_TEST_CASE(translation_namedparams)
 {
-    constexpr util::TranslatedLiteral format{"original [%s]", &translate};
-
     bilingual_str arg{"original", "translated"};
-    bilingual_str result{strprintf(format, arg)};
+    bilingual_str result{strprintf(_t("original [%s]"), arg)};
     BOOST_CHECK_EQUAL(result.original, "original [original]");
     BOOST_CHECK_EQUAL(result.translated, "t(original [translated])");
 
     util::TranslatedLiteral arg2{"original", &translate};
-    bilingual_str result2{strprintf(format, arg2)};
+    bilingual_str result2{strprintf(_t("original [%s]"), arg2)};
     BOOST_CHECK_EQUAL(result2.original, "original [original]");
     BOOST_CHECK_EQUAL(result2.translated, "t(original [t(original)])");
 }

@bitcoin bitcoin deleted a comment from waurah Dec 11, 2024
@hodlinator
Copy link
Contributor

I don't think the TranslatedLiteral::translate_fn members should add overhead because TranslatedLiteral constructor is consteval

Good point, I glossed over TranslatedLiteral being consteval.

Having the pointers also avoids the need to mutate shared global state during tests

Current CTest executes tests in parallel through starting separate processes, but it might change at some point and keeping it less brittle is good.

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK fa7cb52

git range-diff master fa6b1d2 fa7cb52

Really like how _() is sufficient without .translated/.translate() thanks to TranslatedLiterals operator std::string.


Checked out fa37083.

Ran linter being removed in following commit (fa7cb52):

₿ test/lint/lint-format-strings.py 
src/test/fuzz/util/wallet.h: Expected 0 argument(s) after format string but found 2 argument(s): strprintf(tfm::RuntimeFormat{desc_fmt}, "[5aa9973a/66h/4h/2h]" + seed_insecure, int{internal})
src/test/translation_tests.cpp: Expected 0 argument(s) after format string but found 1 argument(s): strprintf(format, arg2)

Might want to do something about that if we want all commits to pass, not blocking for me though.

Comment on lines 8 to +9
#include <tinyformat.h>
#include <util/string.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Arguably tinyformat is more external than util/string.h, but our copy lives with our code so it's certainly not clear-cut.

Suggested change
#include <tinyformat.h>
#include <util/string.h>
#include <util/string.h>
#include <tinyformat.h>

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/34246270269

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@hodlinator
Copy link
Contributor

nit: Noticed a typo in the linter removal commit message:

"The linter has many >implmentation< bugs and missing features."

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK fadd2e6. Just rebased and added lifetimebound and test suggestions since last review. Appreciate your patience with all the review comments and iterations of this PR. Will be really nice to see this merged with the new compiler checks being safer and more convenient than current runtime errors.

return arg;
}
}};
const auto translate_arg{[](const auto& arg) -> const auto& {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "refactor: Check translatable format strings at compile-time" (fa0b20a)

Not important, but probably would have made sense to call this translated_arg instead of translate_arg since this is not actually translating the argument or calling the translation function, but just returning a reference to translated or translatable version of the argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, done

src/tinyformat.h Outdated Show resolved Hide resolved
@DrahtBot DrahtBot requested a review from hodlinator December 11, 2024 15:33
Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK fadd2e6

git range-diff master fa7cb52 fadd2e6

  • Mysteriously grinding out fa-prefixed hashes.
  • Changed order of linter removal so that it doesn't error on what is now the last commit.
  • Adapted unit tests further after feedback.
  • Reconciled silent merge conflict with test: Prove+document ConstevalFormatString/tinyformat parity #30933.
  • Added LIFETIMEBOUND to at least serve as developer documentation even though Clang isn't able to fully use it.

nit: Did not fix typo: #31061 (comment)

MarcoFalke and others added 4 commits December 16, 2024 12:52
This is required for a future commit that requires _() to be consteval
for format literals.

Co-Authored-By: Ryan Ofsky <[email protected]>
The linter has many implementation bugs and missing features.

Also, it is completely redundant with FormatStringCheck, which
constructs from ConstevalFormatString or a runtime format string.
This brings the format types closer to the standard library types:

* FormatStringCheck corresponds to std::basic_format_string, with
  compile-time checks done via ConstevalFormatString
* RuntimeFormat corresponds to std::runtime_format, with no compile-time
  checks done.

Also, it documents where no compile-time checks are done.
Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK fa6d9a2

git range-diff master fadd2e6 fa6d9a2

  • Fixed typo in commit message
  • Changed translate -> translated in lambda, ++correctness

@DrahtBot DrahtBot requested a review from ryanofsky December 16, 2024 14:17
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK fa6d9a2. Just rebase, spelling fix and suggested rename since last review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nuke format linter
7 participants