Skip to content

Conversation

@danielrainer
Copy link

@danielrainer danielrainer commented Oct 10, 2025

First part of introducing fluent localization. Refer to the commit messages for details.

Stacked on #12190, #12208

TODOs:

  • Replace gettext calls by new API, at least for a few instances so we can see it work in action.
  • Update docs for translators.
  • Implement ToFluentArg trait to reduce the need for type conversion when using the localize! macro.
  • Integrate FTL tooling. Useful for static checks, developers, and translators. Requires feat: Make fluent_syntax::parser::core::Parser pub projectfluent/fluent-rs#394. See Fluent FTL tools #12123
  • naming convention for message IDs
  • coordination for porting messages. If people modify PO files while we port messages to Fluent, we would have to redo some porting effort manually. To avoid this, we should either coordinate with translators to ensure no major changes happen to PO files while we port messages to Fluent, or we should add tooling which records translations and is then able to apply them to PO files to automatically port the new changes to Fluent. In total, we have a bit over 500 messages in the Rust sources, so porting them all is not trivial, but given decent tooling it should not take that long either.
  • Investigate asan failures on the latest CI run.
  • Are the macros wrapping localize! (e.g. for printing to stdout/stderr) actually useful?
  • (optional) Move code needed for localization into a separate crate, to allow code outside of fish's main crate to localize messages.

@danielrainer
Copy link
Author

The new commits run cargo run --package fish-fluent-check in check.sh and CI. I added it as separate jobs because running cargo in a test started from the test driver is not ideal, both because cargo can use multiple threads, which might result in more timeouts in CI, and because cargo prints to stderr, which is not ideal for checks, especially since the checks fail by panicking, which also prints to stderr. We could work around the latter issue by redirecting the output of the cargo command, checking cargo's exit status, print the redirected output on error, and delete the file we redirected to afterwards. (The last step might be handled by the test driver if we put it into the test's temporary HOME). Then, we should also exit with cargo's exit status, although the test driver doesn't care about that at the moment.

@krobelus
Copy link
Contributor

krobelus commented Oct 16, 2025 via email

@danielrainer
Copy link
Author

any reason why we can't use "cargo test" ?

That's a good idea. I added a test which just runs main in crates/fluent-check/src/main.rs. Since main indicates errors via panics, no additional logic is needed. For check.sh, we can still pass in an env var indicating where to look for extracted Fluent IDs, so the test does not need to recompile fish. In CI, no such mechanism exists for now, so there the test will recompile fish, but that also happened in the previous implementation, the difference being that before it only happened once in a dedicated job, whereas now it happens in every job which runs the tests.

I wonder if it's realistic to (long-term) move away from test_driver.py completely and do everything with cargo test

I think replacing test_driver.py with Rust code should be doable without too much effort, at least if it continues to be a dedicated program, instead of separate cargo tests for every script file. The harder part would be replacing littlecheck and pexpect. (I haven't looked into the latter at all.) From memory, I think the main challenges with having a cargo test per test script would be:

  • sharing compilation of the test helper
  • setting up parametrization which automatically creates a test case for every relevant script file

we could also write an xtask as a common interface for running tests

I'm generally in favor of replacing (or at least wrapping) the various shell scripts we have with xtasks, to have a unified interface for running everything. Regarding the interface, I'm not sure if it's better to have several different cargo aliases (e.g. cargo test-fish), or use cargo xtask for everything and add subcommands to that as desired. I chose the latter approach for ensuring that the fish version env var is set correctly for every cargo invocation, but I think it makes sense in general. E.g., that would make it easy to have cargo xtask help, which would be difficult to implement with multiple cargo aliases. It also reduces the likelihood of choosing an alias which might clash with a future built-in cargo command.

@danielrainer danielrainer force-pushed the fluent_localization branch 2 times, most recently from 11d1099 to 2e9b78a Compare October 16, 2025 21:34
@danielrainer
Copy link
Author

I changed unic-langid to 0.9.5, since 0.9.6 requires Rust 1.82+. While it might not make much of a difference for the concrete features in this specific instance, not updating our MSRV makes it increasingly painful to manage dependencies. It limits our ability to update existing dependencies and we miss out on many improvements made in more recent Rust versions. I'd really appreciate it if we don't wait until we have a concrete, urgent need to update some dependency which would then force us into a rushed MSRV update. Instead, we should finally come up with a sensible policy of updating our MSRV that's not just "we'll stick with 1.70 indefinitely". See #11679

@krobelus
Copy link
Contributor

krobelus commented Oct 16, 2025 via email

@danielrainer
Copy link
Author

yeah let's update to Debian Stable's 1.85 (and see if someone complains).

Sounds good. That version also allows us to migrate to the 2024 Rust edition if we want that. I just checked some of the other distros, and it seems that Fedora keeps up with stable on all releases and Ubuntu has Rust 1.85 as the default in 25.10. Should I open a PR?

@krobelus
Copy link
Contributor

krobelus commented Oct 16, 2025 via email

@danielrainer
Copy link
Author

both upgrading to 1.85 and 2024 edition sounds good

#11961

Changing the edition is not that straightforward and might require a fairly large commit, including manual updates, so for now I'll just address the things which have become available in Rust 2021 with the MSRV update.

@faho
Copy link
Member

faho commented Oct 20, 2025

Parts of fluent are licensed apache2-only, which is incompatible with fish's GPLv2, so this is, as far as I can tell, legally unmergeable as-is.

cargo deny check licenses would catch that.

@danielrainer
Copy link
Author

The code we use is from https://github.com/projectfluent/fluent-rs, which includes both an apache2 and a MIT license file. I'm no expert on the legal situation here, but it seems to me that using the software under the terms of the MIT License is allowed and that this license does not require adding copyright information to our binaries. AFAICT, we don't include copyright info for any of our dependencies, only for software where the fish repo itself contains code derived from that software.

@danielrainer
Copy link
Author

Relevant issue: projectfluent/fluent-rs#31

@faho
Copy link
Member

faho commented Oct 20, 2025

It's not about including copyright information, it's that some of the dependencies for fluent-rs are still apache2-only.

Apache2 is incompatible with GPLv2 because IIRC it includes a patent grant and the GPLv2 has a "no further restrictions" clause. So the combined product has a license that can't be followed.

@danielrainer
Copy link
Author

If that's the case, why can fluent-rs be MIT-licensed, but we can't use it under that license?

@faho
Copy link
Member

faho commented Oct 20, 2025

Because the MIT license and the Apache license don't conflict (neither of them is "viral" the way the GPL is). The GPLv2 and the Apache license do, though.

We can be using the fluent-rs crate under the MIT, but we can't be using some of its dependencies.

Edit: The offending dependencies are:

@danielrainer
Copy link
Author

danielrainer commented Oct 20, 2025

So MIT-licensed projects can use fluent-rs and its dependencies, including the apache-licensed ones, but we can't because fish is GPLv2 licensed?

Should we ask the two apache-licensed projects about making their projects available under a license that allows us to use their software in fish? fluent-langneg at least seems to be mostly written by people who also contribute to fluent-rs, so I would be surprised if they would object to dual-licensing. self_cell seems to be a fairly small project, both in terms of contributors and code size. If they are unwilling to use a compatible license, we could ask fluent-rs whether they would consider replacing the dependency.

krobelus added a commit to krobelus/fluent-langneg-rs that referenced this pull request Oct 21, 2025
Most of fluent-rs is already dual-licensed.  This crate is not,
which can make it harder for GPL-2-only projects to use fluent-rs.

Fix that by allowing use under MIT lincense. All (or almost
all?) nontrivial contributions seem to be from the same author,
so this should be easy?

Ref: projectfluent/fluent-rs#34
Ref: fish-shell/fish-shell#11928
krobelus added a commit to krobelus/fluent-langneg-rs that referenced this pull request Oct 21, 2025
Most of fluent-rs is already dual-licensed.  This crate is not,
which can make it harder for GPL-2-only projects to use fluent-rs.

Fix that by allowing use under MIT lincense. All (or almost
all?) nontrivial contributions seem to be from the same author,
so this should be easy?

Ref: projectfluent/fluent-rs#34
Ref: fish-shell/fish-shell#11928

Closes projectfluent#30
krobelus added a commit to krobelus/fluent-langneg-rs that referenced this pull request Oct 22, 2025
Most of fluent-rs is already dual-licensed.  This crate is not,
which can make it harder for GPL-2-only projects to use fluent-rs.

Fix that by allowing use under MIT lincense. All (or almost
all?) nontrivial contributions seem to be from the same author,
so this should be easy?

Ref: projectfluent/fluent-rs#34
Ref: fish-shell/fish-shell#11928

Closes projectfluent#30
@emilio
Copy link

emilio commented Oct 22, 2025

Isn't self_cell Apache OR GPLv2? If so wouldn't it be compatible?

@danielrainer
Copy link
Author

Rebased on latest master and updated to remove the dependency on #12008. I haven't gotten around to addressing all the new review comments yet, so there aren't many interesting updates from the last push.

danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Nov 20, 2025
For paths embedded via `rust-embed`, we only need to rebuild on path
changes if the files are actually embedded. For debug builds, data is
read from the file system instead of being embedded, so we don't need to
rebuild then. This is apparently broken on Cygwin, so we always rebuild
there. See
012b507 (Workaround for embed-data debug builds on Cygwin, 2025-09-16)
for details.

To avoid having to remember and duplicate this logic for all embedded
paths, extract it into the build helper.

fish-shell#11928 (comment)
danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Nov 20, 2025
For paths embedded via `rust-embed`, we only need to rebuild on path
changes if the files are actually embedded. For debug builds, data is
read from the file system instead of being embedded, so we don't need to
rebuild then. This is apparently broken on Cygwin, so we always rebuild
there. See
012b507 (Workaround for embed-data debug builds on Cygwin, 2025-09-16)
for details.

To avoid having to remember and duplicate this logic for all embedded
paths, extract it into the build helper.

fish-shell#11928 (comment)
danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Dec 1, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.
danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Dec 2, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.
danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Dec 7, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.
danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Dec 7, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.
danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Dec 8, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.
danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Dec 8, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.
krobelus pushed a commit to krobelus/fish-shell that referenced this pull request Dec 10, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.

Closes fish-shell#12125
krobelus pushed a commit to krobelus/fish-shell that referenced this pull request Dec 10, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
fish-shell#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.

Closes fish-shell#12125
krobelus pushed a commit that referenced this pull request Dec 10, 2025
Multiple gettext-extraction proc macro instances can run at the same
time due to Rust's compilation model. In the previous implementation,
where every instance appended to the same file, this has resulted in
corruption of the file. This was reported and discussed in
#11928 (comment)
for the equivalent macro for Fluent message ID extraction. The
underlying problem is the same.

The best way we have found to avoid such race condition is to write each
entry to a new file, and concatenate them together before using them.
It's not a beautiful approach, but it should be fairly robust and
portable.

Closes #12125
@danielrainer
Copy link
Author

Rebased on latest master and reworked a bit. See PR description for outstanding work.

Asan does not seem to like the intentional leaks for creating &'static strs. A similar approach is already in use for gettext, and I remember getting leak warnings previously when refactoring the gettext code as well. Not sure how to best address these failures.

@krobelus
Copy link
Contributor

Maybe the first step is to resurrect docker/jammy-asan.Dockerfile (ideally making it work with moving ubuntu versions). Assuming this is more convenient than github actions.
If we're reasonably sure it's an asan bug, we can maybe suppress allocations from the relevant functions via build_tools/lsan_suppressions.txt

Daniel Rainer and others added 10 commits January 2, 2026 01:12
Extract the language selection code from the gettext crate, and to a
lesser extent from `src/localization/mod.rs` and put it into
`src/localization/settings.rs`. No functional changes are intended.

Aside from better separation of concerns, this refactoring makes it
feasible to reuse the language selection logic for Fluent later on.

Part of fish-shell#12190
Put the gettext-specific code into `localization/gettext`.

Part of fish-shell#12190
This replaces `initialize_gettext`. It is only defined when the
`localize-messages` feature is enabled, to avoid giving the impression
that it does anything useful when the feature is disabled.

With this change, Fluent will be initialized as well once it is added,
without requiring any additional code for initialization.

Closes fish-shell#12190
The extracted function takes the parts which are used by
gettext-extract, as well as the upcoming fluent-extract, and puts it
into its own crate. This will allow having simpler proc macros for both
localization systems, since it minimizes duplicated code.
Add an implementation allowing to use Fluent for localization in Rust.

The goal of this code is to eventually replace gettext, at least for
localization in Rust, but possibly also in fish scripts.

To make use of the added code, the new `localize!` macro should be used.
It takes a `FluentID` as its first argument. This ID is
used in FTL files to identify messages. In Rust, it can be constructed
using the new `fluent_id!` macro, which takes a string literal, or by
directly passing a string literal to `localize!`.

If arguments should be passed to Fluent, this can be done via additional
arguments to `localize!`. Either, by passing `&FluentArgs` (a type from
the `fluent` crate), or by specifying the arguments as comma-separated
key-value pairs, where the key needs to be a string literal, and the
value needs to be a valid argument to `FluentArgs.set` (also from the
`fluent` crate). The following example demonstrates the syntax:
`localize!("some-id", string_arg = "a string", number = 42)`
The result will be a `String`, formatted according to the rules in the
relevant FTL file. On errors, this macro panics.

For setting the language precedence, the mechanism used for gettext is
employed. One notable difference is that, unlike with gettext, with
Fluent we don't have access to default message values, so instead we
unconditionally fall back to the `en` catalog, which must be complete.
Completeness of this catalog is checked statically using newly
introduced tests. These tests need to extract message IDs from the Rust
sources, which is done by compiling with the `fluent-extract` feature
enabled. To avoid recompilation, `build_tools/check.sh` caches the
extracted IDs. In CI, no caching mechanism exists, so there the test
will invoke cargo.

Translations are specified in the `localization/fluent` directory. For
each language, there is a file in Fluent's FTL format. The filename
indicates the language, and has the suffix `.ftl`, e.g. `en.ftl`.
At the moment, we only have the `en.ftl`, since no translations have
been added yet.
`rust-embed` is used to make these files available to the binary at
runtime. Files will only be parsed if they are specified in the
precedence list. (`en.ftl` is always parsed.)

The `fish-fluent-check` crate contains a binary which can be used to
check our FTL files. It uses the `fish-fluent-extract` crate, to extract
Fluent IDs from the Rust sources.
It panics if any of these properties are violated:
- `en.ftl` does not contain every ID contained in the sources.
- Any FTL file contains an ID not contained in the sources.
- Any FTL file is not sorted by ID.
These checks still need to be integrated into `check.sh` and CI.
These macros first localize a string, and then output it in different
ways depending on the macro. Each macro comes with a version which adds
a newline at the end.
It is added separately using a message which also exists in a regular
test. The reason is that our message ID extraction does not work for doc
tests, so our tooling would complain about an unused ID in the FTL file
if we added it just for the doc test. By introducing it for a regular
test and reusing the ID in the doc test, we avoid this problem.
This migrates the fish version info message from gettext to Fluent. It
can be used to see Fluent-based localization in action.

Because this commit adds new FTL files, these languages show up in the
Fluent language precedence, requiring an update to the corresponding
tests.
Reword zh_CN as suggested in
fish-shell#11833 (comment)

	fish -c 'for LC_MESSAGES in fr_FR zh_CN zh_TW
	    argparse h-
	end'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants