Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#4968] Fix multiline translations - Take 2 #5339

Merged
merged 9 commits into from
Apr 15, 2020

Conversation

amercader
Copy link
Member

Fixes #4968

This builds on #5338 but it goes beyond with what I think is a better approach.

For reference, this is a string extracted from a Jinja2 multiline block (a msgid) in the pot file in current master:

#: ckan/templates/group/snippets/helper.html:8
msgid ""
" You can use CKAN Groups to create and manage collections of datasets. This "
"could be to catalogue datasets for a particular project or team, or on a "
"particular theme, or as a very simple way to help people find and search your"
" own published datasets. "
msgstr ""

As @smotornyuk correctly pointed in #5338, using the standard Jinja2 Babel extractor save us some custom code that we were using in our extractor. But as it is, #5338 extracts messages as:

#: ckan/templates/group/snippets/helper.html:8
msgid ""
"\n"
"        You can use CKAN Groups to create and manage collections of datasets."
"\n"
"        This could be to catalogue datasets for a particular project or team,"
"\n"
"        or on a particular theme, or as a very simple way to help people find"
"\n"
"        and search your own published datasets.\n"
"      "
msgstr ""

If we use the above msgid in the po files, the string gets correctly translated. But by essentially keeping all existing whitespace including line breaks in the msgid, it makes them much more fragile as any whitespace change in the templates will result in a new msgid, and the existing translation will be lost.

I propose we use the ext.i18n.trimmed policy both when extracting messages (da39d7e) and when parsing translated blocks (4cf6e0a) to avoid any whitespace differences.

This is how the string gets extracted with this policy:

#: ckan/templates/group/snippets/helper.html:8
msgid ""
"You can use CKAN Groups to create and manage collections of datasets. This "
"could be to catalogue datasets for a particular project or team, or on a "
"particular theme, or as a very simple way to help people find and search your"
" own published datasets."
msgstr ""

Now, this gives us more robust msgids going forward, but as @smotornyuk correctly pointed out in #4968 if we update them now in the PO files, the current existing translations (msgstr) will be lost. To avoid this I wrote a small command that updates the msgids in the PO files with the corresponding new msgid in the POT file. You can see in the PO files in the PR that only the multiline strings were updated.

If the suggested approach makes sense to all, I'd like to backport this to 2.8. Here's the intended process:

  • Backport 036972c, da39d7e and 4cf6e0a
  • Adapt f69d086 to paster CLI
  • Pull the latest translations from Transifex to get the latest PO files
  • Extract new messages (update POT file)
  • Run sync command (update PO files msgids)
  • Commit and push to Transifex

@smotornyuk @wardi Does this make sense?

@amercader amercader changed the title 4968 fix multiline translations [#4968] Fix multiline translations - Take 2 Apr 10, 2020
@wardi
Copy link
Contributor

wardi commented Apr 10, 2020

backporting might break theme plugin translation overrides that expect the current whitespace, but maybe it's fine to just mention this in the release notes?

@smotornyuk
Copy link
Member

+1 for backporting. Eventually, every portal will be upgraded and the maintainer will have no choice but to do something with custom overrides. So adding clear notice about translation update would be ok - there are not so many multiline strings, so, I hope, it won't take much effort to do and update inside your extension while applying patch releases.

@amercader
Copy link
Member Author

There are 12 strings that were broken because of this, but some of them are quite prominent like the Groups and Orgs helper text (which are the ones that people will have customized if they wanted to rename them).

It is true that we can break some existing translation but I lean towards backporting and documenting because:

  1. What we are fixing is more severe (users can't translate these core strings right now without messing with their own po files)
  2. Nothing will break immediately after update. It would only happen after users have extracted the strings again and compiled the po files

I'll add the notice in the changelog as part of this PR so you can check if it is informative enough

@amercader
Copy link
Member Author

@smotornyuk @wardi if any of you has some time can you review and merge if happy? Thanks!

@smotornyuk
Copy link
Member

✔️ reviewed. Need to fix conflicts and then it's ready to merge

@amercader amercader merged commit 342de61 into master Apr 15, 2020
@amercader
Copy link
Member Author

Thanks @smotornyuk!

@amercader amercader deleted the 4968-fix-multiline-translations branch April 15, 2020 20:21
@Chealer
Copy link

Chealer commented Apr 17, 2020

This was not backported in CKAN 2.8.4.

@amercader
Copy link
Member Author

I decided not to backport this to 2.8 at this stage. The Jinja2 version that 2.8 uses (jinja2==2.8) does not have support for policies, which were used to define the trimmed behaviour in da39d7e and 4cf6e0a. Perhaps there is a lower-level way of achieving the same effect but it feels risky to try to hack something around this in a patch release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

i18n: Multiline translation strings not translated
4 participants