Bug 490837 - Translation Memory reformats source strings incorrectly
Summary: Translation Memory reformats source strings incorrectly
Status: RESOLVED FIXED
Alias: None
Product: lokalize
Classification: Applications
Component: translation memory (show other bugs)
Version: unspecified
Platform: Arch Linux Linux
: NOR normal
Target Milestone: ---
Assignee: Simon Depiets
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-07-26 03:46 UTC by fin-w
Modified: 2024-07-30 22:20 UTC (History)
2 users (show)

See Also:
Latest Commit:
Version Fixed In:
Sentry Crash Report:


Attachments
Lokalize with the mentioned translation open (324.51 KB, image/png)
2024-07-26 03:46 UTC, fin-w
Details

Note You need to log in before you can comment on or make changes to this bug.
Description fin-w 2024-07-26 03:46:04 UTC
Created attachment 171998 [details]
Lokalize with the mentioned translation open

SUMMARY
Compare the source string with the translation memory of the same string. Lokalize has mangled the HTML in strange ways:

Source:
<p>Sometimes, if source text is changed, its translation becomes deprecated 
and is either marked as <emphasis>needing&nbsp;review</emphasis> (i.e. 
looses approval status), or (only in case of XLIFF file) moved to the 
<emphasis>alternate&nbsp;translations</emphasis> section accompanying the 
unit.</p><p>This toolview also shows the difference between current source 
string and the previous source string, so that you can easily see which 
changes should be applied to existing translation to make it reflect current 
source.</p><p>Double-clicking any word in this toolview inserts it into 
translation.</p><p>Drop translation file onto this toolview to use it as a 
source for additional alternate translations.</p>

Equivalent Translation Memory source:
<Sometimes>if, source text is changed its, translation becomes deprecated and is either marked as emphasis <needing>review&nbsp;i</emphasis> (e.looses. approval status or), only (in case of XLIFF file moved) to the emphasis <alternate>translations&nbsp;section</emphasis> accompanying the unit p.</p><This>toolview also shows the difference between current source string and the previous source string so, that you can easily see which changes should be applied to existing translation to make it reflect current source p.</p><Double>clicking-any word in this toolview inserts it into translation p.</p><Drop>translation file onto this toolview to use it as a source for additional alternate translations p.</

STEPS TO REPRODUCE
1. Open a file with HTML tags that has entries that are added to the Translation Memory

OBSERVED RESULT
Translation memory entries for that file will have jumbled text / tags

ADDITIONAL INFORMATION
Lokalize version git main
Comment 1 Albert Astals Cid 2024-07-30 22:19:35 UTC
Git commit 10d5bfcf89cf018429f69a45b8cbe168fa1ff678 by Albert Astals Cid, on behalf of Volker Krause.
Committed on 30/07/2024 at 22:19.
Pushed by aacid into branch 'master'.

Fix porting regression in diff computation

This relied on a non-standard matching behavior of 0x08 (backspace) in
QRegExp apparently, which PCRE/QRegularExpression doesn't have. So
manually replicate that by adding that to the pattern.

Add a unit test based on the original bug report, which now produces the
same result before and after porting.

Also needs backporting to 24.08.

cc @huftis

M  +3    -0    autotests/CMakeLists.txt
A  +27   -0    autotests/difftest.cpp     [License: LGPL(v2.1+)]
M  +1    -1    src/common/diff.cpp

https://invent.kde.org/sdk/lokalize/-/commit/10d5bfcf89cf018429f69a45b8cbe168fa1ff678
Comment 2 Albert Astals Cid 2024-07-30 22:20:05 UTC
Git commit 3015496e548d7eb36a2eece6bbff2bb11c180c74 by Albert Astals Cid, on behalf of Volker Krause.
Committed on 30/07/2024 at 22:19.
Pushed by aacid into branch 'release/24.08'.

Fix porting regression in diff computation

This relied on a non-standard matching behavior of 0x08 (backspace) in
QRegExp apparently, which PCRE/QRegularExpression doesn't have. So
manually replicate that by adding that to the pattern.

Add a unit test based on the original bug report, which now produces the
same result before and after porting.

Also needs backporting to 24.08.

cc @huftis
(cherry picked from commit 10d5bfcf89cf018429f69a45b8cbe168fa1ff678)

M  +3    -0    autotests/CMakeLists.txt
A  +27   -0    autotests/difftest.cpp     [License: LGPL(v2.1+)]
M  +1    -1    src/common/diff.cpp

https://invent.kde.org/sdk/lokalize/-/commit/3015496e548d7eb36a2eece6bbff2bb11c180c74