Open
Description
Environment
Debian Linux
-
Tesseract Version: tesseract 4.00.00alpha
-
Platform: Linux 4.15.0 SMP PREEMPT 2018 x86_64 GNU/Linux
Current Behavior:
using the ron option (Romanian):
romanian diacritics șȘțȚ are mapped into the wrong Unicode codes, namely:
Ș -> Ş=U+015E
ș -> ş=U+015F
Ț -> Ţ=U+0162
ț -> ţ=U+0163
Expected Behavior:
Ș -> Ș=U+0218
ș -> ș=U+0219
Ț -> Ț=U+021A
ț -> ț=U+021B
Suggested Fix:
edit the map accordingly;
Metadata
Metadata
Assignees
Labels
No labels