Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix running tests on Windows and exclude corrupted characters from character-to-language mapping for Latvian #92

Merged
merged 5 commits into from
May 2, 2021

Conversation

janissl
Copy link
Contributor

@janissl janissl commented Apr 30, 2021

On Windows, file.encoding value depends on regional settings of a user and is never set to UTF-8 by default. Therefore, tests that include strings with non-ASCII characters fail on Windows. The solution is to add file.encoding as a parameter for JVM in gradle.properties.

When writing a text file, newline characters depend on OS - Windows uses a carriage return + a line feed (\r\n), however, Linux and Mac OS use just a line feed (\n). Newline characters used in a developer's source code depend on IDE and/or Git settings. Therefore, tests with multiline strings may fail if newline characters are not normalized.

A text corpus or multiple corpora used for building a language model for Latvian were not clean and contained corrupted characters (presumably, due to encoding errors). Weird characters were removed from character-to-language mappings and tests also fixed for words that contain such non-native characters.

@pemistahl pemistahl changed the base branch from master to v1.1.0-wip May 1, 2021 09:33
pemistahl added a commit that referenced this pull request May 2, 2021
@pemistahl pemistahl merged commit 3f77dac into pemistahl:v1.1.0-wip May 2, 2021
@pemistahl
Copy link
Owner

Hello Jānis,

thank you very much for this pull request. I develop on macOS, so I wasn't aware of the Windows problems you have mentioned. I have merged your changes and will release Lingua 1.1.0 soon which will include them. :)

@janissl janissl deleted the master branch May 3, 2021 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants