You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not very proficient at reading Kotlin code, so i cannot really pinpoint why there is this behaviour:
I am limiting languages to GERMAN,ENGLISH,SPANISH,FRENCH,ITALIAN,DUTCH,PORTUGUESE
I am looking at the confidence values only
detection text "Hätte gerne das" leads to confidence values GERMAN=1.0
detection text "Hätte gerne das Angebot" leads to confidence values GERMAN=1.0,DUTCH=0.7305489529112811,FRENCH=0.6533180937401983,ITALIAN=0.5924134102645501,SPANISH=0.582455145441379,ENGLISH=0.5545393891315643,PORTUGUESE=0.5411208670641964
The detection still returns the correct result, but i am wondering why in the second case the library even calculates confidence values for languages that do not contain the "ä" letter in their alphabets.
Is this a bug?
The text was updated successfully, but these errors were encountered:
dl1ely
changed the title
Strangce behaviour in using unique character information for filtering languages
Strange behaviour in using unique character information for filtering languages
Jan 8, 2021
thanks for using my library and for discovering this strange behavior. Indeed, this is a bug in a calculation step in the rule-based language filter. The bug only occurs for an odd number of words as input. I've just fixed it in the commit referenced above. A nice side effect now is that accuracies go up a little for certain languages.
I am not very proficient at reading Kotlin code, so i cannot really pinpoint why there is this behaviour:
GERMAN,ENGLISH,SPANISH,FRENCH,ITALIAN,DUTCH,PORTUGUESE
GERMAN=1.0
GERMAN=1.0,DUTCH=0.7305489529112811,FRENCH=0.6533180937401983,ITALIAN=0.5924134102645501,SPANISH=0.582455145441379,ENGLISH=0.5545393891315643,PORTUGUESE=0.5411208670641964
The detection still returns the correct result, but i am wondering why in the second case the library even calculates confidence values for languages that do not contain the "ä" letter in their alphabets.
Is this a bug?
The text was updated successfully, but these errors were encountered: