Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong identification of text with mixed languages #76

Closed
nicolabertoldi opened this issue Oct 8, 2020 · 1 comment
Closed

Wrong identification of text with mixed languages #76

nicolabertoldi opened this issue Oct 8, 2020 · 1 comment
Labels
bug Something isn't working
Milestone

Comments

@nicolabertoldi
Copy link

I noticed that Lingua identifies wrongly a text which includes portion of foreign words

This is an example of a Korean text, which includes the string "CA" which is not Korean (probably this represents the initials of a person)
( 웃음 ) CA : 실패하는군요 . 안타깝네요 .

This text is identified as "Romanian".
This is a bit strange since there are 3 Koreans tokens and only one (probably) not Korean.

Actually, also the punctuation marks should be "Korean".

Any idea why this wrong identification occurs?

@pemistahl
Copy link
Owner

I confirm that this is a bug. Annoying, I thought I would have fixed that. I'm going to deal with this one.

@pemistahl pemistahl added the bug Something isn't working label Oct 9, 2020
@pemistahl pemistahl added this to the Lingua 1.1.0 milestone Oct 9, 2020
@pemistahl pemistahl changed the title wrong identification on text with mixed languages Wrong identification of text with mixed languages Oct 9, 2020
@pemistahl pemistahl modified the milestones: Lingua 1.1.0, Lingua 1.0.3 Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants