Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Auto-generated titles don't match app language #1531

Closed
crypdick opened this issue Sep 13, 2024 · 12 comments
Closed

[BUG] Auto-generated titles don't match app language #1531

crypdick opened this issue Sep 13, 2024 · 12 comments

Comments

@crypdick
Copy link

Bug Description
Sometimes, auto-generated titles for English conversations are in Chinese.

Expected Results
According to the nameConversation prompt, the title should always be in English.

Screenshots
image

image
image
image

Desktop:

  • Operating System: Ubuntu 23.10
  • Application Version: 1.4.1

Additional Context
This issue makes me suspect that my conversations in the closed source app are not truly private and are being sent to a custom model.

@creesch
Copy link

creesch commented Sep 18, 2024

I just noticed that the releases listed on github stop at 1.3.10, if you download from the website you are served 1.4.2. In the readme it does mention this:

This is the repository for the Chatbox Community Edition, open-sourced under the GPLv3 license. For most users, I recommend using the Chatbox Official Edition (closed-source).

But it does not clearly state that the download buttons below actually point to this closed source version. Which I think is a bit of a dark pattern and not cool at all.

I tried decompiling it and looking at the source code, but because Terser is used during packaging the code is obfuscated making it really difficult to see if anything shady is going on in this closed source version.

@creesch
Copy link

creesch commented Sep 18, 2024

So I found this previous discussion which gives a bit of context: #803
Having read the discussion I find it less likely some malice is involved, although I can't of course rule it out entirely.

I still think that the below section of the readme should be clarified:

image

At the very least, it should say "Closed source download for ".

@creesch
Copy link

creesch commented Sep 18, 2024

Alright, one last reply. I had a look with tcpview open and when you do open up chatbox there is some traffic visible. This is to be expected given the update check and all that.

The traffic goes to 170.106.175.29 which turns out to simply be chatboxai.app.

When I click “new chat” I see activity to that address as well. Oddly enough that seems to be the only UI element causing traffic, I suspect some sort of analytics is going on here. Mind you, at this point I have only clicked the button, not typed in any prompt.

When I actually type something in the chat and send it off towards the LLM, I do not see any activity towards chatboxai.app. The only other traffic I see is towards the LLM provider I use, which is what I would expect.

So it looks like no data about your chats is being sent while chatting. The traffic I see on application startup is also not enough to indicate that previous chats are being sent somewhere. The traffic when clicking new chat is still a bit odd to me.

Overall, it looks like your data is safe. The behavior with the generated titles might simply because of a bug in the closed source version.

@crypdick
Copy link
Author

Thank you for the detective work @creesch !

@Bin-Huang
Copy link
Owner

Bin-Huang commented Oct 7, 2024

Don't worry, your data is safe—Chatbox really values your privacy. As for why the closed-source edition's code is obfuscated, it's because I need to protect it. Honestly, with Electron, there's almost no way to safeguard the source code besides code obfuscation. Thanks to @creesch for the review and confirmation!

Getting back to the original issue with title generation, I don't think that's going to happen. Which model are you using? Does your system prompt or context include any Chinese text? I'm really curious about this issue. If you could provide more details, that'd be great! @crypdick

@crypdick
Copy link
Author

crypdick commented Oct 8, 2024

@Bin-Huang My system prompts and context are always written in English. I use a mix of OpenAI and Anthropic endpoints and I have seen this issue across both model providers.

@Bin-Huang Bin-Huang changed the title [BUG] Autogenerated titles are sometimes in Chinese. Are my conversations actually private? [BUG] Auto-generated titles don't match app language Oct 8, 2024
@Bin-Huang
Copy link
Owner

@crypdick Thanks for the extra detail. Are the endpoints you mentioned official APIs from OpenAI and Anthropic? Also, which version of the Chatbox app are you using, and on what OS?

@crypdick
Copy link
Author

That's right, nothing custom, official endpoints only.

Operating System: Ubuntu 23.10
Application Version: 1.4.1

@Bin-Huang
Copy link
Owner

This is indeed a very interesting bug, thanks for bringing it to my attention. I think I've found the root cause: after multiple tests, I've discovered that the title generation prompt Chatbox ultimately sends to the model doesn't have any issues, meaning it doesn't contain any hints to generate Chinese titles. However, I've noticed that the model (gpt-4o) itself has a tendency to generate Chinese titles. In my case, I tried to have gpt-4o generate a title for a purely English conversation, and gpt-4o suddenly produced a Chinese title. After detailed testing, I found that gpt-4o has a certain probability (about less than 10%) of this occurring. It's pretty clear this is a case of the model hallucinating.

For anyone interested in this issue, you can reproduce my findings with the following code:

import openai

openai.api_key = 'your-api-key'

content = "Name the conversation based on the chat records.\nPlease provide a concise name, within 10 characters and without quotation marks.\nPlease use the speak language in the conversation.\nYou only need to answer with the name.\nThe following is the conversation:\n\n```\nis there any npm packages that can help me make a auto-resized textarea\n\n---------\n\n\n```\n\nPlease provide a concise name, within 10 characters and without quotation marks.\nPlease use the speak language in the conversation.\nYou only need to answer with the name.\nThe conversation is named:"

for i in range(40):
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": content}
        ]
    )
    response_content = response.choices[0].message.content or ''
    if any(ord(char) > 127 for char in response_content):
        print(response_content)

To fix this issue, I've tweaked the prompt for auto-generating titles, making sure it uses the language set in the app. This fix will be rolled out with the next update.

Thanks again for bringing this bug to my attention! It's hands down the most interesting bug I've fixed lately.

@crypdick
Copy link
Author

This is an interesting bug. I think that this is caused by how the prompt is phrased. For example, the sentence "please use the speak language in the conversation" is not how a native speaker would write it; a more natural phrasing might be "please use the same language used in the conversation." This phrasing is a subtle signal to the model that the prompter is Chinese, which is why the summary sometimes includes Chinese characters, even though the prompt specifies to use the conversation's language.

@Bin-Huang
Copy link
Owner

Thanks for your insights! I think you're right. This prompt was probably shared by someone else online, and I didn't really look at its tone or style. I've now tried writing a new prompt myself, which will fix those issues.

Based on the chat history, give this conversation a name.
Keep it short - 10 characters max, no quotes.
Use ${language}.
Just provide the name, nothing else.

Here's the conversation:
{history}

Name this conversation in 10 characters or less.
Use ${language}.
Only give the name, nothing else.

The name is:

@crypdick Could you take a look and let me know what you think?

@crypdick
Copy link
Author

@Bin-Huang much better, although it is redundant. I would delete everything after Here's the conversation: {history}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants