Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derived templates #3090

Merged
merged 21 commits into from
Nov 23, 2024
Merged

Conversation

kallewoof
Copy link
Contributor

@kallewoof kallewoof commented Nov 19, 2024

This PR adds a simple hash / sub-string based method for picking context and instruct templates based on the chat template, when provided by the back end.

It can be confusing to have to manually switch when you switch to a different model, so having an automated way can be helpful.

The koboldcpp backend variant relies on LostRuins/koboldcpp#1222 (merged, hopefully in 1.79 release).
The llama.cpp backend variant works as is.

Checklist:

  • I have read the Contributing guidelines.
  • Add UI/option to enable or disable the auto-toggling.
  • Optional: add more hash-to-template derivations.

Future work (post-merge):

  • This could be expanded to warn the user when the template and the derived template are not compatible (as opposed to being all-or-nothing auto-swap like it is now).
  • Since the chat template is provided as is, it could also be made more sophisticated, e.g. a parser that compares the current preset with the provided one and determines if they are compatible.
  • Allow users to map hash (model type) to whatever preset they prefer. Ideally, remember the decision a user makes when they pick a model given a chat template, and use that model for that model / chat template in the future. Users may want to use custom presets for a specific model which shares a chat template with others so not sure what the ideal approach would be here.

@kallewoof kallewoof force-pushed the 202411-auto-templates branch 11 times, most recently from 2da8c5c to bdfb13d Compare November 19, 2024 08:26
This PR adds a simple hash based method for picking context and instruct templates based on the chat template, when provided by the back end.
@kallewoof kallewoof force-pushed the 202411-auto-templates branch from bdfb13d to 0e2fdf3 Compare November 19, 2024 08:27
// https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest
async function digestMessage(message) {
const msgUint8 = new TextEncoder().encode(message); // encode as (utf-8) Uint8Array
const hashBuffer = await window.crypto.subtle.digest('SHA-256', msgUint8); // hash the message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

window.crypto.subtle is only available in secure contexts (i.e. localhost or HTTPS).
But you can import any library from npm using Webpack, for example js-sha256

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out. I put the hash into the response from the backend to avoid having to bundle anything. It looks reasonable to me, but if you prefer Webpack style I can do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it on a backend is fine too. crypto module is always available in Node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, one less dependency. Switched to using that.

@kallewoof
Copy link
Contributor Author

So, it turns out that llama.cpp already supports fetching the chat template, so adding llama.cpp server support was trivial.

There's a tiny gotcha, where despite returning JSON data, the chat_template string from llama.cpp ends with a NUL terminator. Clearly a bug, so once that is fixed, the corresponding code can be discarded.

@kallewoof
Copy link
Contributor Author

The koboldcpp pull request was updated to use the same endpoint as llama.cpp (/props), which means both can be treated the same way, in terms of chat template fetching.

I also added UI to enable/disable the derived template feature. It defaults to off, for now.

@kallewoof
Copy link
Contributor Author

kallewoof commented Nov 20, 2024

A note on the last commit: there is a bug in llama.cpp which results in the chat template string being null-terminated one character too early. This has two consequences: it means the last character (which happens to be a \n in every case) is trimmed out, and it means a \u0000 is included in the resulting JSON string, in some cases. The latter is handled in feb1b91 already (mentioned in #3090 (comment) above), and the former becomes a compatibility thing. I noted at the top in chat-templates.js that

// the hash can be obtained from command line e.g. via: MODEL=path_to_model; python -c "import json, hashlib, sys; print(hashlib.sha256(json.load(open('"$MODEL"/tokenizer_config.json'))['chat_template'].strip().encode()).hexdigest())"
// note that chat templates must be trimmed to match the llama.cpp metadata value

which was the case (due to this bug), but it is not the case anymore.

Approaches:

  1. Keep the string trimming. Pros: easier to 'get right', avoids issues with llama.cpp/koboldcpp before/after the patch above -- all versions will work fine. Cons: annoying indefinite remnants.
  2. Re-calculate all hashes without trimming, using the correct chat templates. Pros: makes things simpler. Cons: need to juggle broken chat template responses (for awhile), but probably not indefinitely.

@kallewoof
Copy link
Contributor Author

I went with approach 2. It seems like the cleaner approach, even though string trimming is fairly common. The llama.cpp NULL value is now replaced with a newline (which it originally replaced), which aligns it with the koboldcpp endpoint.

I also renamed the Silly Tavern back end endpoint to /props, since that's what it is forwarding right now (not just chat template, at least for llama.cpp version).

@kallewoof kallewoof force-pushed the 202411-auto-templates branch from ba99fd1 to 3789381 Compare November 21, 2024 04:14
@Cohee1207
Copy link
Member

How ready is this for a final review/test?

@kallewoof
Copy link
Contributor Author

How ready is this for a final review/test?

It should be good to go!

public/index.html Outdated Show resolved Hide resolved
@@ -218,6 +219,9 @@ router.post('/status', jsonParser, async function (request, response) {
} catch (error) {
console.error(`Failed to get TabbyAPI model info: ${error}`);
}
} else if (apiType == TEXTGEN_TYPES.KOBOLDCPP || apiType == TEXTGEN_TYPES.LLAMACPP) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This custom header part was not needed, so I removed it.
The pattern was probably copied from tokenization-supported which only exists because of LM Studio-specific hacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Works for me!

Copy link
Member

@Cohee1207 Cohee1207 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with the latest pull of llama.cpp. Works, thanks!

Now we need someone to keep the mapping in sync with the latest models in town, lol.

@Cohee1207 Cohee1207 merged commit ba91845 into SillyTavern:staging Nov 23, 2024
@kallewoof kallewoof deleted the 202411-auto-templates branch November 23, 2024 15:57
@kallewoof
Copy link
Contributor Author

Tested with the latest pull of llama.cpp. Works, thanks!

Thanks for testing/merging.

Now we need someone to keep the mapping in sync with the latest models in town, lol.

Right. Seems like not a big deal esp since it console.logs the unknown hashes, but if it turns into a huge issue, it may be time to consider a more sophisticated approach. Doubt it though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants