Derived templates #3090

kallewoof · 2024-11-19T02:37:16Z

This PR adds a simple hash / sub-string based method for picking context and instruct templates based on the chat template, when provided by the back end.

It can be confusing to have to manually switch when you switch to a different model, so having an automated way can be helpful.

The koboldcpp backend variant relies on LostRuins/koboldcpp#1222 (merged, hopefully in 1.79 release).
The llama.cpp backend variant works as is.

Checklist:

I have read the Contributing guidelines.
Add UI/option to enable or disable the auto-toggling.
Optional: add more hash-to-template derivations.

Future work (post-merge):

This could be expanded to warn the user when the template and the derived template are not compatible (as opposed to being all-or-nothing auto-swap like it is now).
Since the chat template is provided as is, it could also be made more sophisticated, e.g. a parser that compares the current preset with the provided one and determines if they are compatible.
Allow users to map hash (model type) to whatever preset they prefer. Ideally, remember the decision a user makes when they pick a model given a chat template, and use that model for that model / chat template in the future. Users may want to use custom presets for a specific model which shares a chat template with others so not sure what the ideal approach would be here.

This PR adds a simple hash based method for picking context and instruct templates based on the chat template, when provided by the back end.

Cohee1207 · 2024-11-19T10:43:41Z

public/scripts/chat-cemplates.js

+// https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest
+async function digestMessage(message) {
+    const msgUint8 = new TextEncoder().encode(message); // encode as (utf-8) Uint8Array
+    const hashBuffer = await window.crypto.subtle.digest('SHA-256', msgUint8); // hash the message


window.crypto.subtle is only available in secure contexts (i.e. localhost or HTTPS).
But you can import any library from npm using Webpack, for example js-sha256

Thanks for pointing that out. I put the hash into the response from the backend to avoid having to bundle anything. It looks reasonable to me, but if you prefer Webpack style I can do that.

Doing it on a backend is fine too. crypto module is always available in Node.

Right, one less dependency. Switched to using that.

kallewoof · 2024-11-19T14:40:34Z

So, it turns out that llama.cpp already supports fetching the chat template, so adding llama.cpp server support was trivial.

There's a tiny gotcha, where despite returning JSON data, the chat_template string from llama.cpp ends with a NUL terminator. Clearly a bug, so once that is fixed, the corresponding code can be discarded.

kallewoof · 2024-11-20T04:30:23Z

The koboldcpp pull request was updated to use the same endpoint as llama.cpp (/props), which means both can be treated the same way, in terms of chat template fetching.

I also added UI to enable/disable the derived template feature. It defaults to off, for now.

kallewoof · 2024-11-20T08:08:43Z

A note on the last commit: there is a bug in llama.cpp which results in the chat template string being null-terminated one character too early. This has two consequences: it means the last character (which happens to be a \n in every case) is trimmed out, and it means a \u0000 is included in the resulting JSON string, in some cases. The latter is handled in feb1b91 already (mentioned in #3090 (comment) above), and the former becomes a compatibility thing. I noted at the top in chat-templates.js that

// the hash can be obtained from command line e.g. via: MODEL=path_to_model; python -c "import json, hashlib, sys; print(hashlib.sha256(json.load(open('"$MODEL"/tokenizer_config.json'))['chat_template'].strip().encode()).hexdigest())"
// note that chat templates must be trimmed to match the llama.cpp metadata value

which was the case (due to this bug), but it is not the case anymore.

Approaches:

Keep the string trimming. Pros: easier to 'get right', avoids issues with llama.cpp/koboldcpp before/after the patch above -- all versions will work fine. Cons: annoying indefinite remnants.
Re-calculate all hashes without trimming, using the correct chat templates. Pros: makes things simpler. Cons: need to juggle broken chat template responses (for awhile), but probably not indefinitely.

kallewoof · 2024-11-21T02:41:05Z

I went with approach 2. It seems like the cleaner approach, even though string trimming is fairly common. The llama.cpp NULL value is now replaced with a newline (which it originally replaced), which aligns it with the koboldcpp endpoint.

I also renamed the Silly Tavern back end endpoint to /props, since that's what it is forwarding right now (not just chat template, at least for llama.cpp version).

Cohee1207 · 2024-11-21T21:51:46Z

How ready is this for a final review/test?

kallewoof · 2024-11-21T23:49:01Z

How ready is this for a final review/test?

It should be good to go!

public/index.html

Cohee1207 · 2024-11-23T15:50:37Z

src/endpoints/backends/text-completions.js

@@ -218,6 +219,9 @@ router.post('/status', jsonParser, async function (request, response) {
            } catch (error) {
                console.error(`Failed to get TabbyAPI model info: ${error}`);
            }
+        } else if (apiType == TEXTGEN_TYPES.KOBOLDCPP || apiType == TEXTGEN_TYPES.LLAMACPP) {


This custom header part was not needed, so I removed it.
The pattern was probably copied from tokenization-supported which only exists because of LM Studio-specific hacks.

Got it. Works for me!

Cohee1207

Tested with the latest pull of llama.cpp. Works, thanks!

Now we need someone to keep the mapping in sync with the latest models in town, lol.

kallewoof · 2024-11-23T15:58:47Z

Tested with the latest pull of llama.cpp. Works, thanks!

Thanks for testing/merging.

Now we need someone to keep the mapping in sync with the latest models in town, lol.

Right. Seems like not a big deal esp since it console.logs the unknown hashes, but if it turns into a huge issue, it may be time to consider a more sophisticated approach. Doubt it though.

kallewoof force-pushed the 202411-auto-templates branch 11 times, most recently from 2da8c5c to bdfb13d Compare November 19, 2024 08:26

feature: derived templates

0e2fdf3

This PR adds a simple hash based method for picking context and instruct templates based on the chat template, when provided by the back end.

kallewoof force-pushed the 202411-auto-templates branch from bdfb13d to 0e2fdf3 Compare November 19, 2024 08:27

Cohee1207 reviewed Nov 19, 2024

View reviewed changes

kallewoof added 4 commits November 19, 2024 20:09

template derivation: move hash part to backend

f25ea9f

switch to crypto lib

c2eaae3

fix error console.log message

cdc0147

template derivation: add support for llama.cpp server backend

feb1b91

kallewoof added 2 commits November 20, 2024 13:11

update endpoint to reflect koboldcpp update

bb062f5

UI: add UI to enable/disable auto-derived templates

50ffaeb

trim chat template before hashing

4214c9d

rename /chat_template to /props, and switch to non-trimmed hashes

c30dde8

added substr derivations with one example

3789381

kallewoof force-pushed the 202411-auto-templates branch from ba99fd1 to 3789381 Compare November 21, 2024 04:14

Cohee1207 added 2 commits November 23, 2024 15:47

Merge branch 'staging' into 202411-auto-templates

876da68

Update documentation links

8ed6439

Cohee1207 reviewed Nov 23, 2024

View reviewed changes

public/index.html Outdated Show resolved Hide resolved

Cohee1207 added 9 commits November 23, 2024 16:01

Change icon for derivation

50922c1

Check for response ok

2e661c3

Add 'node:' prefix to import

049ae54

Exclude 'derived' from instruct templates

eccd1ab

Fix typo in file name

b9cc763

Don't log when no endpoint available. Remove pointless header

362bdf0

Add template hash for gemma-2-2b

c8e6ba4

Validate instruct/context existance

3c2e802

Remove custom header

803b7fc

Cohee1207 reviewed Nov 23, 2024

View reviewed changes

Cohee1207 approved these changes Nov 23, 2024

View reviewed changes

Cohee1207 merged commit ba91845 into SillyTavern:staging Nov 23, 2024

kallewoof deleted the 202411-auto-templates branch November 23, 2024 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Derived templates #3090

Derived templates #3090

kallewoof commented Nov 19, 2024 •

edited

Loading

Cohee1207 Nov 19, 2024

kallewoof Nov 19, 2024

Cohee1207 Nov 19, 2024

kallewoof Nov 19, 2024

kallewoof commented Nov 19, 2024

kallewoof commented Nov 20, 2024

kallewoof commented Nov 20, 2024 •

edited

Loading

kallewoof commented Nov 21, 2024

Cohee1207 commented Nov 21, 2024

kallewoof commented Nov 21, 2024

Cohee1207 Nov 23, 2024

kallewoof Nov 23, 2024

Cohee1207 left a comment

kallewoof commented Nov 23, 2024

Derived templates #3090

Derived templates #3090

Conversation

kallewoof commented Nov 19, 2024 • edited Loading

Checklist:

Future work (post-merge):

Cohee1207 Nov 19, 2024

Choose a reason for hiding this comment

kallewoof Nov 19, 2024

Choose a reason for hiding this comment

Cohee1207 Nov 19, 2024

Choose a reason for hiding this comment

kallewoof Nov 19, 2024

Choose a reason for hiding this comment

kallewoof commented Nov 19, 2024

kallewoof commented Nov 20, 2024

kallewoof commented Nov 20, 2024 • edited Loading

kallewoof commented Nov 21, 2024

Cohee1207 commented Nov 21, 2024

kallewoof commented Nov 21, 2024

Cohee1207 Nov 23, 2024

Choose a reason for hiding this comment

kallewoof Nov 23, 2024

Choose a reason for hiding this comment

Cohee1207 left a comment

Choose a reason for hiding this comment

kallewoof commented Nov 23, 2024

kallewoof commented Nov 19, 2024 •

edited

Loading

kallewoof commented Nov 20, 2024 •

edited

Loading