-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Derived templates #3090
Derived templates #3090
Conversation
2da8c5c
to
bdfb13d
Compare
This PR adds a simple hash based method for picking context and instruct templates based on the chat template, when provided by the back end.
bdfb13d
to
0e2fdf3
Compare
public/scripts/chat-cemplates.js
Outdated
// https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest | ||
async function digestMessage(message) { | ||
const msgUint8 = new TextEncoder().encode(message); // encode as (utf-8) Uint8Array | ||
const hashBuffer = await window.crypto.subtle.digest('SHA-256', msgUint8); // hash the message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
window.crypto.subtle is only available in secure contexts (i.e. localhost or HTTPS).
But you can import any library from npm using Webpack, for example js-sha256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out. I put the hash into the response from the backend to avoid having to bundle anything. It looks reasonable to me, but if you prefer Webpack style I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it on a backend is fine too. crypto module is always available in Node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, one less dependency. Switched to using that.
So, it turns out that llama.cpp already supports fetching the chat template, so adding llama.cpp server support was trivial. There's a tiny gotcha, where despite returning JSON data, the chat_template string from llama.cpp ends with a NUL terminator. Clearly a bug, so once that is fixed, the corresponding code can be discarded. |
The koboldcpp pull request was updated to use the same endpoint as llama.cpp (/props), which means both can be treated the same way, in terms of chat template fetching. I also added UI to enable/disable the derived template feature. It defaults to off, for now. |
A note on the last commit: there is a bug in llama.cpp which results in the chat template string being null-terminated one character too early. This has two consequences: it means the last character (which happens to be a // the hash can be obtained from command line e.g. via: MODEL=path_to_model; python -c "import json, hashlib, sys; print(hashlib.sha256(json.load(open('"$MODEL"/tokenizer_config.json'))['chat_template'].strip().encode()).hexdigest())"
// note that chat templates must be trimmed to match the llama.cpp metadata value which was the case (due to this bug), but it is not the case anymore. Approaches:
|
I went with approach 2. It seems like the cleaner approach, even though string trimming is fairly common. The llama.cpp NULL value is now replaced with a newline (which it originally replaced), which aligns it with the koboldcpp endpoint. I also renamed the Silly Tavern back end endpoint to /props, since that's what it is forwarding right now (not just chat template, at least for llama.cpp version). |
ba99fd1
to
3789381
Compare
How ready is this for a final review/test? |
It should be good to go! |
@@ -218,6 +219,9 @@ router.post('/status', jsonParser, async function (request, response) { | |||
} catch (error) { | |||
console.error(`Failed to get TabbyAPI model info: ${error}`); | |||
} | |||
} else if (apiType == TEXTGEN_TYPES.KOBOLDCPP || apiType == TEXTGEN_TYPES.LLAMACPP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This custom header part was not needed, so I removed it.
The pattern was probably copied from tokenization-supported which only exists because of LM Studio-specific hacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Works for me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with the latest pull of llama.cpp. Works, thanks!
Now we need someone to keep the mapping in sync with the latest models in town, lol.
Thanks for testing/merging.
Right. Seems like not a big deal esp since it console.logs the unknown hashes, but if it turns into a huge issue, it may be time to consider a more sophisticated approach. Doubt it though. |
This PR adds a simple hash / sub-string based method for picking context and instruct templates based on the chat template, when provided by the back end.
It can be confusing to have to manually switch when you switch to a different model, so having an automated way can be helpful.
The koboldcpp backend variant relies on LostRuins/koboldcpp#1222 (merged, hopefully in 1.79 release).
The llama.cpp backend variant works as is.
Checklist:
Future work (post-merge):