-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Claude caching at depth #3085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Claude caching at depth #3085
Conversation
|
OpenRouter support added as well. To clarify, the 2 in the code is supposed to be from the cache immediately before the previous message (if available) regardless of what's going on with injections at depth (which should all presumably have user role for Claude?), hence why depth is strict role switches (unless I'm doing something absurdly stupid here, always possible). |
I doubt that's a requirement. It's on user messages in the docs because naturally the last message in a request would just be user (solo chat being the typical use case). User sends on turn 1. Then turn 2. The current and previous breakpoints happen to be on the user; you get the idea.
https://docs.anthropic.com/en/release-notes/api#october-8th-2024 Also, we are allowed up to 4 breakpoints, so we should use 4 (edit: if not
|
|
I very intentionally only used 2 breakpoints to avoid breaking the system prompt caching option (which uses up to 2 breakpoints). EDIT: Anyway the current setup is entirely rational. It's customizable, it hits good caches between swipes and between messages, assuming people aren't going back and editing the chat history. Extremely optimized for 90% of usecases other than swiping 50 times (but still optimized for that). EDIT 2: Exclusively caching on user messages might be a good heuristic, I'll sleep on it. Group chats were always a mess and will remain a mess and I refuse to waste more than 5 neurons on them. |
|
System prompt caching uses 2? My bad. But where and why? |
One breakpoint on the sysprompt and another for the tools. It doesn't NEED both, but if I were using a third breakpoint I'd just place it on the prefill to optimize swipes all the way (with a third configuration option). EDIT: To clarify the prefill breakpoint would be more of a "might as well" thing because your prefill shouldn't be big enough for the caching to make a difference, heuristically, and not worth breaking a feature that already exists. |
| bodyParams['route'] = 'fallback'; | ||
| } | ||
|
|
||
| let cachingAtDepth = getConfigValue('claude.cachingAtDepth', -1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd maybe put that into a separate function (i.e in the prompt-converters module) cause it makes the endpoint harder to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, also edited up the anthropic code so it doesn't rely on ST's current squashing behavior for messages, anything else?
EDIT: by "squashing" I mean "flattening the content arrays and/or just intentionally always putting everything in a single content array".
Cohee1207
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I poked at it (with N = 0) and it appears to be functional in the ideal circustances. Can't say any more than that.
|
Thx. Sorry about the eslint thing. |

Very optional, very out of the way, very explicit warnings about how badly it can go, ample room for comfortable configuration ({{random}} on prefills should work out of the box, which is the most useful usecase for {{random}}, PHIs and A/Ns take minor configuration).
Checklist: