Claude caching at depth #3085

ghost · 2024-11-17T12:39:44Z

Very optional, very out of the way, very explicit warnings about how badly it can go, ample room for comfortable configuration ({{random}} on prefills should work out of the box, which is the most useful usecase for {{random}}, PHIs and A/Ns take minor configuration).

Checklist:

I have read the Contributing guidelines.

…ost use cases

default/config.yaml

ghost · 2024-11-17T19:00:38Z

OpenRouter support added as well. To clarify, the 2 in the code is supposed to be from the cache immediately before the previous message (if available) regardless of what's going on with injections at depth (which should all presumably have user role for Claude?), hence why depth is strict role switches (unless I'm doing something absurdly stupid here, always possible).

cloak1505 · 2024-11-18T00:16:28Z

(which should all presumably have user role for Claude?)

I doubt that's a requirement. It's on user messages in the docs because naturally the last message in a request would just be user (solo chat being the typical use case). User sends on turn 1. Then turn 2. The current and previous breakpoints happen to be on the user; you get the idea.

hence why depth is strict role switches (unless I'm doing something absurdly stupid here, always possible)

https://docs.anthropic.com/en/release-notes/api#october-8th-2024
Anthropic recently loosened restrictions to allow for consecutive same-role messages, but they say those messages will be combined into single message, which makes me worry whether consecutive assistant messages will break the cache. If it does, then we'll have to default to caching only the user messages to be safe. If not, well, caching assistant messages will help group chats where most messages are from various characters.

Also, we are allowed up to 4 breakpoints, so we should use 4 (edit: if not enableSystemPromptCache else 3 uh, 2). This will allow users to edit 7 messages back (after 4th last user turn) and keep the cache.

#2693

My original idea was system prompt + last 3 user messages, which would allow you to restart the chat assuming the sys prompt is at least 1024 tokens

ghost · 2024-11-18T00:39:59Z

I very intentionally only used 2 breakpoints to avoid breaking the system prompt caching option (which uses up to 2 breakpoints).

EDIT: Anyway the current setup is entirely rational. It's customizable, it hits good caches between swipes and between messages, assuming people aren't going back and editing the chat history. Extremely optimized for 90% of usecases other than swiping 50 times (but still optimized for that).

EDIT 2: Exclusively caching on user messages might be a good heuristic, I'll sleep on it. Group chats were always a mess and will remain a mess and I refuse to waste more than 5 neurons on them.

cloak1505 · 2024-11-18T00:44:30Z

System prompt caching uses 2? My bad. But where and why?

ghost · 2024-11-18T00:46:37Z

SillyTavern/src/endpoints/backends/chat-completions.js

Line 117 in a3ca407

    
           convertedPrompt.systemPrompt[convertedPrompt.systemPrompt.length - 1]['cache_control'] = { type: 'ephemeral' };

SillyTavern/src/endpoints/backends/chat-completions.js

Line 137 in a3ca407

    
           requestBody.tools[requestBody.tools.length - 1]['cache_control'] = { type: 'ephemeral' };

One breakpoint on the sysprompt and another for the tools. It doesn't NEED both, but if I were using a third breakpoint I'd just place it on the prefill to optimize swipes all the way (with a third configuration option).

EDIT: To clarify the prefill breakpoint would be more of a "might as well" thing because your prefill shouldn't be big enough for the caching to make a difference, heuristically, and not worth breaking a feature that already exists.

ghost · 2024-11-18T01:24:36Z

Funny drawing to explain the intended cache hits.

Cohee1207 · 2024-11-18T10:35:57Z

src/endpoints/backends/chat-completions.js

            bodyParams['route'] = 'fallback';
        }
+
+        let cachingAtDepth = getConfigValue('claude.cachingAtDepth', -1);


I'd maybe put that into a separate function (i.e in the prompt-converters module) cause it makes the endpoint harder to read.

Done, also edited up the anthropic code so it doesn't rely on ST's current squashing behavior for messages, anything else?

EDIT: by "squashing" I mean "flattening the content arrays and/or just intentionally always putting everything in a single content array".

Cohee1207

I poked at it (with N = 0) and it appears to be functional in the ideal circustances. Can't say any more than that.

ghost · 2024-11-18T19:42:10Z

Thx. Sorry about the eslint thing.

Honey Tree added 2 commits November 17, 2024 08:32

Simple implementation of caching at depth that should be useful for m…

73dabd8

…ost use cases

cache_control positioning fix

ac33e4d

Cohee1207 reviewed Nov 17, 2024

View reviewed changes

default/config.yaml Outdated Show resolved Hide resolved

Honey Tree added 2 commits November 17, 2024 14:01

Defaulting to -1 rather than boolean false

5397614

Adding Claude caching support to OpenRouter as well

befe5a7

cloak1505 mentioned this pull request Nov 18, 2024

[FEATURE_REQUEST] Chat Completion: Add option to send system messages as user #2507

Closed

Cohee1207 reviewed Nov 18, 2024

View reviewed changes

Honey Tree and others added 3 commits November 18, 2024 08:06

Moved most of code to prompt converters

c3caa16

Merge branch 'staging' into claude-caching-at-depth

c3483bc

[chore] eslint fixes

3b03f07

Cohee1207 approved these changes Nov 18, 2024

View reviewed changes

Cohee1207 merged commit 54db498 into SillyTavern:staging Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude caching at depth #3085

Claude caching at depth #3085

Uh oh!

ghost commented Nov 17, 2024

Uh oh!

Uh oh!

ghost commented Nov 17, 2024 •

edited by ghost

Loading

Uh oh!

cloak1505 commented Nov 18, 2024 •

edited

Loading

Uh oh!

ghost commented Nov 18, 2024 •

edited by ghost

Loading

Uh oh!

cloak1505 commented Nov 18, 2024

Uh oh!

ghost commented Nov 18, 2024 •

edited by ghost

Loading

Uh oh!

ghost commented Nov 18, 2024

Uh oh!

Cohee1207 Nov 18, 2024

Uh oh!

ghost Nov 18, 2024 •

edited by ghost

Loading

Uh oh!

Cohee1207 left a comment

Uh oh!

ghost commented Nov 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Claude caching at depth #3085

Claude caching at depth #3085

Uh oh!

Conversation

ghost commented Nov 17, 2024

Checklist:

Uh oh!

Uh oh!

ghost commented Nov 17, 2024 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloak1505 commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Nov 18, 2024 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloak1505 commented Nov 18, 2024

Uh oh!

ghost commented Nov 18, 2024 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Nov 18, 2024

Uh oh!

Cohee1207 Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

ghost Nov 18, 2024 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cohee1207 left a comment

Choose a reason for hiding this comment

Uh oh!

ghost commented Nov 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ghost commented Nov 17, 2024 •

edited by ghost

Loading

cloak1505 commented Nov 18, 2024 •

edited

Loading

ghost commented Nov 18, 2024 •

edited by ghost

Loading

ghost commented Nov 18, 2024 •

edited by ghost

Loading

ghost Nov 18, 2024 •

edited by ghost

Loading