Description
Background
SlateDB is adding caching as described in #15 and #9 . These caches would reduce read latency. Applications that have a large state, and a smaller hot key would benefit from cache. Cache would be empty on recovery, and until the cache is warm read latency would be high.
High level proposal
Persisting metadata about cache and proactively filling in the cache could improve read latency.
SlateDB has immutable SSTs, and SSTs have multiple blocks. Upon recovery, set of SSTs that are part of the DB would be loaded to db_state
as described in the manifest design doc. (SST Id, Block)
would likely be one of the index for the cache. If we persist [AppId, [CachedBlock(SST Id, Block)]]
as metadata, new writers can optionally use the metadata to asynchronously fill the cache. AppId
in this case is an opaque id provided by user. Same DB could have multiple readers, each with different AppId
.
We could extend manifest file structure to add optional metadata, and add pointers to this metadata in the manifest to keep the manifest file manageable. This could belong to Snapshot
.
This issue should cover
- Design, including the perf parameters (DB size, key count, hot key count, read latency with and without cache on recovery) and back of envelope calculations.
- Implementation.
- Updates to
db_bench
for the new benchmark.
Activity