Skip to content

Provide a mechanism to persist cache metadata, to improve read latency on recovery. #169

Open
@vigneshc

Description

Background

SlateDB is adding caching as described in #15 and #9 . These caches would reduce read latency. Applications that have a large state, and a smaller hot key would benefit from cache. Cache would be empty on recovery, and until the cache is warm read latency would be high.

High level proposal

Persisting metadata about cache and proactively filling in the cache could improve read latency.
SlateDB has immutable SSTs, and SSTs have multiple blocks. Upon recovery, set of SSTs that are part of the DB would be loaded to db_state as described in the manifest design doc. (SST Id, Block) would likely be one of the index for the cache. If we persist [AppId, [CachedBlock(SST Id, Block)]] as metadata, new writers can optionally use the metadata to asynchronously fill the cache. AppId in this case is an opaque id provided by user. Same DB could have multiple readers, each with different AppId.
We could extend manifest file structure to add optional metadata, and add pointers to this metadata in the manifest to keep the manifest file manageable. This could belong to Snapshot.

This issue should cover

  1. Design, including the perf parameters (DB size, key count, hot key count, read latency with and without cache on recovery) and back of envelope calculations.
  2. Implementation.
  3. Updates to db_bench for the new benchmark.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions