-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create an in-memory block cache #15
Comments
I have some preliminary ideas regarding issues #9 and #15 that I would like to discuss. Since I am new to AWS pricing and instance specifications, I would appreciate some insights into the capabilities of the ephemeral storage service of VMs in AWS.
Thanks |
My opinion is that slatedb should be built to work well in both scenarios. If NVMe (or local SSD) is available, it should be leveraged to decrease latency (without sacrificing consistency) by caching SSTs locally on disk (#9). Similarly, slatedb should use memory to cache frequently accessed SST blocks. For SST disk caching, I think we'll want to expose a config to define how much space to allocate to the disk cache. This is similar to how RocksDB-cloud works with their persistent secondary cache (PSC). The in-memory cache (this GH issue) should function similarly. Indeed, this is how RocksDB's own block cache works:
Users determine the size. This will allow users to configure slatedb according to their environment. w.r.t. performance numbers, I think it's pretty dependent on the instances you select (EBS, NVMe, SSD, and so on). This is why I think exposing the knobs and allowing users to configure slatedb is a better approach than making assumptions about what the environment is capable of. |
@gesalous I've assigned this one to you. Feel free to unassign yourself if you decide not to work on it. :) Thanks! |
Should we define the API of the block cache before delving into design details? The zero version could be as simple as an LRU cache with a queue and a hash table. I could proceed with a zero version of the API. What do you think? |
💯 This is what I was thinking. Basically just an LRU map from (sst id, block id) to block. iirc, mini-LSM uses some third party crate to get this data structure. EDIT: It's moka. |
Here's the line in mini-lsm where they declare the BlockCache using moka: https://github.com/skyzh/mini-lsm/blob/main/mini-lsm/src/lsm_storage.rs#L28 |
@gesalous checking in: you still interested in this one? |
Yes, I read today the design manifest in detail. I ll get back with a design sketch. |
Great, thanks for the update! 😄 |
i'm interested in this issue. can you assign it to me if it hasn't been scheduled yet? i'm mostly available in the weekends. I'll unassign myself if I become unavailable anymore : ) |
@flaneur2020 Thanks for checking in! @pragmaticanon has been talking about working on this one. We're discussing it on Discord if you want to join the conversation. @rodesai pointed out that it might make more sense to start with SST caching on disk (#9), since that one gives us the OS (in-memory) page cache for free. I'm inclined to agree with him. Perhaps you want to take #9 first? |
Taking up this issue as per discussion on discord. |
Frequently accessed SST blocks should be cached in memory. This should dramatically improve read performance. Let's start with a simple LRU cache. See mini-lsm for an example.
The text was updated successfully, but these errors were encountered: