Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an in-memory block cache #15

Closed
criccomini opened this issue Apr 23, 2024 · 13 comments · Fixed by #137
Closed

Create an in-memory block cache #15

criccomini opened this issue Apr 23, 2024 · 13 comments · Fixed by #137
Assignees
Labels
enhancement New feature or request

Comments

@criccomini
Copy link
Collaborator

Frequently accessed SST blocks should be cached in memory. This should dramatically improve read performance. Let's start with a simple LRU cache. See mini-lsm for an example.

@criccomini criccomini added the enhancement New feature or request label Apr 23, 2024
@gesalous
Copy link

I have some preliminary ideas regarding issues #9 and #15 that I would like to discuss.
Specifically, is SlateDB intended to target VM instances with local ephemeral NVMe storage with high IOPs and capacity, or should it work with VM instances with ephemeral storage with low IOPs and capacity?

Since I am new to AWS pricing and instance specifications, I would appreciate some insights into the capabilities of the ephemeral storage service of VMs in AWS.
Could anyone share detailed information on:

  • The IOPS performance,
  • Throughput, and
  • Capacity of these NVMe SSDs?

Thanks

@criccomini
Copy link
Collaborator Author

My opinion is that slatedb should be built to work well in both scenarios. If NVMe (or local SSD) is available, it should be leveraged to decrease latency (without sacrificing consistency) by caching SSTs locally on disk (#9). Similarly, slatedb should use memory to cache frequently accessed SST blocks.

For SST disk caching, I think we'll want to expose a config to define how much space to allocate to the disk cache. This is similar to how RocksDB-cloud works with their persistent secondary cache (PSC). The in-memory cache (this GH issue) should function similarly. Indeed, this is how RocksDB's own block cache works:

Block cache is where RocksDB caches data in memory for reads. User can pass in a Cache object to a RocksDB instance with a desired capacity (size).

Users determine the size.

This will allow users to configure slatedb according to their environment.

w.r.t. performance numbers, I think it's pretty dependent on the instances you select (EBS, NVMe, SSD, and so on). This is why I think exposing the knobs and allowing users to configure slatedb is a better approach than making assumptions about what the environment is capable of.

@criccomini
Copy link
Collaborator Author

@gesalous I've tried to clarify things more here: #20

@criccomini
Copy link
Collaborator Author

@gesalous I've assigned this one to you. Feel free to unassign yourself if you decide not to work on it. :) Thanks!

@gesalous
Copy link

Should we define the API of the block cache before delving into design details? The zero version could be as simple as an LRU cache with a queue and a hash table. I could proceed with a zero version of the API. What do you think?

@criccomini
Copy link
Collaborator Author

criccomini commented Apr 29, 2024

💯 This is what I was thinking. Basically just an LRU map from (sst id, block id) to block. iirc, mini-LSM uses some third party crate to get this data structure.

EDIT: It's moka.

@criccomini
Copy link
Collaborator Author

Here's the line in mini-lsm where they declare the BlockCache using moka:

https://github.com/skyzh/mini-lsm/blob/main/mini-lsm/src/lsm_storage.rs#L28

@criccomini
Copy link
Collaborator Author

@gesalous checking in: you still interested in this one?

@gesalous
Copy link

gesalous commented May 8, 2024

Yes, I read today the design manifest in detail. I ll get back with a design sketch.

@criccomini
Copy link
Collaborator Author

Great, thanks for the update! 😄

@flaneur2020
Copy link
Contributor

flaneur2020 commented Aug 18, 2024

i'm interested in this issue. can you assign it to me if it hasn't been scheduled yet?

i'm mostly available in the weekends. I'll unassign myself if I become unavailable anymore : )

@criccomini
Copy link
Collaborator Author

@flaneur2020 Thanks for checking in! @pragmaticanon has been talking about working on this one. We're discussing it on Discord if you want to join the conversation.

@rodesai pointed out that it might make more sense to start with SST caching on disk (#9), since that one gives us the OS (in-memory) page cache for free. I'm inclined to agree with him. Perhaps you want to take #9 first?

@pragmaticanon
Copy link
Contributor

Taking up this issue as per discussion on discord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants