Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: merge development updates #26

Merged
merged 32 commits into from
May 24, 2024
Merged
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
434d3bf
chore: remove redundant doc links
arindas Dec 24, 2023
e3061d5
doc: adds documentation for top-level modules under crate::storage
arindas Dec 24, 2023
99d7e22
doc: adds documentation for the commit-log module
arindas Dec 29, 2023
3cdae60
doc: adds documentation for the index module
arindas Jan 13, 2024
0d2f3fb
style: rewrites SegmentedLog::new to reduce Segment::new calls
arindas Feb 7, 2024
0af1217
chore: updates generational-cache dep version
arindas Feb 7, 2024
ed3faf2
chore: updates generational-cache dep version
arindas Feb 8, 2024
0084f4a
style: conforms common::split module structure to follow repository s…
arindas Feb 8, 2024
1d994b2
doc: adds documentation for Index struct
arindas Feb 8, 2024
b1de935
doc: updates README and Index documentation
arindas Feb 9, 2024
b70ba84
doc: documents segmented_log::MetaWithIdx
arindas Feb 26, 2024
f95fb55
chore: adds documentation for SegmentedLogError
arindas Feb 27, 2024
95732bd
style: remove redundant unit return from unit closure
arindas Feb 27, 2024
ff8b4d8
doc: adds documentation for the SegmentedLog struct
arindas Feb 27, 2024
78f038e
doc: adds an example for SegmentedLog
arindas Feb 27, 2024
52d2ee4
doc: documents SegmentedLog::new
arindas Feb 28, 2024
a9e2017
doc: adds documentation for convenience macros
arindas Feb 28, 2024
61b83ef
doc: adds documentation for various read() APIs
arindas Feb 28, 2024
0d452c2
chore: corrects typo
arindas Feb 28, 2024
47f92d6
style: refactor CacheOp and CacheOpKind to single sum type SegmentInd…
arindas Mar 2, 2024
9bfbea3
doc: adds documentation for stream_* and seq_read APIs
arindas Mar 2, 2024
c3d2e13
chore: fix typos
arindas Mar 6, 2024
3c4d0ba
doc: documents remaining methods for SegmentedLog
arindas Mar 6, 2024
a139f28
doc: adds documentation for IndexError and Index::* functions
arindas Mar 16, 2024
89e863c
doc: adds missing documenatation for Index::* functions
arindas May 1, 2024
ec7874f
doc: documents RecordHeader and Store in segmented_log::store
arindas May 24, 2024
09f67d2
doc: documents store::StoreError
arindas May 24, 2024
e4900cd
doc: documents segmented_log::segment::Segment
arindas May 24, 2024
c16dbe1
chore: update crate version
arindas May 24, 2024
d363e69
chore: update examples path in README
arindas May 24, 2024
32dadc3
Merge pull request #25 from arindas/docs/0.0.5
arindas May 24, 2024
828864f
Merge branch 'main' into develop
arindas May 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
doc: adds documentation for the SegmentedLog struct
  • Loading branch information
arindas committed Feb 27, 2024
commit ff8b4d8a2a8155fd0075828acc1ccad0006973dd
85 changes: 81 additions & 4 deletions src/storage/commit_log/segmented_log/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ use std::{
time::Duration,
};

/// Represents metadata for records in the [`SegmentedLog`].
/// Represents metadata for [`Record`] instances in the [`SegmentedLog`].
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub struct MetaWithIdx<M, Idx> {
/// Generic metadata for the record as necessary
Expand All @@ -142,7 +142,7 @@ where
Idx: Eq,
{
/// Returns a [`Some`]`(`[`MetaWithIdx`]`)` containing this instance's `metadata` and the
/// provided `anchor_idx` if this indices match or this instance's `index` is `None`.
/// provided `anchor_idx` if the indices match or this instance's `index` is `None`.
///
/// Returns `None` if this instance contains an `index` and the indices mismatch.
pub fn anchored_with_index(self, anchor_idx: Idx) -> Option<Self> {
Expand All @@ -158,8 +158,7 @@ where
}
}

/// Record type alias for [`SegmentedLog`] using [`MetaWithIdx`] for the generic metadata
/// parameter.
/// Record type alias for [`SegmentedLog`] using [`MetaWithIdx`] as the metadata.
pub type Record<M, Idx, T> = super::Record<MetaWithIdx<M, Idx>, T>;

/// Error type associated with [`SegmentedLog`] operations.
Expand Down Expand Up @@ -211,13 +210,91 @@ where
{
}

/// Configuration for [`SegmentedLog`].
///
/// Used to configure specific invariants of a segmented log.
#[derive(Default, Debug, Clone, Copy, Serialize, Deserialize)]
pub struct Config<Idx, Size> {
/// Number of [`Segment`] instances in the [`SegmentedLog`] to be _index-cached_.
///
/// _Index-cached_ [`Segment`] instances cache their inner [`Index`](index::Index) in memory.
/// This helps to avoid I/O for reading [`Record`] persistent metadata (such as position in
/// store file or checksum) everytime the [`Record`] is read from the [`Segment`]
///
/// This configuration has the following effects depending on it's values:
/// - [`None`]: Default, *all* [`Segment`] instances are _index-cached_
/// - [`Some`]`(0)`: *No* [`Segment`] instances are _index-cached_
/// - [`Some`]`(<non-zero-value>)`: A *maximum of the given number* of [`Segment`] instances are
/// _index-cached_ at any time.
///
/// >You may think of it this way -- you can opt-in to optional index-caching by specific a
/// >[`Some`]. Or, you can keep using the default setting to index-cache all segments by
/// >specifying [`None`].
///
/// _Optional index-caching_ is benefical in [`SegmentedLog`] with a large number of
/// [`Segment`] instances, only a few of which are actively read from at any given point of
/// time. This is beneifical when working with limited heap memory but a large amount of
/// storage.
///
/// <div></div>
pub num_index_cached_read_segments: Option<usize>,

/// [`Segment`] specific configuration to be used for all [`Segment`] instances in the
/// [`SegmentedLog`] in question.
///
/// <div></div>
pub segment_config: segment::Config<Size>,

/// Lowest possible record index in the [`SegmentedLog`] in question.
///
/// `( initial_index <= read_segments[0].base_index )`
pub initial_index: Idx,
}

/// The [`SegmentedLog`] abstraction, implementing a [`CommitLog`] with a collection of _read_
/// [`Segment`]`s` and a single _write_ [`Segment`].
///
/// Uses a [`Vec`] to store _read_ [`Segment`] instances and an [`Option`] to store the _write_
/// [`Segment`]. The [`Option`] is used so that we can easily move out the _write_ [`Segment`] or
/// move in a new one when implementing some of the APIs. The
/// [`SegmentedLogError::WriteSegmentLost`] error is a result of this implementation decision.
///
/// [`SegmentedLog`] also has the ability to only optionally _index-cache_ some of the [`Segment`]
/// instances.
///
/// >_Index-cached_ [`Segment`] instances cache their inner [`Index`](index::Index) in memory.
/// >This helps to avoid I/O for reading [`Record`] persistent metadata (such as position in store
/// >file, checksum) everytime the [`Record`] is read from the [`Segment`]
///
/// >_Optional index-caching_ is benefical in [`SegmentedLog`] with a large number of
/// >[`Segment`] instances, only a few of which are actively read from at any given point of
/// >time. This is beneifical when working with limited heap memory but a large amount of
/// >storage.
///
/// [`SegmentedLog`] maintains a [`Cache`] to keep track of which [`Segment`] instances to
/// _index-cache_. The _index-caching_ behaviour will depend on the [`Cache`] implementation used.
/// (For instance, an `LRUCache` would cache the least recently used [`Segment`] instances.) In
/// order to enable such behaviour, we perform lookups and inserts on this inner cache when
/// referring to any [`Segment`] for any operation.
///
/// The _write_ [`Segment`] is always _index-cached_.
///
/// Only the metadata associated with [`Record`] instances are serialized or deserialized. The
/// reocord content bytes are always written and read from the [`Storage`] as-is.
///
/// ### Type parameters
/// - `S`: [`Storage`] implementation to be used for [`Segment`] instances
/// - `M`: Metadata to be used for [`Record`] instances
/// - `H`: [`Hasher`] to use for computing checksums of our [`Record`] contents
/// - `Idx`: Unsigned integer type to used for represeting record indices
/// - `Size`: Unsized integer to represent record and persistent storage sizes
/// - `SERP`: [`SerializationProvider`] used for serializing and deserializing metadata associated
/// with our records.
/// - `SSP`: [`SegmentStorageProvider`] used for obtaining backing storage for our [`Segment`]
/// instances
/// - `C`: [`Cache`] implementation to use for _index-caching_ behaviour. You may use
/// [`NoOpCache`](crate::common::cache::NoOpCache) when opting out of _optional index-caching_,
/// i.e. using [`None`] for [`Config::num_index_cached_read_segments`].
pub struct SegmentedLog<S, M, H, Idx, Size, SERP, SSP, C> {
write_segment: Option<Segment<S, M, H, Idx, Size, SERP>>,
read_segments: Vec<Segment<S, M, H, Idx, Size, SERP>>,
Expand Down