Skip to content

Commit 5cc4ed2

Browse files
committed
Write a manual
1 parent 5c91640 commit 5cc4ed2

18 files changed

Lines changed: 1398 additions & 115 deletions

book.toml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
[book]
2+
title = "A User’s Guide to the `bitvec` Project"
3+
authors = [
4+
"myrrlyn <[email protected]>",
5+
]
6+
description = "A more thorough manual of the `bitvec` project"
7+
src = "book"
8+
9+
[build]
10+
build-dir = "target/book"
11+
12+
[output.html]
13+
mathjax-support = true
14+
15+
[output.html.playpen]
16+
line-numbers = true

book/SUMMARY.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Summary
2+
3+
[Dedication](dedication.md)
4+
5+
1. [Introduction](introduction.md)
6+
1. [Data Structures](data-structures.md)
7+
1. [`BitSlice`](data-structures/bitslice.md)
8+
1. [`BitArray`](data-structures/bitarray.md)
9+
1. [`BitVec` and `BitBox`](data-structures/bitvec.md)
10+
1. [Type Parameters](type-parameters.md)
11+
1. [`BitOrder`](type-parameters/bitorder.md)
12+
1. [`BitStore`](type-parameters/bitstore.md)
13+
1. [Practical Use](practical-use.md)
14+
1. [`bool` Collections](practical-use/collections.md)
15+
1. [C-Style Bitfields](practical-use/bitfields.md)
16+
1. [Memory Model](memory-model.md)
17+
1. [Pointer Encoding](pointer-encoding.md)

book/bit-ordering.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Bit Ordering
2+
3+
`bitvec` is unique among its class of libraries in that it distinguishes
4+
indexing within a sequence versus selecting a bit within a memory element. Users
5+
are allowed to choose, or even create, a function that transforms an abstract
6+
index into an actual selection mask. To make matters even more complex, users
7+
are also allowed to choose which of the fundamental register types will be used
8+
as the base unit of memory, with no intermediate subdivision between that type
9+
and the individual bits.
10+
11+
This is expressed through the pair of type parameters
12+
`<O: BitOrder, T: BitStore>` on all `bitvec` data structures. The correct choice
13+
of parameters depends on your use case. This chapter explains how each
14+
combination translates to real memory.
15+
16+
## `BitOrder` Types
17+
18+
`bitvec` provides three ordering names in its prelude: `LocalBits`, `Lsb0`, and
19+
`Msb0`.
20+
21+
The `Lsb0` type uses the function `selector = 1 << index`, and is the typical
22+
behavior of other bit-sequence libraries and of C-style structural bitfields on
23+
most architectures. Notably, it is *not* the behavior of many I/O protocols,
24+
which is why `bitvec` allows substitution.
25+
26+
The `Msb0` type uses the function `selector = !(!0 >> 1) >> index`, and is the
27+
behavior used by I/O protocols such as TCP and IP, as well as C-style structural
28+
bitfields on targets with big-endian byte ordering, such as SPARCv8.
29+
30+
If you do not have an explicit need for memory layout, you should probably not
31+
use this ordering: its base constant is one of `0x80`, `0x80_00`,
32+
`0x8000_0000`, or `0x8000_0000__0000_0000`, while the `Lsb0` base constant is
33+
uniformly `1`. When used with register types wider than a byte, the base
34+
constant probably will not be inlined into the instruction as an immediate
35+
value, but will rather be computed at runtime as `(1 << reg_width) >> index`.
36+
37+
The `LocalBits` name aliases to either `Lsb0` or `Msb0`, depending on what C
38+
compilers targeting your architecture would probably select when performing
39+
structural bitfield layout. This is the default parameter used in `bitvec` types
40+
that do not explicitly specify an ordering.
41+
42+
> I am uncertain whether this is specified in the C Standard, or if it is an
43+
> implementation-defined practice. As bit-orderings within a memory element are
44+
> unrelated to the byte-orderings of multi-byte elements, I cannot explain
45+
> further why this change in behavior exists. Nevertheless, if you have written
46+
> cross-platform I/O protocol code using bitfields, as I have, you have probably
47+
> run into this behavior.
48+
49+
## `BitStore` Types
50+
51+
Unsigned integers that roughly correspond to ordinary (non-vector) CPU registers
52+
on your target architecture implement `BitStore`.
53+
54+
This is not *quite* true: the actual **requirement** is that integers
55+
implementing `BitStore` must always be aligned to their size. This is not the
56+
case for `u64` on 32-bit `x86`, architectures (see the Rust [layout docs]).
57+
58+
In addition, `BitStore` is implemented on `Cell` wrappers of the integers, and
59+
their atomic variants. This is used to provide alias-aware memory access, as
60+
most architectures do not support modifying a single bit within a memory address
61+
without locking the entire referent element.
62+
63+
`bitvec` is primarily developed and tested on `x86_64`, and may be incorrect on
64+
embedded architectures. Please file an issue if you encounter errors involving
65+
this trait.
66+
67+
## Memory Layout
68+
69+
The `BitOrder` and `BitStore` traits combine with your target architecture’s
70+
byte-ordering of register elements to create a matrix of memory traversals. This
71+
matrix informs what is the appropriate choice of parameters to use for your
72+
program whenever you are using `bitvec` for precise memory control rather than
73+
solely for a compact `usize`-to-`bool` data collection.
74+
75+
The tables below list bytes of a memory address space with addresses increasing
76+
to the right, and the bits *within* those bytes with numeric significance
77+
decreasing to the right. This is the ordering used in most debug-printing of
78+
memory, so hopefully the table contents should match up with your prior
79+
experience viewing memory bytes.
80+
81+
The `L` and `M` indicate the `Lsb0` and `Msb0` ordering parameters,
82+
respectively, and `XX` indicates that the row matches all register types.
83+
Within each row, traversal begins at zero and follows the arrows to each
84+
successive step. Boundaries between registers are marked with a column;
85+
boundaries between bytes within the same register are marked with a space.
86+
87+
### Little-Endian Byte-Ordered Machines
88+
89+
On little-endian machines, the least-significant *byte* of a register type is
90+
stored at the lowest memory address, and each byte higher is one step more
91+
numerically significant than the last.
92+
93+
```text
94+
byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777
95+
bit ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210
96+
═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════
97+
LXX ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E
98+
─────╫─────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────
99+
M8 ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F
100+
M16 ║ 2 ---> 3 0 ---> 1│6 ---> 7 4 ---> 5│A ---> B 8 ---> 9│E ---> F C ---> D
101+
M32 ║ 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1│E ---> F C ---> D A ---> B 8 ---> 9
102+
M64 ║ E ---> F C ---> D A ---> B 8 ---> 9 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1
103+
```
104+
105+
### Big-Endian Byte-Ordered Machines
106+
107+
On big-endian machines, the most-significant *byte* of a register type is stored
108+
at the lowest memory address, and each byte higher is one step less numerically
109+
signifcant than the last.
110+
111+
```text
112+
byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777
113+
bit ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210
114+
═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════
115+
L8 ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E
116+
L16 ║ 3 <--- 2 1 <--- 0│7 <--- 6 5 <--- 4│B <--- A 9 <--- 8│F <--- E D <--- C
117+
L32 ║ 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0│F <--- E D <--- C B <--- A 9 <--- 8
118+
L64 ║ F <--- E D <--- C B <--- A 9 <--- 8 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0
119+
─────╫─────────┬────────┬────────┬────────┬────────┬────────┬────────┬────────
120+
MXX ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F
121+
```
122+
123+
[layout docs]: https://doc.rust-lang.org/reference/type-layout.html#primitive-data-layout

book/data-structures.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Data Structures
2+
3+
`bitvec`’s primary exports are four data structures: [`BitSlice`], [`BitArray`],
4+
[`BitBox`], and [`BitVec`]. These correspond to the [`[bool]`][slice],
5+
[`[bool; N]`][array], [`Box<[bool]>`][boxed], and [`Vec<bool>`] types in the
6+
Rust standard libraries. Unlike the Rust types, the `bitvec` types are not
7+
composable, and cannot be mixed with any other types, including pointers in the
8+
standard library.
9+
10+
The `bitvec` types implement the APIs of their standard-library counterparts to
11+
the fullest extent possible. The only missing feature is currently
12+
`IndexMut<usize>`, preventing `collection[index] = bit;` assignment. This means
13+
that, for users looking for compact `usize => bool` collections, substituting
14+
types in your project codebase ought to be enough to make your project Just
15+
Work™️.
16+
17+
It is the fact that `BitSlice` acts exactly like `[bool]`, and can only be used
18+
through `&BitSlice` and `&mut BitSlice` references, that makes `bitvec` work. No
19+
other Rust library has this capability.
20+
21+
Before we explore the data structures in more detail, there are three caveats I
22+
must provide:
23+
24+
1. `bitvec` uses an opaque, custom, pointer representation for everything except
25+
`BitArray`. You may not inspect or modify this pointer. You may not use it as
26+
a type or value parameter in any other types or functions. You will break
27+
your program if you try.
28+
29+
`bitvec` ensures that this pointer encoding does not fail the compiler and
30+
language requirements for reference correctness. The rules used to do so are
31+
internal to the crate, and will not be present outside it. `bitvec` pointers
32+
are perfectly safe to use, as long as you treat them as completely opaque and
33+
use *only* the interfaces provided.
34+
1. These structures all have two type parameters, `<O: BitOrder, T: BitStore>`.
35+
These parameters are described in the next chapter. They govern the in-memory
36+
representation of data storage, but are not relevant to the general use of the
37+
handle types.
38+
1. `bitvec` trades an increased memory space efficiency for decreased
39+
instruction size and speed efficiency. The compiler *may* optimize some of
40+
the costs away, but `bitvec` structures are not free to use. The “zero-cost”
41+
of the `bitvec` abstraction is that you cannot write a better bitset, and
42+
*not* that it is equal to an ordinary `bool` sequence.
43+
44+
Now that the disclaimers are over, we can talk about how the types actually
45+
work.
46+
47+
[`BitArray`]: https://docs.rs/bitvec/latest/bitvec/array/struct.BitArray.html "BitArray API documentation"
48+
[`BitBox`]: https://docs.rs/bitvec/latest/bitvec/boxed/struct.BitBox.html "BitBox API documentation"
49+
[`BitSlice`]: https://docs.rs/bitvec/latest/bitvec/slice/struct.BitSlice.html "BitSlice API documentation"
50+
[`BitVec`]: https://docs.rs/bitvec/latest/bitvec/vec/struct.BitVec.html "BitVec API documentation"
51+
[`Vec<bool>`]: https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html "Vec API documentation"
52+
[`std::bitset<N>`]: https://en.cppreference.com/w/cpp/utility/bitset "C++ std::bitset documentation"
53+
[`std::vector<bool>`]: https://en.cppreference.com/w/cpp/container/vector_bool "C++ std::vector<bool> documentation"
54+
[array]: https://doc.rust-lang.org/stable/std/primitive.array.html "array API documentation"
55+
[boxed]: https://doc.rust-lang.org/stable/alloc/boxed/struct.Boxed.html "Box API documentation"
56+
[slice]: https://doc.rust-lang.org/stable/std/primitive.slice.html "slice API documentation"

book/data-structures/bitarray.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Arrays
2+
3+
While `BitSlice` describes a region of borrowed data, `BitArray` provides a
4+
container that can hold and manage such a region.
5+
6+
It is most comparable to the C++ type [`std::bitset<N>`]. Unfortunately, the
7+
Rust support for type-level integers is still experimental, so it is unable to
8+
take the length of the `BitSlice` it contains as a type parameter. Instead, it
9+
must take the entire region it contains as a type parameter. The full type
10+
declaration is
11+
12+
```rust,ignore
13+
# use bitvec::prelude::*;
14+
pub struct BitArray<
15+
O: BitOrder,
16+
V: BitView,
17+
> {
18+
_ord: PhantomData<O>,
19+
data: V,
20+
}
21+
```
22+
23+
As described in the [previous chapter], the `BitView` trait is implemented on
24+
the unsigned integers, and on arrays of them. Currently, array support is
25+
limited to arrays up to and including 32 elements long; as Rust type-level
26+
integers mature, this will grow to include all arrays.
27+
28+
> Once type-level computation stabilizes, `BitArray` will change to have the
29+
> type parameters `<O: BitOrder, T: BitStore, const N: usize>`, matching the
30+
> `std::bitset<N>` length parameter.
31+
32+
This array dereferences to a `BitSlice` region over its entire length. It does
33+
not currently permit shortening its `BitSlice` from either end. If this is a
34+
behavior you want, please file an issue.
35+
36+
## Using a `BitArray`
37+
38+
The `::zeroed` function constructs a new `BitArray` with its memory completely
39+
zeroed. The `::new` function wraps an existing element or array into a
40+
`BitArray`. In addition, the macro constructor `bitarr!` takes the exact same
41+
arguments as the `bits!` constructor, except that it returns an array directly
42+
rather than a reference to a `static` buffer.
43+
44+
In addition, `BitArray` structures and references can be constructed from
45+
`&BitSlice` references using the `TryFrom` trait, just as arrays can be
46+
constructed in the standard library.
47+
48+
Once constructed, `BitArray` offers the `.as_bitslice` and `.as_mut_bitslice`
49+
explicit methods, as well as all the standard traits, to borrow its data as a
50+
`BitSlice`. The array has no functionality of its own, and serves only to own a
51+
region used as a`BitSlice`.
52+
53+
Once you are done using `BitSlice` to manipulate the array, you can remove the
54+
array with `.unwrap` and regain the `V` memory within. This name is subject to
55+
change if users are sufficiently unhappy with it.
56+
57+
That’s everything that the array does! Like regular arrays, it is useful
58+
primarily for its ability to move memory through a program, and has essentially
59+
no behavior in its own right. It is useful for programs that do not have access
60+
to a dynamic allocator, and do not wish to use `static` buffers. However, if you
61+
do have access to an allocator, you will probably prefer to use `BitVec`
62+
instead.
63+
64+
[previous chapter]: ./bitslice.html "BitSlice region"
65+
[`std::bitset<N>`]: https://en.cppreference.com/w/cpp/utility/bitset "C++ std::bitset documentation"

0 commit comments

Comments
 (0)