|
| 1 | +# Bit Ordering |
| 2 | + |
| 3 | +`bitvec` is unique among its class of libraries in that it distinguishes |
| 4 | +indexing within a sequence versus selecting a bit within a memory element. Users |
| 5 | +are allowed to choose, or even create, a function that transforms an abstract |
| 6 | +index into an actual selection mask. To make matters even more complex, users |
| 7 | +are also allowed to choose which of the fundamental register types will be used |
| 8 | +as the base unit of memory, with no intermediate subdivision between that type |
| 9 | +and the individual bits. |
| 10 | + |
| 11 | +This is expressed through the pair of type parameters |
| 12 | +`<O: BitOrder, T: BitStore>` on all `bitvec` data structures. The correct choice |
| 13 | +of parameters depends on your use case. This chapter explains how each |
| 14 | +combination translates to real memory. |
| 15 | + |
| 16 | +## `BitOrder` Types |
| 17 | + |
| 18 | +`bitvec` provides three ordering names in its prelude: `LocalBits`, `Lsb0`, and |
| 19 | +`Msb0`. |
| 20 | + |
| 21 | +The `Lsb0` type uses the function `selector = 1 << index`, and is the typical |
| 22 | +behavior of other bit-sequence libraries and of C-style structural bitfields on |
| 23 | +most architectures. Notably, it is *not* the behavior of many I/O protocols, |
| 24 | +which is why `bitvec` allows substitution. |
| 25 | + |
| 26 | +The `Msb0` type uses the function `selector = !(!0 >> 1) >> index`, and is the |
| 27 | +behavior used by I/O protocols such as TCP and IP, as well as C-style structural |
| 28 | +bitfields on targets with big-endian byte ordering, such as SPARCv8. |
| 29 | + |
| 30 | +If you do not have an explicit need for memory layout, you should probably not |
| 31 | +use this ordering: its base constant is one of `0x80`, `0x80_00`, |
| 32 | +`0x8000_0000`, or `0x8000_0000__0000_0000`, while the `Lsb0` base constant is |
| 33 | +uniformly `1`. When used with register types wider than a byte, the base |
| 34 | +constant probably will not be inlined into the instruction as an immediate |
| 35 | +value, but will rather be computed at runtime as `(1 << reg_width) >> index`. |
| 36 | + |
| 37 | +The `LocalBits` name aliases to either `Lsb0` or `Msb0`, depending on what C |
| 38 | +compilers targeting your architecture would probably select when performing |
| 39 | +structural bitfield layout. This is the default parameter used in `bitvec` types |
| 40 | +that do not explicitly specify an ordering. |
| 41 | + |
| 42 | +> I am uncertain whether this is specified in the C Standard, or if it is an |
| 43 | +> implementation-defined practice. As bit-orderings within a memory element are |
| 44 | +> unrelated to the byte-orderings of multi-byte elements, I cannot explain |
| 45 | +> further why this change in behavior exists. Nevertheless, if you have written |
| 46 | +> cross-platform I/O protocol code using bitfields, as I have, you have probably |
| 47 | +> run into this behavior. |
| 48 | +
|
| 49 | +## `BitStore` Types |
| 50 | + |
| 51 | +Unsigned integers that roughly correspond to ordinary (non-vector) CPU registers |
| 52 | +on your target architecture implement `BitStore`. |
| 53 | + |
| 54 | +This is not *quite* true: the actual **requirement** is that integers |
| 55 | +implementing `BitStore` must always be aligned to their size. This is not the |
| 56 | +case for `u64` on 32-bit `x86`, architectures (see the Rust [layout docs]). |
| 57 | + |
| 58 | +In addition, `BitStore` is implemented on `Cell` wrappers of the integers, and |
| 59 | +their atomic variants. This is used to provide alias-aware memory access, as |
| 60 | +most architectures do not support modifying a single bit within a memory address |
| 61 | +without locking the entire referent element. |
| 62 | + |
| 63 | +`bitvec` is primarily developed and tested on `x86_64`, and may be incorrect on |
| 64 | +embedded architectures. Please file an issue if you encounter errors involving |
| 65 | +this trait. |
| 66 | + |
| 67 | +## Memory Layout |
| 68 | + |
| 69 | +The `BitOrder` and `BitStore` traits combine with your target architecture’s |
| 70 | +byte-ordering of register elements to create a matrix of memory traversals. This |
| 71 | +matrix informs what is the appropriate choice of parameters to use for your |
| 72 | +program whenever you are using `bitvec` for precise memory control rather than |
| 73 | +solely for a compact `usize`-to-`bool` data collection. |
| 74 | + |
| 75 | +The tables below list bytes of a memory address space with addresses increasing |
| 76 | +to the right, and the bits *within* those bytes with numeric significance |
| 77 | +decreasing to the right. This is the ordering used in most debug-printing of |
| 78 | +memory, so hopefully the table contents should match up with your prior |
| 79 | +experience viewing memory bytes. |
| 80 | + |
| 81 | +The `L` and `M` indicate the `Lsb0` and `Msb0` ordering parameters, |
| 82 | +respectively, and `XX` indicates that the row matches all register types. |
| 83 | +Within each row, traversal begins at zero and follows the arrows to each |
| 84 | +successive step. Boundaries between registers are marked with a column; |
| 85 | +boundaries between bytes within the same register are marked with a space. |
| 86 | + |
| 87 | +### Little-Endian Byte-Ordered Machines |
| 88 | + |
| 89 | +On little-endian machines, the least-significant *byte* of a register type is |
| 90 | +stored at the lowest memory address, and each byte higher is one step more |
| 91 | +numerically significant than the last. |
| 92 | + |
| 93 | +```text |
| 94 | +byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777 |
| 95 | +bit ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210 |
| 96 | +═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════ |
| 97 | + LXX ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E |
| 98 | +─────╫─────────┼────────┼────────┼────────┼────────┼────────┼────────┼──────── |
| 99 | + M8 ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F |
| 100 | + M16 ║ 2 ---> 3 0 ---> 1│6 ---> 7 4 ---> 5│A ---> B 8 ---> 9│E ---> F C ---> D |
| 101 | + M32 ║ 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1│E ---> F C ---> D A ---> B 8 ---> 9 |
| 102 | + M64 ║ E ---> F C ---> D A ---> B 8 ---> 9 6 ---> 7 4 ---> 5 2 ---> 3 0 ---> 1 |
| 103 | +``` |
| 104 | + |
| 105 | +### Big-Endian Byte-Ordered Machines |
| 106 | + |
| 107 | +On big-endian machines, the most-significant *byte* of a register type is stored |
| 108 | +at the lowest memory address, and each byte higher is one step less numerically |
| 109 | +signifcant than the last. |
| 110 | + |
| 111 | +```text |
| 112 | +byte ║ 00000000│11111111│22222222│33333333│44444444│55555555│66666666│77777777 |
| 113 | +bit ║ 76543210│76543210│76543210│76543210│76543210│76543210│76543210│76543210 |
| 114 | +═════╬═════════╪════════╪════════╪════════╪════════╪════════╪════════╪════════ |
| 115 | + L8 ║ 1 <--- 0│3 <--- 2│5 <--- 4│7 <--- 6│9 <--- 8│B <--- A│D <--- C│F <--- E |
| 116 | + L16 ║ 3 <--- 2 1 <--- 0│7 <--- 6 5 <--- 4│B <--- A 9 <--- 8│F <--- E D <--- C |
| 117 | + L32 ║ 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0│F <--- E D <--- C B <--- A 9 <--- 8 |
| 118 | + L64 ║ F <--- E D <--- C B <--- A 9 <--- 8 7 <--- 6 5 <--- 4 3 <--- 2 1 <--- 0 |
| 119 | +─────╫─────────┬────────┬────────┬────────┬────────┬────────┬────────┬──────── |
| 120 | + MXX ║ 0 ---> 1│2 ---> 3│4 ---> 5│6 ---> 7│8 ---> 9│A ---> B│C ---> D│E ---> F |
| 121 | +``` |
| 122 | + |
| 123 | +[layout docs]: https://doc.rust-lang.org/reference/type-layout.html#primitive-data-layout |
0 commit comments