1. 15
    1. 13

      Note that this is using undefined behavior, so it’s unsound and could break at any time.

      https://github.com/ogxd/gxhash/issues/82 (there’s a concurring comment from Ralf Jung, who would know what he’s talking about.)

      1. 5

        Yeah, and the author seems to confuse “works on my machine” with “doesn’t have UB”.

        FWIW, I think this is a fine use-case for inline assembly - if you want “what the machine does” semantics as opposed to Rust semantics, that’s how you can get that.

        1. 1

          It’s also a use case for compiler builtins that define or change language semantics to whatever it is we need them to be. Rust semantics are only useful if they get the compiler to generate the code we want. If that’s not happening, it’s time to leave them behind.

    2. 2

      In this:

      #[inline(always)]
      pub unsafe fn get_partial_unsafe(data: *const State, len: usize) -> State {
          let indices = 
              _mm_set_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
          // Create a mask by comparing the indices to the length
          let mask = _mm_cmpgt_epi8(_mm_set1_epi8(len as i8), indices);
          // Mask the bytes that don't belong to our stream
          return _mm_and_si128(_mm_loadu_si128(data), mask);
      }
      

      You can generate the mask by reading from a buffer at an offset. It will save you a couple of instructions:

      // Shorthand for sixteen -1, followed by sixteen 0s.
      static uint64_t mask_buffer[4] = { -1, -1, 0,  0}; 
      
      __m128i get_partial(__m128i* data, size_t len) {
          __m128i mask = _mm_loadu_si128(mask_buffer - len);
          __m128i raw_data = _mm_loadu_si128(data);
          return _mm_and_si128(raw_data, mask);
      }
      
      1. 2

        __m128i mask = _mm_loadu_si128(mask_buffer - len);

        I feel like this line is missing an addition of 16? And also probably a cast or two so that the pointer arithmetic works out correctly?

         __m128i mask = _mm_loadu_si128((__m128i *)((unsigned char *)mask_buffer + 16 - len));
        
        1. 2

          cast

          why trying to do this crap in c is always a mistake

          section .rodata
          mask: dq -1, -1, 0, 0
          section .text
          ; in rsi ptr rcx length
          ; out xmm0 loaded data
          movdqu xmm0, [rsi]
          neg rcx
          lea rax, [mask + 16] ;riprel limited addressing forms
          vpand xmm0, xmm0, [rax + rcx] ;avx unaligned ok
          

          e: ‘better’ (no overread):

          section .rodata
          blop: dq 0x0706050403020100, 0x0f0e0d0c0b0a0908, 0x8080808080808080, 0x8080808080808080
          section .text
          movdqu xmm0, [rsi + rcx - 16]
          neg rcx
          lea rax, [blop + 16]
          vpshufb xmm0, xmm0, [rax + rcx]
          

          of course overread is actually fine…

          e: of course this comes from somewhere - found this dumb proof of concept / piece of crap i made forever ago https://files.catbox.moe/pfs2qu.txt (point to avoid page faults; no other ‘bounds’ here..)

        2. 1

          Correct. I shouldn’t write comments late in the evening…

    3. 1

      The very first comment in the article condemns these impressive results due to undefined behavior and goes on to suggest that compilers are “allowed” to generate broken code. Looks like Rust is following C’s footsteps…

      If the language’s semantics are insufficient, they should give us a rich set of built in features that tweak the compiler’s expectations. Instead of “optimizing” in an adversarial, borderline malicious way, the compiler should require the programmer to define the undefined behavior. After the code is compiled, the compiler’s expectations are irrelevant, only the system’s expectations matter. Those are the ones we care about.

      The Linux kernel is compiled with behavior defining flags such as:

      -fno-strict-aliasing
      -fwrapv
      -fno-delete-null-pointer-checks
      

      The last item is especially noteworthy. Compilers “optimizing” by deleting null pointer checks because they “know” it can’t be null because some standard said so has turned bugs into exploitable vulnerabilities. Stuff like this shouldn’t be normalized.