Zola
2024-01-14T00:00:00+00:00
https://xoranth.net/atom.xml
Switch lowering in GCC
2024-01-14T00:00:00+00:00
2024-01-14T00:00:00+00:00
Unknown
https://xoranth.net/gcc-switch/
<h3 id="introduction">Introduction</h3>
<!-- TODO link previous post -->
<p>We have seen in a <a href="../verb-parse/">previous post</a> that switch statements are more complex than they seem.
The compiler is able to implement them as a decision tree (even though maybe it shouldn't).
Is that everything it can do?</p>
<p><em>In this post, I'll focus on <code>gcc</code>. All optimizations, except the bitset lowering, translate to clang. I'll be using the <code>-O3</code> optimization flag unless indicated otherwise.</em></p>
<p><em>I've also slightly edited <code>gcc</code> code and removed certain assertions for brevity.</em> </p>
<h3 id="examples">Examples</h3>
<p>Let us start with a simple example.
A function that takes two integers <code>x</code> and <code>val</code>. Depending on the value of <code>x</code>, it calls a corresponding function (<code>f0</code>, <code>f1</code>, and so forth) with <code>val</code> as an argument and returns the result.
Were we to implement this manually, we'd would use an array of function pointers, where the input serves as the index into the array.</p>
<div class="flex-container">
<div class="flex-1">
<pre data-lang="c" class="language-c z-code"><code class="language-c" data-lang="c"><span class="z-source z-c"><span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f0</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f1</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f2</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f3</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f4</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">compact</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">x</span><span class="z-punctuation z-separator z-c">,</span> <span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-meta z-function z-c"> </span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span></span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c">
<span class="z-keyword z-control z-c">switch</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span>x<span class="z-punctuation z-section z-group z-end z-c">)</span></span> <span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f0</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">1</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f1</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">2</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f2</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">3</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f3</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">4</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f4</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">default</span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-punctuation z-section z-block z-end z-c">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
</span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-end z-c">}</span></span></span>
</span></code></pre>
</div>
<div class="flex-1">
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly">compact:
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">edi</span>
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">edi</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">esi</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">4</span>
<span class="z-keyword z-control z-assembly">ja</span> .L2
<span class="z-keyword z-control z-assembly">jmp</span> <span class="z-source z-assembly">[</span><span class="z-support z-function z-directive z-assembly">QWORD</span> <span class="z-support z-function z-directive z-assembly">PTR</span> .L4<span class="z-source z-assembly">[</span><span class="z-constant z-character z-decimal z-assembly">0</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">8</span><span class="z-source z-assembly">]</span><span class="z-source z-assembly">]</span>
.L4:
.quad .L8
.quad .L7
.quad .L6
.quad .L5
.quad .L3
.L5:
<span class="z-keyword z-control z-assembly">jmp</span> f3
.L3:
<span class="z-keyword z-control z-assembly">jmp</span> f4
.L8:
<span class="z-keyword z-control z-assembly">jmp</span> f0
.L7:
<span class="z-keyword z-control z-assembly">jmp</span> f1
.L6:
<span class="z-keyword z-control z-assembly">jmp</span> f2
.L2:
<span class="z-keyword z-control z-assembly">xor</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">ret</span>
</span></code></pre>
</div>
</div>
<p>The compiler agrees with this approach, and transforms the switch statement into a jump table.</p>
<p>Moreover, it is able to identify when the values are offset by a constant:</p>
<div class="flex-container">
<div class="flex-1">
<pre data-lang="c" class="language-c z-code"><code class="language-c" data-lang="c"><span class="z-source z-c"><span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f0</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f1</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f2</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f3</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f4</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">compact</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">x</span><span class="z-punctuation z-separator z-c">,</span> <span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-meta z-function z-c"> </span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span></span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c">
<span class="z-keyword z-control z-c">switch</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span>x<span class="z-punctuation z-section z-group z-end z-c">)</span></span> <span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">97</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f0</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">98</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f1</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">99</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f2</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">100</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f3</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">101</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f4</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">default</span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-punctuation z-section z-block z-end z-c">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
</span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-end z-c">}</span></span></span>
</span></code></pre>
</div>
<div class="flex-1">
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly">compact:
<span class="z-keyword z-control z-assembly">lea</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">-</span><span class="z-constant z-character z-decimal z-assembly">97</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">4</span>
<span class="z-keyword z-control z-assembly">ja</span> .L2
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">edi</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">esi</span>
<span class="z-keyword z-control z-assembly">jmp</span> <span class="z-source z-assembly">[</span><span class="z-support z-function z-directive z-assembly">QWORD</span> <span class="z-support z-function z-directive z-assembly">PTR</span> .L4<span class="z-source z-assembly">[</span><span class="z-constant z-character z-decimal z-assembly">0</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">8</span><span class="z-source z-assembly">]</span><span class="z-source z-assembly">]</span>
.L4:
.quad .L8
.quad .L7
.quad .L6
.quad .L5
.quad .L3
.L5:
<span class="z-keyword z-control z-assembly">jmp</span> f3
.L3:
<span class="z-keyword z-control z-assembly">jmp</span> f4
.L8:
<span class="z-keyword z-control z-assembly">jmp</span> f0
.L7:
<span class="z-keyword z-control z-assembly">jmp</span> f1
.L6:
<span class="z-keyword z-control z-assembly">jmp</span> f2
.L2:
<span class="z-keyword z-control z-assembly">xor</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">ret</span>
</span></code></pre>
</div>
</div>
<p>Our intuition suggests that the more sparse the values, the less advantageous the jump table conversion is.</p>
<p>Through experimentation, we find that <code>gcc</code> considers the threshold to be around eight jump table entries per <code>case</code> label.</p>
<p>In other words, the following function compiles to a jump table:</p>
<div class="flex-container">
<div class="flex-1">
<pre data-lang="c" class="language-c z-code"><code class="language-c" data-lang="c"><span class="z-source z-c"><span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f0</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f1</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f2</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f3</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f4</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">compact_7</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">x</span><span class="z-punctuation z-separator z-c">,</span> <span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-meta z-function z-c"> </span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span></span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c">
<span class="z-keyword z-control z-c">switch</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span>x<span class="z-punctuation z-section z-group z-end z-c">)</span></span> <span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f0</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">7</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f1</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">14</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f2</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">21</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f3</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">34</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f4</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">default</span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-punctuation z-section z-block z-end z-c">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
</span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-end z-c">}</span></span></span>
</span></code></pre>
</div>
<div class="flex-1">
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly">compact_7:
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">edi</span>
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">edi</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">esi</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">34</span>
<span class="z-keyword z-control z-assembly">ja</span> .L2
<span class="z-keyword z-control z-assembly">jmp</span> <span class="z-source z-assembly">[</span><span class="z-support z-function z-directive z-assembly">QWORD</span> <span class="z-support z-function z-directive z-assembly">PTR</span> .L4<span class="z-source z-assembly">[</span><span class="z-constant z-character z-decimal z-assembly">0</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">8</span><span class="z-source z-assembly">]</span><span class="z-source z-assembly">]</span>
.L4:
.quad .L8
.quad .L2
<span class="z-comment z-assembly">; ...</span>
.quad .L2
.quad .L7
.quad .L2
<span class="z-comment z-assembly">; ...</span>
.quad .L2
.quad .L6
.quad .L2
<span class="z-comment z-assembly">; ...</span>
.quad .L2
.quad .L5
.quad .L2
<span class="z-comment z-assembly">; ...</span>
.quad .L2
.quad .L3
.L2:
<span class="z-keyword z-control z-assembly">xor</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">ret</span>
.L8:
<span class="z-keyword z-control z-assembly">jmp</span> f0
.L7:
<span class="z-keyword z-control z-assembly">jmp</span> f1
.L6:
<span class="z-keyword z-control z-assembly">jmp</span> f2
.L5:
<span class="z-keyword z-control z-assembly">jmp</span> f3
.L3:
<span class="z-keyword z-control z-assembly">jmp</span> f4
</span></code></pre>
</div>
</div>
<p>But this other function is expanded as a tree of comparisons:</p>
<div class="flex-container">
<div class="flex-1">
<pre data-lang="c" class="language-c z-code"><code class="language-c" data-lang="c"><span class="z-source z-c"><span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f0</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f1</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f2</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f3</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f4</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">sparse_8</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">x</span><span class="z-punctuation z-separator z-c">,</span> <span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-meta z-function z-c"> </span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span></span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c">
<span class="z-keyword z-control z-c">switch</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span>x<span class="z-punctuation z-section z-group z-end z-c">)</span></span> <span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f0</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">8</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f1</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">16</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f2</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">24</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f3</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">40</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f4</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">default</span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-punctuation z-section z-block z-end z-c">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
</span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-end z-c">}</span></span></span>
</span></code></pre>
</div>
<div class="flex-1">
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly">sparse_8:
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">edi</span>
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">edi</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">esi</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">16</span>
<span class="z-keyword z-control z-assembly">je</span> .L2
<span class="z-keyword z-control z-assembly">jle</span> .L10
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">24</span>
<span class="z-keyword z-control z-assembly">je</span> .L7
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">40</span>
<span class="z-keyword z-control z-assembly">jne</span> .L1
<span class="z-keyword z-control z-assembly">jmp</span> f4
.L10:
<span class="z-keyword z-control z-assembly">test</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">je</span> .L4
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">8</span>
<span class="z-keyword z-control z-assembly">jne</span> .L1
<span class="z-keyword z-control z-assembly">jmp</span> f1
.L1:
<span class="z-keyword z-control z-assembly">xor</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">ret</span>
.L4:
<span class="z-keyword z-control z-assembly">jmp</span> f0
.L7:
<span class="z-keyword z-control z-assembly">jmp</span> f3
.L2:
<span class="z-keyword z-control z-assembly">jmp</span> f2
</span></code></pre>
</div>
</div>
<p>Here's the comparison tree visualized for clarity:</p>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 8.1.0 (20230707.0739)
-->
<!-- Title: G Pages: 1 -->
<svg width="578pt" height="206pt"
viewBox="0.00 0.00 578.01 206.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 202)">
<title>G</title>
<polygon fill="white" stroke="none" points="-4,4 -4,-202 574.01,-202 574.01,4 -4,4"/>
<!-- ret 0 -->
<g id="node1" class="node">
<title>ret 0</title>
<ellipse fill="#f6f6f6" stroke="black" cx="527.33" cy="-72" rx="31.01" ry="18"/>
<text text-anchor="middle" x="527.33" y="-68.3" font-family="Fira Mono" font-size="11.00" fill="#015493">ret 0</text>
</g>
<!-- call f0 -->
<g id="node2" class="node">
<title>call f0</title>
<polygon fill="#f6f6f6" stroke="black" points="419.55,-172.54 419.55,-187.46 394.55,-198 359.2,-198 334.2,-187.46 334.2,-172.54 359.2,-162 394.55,-162 419.55,-172.54"/>
<text text-anchor="middle" x="376.88" y="-176.3" font-family="Fira Mono" font-size="11.00" fill="#b60d13">call f0</text>
</g>
<!-- call f1 -->
<g id="node3" class="node">
<title>call f1</title>
<polygon fill="#f6f6f6" stroke="black" points="570.01,-118.54 570.01,-133.46 545.01,-144 509.65,-144 484.65,-133.46 484.65,-118.54 509.65,-108 545.01,-108 570.01,-118.54"/>
<text text-anchor="middle" x="527.33" y="-122.3" font-family="Fira Mono" font-size="11.00" fill="#b60d13">call f1</text>
</g>
<!-- call f2 -->
<g id="node4" class="node">
<title>call f2</title>
<polygon fill="#f6f6f6" stroke="black" points="260.25,-172.54 260.25,-187.46 235.25,-198 199.9,-198 174.9,-187.46 174.9,-172.54 199.9,-162 235.25,-162 260.25,-172.54"/>
<text text-anchor="middle" x="217.57" y="-176.3" font-family="Fira Mono" font-size="11.00" fill="#b60d13">call f2</text>
</g>
<!-- cmp eax, 24 -->
<g id="node9" class="node">
<title>cmp eax, 24</title>
<polygon fill="none" stroke="black" points="269.1,-90 166.05,-90 166.05,-54 269.1,-54 269.1,-90"/>
<text text-anchor="middle" x="217.57" y="-68.3" font-family="Fira Mono" font-size="11.00">cmp eax, 24</text>
</g>
<!-- call f2->cmp eax, 24 -->
<!-- call f3 -->
<g id="node5" class="node">
<title>call f3</title>
<polygon fill="#f6f6f6" stroke="black" points="419.55,-64.54 419.55,-79.46 394.55,-90 359.2,-90 334.2,-79.46 334.2,-64.54 359.2,-54 394.55,-54 419.55,-64.54"/>
<text text-anchor="middle" x="376.88" y="-68.3" font-family="Fira Mono" font-size="11.00" fill="#b60d13">call f3</text>
</g>
<!-- call f4 -->
<g id="node6" class="node">
<title>call f4</title>
<polygon fill="#f6f6f6" stroke="black" points="570.01,-10.54 570.01,-25.46 545.01,-36 509.65,-36 484.65,-25.46 484.65,-10.54 509.65,0 545.01,0 570.01,-10.54"/>
<text text-anchor="middle" x="527.33" y="-14.3" font-family="Fira Mono" font-size="11.00" fill="#b60d13">call f4</text>
</g>
<!-- cmp eax, 16 -->
<g id="node7" class="node">
<title>cmp eax, 16</title>
<polygon fill="none" stroke="black" points="103.05,-144 0,-144 0,-108 103.05,-108 103.05,-144"/>
<text text-anchor="middle" x="51.52" y="-122.3" font-family="Fira Mono" font-size="11.00">cmp eax, 16</text>
</g>
<!-- cmp eax, 16->call f2 -->
<g id="edge1" class="edge">
<title>cmp eax, 16->call f2</title>
<path fill="none" stroke="black" d="M103.53,-142.79C129.3,-151.27 159.87,-161.33 182.71,-168.85"/>
<text text-anchor="middle" x="131.17" y="-157.05" font-family="Fira Mono" font-size="11.00">je</text>
</g>
<!-- test eax, eax -->
<g id="node8" class="node">
<title>test eax, eax</title>
<polygon fill="none" stroke="black" points="275.85,-144 159.3,-144 159.3,-108 275.85,-108 275.85,-144"/>
<text text-anchor="middle" x="217.57" y="-122.3" font-family="Fira Mono" font-size="11.00">test eax, eax</text>
</g>
<!-- cmp eax, 16->test eax, eax -->
<g id="edge2" class="edge">
<title>cmp eax, 16->test eax, eax</title>
<path fill="none" stroke="black" d="M103.53,-126C121.12,-126 140.94,-126 159.02,-126"/>
<text text-anchor="middle" x="131.17" y="-129.05" font-family="Fira Mono" font-size="11.00">jle</text>
</g>
<!-- cmp eax, 16->cmp eax, 24 -->
<g id="edge3" class="edge">
<title>cmp eax, 16->cmp eax, 24</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M103.53,-109.21C123.28,-102.71 145.85,-95.28 165.6,-88.78"/>
</g>
<!-- test eax, eax->call f0 -->
<g id="edge8" class="edge">
<title>test eax, eax->call f0</title>
<path fill="none" stroke="black" d="M272.27,-144.43C295.59,-152.44 322.17,-161.56 342.57,-168.57"/>
<text text-anchor="middle" x="300.6" y="-159.05" font-family="Fira Mono" font-size="11.00">je</text>
</g>
<!-- cmp eax, 8 -->
<g id="node11" class="node">
<title>cmp eax, 8</title>
<polygon fill="none" stroke="black" points="425.02,-144 328.73,-144 328.73,-108 425.02,-108 425.02,-144"/>
<text text-anchor="middle" x="376.88" y="-122.3" font-family="Fira Mono" font-size="11.00">cmp eax, 8</text>
</g>
<!-- test eax, eax->cmp eax, 8 -->
<g id="edge9" class="edge">
<title>test eax, eax->cmp eax, 8</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M276.25,-126C293.35,-126 311.88,-126 328.26,-126"/>
</g>
<!-- cmp eax, 24->call f3 -->
<g id="edge4" class="edge">
<title>cmp eax, 24->call f3</title>
<path fill="none" stroke="black" d="M269.21,-72C290.05,-72 313.9,-72 333.76,-72"/>
<text text-anchor="middle" x="300.6" y="-75.05" font-family="Fira Mono" font-size="11.00">je</text>
</g>
<!-- cmp eax, 40 -->
<g id="node10" class="node">
<title>cmp eax, 40</title>
<polygon fill="none" stroke="black" points="428.4,-36 325.35,-36 325.35,0 428.4,0 428.4,-36"/>
<text text-anchor="middle" x="376.88" y="-14.3" font-family="Fira Mono" font-size="11.00">cmp eax, 40</text>
</g>
<!-- cmp eax, 24->cmp eax, 40 -->
<g id="edge5" class="edge">
<title>cmp eax, 24->cmp eax, 40</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M269.21,-54.62C287.08,-48.48 307.18,-41.58 325.06,-35.44"/>
</g>
<!-- cmp eax, 40->ret 0 -->
<g id="edge6" class="edge">
<title>cmp eax, 40->ret 0</title>
<path fill="none" stroke="black" d="M428.55,-36.43C452.74,-45.23 480.65,-55.39 500.45,-62.59"/>
<text text-anchor="middle" x="456.52" y="-53.05" font-family="Fira Mono" font-size="11.00">jne</text>
</g>
<!-- cmp eax, 40->call f4 -->
<g id="edge7" class="edge">
<title>cmp eax, 40->call f4</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M428.55,-18C446.69,-18 466.93,-18 484.26,-18"/>
</g>
<!-- cmp eax, 8->ret 0 -->
<g id="edge10" class="edge">
<title>cmp eax, 8->ret 0</title>
<path fill="none" stroke="black" d="M425.25,-108.77C450.2,-99.69 479.83,-88.92 500.53,-81.38"/>
<text text-anchor="middle" x="456.52" y="-103.05" font-family="Fira Mono" font-size="11.00">jne</text>
</g>
<!-- cmp eax, 8->call f1 -->
<g id="edge11" class="edge">
<title>cmp eax, 8->call f1</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M425.25,-126C444.26,-126 465.98,-126 484.41,-126"/>
</g>
</g>
</svg>
<p>A couple of observations:</p>
<!-- TODO linear function case -->
<ol>
<li>We've only tried equally spaced values so far. Can the compiler identify transformations that are profitable on a subset of all cases?</li>
<li>So far each <code>case</code> led to a different result. What if we had different cases with the same result?</li>
<li>The switch in <code>sparse_8</code> could be simplified by dividing <code>x</code> by eight before the jump. Interestingly, <code>gcc</code> <em>doesn't</em> find this simplification.</li>
</ol>
<p>Let's dig into 1. and 2.</p>
<h4 id="outliers">Outliers</h4>
<p>Altering our first example by including an outlier value shows that the compiler is able to recognize dense subsets of cases.</p>
<div class="flex-container">
<div class="flex-1">
<pre data-lang="c" class="language-c z-code"><code class="language-c" data-lang="c"><span class="z-source z-c"><span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f0</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f1</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f2</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f3</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f4</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f5</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">outlier</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">x</span><span class="z-punctuation z-separator z-c">,</span> <span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-meta z-function z-c"> </span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span></span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c">
<span class="z-keyword z-control z-c">switch</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span>x<span class="z-punctuation z-section z-group z-end z-c">)</span></span> <span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">1</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f0</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">2</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f1</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">3</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f2</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">4</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f3</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">5</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f4</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">100</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f5</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">default</span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-punctuation z-section z-block z-end z-c">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
</span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-end z-c">}</span></span></span>
</span></code></pre>
</div>
<div class="flex-1">
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly">outlier:
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">edi</span>
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">edi</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">esi</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">5</span>
<span class="z-keyword z-control z-assembly">jg</span> .L2
<span class="z-keyword z-control z-assembly">test</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">jle</span> .L1
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">5</span>
<span class="z-keyword z-control z-assembly">ja</span> .L4
<span class="z-keyword z-control z-assembly">jmp</span> <span class="z-source z-assembly">[</span><span class="z-support z-function z-directive z-assembly">QWORD</span> <span class="z-support z-function z-directive z-assembly">PTR</span> .L6<span class="z-source z-assembly">[</span><span class="z-constant z-character z-decimal z-assembly">0</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">8</span><span class="z-source z-assembly">]</span><span class="z-source z-assembly">]</span>
.L6:
.quad .L4
.quad .L4
.quad .L9
.quad .L8
.quad .L7
.quad .L5
.L2:
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">100</span>
<span class="z-keyword z-control z-assembly">jne</span> .L1
<span class="z-keyword z-control z-assembly">jmp</span> f5
.L7:
<span class="z-keyword z-control z-assembly">jmp</span> f3
.L5:
<span class="z-keyword z-control z-assembly">jmp</span> f4
.L9:
<span class="z-keyword z-control z-assembly">jmp</span> f1
.L8:
<span class="z-keyword z-control z-assembly">jmp</span> f2
.L4:
<span class="z-keyword z-control z-assembly">jmp</span> f0
.L1:
<span class="z-keyword z-control z-assembly">xor</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">ret</span>
</span></code></pre>
</div>
</div>
<p>The compiler emits a branch for case 100, but emits a jump table for case 1 through 5.</p>
<h4 id="contiguous">Contiguous</h4>
<p>In our examples so far, each case statement lead to a different code block. Let's test what happens when multiple <code>case</code> labels invoke the same function.</p>
<div class="flex-container">
<div class="flex-1">
<pre data-lang="c" class="language-c z-code"><code class="language-c" data-lang="c"><span class="z-source z-c"><span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f1</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f2</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f3</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f4</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f5</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">blocks_2</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">x</span><span class="z-punctuation z-separator z-c">,</span> <span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-meta z-function z-c"> </span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span></span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c">
<span class="z-keyword z-control z-c">switch</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span>x<span class="z-punctuation z-section z-group z-end z-c">)</span></span> <span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">1</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">2</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">3</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f1</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">21</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-operator z-variadic z-c">...</span> <span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">24</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f2</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">42</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-operator z-variadic z-c">...</span> <span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">45</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f3</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">63</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-operator z-variadic z-c">...</span> <span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">66</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f4</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">84</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-operator z-variadic z-c">...</span> <span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">87</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f5</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">default</span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-punctuation z-section z-block z-end z-c">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
</span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-end z-c">}</span></span></span>
</span></code></pre>
</div>
<div class="flex-1">
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly">blocks_2:
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">edi</span>
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">edi</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">esi</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">45</span>
<span class="z-keyword z-control z-assembly">jg</span> .L2
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">41</span>
<span class="z-keyword z-control z-assembly">jg</span> .L3
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">3</span>
<span class="z-keyword z-control z-assembly">jg</span> .L4
<span class="z-keyword z-control z-assembly">test</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">js</span> .L1
<span class="z-keyword z-control z-assembly">jmp</span> f1
.L2:
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">66</span>
<span class="z-keyword z-control z-assembly">jg</span> .L8
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">62</span>
<span class="z-keyword z-control z-assembly">jle</span> .L1
<span class="z-keyword z-control z-assembly">jmp</span> f4
.L4:
<span class="z-keyword z-control z-assembly">sub</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">21</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">3</span>
<span class="z-keyword z-control z-assembly">ja</span> .L1
<span class="z-keyword z-control z-assembly">jmp</span> f2
.L8:
<span class="z-keyword z-control z-assembly">sub</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">84</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">3</span>
<span class="z-keyword z-control z-assembly">ja</span> .L1
<span class="z-keyword z-control z-assembly">jmp</span> f5
.L3:
<span class="z-keyword z-control z-assembly">jmp</span> f3
.L1:
<span class="z-keyword z-control z-assembly">xor</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">ret</span>
</span></code></pre>
</div>
</div>
<p>We have 20 case labels, with values between 0 and 87. The compiler could lower this <code>switch</code> to a jump table with 88 entries.
Our hypothesis for the heuristic is that a jump table conversion is profitable when its size is smaller than <code>8 * num_case_labels</code>.
Since <code>88 < 20 * 8</code>, the conversion should be profitable in this case, however <code>gcc</code> opted to lower the switch to a sequence of comparisons.</p>
<p>To better understand why, let's examine a specific block of <code>case</code> labels.</p>
<div class="flex-container">
<div class="flex-1">
<pre data-lang="c" class="language-c z-code"><code class="language-c" data-lang="c"><span class="z-source z-c"><span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">blocks_2</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">x</span><span class="z-punctuation z-separator z-c">,</span> <span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-meta z-function z-c"> </span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span></span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c">
<span class="z-keyword z-control z-c">switch</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span>x<span class="z-punctuation z-section z-group z-end z-c">)</span></span> <span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span> <span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">42</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">43</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">44</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">45</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f3</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">val</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span> <span class="z-punctuation z-section z-block z-end z-c">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
</span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-end z-c">}</span></span></span>
</span></code></pre>
</div>
<div class="flex-1">
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly">blocks_2:
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">edi</span> <span class="z-comment z-assembly">; Move `val` to $eax</span>
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">edi</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">esi</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">45</span>
<span class="z-comment z-assembly">; If val <= 45, do _not_ jump</span>
<span class="z-keyword z-control z-assembly">jg</span> .L2
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">41</span>
<span class="z-comment z-assembly">; If val >= 42, jump to L3</span>
<span class="z-keyword z-control z-assembly">jg</span> .L3
<span class="z-comment z-assembly">; ...</span>
.L2:
<span class="z-comment z-assembly">; ...</span>
.L3:
<span class="z-comment z-assembly">; We reach L3 only if 42 <= val <= 45</span>
<span class="z-keyword z-control z-assembly">jmp</span> f3 <span class="z-comment z-assembly">; Call f3(val)</span>
<span class="z-comment z-assembly">; ...</span>
</span></code></pre>
</div>
</div>
<p>The compiler realizes that the cases are contiguous and can be implemented with two comparisons instead of four.
If the input of the jump table heuristic is the number of <em>comparisons</em> instead of <em>case labels</em>, things work out:</p>
<p>We have a total of 10 comparisons. Since <code>10 * 8 < 88</code>, our revised heuristic suggests that a jump table transformation is unprofitable.</p>
<!-- TODO discuss binary tree, with graphics -->
<h4 id="non-contiguous">Non-contiguous</h4>
<p>Let us try a non-contiguous case.</p>
<div class="flex-container">
<div class="flex-1">
<pre data-lang="c" class="language-c z-code"><code class="language-c" data-lang="c"><span class="z-source z-c"><span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">f1</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">bittest</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">x</span><span class="z-punctuation z-separator z-c">,</span> <span class="z-storage z-type z-c">int</span> <span class="z-variable z-parameter z-c">val</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-meta z-function z-c"> </span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span></span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c">
<span class="z-keyword z-control z-c">switch</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span>x<span class="z-punctuation z-section z-group z-end z-c">)</span></span> <span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>0<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-control z-c">case</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>2<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-control z-c">case</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>4<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>5<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-control z-c">case</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>7<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-control z-c">case</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>9<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-c">case</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>A<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-control z-c">case</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>C<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c">:</span> <span class="z-keyword z-control z-c">case</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>E<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">f1</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">x</span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">default</span><span class="z-punctuation z-separator z-c">:</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-punctuation z-section z-block z-end z-c">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
</span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-end z-c">}</span></span></span>
</span></code></pre>
</div>
<div class="flex-1">
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly">bittest:
<span class="z-keyword z-control z-assembly">lea</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">-</span><span class="z-constant z-character z-decimal z-assembly">48</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">21</span>
<span class="z-keyword z-control z-assembly">ja</span> .L1
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">edx</span><span class="z-source z-assembly">,</span> <span class="z-constant z-character z-decimal z-assembly">2753205</span> <span class="z-comment z-assembly">; ???</span>
<span class="z-keyword z-control z-assembly">bt</span> <span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">rax</span>
<span class="z-keyword z-control z-assembly">jc</span> .L7
.L1:
<span class="z-keyword z-control z-assembly">xor</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">ret</span>
.L7:
<span class="z-keyword z-control z-assembly">jmp</span> f1
</span></code></pre>
</div>
</div>
<p>The compiler emits non-intuitive machine code. Why is there no <code>cmp</code> nor jump tables? Where does the constant <code>2753205</code> come from?</p>
<p>An hint comes from the instruction <code>bt</code>. According to the <a href="https://www.felixcloutier.com/x86/bt">Intel docs</a>, <code>bt</code>:</p>
<blockquote>
<p>Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset (specified by the second operand) and stores the value of the bit in the CF flag.</p>
</blockquote>
<p>If we examine the binary representation of <code>2753205</code>, we find that it is in fact a bitset. The bits that are set to one correspond to the ASCII values of the case labels, each value being reduced by 48.</p>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 7.1.0 (0)
--><!-- Title: Simd ops Pages: 1 --><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0.00 0.00 651.00 181.00">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 177)">
<title>Simd ops</title>
<polygon fill="white" stroke="none" points="-4,4 -4,-177 647,-177 647,4 -4,4"/>
<!-- chars -->
<g id="node1" class="node">
<title>chars</title>
<polygon fill="none" stroke="black" points="83,-7 83,-28 106,-28 106,-7 83,-7"/>
<text text-anchor="start" x="86" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="#c10206">0 </text>
<polygon fill="none" stroke="black" points="106,-7 106,-28 129,-28 129,-7 106,-7"/>
<text text-anchor="start" x="109" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">1 </text>
<polygon fill="none" stroke="black" points="129,-7 129,-28 152,-28 152,-7 129,-7"/>
<text text-anchor="start" x="132" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="#c10206">2 </text>
<polygon fill="none" stroke="black" points="152,-7 152,-28 175,-28 175,-7 152,-7"/>
<text text-anchor="start" x="155" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">3 </text>
<polygon fill="none" stroke="black" points="175,-7 175,-28 198,-28 198,-7 175,-7"/>
<text text-anchor="start" x="178" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="#c10206">4 </text>
<polygon fill="none" stroke="black" points="198,-7 198,-28 221,-28 221,-7 198,-7"/>
<text text-anchor="start" x="201" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="#c10206">5 </text>
<polygon fill="none" stroke="black" points="221,-7 221,-28 244,-28 244,-7 221,-7"/>
<text text-anchor="start" x="224" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">6 </text>
<polygon fill="none" stroke="black" points="244,-7 244,-28 267,-28 267,-7 244,-7"/>
<text text-anchor="start" x="247" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="#c10206">7 </text>
<polygon fill="none" stroke="black" points="267,-7 267,-28 290,-28 290,-7 267,-7"/>
<text text-anchor="start" x="270" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">8 </text>
<polygon fill="none" stroke="black" points="290,-7 290,-28 313,-28 313,-7 290,-7"/>
<text text-anchor="start" x="293" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="#c10206">9 </text>
<polygon fill="none" stroke="black" points="313,-7 313,-28 336,-28 336,-7 313,-7"/>
<text text-anchor="start" x="316" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">: </text>
<polygon fill="none" stroke="black" points="336,-7 336,-28 359,-28 359,-7 336,-7"/>
<text text-anchor="start" x="339" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">; </text>
<polygon fill="none" stroke="black" points="359,-7 359,-28 382,-28 382,-7 359,-7"/>
<text text-anchor="start" x="362" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">< </text>
<polygon fill="none" stroke="black" points="382,-7 382,-28 405,-28 405,-7 382,-7"/>
<text text-anchor="start" x="385" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">= </text>
<polygon fill="none" stroke="black" points="405,-7 405,-28 428,-28 428,-7 405,-7"/>
<text text-anchor="start" x="408" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">> </text>
<polygon fill="none" stroke="black" points="428,-7 428,-28 451,-28 451,-7 428,-7"/>
<text text-anchor="start" x="431" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">? </text>
<polygon fill="none" stroke="black" points="451,-7 451,-28 474,-28 474,-7 451,-7"/>
<text text-anchor="start" x="454" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">@ </text>
<polygon fill="none" stroke="black" points="474,-7 474,-28 497,-28 497,-7 474,-7"/>
<text text-anchor="start" x="477" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="#c10206">A </text>
<polygon fill="none" stroke="black" points="497,-7 497,-28 520,-28 520,-7 497,-7"/>
<text text-anchor="start" x="500" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">B </text>
<polygon fill="none" stroke="black" points="520,-7 520,-28 543,-28 543,-7 520,-7"/>
<text text-anchor="start" x="523" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="#c10206">C </text>
<polygon fill="none" stroke="black" points="543,-7 543,-28 566,-28 566,-7 543,-7"/>
<text text-anchor="start" x="546" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">D </text>
<polygon fill="none" stroke="black" points="566,-7 566,-28 589,-28 589,-7 566,-7"/>
<text text-anchor="start" x="569" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="#c10206">E </text>
<polygon fill="none" stroke="black" points="589,-7 589,-28 612,-28 612,-7 589,-7"/>
<text text-anchor="start" x="592" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">F </text>
<polygon fill="none" stroke="black" points="612,-7 612,-28 635,-28 635,-7 612,-7"/>
<text text-anchor="start" x="615" y="-13.8" font-family="Fira Mono" font-size="14.00" fill="gray">G </text>
<text text-anchor="middle" x="37.5" y="-39.8" font-family="Fira Mono" font-size="14.00">Cases ...</text>
</g>
<!-- edx -->
<g id="node2" class="node">
<title>edx</title>
<text text-anchor="start" x="160" y="-135.8" font-family="Fira Mono" font-size="14.00">1</text>
<text text-anchor="start" x="173" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="186" y="-135.8" font-family="Fira Mono" font-size="14.00">1</text>
<text text-anchor="start" x="199" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="212" y="-135.8" font-family="Fira Mono" font-size="14.00">1</text>
<text text-anchor="start" x="225" y="-135.8" font-family="Fira Mono" font-size="14.00">1</text>
<text text-anchor="start" x="238" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="251" y="-135.8" font-family="Fira Mono" font-size="14.00">1</text>
<text text-anchor="start" x="264" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="277" y="-135.8" font-family="Fira Mono" font-size="14.00">1</text>
<text text-anchor="start" x="290" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="303" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="316" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="329" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="342" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="355" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="368" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="381" y="-135.8" font-family="Fira Mono" font-size="14.00">1</text>
<text text-anchor="start" x="394" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="407" y="-135.8" font-family="Fira Mono" font-size="14.00">1</text>
<text text-anchor="start" x="420" y="-135.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="433" y="-135.8" font-family="Fira Mono" font-size="14.00">1</text>
<polygon fill="none" stroke="black" points="157,-129.5 157,-150.5 445,-150.5 445,-129.5 157,-129.5"/>
<text text-anchor="middle" x="82.5" y="-161.8" font-family="Fira Mono" font-size="14.00">mov edx, 2753205</text>
</g>
<!-- edx--chars -->
<g id="edge1" class="edge">
<title>edx:p0--chars:p0</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M164,-129C164,-74.75 94,-83.25 94,-29"/>
</g>
<!-- edx--chars -->
<g id="edge2" class="edge">
<title>edx:p2--chars:p2</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M190,-129C190,-79.31 140,-78.69 140,-29"/>
</g>
<!-- edx--chars -->
<g id="edge3" class="edge">
<title>edx:p4--chars:p4</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M216,-129C216,-82.6 186,-75.4 186,-29"/>
</g>
<!-- edx--chars -->
<g id="edge4" class="edge">
<title>edx:p5--chars:p5</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M229,-129C229,-83.68 209,-74.32 209,-29"/>
</g>
<!-- edx--chars -->
<g id="edge5" class="edge">
<title>edx:p7--chars:p7</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M255,-129C255,-84.56 255,-73.44 255,-29"/>
</g>
<!-- edx--chars -->
<g id="edge6" class="edge">
<title>edx:p9--chars:p9</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M281,-129C281,-83.68 301,-74.32 301,-29"/>
</g>
<!-- edx--chars -->
<g id="edge7" class="edge">
<title>edx:p17--chars:p17</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M386,-129C386,-66.15 486,-91.85 486,-29"/>
</g>
<!-- edx--chars -->
<g id="edge8" class="edge">
<title>edx:p19--chars:p19</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M412,-129C412,-59.58 532,-98.42 532,-29"/>
</g>
<!-- edx--chars -->
<g id="edge9" class="edge">
<title>edx:p21--chars:p21</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M438,-129C438,-52.53 578,-105.47 578,-29"/>
</g>
</g>
</svg>
<h3 id="implementation">Implementation</h3>
<p>We've encountered several distinct lowering strategies employed by gcc. For a more in-depth understanding, we can delve into the compiler's source code.</p>
<!-- TODO I vs We -->
<p>The compiler pass that responsible for lowering <code>switch</code> statements is the (appropriately named) <code>pass_lower_switch</code>, which in turn calls <code>analyze_switch_statement</code>.</p>
<p>Since <code>gcc</code> source can be daunting to follow, I'll provide a step-by-step commented version below.
Following this, I will break down each section to explain its functionality.</p>
<!-- TODO comment about GC and clean-up -->
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"><span class="z-storage z-type z-c">bool</span> <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++">switch_decision_tree<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-entity z-name z-function z-c++">analyze_switch_statement</span> </span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++">
</span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-storage z-type z-c">unsigned</span> l <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">gimple_switch_num_labels</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-variable z-other z-readwrite z-member z-c++">m_switch</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
auto_vec<span class="z-keyword z-operator z-comparison z-c"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-keyword z-operator z-comparison z-c">></span> clusters<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span>
<span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">compute_cases_per_edge</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
</span></span></span></code></pre>
<p>Some preamble code. <code>m_switch</code> points to a gimple object. Gimple is the internal representation used by <code>gcc</code>.
Through this, we extract the overall count of cases and the individual counts of cases for each outgoing edge.</p>
<!-- TODO more on clusters -->
<!-- TODO more on `gcc` data structures -->
<p>We also allocate a vector that will record the lowering strategy chosen for each case label.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> <span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> i <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-terminator z-c++">;</span> i <span class="z-keyword z-operator z-comparison z-c"><</span> l<span class="z-punctuation z-terminator z-c++">;</span> i<span class="z-keyword z-operator z-arithmetic z-c">++</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
tree elt <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">gimple_switch_label</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-variable z-other z-readwrite z-member z-c++">m_switch</span><span class="z-punctuation z-separator z-c++">,</span> i</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span> tree low <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">CASE_LOW</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">elt</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
tree high <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">CASE_HIGH</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">elt</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
profile_probability p
<span class="z-keyword z-operator z-assignment z-c">=</span> case_edge<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">probability</span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">apply_scale</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-support z-type z-stdint z-c">intptr_t</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>case_edge<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">aux</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
clusters<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">quick_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-keyword z-control z-c++">new</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">simple_cluster</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">low<span class="z-punctuation z-separator z-c++">,</span> high<span class="z-punctuation z-separator z-c++">,</span> elt<span class="z-punctuation z-separator z-c++">,</span> case_edge<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">dest</span><span class="z-punctuation z-separator z-c++">,</span> p</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span> <span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
</span></code></pre>
<!-- TODO talk about SIMPLE_CASE -->
<p>More setup code. We notice that gimple has a concept of high and low value for each label. That is, contiguous <code>case</code> labels have already joined together at this stage.</p>
<!-- TODO talk more about probability -->
<p>For each case label, we allocate a <code>simple_cluster</code>. This cluster serves as a record, capturing the range of values associated with the label, the relevant output edge, and the likelihood of this cluster being chosen. This probability comes from profile-guided optimization. By default it is uniform among case labels.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span> vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> output <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++">bit_test_cluster<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-variable z-function z-c++">find_bit_tests</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">clusters</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
</span></code></pre>
<p>We transform the vector of clusters we have created. The name <code>bit_test_cluster</code> hints to the bitset lowering strategy we saw before.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> <span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Find jump table clusters. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> output2<span class="z-punctuation z-terminator z-c++">;</span>
auto_vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> tmp<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> i <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-terminator z-c++">;</span> i <span class="z-keyword z-operator z-comparison z-c"><</span> output<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">length</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span> i<span class="z-keyword z-operator z-arithmetic z-c">++</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
cluster <span class="z-keyword z-operator z-c">*</span>c <span class="z-keyword z-operator z-assignment z-c">=</span> output<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>c<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">get_type</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-comparison z-c">!=</span> SIMPLE_CASE<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-keyword z-operator z-arithmetic z-c">!</span>tmp<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">is_empty</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ... See below for what's here
</span> tmp<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">truncate</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
output2<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">safe_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++">c</span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-keyword z-control z-c++">else</span>
tmp<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">safe_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++">c</span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
</span></code></pre>
<p>This section allocates a vector for the transformed cases and a scratch vector.
Any case that wasn't altered during the bitset transformation process is gathered in this scratch vector. These cases are then processed and cleared whenever we come across an optimized cluster.</p>
<p>Essentially, in this section we are looking for sequences of case labels that couldn't be converted into bitset clusters.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> n <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++">jump_table_cluster<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-variable z-function z-c++">find_jump_tables</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">tmp</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
output2<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">safe_splice</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++">n</span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
n<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">release</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
</span></code></pre>
<p>We attempt to convert these consecutive labels into a jump table and add the outcome to the output.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> <span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> We still can have a temporary vector to test. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-keyword z-operator z-arithmetic z-c">!</span>tmp<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">is_empty</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
vec<span class="z-keyword z-operator z-comparison z-c"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-keyword z-operator z-comparison z-c">></span> n <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++">jump_table_cluster<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-variable z-function z-c++">find_jump_tables</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">tmp</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
output2<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">safe_splice</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++">n</span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
n<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">release</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
</span></code></pre>
<p>This is a coda for the loop above.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span>
<span class="z-storage z-type z-c">bool</span> expanded <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">try_switch_expansion</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">output2</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">release_clusters</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">output2</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
return expanded<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-invalid z-illegal z-stray-bracket-end z-c++">}</span>
</span></code></pre>
<p>Whatever is left goes through another pass, which turns the switch statement into a decision tree.
The bitsets and jump table are leaves of the decision tree.</p>
<h4 id="bitsets">Bitsets</h4>
<p>Let's dig further into <code>bit_test_cluster::find_bit_tests</code>. The high level structure of the code is as follow.</p>
<p>Initially, we aim to find the optimal method to bundle all clusters into bitsets. Optimality, in this context, refers to minimizing the number of bitsets used. We do so using dynamic programming.</p>
<p>Subsequently, we iterate over the bitsets and apply a profitability heuristic. If the bitsets meet this heuristic, we emit a <code>bit_test_cluster</code>; otherwise, we keep the original <code>simple_cluster</code>s.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++">vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++">bit_test_cluster<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-entity z-name z-function z-c++">find_bit_tests</span> </span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++">vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> <span class="z-keyword z-operator z-c">&</span><span class="z-variable z-parameter z-c++">clusters</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++">
</span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span>
<span class="z-storage z-type z-c">unsigned</span> l <span class="z-keyword z-operator z-assignment z-c">=</span> clusters<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">length</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
auto_vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>min_cluster_item<span class="z-punctuation z-definition z-generic z-end z-c++">></span> min<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span>
min<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">quick_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">min_cluster_item</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> i <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-terminator z-c++">;</span> i <span class="z-keyword z-operator z-comparison z-c"><=</span> l<span class="z-punctuation z-terminator z-c++">;</span> i<span class="z-keyword z-operator z-arithmetic z-c">++</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Set minimal # of clusters with i-th item to infinite. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
min<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">quick_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">min_cluster_item</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">INT_MAX<span class="z-punctuation z-separator z-c++">,</span> INT_MAX<span class="z-punctuation z-separator z-c++">,</span> INT_MAX</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> j <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-terminator z-c++">;</span> j <span class="z-keyword z-operator z-comparison z-c"><</span> i<span class="z-punctuation z-terminator z-c++">;</span> j<span class="z-keyword z-operator z-arithmetic z-c">++</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>j<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_count</span> <span class="z-keyword z-operator z-arithmetic z-c">+</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span> <span class="z-keyword z-operator z-comparison z-c"><</span> min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_count</span>
<span class="z-keyword z-operator z-arithmetic z-c">&&</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">can_be_handled</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">clusters<span class="z-punctuation z-separator z-c++">,</span> j<span class="z-punctuation z-separator z-c++">,</span> i <span class="z-keyword z-operator z-arithmetic z-c">-</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span> <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">min_cluster_item</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>j<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_count</span> <span class="z-keyword z-operator z-arithmetic z-c">+</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-separator z-c++">,</span> j<span class="z-punctuation z-separator z-c++">,</span> INT_MAX</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">gcc_checking_assert</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_count</span> <span class="z-keyword z-operator z-comparison z-c">!=</span> INT_MAX</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
</span></span></span></code></pre>
<p>For the dynamic programming, we'll use a vector of <code>min_cluster_item</code> structures.
Each <code>min_cluster_item</code> has three components:</p>
<ul>
<li>The first records how many bitsets do we need to represent the first <em>i</em> case labels, where <code>i</code> is the index in the vector.</li>
<li>The second records the <strong>start</strong> of the last bitset.</li>
<li>The third is unused.</li>
</ul>
<p>The dynamic programming is as follows. We want to find the minimum number of bitsets to represent the first <em>i</em> case labels.</p>
<p>The base case is straightforward, a single case label can be represented by a single bitset.</p>
<!-- TODO check interval boundaries. Is it [0, i] or [0, i) ? -->
<p>Suppose we have already have found the optimal number of bitsets for the first <em>j</em> case label, for all <script type="math/tex" >j < i</script>
.
We define <script type="math/tex" >\mathcal{Opt}_i</script>
to be the optimal number of splits for the first <code>i</code> elements. This means we already computed <script type="math/tex" >\mathcal{Opt}_j</script>
, and are now seeking <script type="math/tex" >\mathcal{Opt}_i</script>
.</p>
<p>If the range <script type="math/tex" >\left(j, i \right]</script>
of case labels can be represented as a bitset cluster, it must be true that</p>
<script type="math/tex" is_fleqn="true"is_display="true">%\newcommand\Opt[0]{\textnormal{Opt}}
%\newcommand\Opt[0]{\verb!Opt!}
\newcommand\Opt[0]{\mathcal{Opt}}
\Opt_i <= \Opt_j + 1</script>
<p>Conversely, assume we have the optimal assignment for the first <em>i</em> labels. If the last bitset starts at label <script type="math/tex" >j^*</script>
, this assignment is also optimal when restricted to <script type="math/tex" >[0, j^*)</script>
. Were this not the case, we could improve the assignment on <script type="math/tex" >[0, i]</script>
as well.</p>
<p>Therefore, it must be that:</p>
<script type="math/tex" is_fleqn="true"is_display="true">\begin{align*}
F &= \left\{ j \mid \verb!can_be_handled! \left( [j, i] \right) \right\} \\[3pt]
\Opt_i &= \min_{F}{ \Opt_j } + 1
\end{align*}</script>
<p>These conditions are exactly what the <code>if</code> statement in the code above checks for.</p>
<p>The <code>can_be_handled</code> function controls whether a range of <code>case</code> labels can be turned into a bitset test. It tests two criteria:</p>
<ul>
<li>The <em>combined</em> range of all the case labels must be smaller than the width of a register. On contemporary systems, the combined range should be less than 64.</li>
<li>The case labels must have at most 3 <em>unique</em> outgoing edges. </li>
</ul>
<p>If there is more than one edge, we emit a cascade of bit tests.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> output<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span>
<span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Find and build the clusters. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> end <span class="z-keyword z-operator z-assignment z-c">=</span> l<span class="z-punctuation z-terminator z-c++">;</span><span class="z-punctuation z-terminator z-c++">;</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-storage z-type z-c">int</span> start <span class="z-keyword z-operator z-assignment z-c">=</span> min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>end<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_start</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">is_beneficial</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">clusters<span class="z-punctuation z-separator z-c++">,</span> start<span class="z-punctuation z-separator z-c++">,</span> end <span class="z-keyword z-operator z-arithmetic z-c">-</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-storage z-type z-c">bool</span> entire <span class="z-keyword z-operator z-assignment z-c">=</span> start <span class="z-keyword z-operator z-comparison z-c">==</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span> <span class="z-keyword z-operator z-arithmetic z-c">&&</span> end <span class="z-keyword z-operator z-comparison z-c">==</span> clusters<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">length</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
output<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">safe_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-keyword z-control z-c++">new</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">bit_test_cluster</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">clusters<span class="z-punctuation z-separator z-c++">,</span> start<span class="z-punctuation z-separator z-c++">,</span> end <span class="z-keyword z-operator z-arithmetic z-c">-</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-separator z-c++">,</span> entire</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-keyword z-control z-c++">else</span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">int</span> i <span class="z-keyword z-operator z-assignment z-c">=</span> end <span class="z-keyword z-operator z-arithmetic z-c">-</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-terminator z-c++">;</span> i <span class="z-keyword z-operator z-comparison z-c">>=</span> start<span class="z-punctuation z-terminator z-c++">;</span> i<span class="z-keyword z-operator z-arithmetic z-c">--</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
output<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">safe_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++">clusters<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
end <span class="z-keyword z-operator z-assignment z-c">=</span> start<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>start <span class="z-keyword z-operator z-comparison z-c"><=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-keyword z-control z-flow z-break z-c++">break</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
output<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">reverse</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
return output<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-invalid z-illegal z-stray-bracket-end z-c++">}</span>
</span></code></pre>
<p>Now we convert only bitsets that satisfy the <code>is_beneficial</code> test.
In the code <code>count</code> represents the number of case labels handled by the cluster, while <code>uniq</code> denotes the number of distinct outgoing edges.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"><span class="z-storage z-type z-c">bool</span> <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++">bit_test_cluster<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-entity z-name z-function z-c++">is_beneficial</span> </span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-storage z-type z-c">unsigned</span> <span class="z-variable z-parameter z-c++">count</span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-storage z-type z-c">unsigned</span> <span class="z-variable z-parameter z-c++">uniq</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++">
</span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-keyword z-control z-flow z-return z-c++">return</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>uniq <span class="z-keyword z-operator z-comparison z-c">==</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span> <span class="z-keyword z-operator z-arithmetic z-c">&&</span> count <span class="z-keyword z-operator z-comparison z-c">>=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">3</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-keyword z-operator z-arithmetic z-c">||</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>uniq <span class="z-keyword z-operator z-comparison z-c">==</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">2</span></span> <span class="z-keyword z-operator z-arithmetic z-c">&&</span> count <span class="z-keyword z-operator z-comparison z-c">>=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">5</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-keyword z-operator z-arithmetic z-c">||</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>uniq <span class="z-keyword z-operator z-comparison z-c">==</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">3</span></span> <span class="z-keyword z-operator z-arithmetic z-c">&&</span> count <span class="z-keyword z-operator z-comparison z-c">>=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">6</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span><span class="z-punctuation z-terminator z-c++">;</span>
</span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span>
</span></code></pre>
<h4 id="jump-tables">Jump tables</h4>
<p>The logic for jump table conversion mirrors the one used for bit test conversion.
It leverages dynamic programming to identify "clusters" that can be profitably converted
to a jump table.</p>
<p>Let's delve into the code. </p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++">vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++">jump_table_cluster<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-entity z-name z-function z-c++">find_jump_tables</span> </span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++">vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> <span class="z-keyword z-operator z-c">&</span><span class="z-variable z-parameter z-c++">clusters</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++">
</span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-keyword z-operator z-arithmetic z-c">!</span><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">is_enabled</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> clusters<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">copy</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-storage z-type z-c">unsigned</span> l <span class="z-keyword z-operator z-assignment z-c">=</span> clusters<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">length</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
auto_vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>min_cluster_item<span class="z-punctuation z-definition z-generic z-end z-c++">></span> min<span class="z-punctuation z-terminator z-c++">;</span>
min<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">reserve</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++">l <span class="z-keyword z-operator z-arithmetic z-c">+</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
</span></span></span></code></pre>
<p>We start by preallocating a vector of <code>min_cluster_item</code>. We'll use this vector
to keep track of the best jump table assignments for consecutive <code>case</code> clusters.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> min<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">quick_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">min_cluster_item</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
</span></code></pre>
<p>We also push the base case for dynamic programming. This corresponds
to using a single jump table for the whole <code>switch</code>.</p>
<p>Let's focus now on the main loop.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> <span class="z-storage z-type z-c">unsigned</span> HOST_WIDE_INT max_ratio <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Used in the main heuristic. See next snippet. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> i <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-terminator z-c++">;</span> i <span class="z-keyword z-operator z-comparison z-c"><=</span> l<span class="z-punctuation z-terminator z-c++">;</span> i<span class="z-keyword z-operator z-arithmetic z-c">++</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> NOTE(xoranth): Base case
</span> min<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">quick_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">min_cluster_item</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">INT_MAX<span class="z-punctuation z-separator z-c++">,</span> INT_MAX<span class="z-punctuation z-separator z-c++">,</span> INT_MAX</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Pre-calculate number of comparisons for the clusters. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
HOST_WIDE_INT comparison_count <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> k <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-terminator z-c++">;</span> k <span class="z-keyword z-operator z-comparison z-c"><=</span> i <span class="z-keyword z-operator z-arithmetic z-c">-</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-terminator z-c++">;</span> k<span class="z-keyword z-operator z-arithmetic z-c">++</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
simple_cluster <span class="z-keyword z-operator z-c">*</span>sc <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-keyword z-operator z-word z-cast z-c++">static_cast</span><span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>simple_cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>clusters<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>k<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> NOTE(xoranth): get_comparison_count is 2 for contiguous case
</span> <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> labels, and 1 otherwise
</span> comparison_count <span class="z-keyword z-operator z-assignment z-augmented z-c">+=</span> sc<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">get_comparison_count</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> j <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-terminator z-c++">;</span> j <span class="z-keyword z-operator z-comparison z-c"><</span> i<span class="z-punctuation z-terminator z-c++">;</span> j<span class="z-keyword z-operator z-arithmetic z-c">++</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-storage z-type z-c">unsigned</span> HOST_WIDE_INT s <span class="z-keyword z-operator z-assignment z-c">=</span> min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>j<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_non_jt_cases</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>i <span class="z-keyword z-operator z-arithmetic z-c">-</span> j <span class="z-keyword z-operator z-comparison z-c"><</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">case_values_threshold</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
s <span class="z-keyword z-operator z-assignment z-augmented z-c">+=</span> i <span class="z-keyword z-operator z-arithmetic z-c">-</span> j<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Prefer clusters with smaller number of numbers covered. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Check if new split is better. See next snippet <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> NOTE(xoranth): Update best score.
</span> min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span> <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">min_cluster_item</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>j<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_count</span> <span class="z-keyword z-operator z-arithmetic z-c">+</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-separator z-c++">,</span> j<span class="z-punctuation z-separator z-c++">,</span>
s <span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> `s` is the number of unhandled cases <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
simple_cluster <span class="z-keyword z-operator z-c">*</span>sc <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-keyword z-operator z-word z-cast z-c++">static_cast</span><span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>simple_cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>clusters<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>j<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span><span class="z-punctuation z-terminator z-c++">;</span>
comparison_count <span class="z-keyword z-operator z-assignment z-augmented z-c">-=</span> sc<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">get_comparison_count</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
</span></code></pre>
<p>The dynamic programming loop mirrors the one we used for bitsets.
We want to find the minimum number of jump tables to represent the first i case labels.</p>
<p>We iterate over splits at the <code>j</code>-th case label, for <code>j < i</code>. During each iteration we track the best scoring split.
The special case <code>j == 0</code> represents a single jump table for the entire <code>switch</code> statement.</p>
<p>Inside this inner loop we also keep track of:</p>
<ol>
<li>How many distinct jump tables we would need to lower the switch.</li>
<li>The <strong>start</strong> of the <em>last</em> jump table of the decomposition.</li>
<li>How many cases <em>cannot</em> be handled by a jump table. In the code, this is the accumulator <code>s</code>.</li>
</ol>
<p>Point 1. and 2. are analogous to the bitset lowering pass, while 3. is new and specific to jump tables.</p>
<p>Namely, there is some overhead to dispatching to a jump table, due to the indirect jump and the need to check bounds.
Therefore <code>gcc</code> requires a minimum amount of <code>case</code> labels to generate a jump table. </p>
<p>If a split would generate a jump table that is too small, the size of the jump table is added to an accumulator <code>s</code>, which is used as a tie-breaker between different split candidates.</p>
<p>To compute the scoring, we need the comparison count between <code>j</code>-th and the <code>i</code>-th case label. We compute it summing up the number of comparisons for all labels, and decreasing the count as we progress in the inner loop.</p>
<p>We also need the number of unhandled <code>case</code> labels. We determine it comparing the size of the jump table to the result of the <code>case_values_threshold</code> function. This function is architecture dependent and takes into account the presence of switch-case specific instructions, but on modern x64 and Arm64 machines, it defaults to 5.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"><span class="z-storage z-type z-c">unsigned</span> <span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++"><span class="z-entity z-name z-function z-c++">default_case_values_threshold</span> </span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-storage z-type z-c++">void</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++"> </span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-keyword z-control z-flow z-return z-c++">return</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>targetm<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">have_casesi</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-ternary z-c">?</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">4</span></span> <span class="z-keyword z-operator z-ternary z-c">:</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">5</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span><span class="z-punctuation z-terminator z-c++">;</span>
</span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span>
</span></code></pre>
<p>Let's now focus on the main heuristic, starting with how <code>max_ratio</code> is computed. I've reproduced below the main loop, omitting lines unrelated to the main heuristic.</p>
<p>We've seen in the examples section that gcc uses the ratio between jump table entries and comparisons as a metric for the lowering the heuristic. The code below shows that distinct ratios are used when optimizing for speed (flags <code>-O[123|fast]</code>) or size (flag <code>-Os</code>). Specifically, a ratio of 8 is used when optimizing for speed, while a ratio of 2 is used when optimizing for size.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> <span class="z-storage z-type z-c">unsigned</span> HOST_WIDE_INT <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++"><span class="z-entity z-name z-function z-c++">max_ratio</span>
</span></span><span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">optimize_insn_for_size_p</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span>
<span class="z-keyword z-operator z-ternary z-c">?</span> param_jump_table_max_growth_ratio_for_size <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> This is `2`
</span> <span class="z-keyword z-operator z-ternary z-c">:</span> param_jump_table_max_growth_ratio_for_speed<span class="z-punctuation z-section z-group z-end z-c++">)</span></span><span class="z-punctuation z-terminator z-c++">;</span> <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> This is `8`
</span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> i <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-terminator z-c++">;</span> i <span class="z-keyword z-operator z-comparison z-c"><=</span> l<span class="z-punctuation z-terminator z-c++">;</span> i<span class="z-keyword z-operator z-arithmetic z-c">++</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Set minimal # of clusters with i-th item to infinite. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
min<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">quick_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">min_cluster_item</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">INT_MAX<span class="z-punctuation z-separator z-c++">,</span> INT_MAX<span class="z-punctuation z-separator z-c++">,</span> INT_MAX</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> NOTE(xoranth): Compute comparison count, see previous snippet
</span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> j <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-terminator z-c++">;</span> j <span class="z-keyword z-operator z-comparison z-c"><</span> i<span class="z-punctuation z-terminator z-c++">;</span> j<span class="z-keyword z-operator z-arithmetic z-c">++</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> NOTE(xoranth): Compute unhandled cases, see previous snippet
</span>
<span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Prefer clusters with smaller number of numbers covered. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
<mark> <span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>j<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_count</span> <span class="z-keyword z-operator z-arithmetic z-c">+</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span> <span class="z-keyword z-operator z-comparison z-c"><</span> min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_count</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
</mark><mark> <span class="z-keyword z-operator z-arithmetic z-c">||</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>j<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_count</span> <span class="z-keyword z-operator z-arithmetic z-c">+</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span> <span class="z-keyword z-operator z-comparison z-c">==</span> min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_count</span> <span class="z-keyword z-operator z-arithmetic z-c">&&</span> s <span class="z-keyword z-operator z-comparison z-c"><</span> min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_non_jt_cases</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
</mark><mark> <span class="z-keyword z-operator z-arithmetic z-c">&&</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">can_be_handled</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">clusters<span class="z-punctuation z-separator z-c++">,</span> j<span class="z-punctuation z-separator z-c++">,</span> i <span class="z-keyword z-operator z-arithmetic z-c">-</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-separator z-c++">,</span> max_ratio<span class="z-punctuation z-separator z-c++">,</span> comparison_count</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
</mark> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span> <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">min_cluster_item</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>j<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_count</span> <span class="z-keyword z-operator z-arithmetic z-c">+</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-separator z-c++">,</span> j<span class="z-punctuation z-separator z-c++">,</span>
s <span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> `s` is the number of unhandled cases <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span> </span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> NOTE(xoranth): Update comparison count, see previous snippet
</span> <span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
</span></code></pre>
<p>Now onto the main loop. Given two splits, we prefer the one that results in fewer jump tables being emitted.
Or, if they would result in the same number of jump tables, we prefer the one that handles the most <code>case</code> labels.
We also ignore splits that would result in unprofitable jump tables, where profitability is checked by the <code>can_be_handled</code> function.</p>
<p>The <code>can_be_handled</code> condition, checks for overflow and compares the ratio of case labels to jump table entries:</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"><span class="z-storage z-type z-c">bool</span> <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++">jump_table_cluster<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-entity z-name z-function z-c++">can_be_handled</span> </span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-storage z-modifier z-c++">const</span> vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> <span class="z-keyword z-operator z-c">&</span><span class="z-variable z-parameter z-c++">clusters</span><span class="z-punctuation z-separator z-c++">,</span>
<span class="z-storage z-type z-c">unsigned</span> <span class="z-variable z-parameter z-c++">start</span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-storage z-type z-c">unsigned</span> <span class="z-variable z-parameter z-c++">end</span><span class="z-punctuation z-separator z-c++">,</span>
<span class="z-storage z-type z-c">unsigned</span> HOST_WIDE_INT <span class="z-variable z-parameter z-c++">max_ratio</span><span class="z-punctuation z-separator z-c++">,</span>
<span class="z-storage z-type z-c">unsigned</span> HOST_WIDE_INT <span class="z-variable z-parameter z-c++">comparison_count</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++">
</span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> If the switch is relatively small such that the cost of one
indirect jump on the target are higher than the cost of a
decision tree, go with the decision tree.
[...]
For algorithm correctness, jump table for a single case must return
true. We bail out in is_beneficial if it's called just for
a single case. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>start <span class="z-keyword z-operator z-comparison z-c">==</span> end<span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> <span class="z-constant z-language z-c">true</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-storage z-type z-c">unsigned</span> HOST_WIDE_INT range <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">get_range</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">clusters<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>start<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">get_low</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-separator z-c++">,</span> clusters<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>end<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">get_high</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Check for overflow. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> lhs <span class="z-keyword z-operator z-comparison z-c"><=</span> max_ratio <span class="z-keyword z-operator z-c">*</span> comparison_count<span class="z-punctuation z-terminator z-c++">;</span>
</span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span>
</span></code></pre>
<p>The coda of the function also mirrors the one for bitset clusters.
It iterates over all jump tables in reverse order, and prunes the ones that fail the <code>is_beneficial</code> check.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> output<span class="z-punctuation z-terminator z-c++">;</span>
output<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">create</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">4</span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Find and build the clusters. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> <span class="z-storage z-type z-c">int</span> end <span class="z-keyword z-operator z-assignment z-c">=</span> l<span class="z-punctuation z-terminator z-c++">;</span><span class="z-punctuation z-terminator z-c++">;</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-storage z-type z-c">int</span> start <span class="z-keyword z-operator z-assignment z-c">=</span> min<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>end<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-variable z-other z-readwrite z-member z-c++">m_start</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Do not allow clusters with small number of cases. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">is_beneficial</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">clusters<span class="z-punctuation z-separator z-c++">,</span> start<span class="z-punctuation z-separator z-c++">,</span> end <span class="z-keyword z-operator z-arithmetic z-c">-</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
output<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">safe_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-keyword z-control z-c++">new</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">jump_table_cluster</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">clusters<span class="z-punctuation z-separator z-c++">,</span> start<span class="z-punctuation z-separator z-c++">,</span> end <span class="z-keyword z-operator z-arithmetic z-c">-</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">else</span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">int</span> i <span class="z-keyword z-operator z-assignment z-c">=</span> end <span class="z-keyword z-operator z-arithmetic z-c">-</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-terminator z-c++">;</span> i <span class="z-keyword z-operator z-comparison z-c">>=</span> start<span class="z-punctuation z-terminator z-c++">;</span> i<span class="z-keyword z-operator z-arithmetic z-c">--</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
output<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">safe_push</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++">clusters<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
end <span class="z-keyword z-operator z-assignment z-c">=</span> start<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>start <span class="z-keyword z-operator z-comparison z-c"><=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-keyword z-control z-flow z-break z-c++">break</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
output<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">reverse</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
return output<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-invalid z-illegal z-stray-bracket-end z-c++">}</span>
</span></code></pre>
<p>The <code>is_beneficial</code> check for jump tables aims to prune small jump tables. It compares the number of case labels to <code>case_values_threshold</code>.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"><span class="z-storage z-type z-c">bool</span> <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++">jump_table_cluster<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-entity z-name z-function z-c++">is_beneficial</span> </span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-storage z-modifier z-c++">const</span> vec<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>cluster <span class="z-keyword z-operator z-c">*</span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> <span class="z-keyword z-operator z-c">&</span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-storage z-type z-c">unsigned</span> <span class="z-variable z-parameter z-c++">start</span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-storage z-type z-c">unsigned</span> <span class="z-variable z-parameter z-c++">end</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++"> </span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>start <span class="z-keyword z-operator z-comparison z-c">==</span> end<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-keyword z-control z-flow z-return z-c++">return</span> <span class="z-constant z-language z-c">false</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> end <span class="z-keyword z-operator z-arithmetic z-c">-</span> start <span class="z-keyword z-operator z-arithmetic z-c">+</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span> <span class="z-keyword z-operator z-comparison z-c">>=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">case_values_threshold</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
</span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span>
</span></code></pre>
<h4 id="binary-tree">Binary tree</h4>
<p>If <code>gcc</code> cannot emit a bitset cluster or a jump table, it falls back to a binary
tree of comparison.</p>
<p>In emitting the tree, it takes into account the probability of each case label
cluster. The probability can be obtained via profile-guided optimization, or with
the <code>__builtin_expect_with_probability</code> builtin.</p>
<p>The code is pretty straightforward.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"><span class="z-storage z-type z-c">void</span> <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++">switch_decision_tree<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-entity z-name z-function z-c++">balance_case_nodes</span> </span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++">case_tree_node <span class="z-keyword z-operator z-c">*</span><span class="z-keyword z-operator z-c">*</span><span class="z-variable z-parameter z-c++">head</span><span class="z-punctuation z-separator z-c++">,</span> case_tree_node <span class="z-keyword z-operator z-c">*</span><span class="z-variable z-parameter z-c++">parent</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++">
</span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
case_tree_node <span class="z-keyword z-operator z-c">*</span>np <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-keyword z-operator z-c">*</span>head<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>np<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-storage z-type z-c">int</span> i <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-terminator z-c++">;</span>
case_tree_node <span class="z-keyword z-operator z-c">*</span><span class="z-keyword z-operator z-c">*</span>npp<span class="z-punctuation z-separator z-c++">,</span> <span class="z-keyword z-operator z-c">*</span>left<span class="z-punctuation z-terminator z-c++">;</span>
profile_probability prob <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++">profile_probability<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-variable z-function z-c++">never</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
</span></span></span></span></code></pre>
<p>Sum probability of all case labels.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> <span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Count the number of entries on branch. Also count the ranges. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
<span class="z-keyword z-control z-c++">while</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>np<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
i<span class="z-keyword z-operator z-arithmetic z-c">++</span><span class="z-punctuation z-terminator z-c++">;</span>
prob <span class="z-keyword z-operator z-assignment z-augmented z-c">+=</span> np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_c</span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_prob</span><span class="z-punctuation z-terminator z-c++">;</span>
np <span class="z-keyword z-operator z-assignment z-c">=</span> np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_right</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
</span></code></pre>
<p>Split the case label list such that both halves have equal probability.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> <span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>i <span class="z-keyword z-operator z-comparison z-c">></span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">2</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
npp <span class="z-keyword z-operator z-assignment z-c">=</span> head<span class="z-punctuation z-terminator z-c++">;</span>
left <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-keyword z-operator z-c">*</span>npp<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> NOTE(xoranth): this divides prob by 2.
</span> profile_probability pivot_prob <span class="z-keyword z-operator z-assignment z-c">=</span> prob<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">apply_scale</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">2</span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">while</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span> Skip nodes while their probability does not reach that amount. <span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span>
prob <span class="z-keyword z-operator z-assignment z-augmented z-c">-=</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-keyword z-operator z-c">*</span>npp<span class="z-punctuation z-section z-group z-end z-c++">)</span></span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_c</span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_prob</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>prob<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">initialized_p</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-arithmetic z-c">&&</span> prob <span class="z-keyword z-operator z-comparison z-c"><</span> pivot_prob<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-keyword z-operator z-arithmetic z-c">||</span> <span class="z-keyword z-operator z-arithmetic z-c">!</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-keyword z-operator z-c">*</span>npp<span class="z-punctuation z-section z-group z-end z-c++">)</span></span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_right</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-keyword z-control z-flow z-break z-c++">break</span><span class="z-punctuation z-terminator z-c++">;</span>
npp <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-keyword z-operator z-c">&</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-keyword z-operator z-c">*</span>npp<span class="z-punctuation z-section z-group z-end z-c++">)</span></span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_right</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
np <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-keyword z-operator z-c">*</span>npp<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-operator z-c">*</span>npp <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-operator z-c">*</span>head <span class="z-keyword z-operator z-assignment z-c">=</span> np<span class="z-punctuation z-terminator z-c++">;</span>
np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_parent</span> <span class="z-keyword z-operator z-assignment z-c">=</span> parent<span class="z-punctuation z-terminator z-c++">;</span>
np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_left</span> <span class="z-keyword z-operator z-assignment z-c">=</span> left <span class="z-keyword z-operator z-comparison z-c">==</span> np <span class="z-keyword z-operator z-ternary z-c">?</span> <span class="z-constant z-language z-c">NULL</span> <span class="z-keyword z-operator z-ternary z-c">:</span> left<span class="z-punctuation z-terminator z-c++">;</span>
</span></span></code></pre>
<p>Recurse in both halves to build the binary tree.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">balance_case_nodes</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-keyword z-operator z-c">&</span>np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_left</span><span class="z-punctuation z-separator z-c++">,</span> np</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">balance_case_nodes</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-keyword z-operator z-c">&</span>np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_right</span><span class="z-punctuation z-separator z-c++">,</span> np</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_c</span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_subtree_prob</span> <span class="z-keyword z-operator z-assignment z-c">=</span> np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_c</span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_prob</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_left</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_c</span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_subtree_prob</span> <span class="z-keyword z-operator z-assignment z-augmented z-c">+=</span> np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_left</span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_c</span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_subtree_prob</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_right</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_c</span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_subtree_prob</span> <span class="z-keyword z-operator z-assignment z-augmented z-c">+=</span> np<span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_right</span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_c</span><span class="z-punctuation z-accessor z-arrow z-c++">-></span><span class="z-variable z-other z-readwrite z-member z-c++">m_subtree_prob</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-invalid z-illegal z-stray-bracket-end z-c++">}</span> <span class="z-keyword z-control z-c++">else</span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> [...] Base case
</span> <span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-invalid z-illegal z-stray-bracket-end z-c++">}</span>
<span class="z-invalid z-illegal z-stray-bracket-end z-c++">}</span>
</span></code></pre>
<h3 id="conclusion">Conclusion</h3>
<p>We've seen the heuristics and lowering strategies <code>gcc</code> employs when compiling switch statements, including:</p>
<ul>
<li>Compressing multiple <code>case</code> labels into a bitset.</li>
<li>Transforming the <code>switch</code> statement into a jump table.</li>
<li>Transforming the <code>switch</code> statement into a binary decision tree.</li>
</ul>
<p>For anyone interested in delving deeper, I'd suggest examining the <code>gcc</code> <a href="https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-switch-conversion.cc">source code</a>, and reading <a href="https://www.researchgate.net/publication/245584786_A_Superoptimizer_Analysis_of_Multiway_Branch_Code_Generation">this paper</a>.</p>
<p><em>Finally, I want to thank B.X.V., C.P., and V.P. for their feedback when writing this post</em></p>
Optimizing the `pext` perfect hash function
2023-06-11T00:00:00+00:00
2023-06-11T00:00:00+00:00
Unknown
https://xoranth.net/verb-parse/
<p>This post is a followup to two posts by Wojciech Muła. <a href="http://0x80.pl/notesen/2022-01-29-http-verb-parse.html">One</a> on parsing HTTP verbs, and another on using <a href="http://0x80.pl/notesen/2023-04-30-lookup-in-strings.html"><code>pext</code> for perfect hashing</a>.</p>
<p>The first post analyzes alternate implementations of a function <code>string_to_verb</code> that translates and HTTP verb, represented by a <code>std::string_view</code>, into an enum.
It explores three different strategies:</p>
<ul>
<li>An hardcoded trie, taken from the <code>boost::beast</code> library.</li>
<li>A SWAR strategy that, instead of reading from the input string one character at a time, reads a 32 or 64 bit prefix and switches on that.</li>
<li>A perfect hash function generated by the <code>gperf</code> tool.</li>
</ul>
<p>The SWAR strategy turns out to be the <em>least</em> performant. </p>
<p>The three strategies are tested on two syntethic datasets:</p>
<ul>
<li>One with a uniform distribution of all HTTP verbs.</li>
<li>A more realistic one, which only contains the three most frequent verbs (GET/PUT/POST).</li>
</ul>
<p>The second post introduces a new strategy. This strategy first branches on the <em>length</em> of the input.
Then it uses the <code>pext</code> Intel-specific instruction to extract a few discriminating bits from the input.
Finally interprets those bits as a index into a <em>per-length</em> lookup table.
We'll call this new strategy <code>pext_by_len</code>.</p>
<p>This second implementation allows each codepath to know at compile-time how much data it needs to load from memory. We will see that this is critical for performance.</p>
<!-- TODO double check this -->
<p>Wojciech Muła doesn't test this new strategy on the original problem, but it finds it to perform very well on a range of similar tasks.</p>
<p>In this post I will:</p>
<ul>
<li>Reproduce the results of the original post on my machine. I will add a further annotation on the number of cache misses.</li>
<li>Backport the new strategy to the original problem and quickly discuss its performance.</li>
<li>Analyze the SWAR strategy from the original post to see why it performs so badly. In particular, we will see that GCC implements the SWAR strategy as a trie, which leads to a substantial amount of cache misses.</li>
<li>Modify the SWAR strategy to use <code>pext</code> as well. This will use a <em>global</em> table instead of <em>per-length</em> ones as in <code>pext_by_len</code>. We'll see how some characteristics of <code>pext</code> synergize with the use of SWAR techniques and a global table.</li>
<li>Further optimize things by replacing <code>memcmp</code> with a more specialized implementation.</li>
</ul>
<h4 id="benchmarking-setup">Benchmarking setup</h4>
<p>All benchmarks were run on my machine, a ThinkPad laptop with an <code>Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz</code>.</p>
<p>Here’s what I did to set up the benchmark program:</p>
<ul>
<li>I excluded the second core of the machine (logical cores 1 and 5) from the scheduler using the <code>isolcpus</code> kernel boot parameter.</li>
<li>I disabled turboboost and set the CPU frequency to 45% of the maximum value before running the benchmark.</li>
<li>I pinned the benchmark program to logical core 1 using <code>taskset</code>.</li>
</ul>
<p>To ensure accurate results and compensate for cache warming effects, I ran each benchmark approximately 256'000 times.
In the tables below, I report both the average timing in nanoseconds and cycles and the median absolute percent error across all runs.
Additionally, I report the cache misses rate obtained through hardware perfomance counters, as it will be useful in the analysis.</p>
<h3 id="reproducing-previous-results">Reproducing previous results</h3>
<p>As a start, I have reproduced the benchmarks on my machine, timing both in nanoseconds and cycles, and added an annotation for cache misses. </p>
<p>Of the strategies in the first post, we see that SWAR implementations lag behind the <code>boost::beast</code> implementation, while using perfect hashing yields the best performance by far.
The ranking of the results matches the results of the original post.</p>
<p>The backported <code>pext_by_len</code> strategy shines on the three verbs benchmark, and is still the second best on the full benchmark.</p>
<p>We see that <code>pext_by_len</code> has a lot more cache misses and takes more time on the full benchmark.
This is the first hint that the input size branch is difficult to predict, and that removing it might yield improvements.</p>
<table><thead><tr><th style="text-align: right">ns/op</th><th style="text-align: right">err%</th><th style="text-align: right">cyc/op</th><th style="text-align: right">miss%</th><th style="text-align: left">all verbs</th></tr></thead><tbody>
<tr><td style="text-align: right">14.91</td><td style="text-align: right">1.6%</td><td style="text-align: right">29.22</td><td style="text-align: right">51.9%</td><td style="text-align: left"><code>noop</code></td></tr>
<tr><td style="text-align: right">63.68</td><td style="text-align: right">0.1%</td><td style="text-align: right">126.02</td><td style="text-align: right">21.7%</td><td style="text-align: left"><code>reference_impl</code></td></tr>
<tr><td style="text-align: right">72.16</td><td style="text-align: right">0.1%</td><td style="text-align: right">142.79</td><td style="text-align: right">10.6%</td><td style="text-align: left"><code>swar</code></td></tr>
<tr><td style="text-align: right">68.98</td><td style="text-align: right">0.2%</td><td style="text-align: right">136.47</td><td style="text-align: right">10.8%</td><td style="text-align: left"><code>swar32</code></td></tr>
<tr><td style="text-align: right">67.39</td><td style="text-align: right">0.2%</td><td style="text-align: right">133.39</td><td style="text-align: right">10.6%</td><td style="text-align: left"><code>swar32 v2</code></td></tr>
<tr><td style="text-align: right">29.22</td><td style="text-align: right">0.4%</td><td style="text-align: right">57.66</td><td style="text-align: right">0.6%</td><td style="text-align: left"><code>perfect_hash</code></td></tr>
<tr><td style="text-align: right">44.21</td><td style="text-align: right">0.2%</td><td style="text-align: right">87.50</td><td style="text-align: right">14.9%</td><td style="text-align: left"><code>pext_by_len</code></td></tr>
</tbody></table>
<!-- | 39.74 | 0.2% | 78.70 | 0.4% | `pext` -->
<!-- | 29.75 | 1.4% | 58.71 | 0.5% | `pext v2` -->
<!-- | 26.25 | 0.7% | 51.71 | 0.0% | `pext v3` -->
<!-- | 22.04 | 1.1% | 43.40 | 0.0% | `pext v4` -->
<!-- | 26.01 | 0.6% | 51.36 | 1.0% | `pext v5` -->
<!-- | 25.53 | 0.7% | 50.52 | 1.0% | `pext v6` -->
<table><thead><tr><th style="text-align: right">ns/op</th><th style="text-align: right">err%</th><th style="text-align: right">cyc/op</th><th style="text-align: right">miss%</th><th style="text-align: left">GET/PUT/POST</th></tr></thead><tbody>
<tr><td style="text-align: right">14.77</td><td style="text-align: right">1.2%</td><td style="text-align: right">29.10</td><td style="text-align: right">46.5%</td><td style="text-align: left"><code>noop</code></td></tr>
<tr><td style="text-align: right">45.76</td><td style="text-align: right">0.7%</td><td style="text-align: right">90.62</td><td style="text-align: right">9.8%</td><td style="text-align: left"><code>reference_impl</code></td></tr>
<tr><td style="text-align: right">50.24</td><td style="text-align: right">0.1%</td><td style="text-align: right">99.48</td><td style="text-align: right">4.3%</td><td style="text-align: left"><code>swar</code></td></tr>
<tr><td style="text-align: right">49.45</td><td style="text-align: right">0.1%</td><td style="text-align: right">97.92</td><td style="text-align: right">3.6%</td><td style="text-align: left"><code>swar32</code></td></tr>
<tr><td style="text-align: right">47.80</td><td style="text-align: right">0.1%</td><td style="text-align: right">94.65</td><td style="text-align: right">4.9%</td><td style="text-align: left"><code>swar32 v2</code></td></tr>
<tr><td style="text-align: right">38.07</td><td style="text-align: right">0.9%</td><td style="text-align: right">75.15</td><td style="text-align: right">2.7%</td><td style="text-align: left"><code>perfect_hash</code></td></tr>
<tr><td style="text-align: right">31.65</td><td style="text-align: right">0.2%</td><td style="text-align: right">62.65</td><td style="text-align: right">7.2%</td><td style="text-align: left"><code>pext_by_len</code></td></tr>
</tbody></table>
<!-- | 21.03 | 1.2% | 41.51 | 0.0% | `avx2` -->
<!-- | 43.15 | 0.2% | 85.44 | 1.8% | `pext` -->
<!-- | 38.67 | 0.4% | 76.47 | 2.5% | `pext v2` -->
<!-- | 26.06 | 0.2% | 51.51 | 0.0% | `pext v3` -->
<!-- | 21.98 | 1.0% | 43.28 | 0.0% | `pext v4` -->
<!-- | 32.61 | 1.0% | 64.37 | 4.2% | `pext v5` -->
<!-- | 33.42 | 3.0% | 66.16 | 4.3% | `pext v6` -->
<h3 id="profiling-the-swar-implementation">Profiling the SWAR implementation</h3>
<p>Let's analyze the SWAR 32 strategy. The code is roughly as follows:</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++">verb <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++"><span class="z-entity z-name z-function z-c++">string_to_verb</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++">std<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>string_view <span class="z-variable z-parameter z-c++">v</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++">
</span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>v<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-comparison z-c"><</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">3</span></span> <span class="z-keyword z-operator z-word z-c++">or</span> v<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-comparison z-c">></span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">13</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> verb<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>unknown<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> Type punning with `memcpy`
</span> <span class="z-meta z-union z-c++"><span class="z-keyword z-declaration z-union z-type z-c++">union</span> </span><span class="z-meta z-union z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-union z-c++"><span class="z-meta z-block z-c++">
<span class="z-storage z-type z-c">char</span> buf<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">13</span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-support z-type z-stdint z-c">uint32_t</span> value<span class="z-punctuation z-terminator z-c++">;</span>
</span></span><span class="z-meta z-union z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span>
value <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-meta z-function-call z-c++"><span class="z-support z-function z-C99 z-c">memcpy</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">buf<span class="z-punctuation z-separator z-c++">,</span> v<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">data</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-separator z-c++">,</span> v<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">switch</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>value<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span> <span class="z-keyword z-control z-c++">case</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">STRING_CONST</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>M<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>K<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>A<span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span>C<span class="z-punctuation z-definition z-string z-end z-c">'</span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-separator z-c++">:</span> <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> The STRING_CONST macro turns "MKAC" into a 32 bit integer.
</span> <span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>v<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">substr</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">4</span></span></span></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-comparison z-c">==</span> <span class="z-comment z-block z-c"><span class="z-punctuation z-definition z-comment z-begin z-c">/*</span>MKAC<span class="z-punctuation z-definition z-comment z-end z-c">*/</span></span><span class="z-string z-quoted z-double z-c"><span class="z-punctuation z-definition z-string z-begin z-c">"</span>TIVITY<span class="z-punctuation z-definition z-string z-end z-c">"</span></span>_sv<span class="z-punctuation z-section z-group z-end z-c++">)</span></span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> verb<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>mkactivity<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-flow z-break z-c++">break</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> ...
</span> <span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> verb<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>unknown<span class="z-punctuation z-terminator z-c++">;</span>
</span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span>
</span></code></pre>
<p>The performance table shows a very high level of cache misses. Given that the switch statement has 33 cases, we suspect it might be the culprit. </p>
<p>To confirm this, we can use Linux's <code>perf</code> utility:</p>
<pre class="z-code"><code><span class="z-text z-plain">$ env NANOBENCH_ENDLESS=swar32 timeout 10 perf record -e branches,branch-misses taskset -c 1 ./benchmark_nb
[ perf record: Woken up 15 times to write data ]
[ perf record: Captured and wrote 3.583 MB perf.data (78219 samples) ]
</span></code></pre>
<p>An initial look at <code>perf</code>'s output doesn't show any particular hotspot.
To see why, let us look at how GCC compiles the <code>switch</code> statement:</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly">Samples <span class="z-keyword z-control z-assembly">lea</span> <span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">,</span><span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">-</span><span class="z-constant z-character z-hexadecimal z-assembly">0x3</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0xa</span>
↓ <span class="z-keyword z-control z-assembly">ja</span> <span class="z-constant z-character z-decimal z-assembly">510</span>
<span class="z-keyword z-control z-assembly">push</span> <span class="z-variable z-parameter z-register z-assembly">rbp</span>
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rdi</span>
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">rbp</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rsi</span>
<span class="z-keyword z-control z-assembly">push</span> <span class="z-variable z-parameter z-register z-assembly">rbx</span>
<span class="z-constant z-character z-decimal z-assembly">1</span> <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">rbx</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rdi</span>
<span class="z-keyword z-control z-assembly">sub</span> <span class="z-variable z-parameter z-register z-assembly">rsp</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x18</span> <span class="z-comment z-assembly">; reserve space for `buf`</span>
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rsp</span>
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-support z-function z-directive z-assembly">QWORD</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsp</span><span class="z-source z-assembly">]</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x0</span> <span class="z-comment z-assembly">; zero out the first 64 bits of `buf`</span>
→ <span class="z-keyword z-control z-assembly">call</span> memcpy@plt <span class="z-comment z-assembly">; copy the `v` to `buf`</span>
<span class="z-constant z-character z-decimal z-assembly">6</span> <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-support z-function z-directive z-assembly">DWORD</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsp</span><span class="z-source z-assembly">]</span> <span class="z-comment z-assembly">; load the first 4 characters of `buf` into $eax</span>
<span class="z-comment z-assembly">; This is the body of the switch statement. We compare the first 4 characters of `buf`</span>
<span class="z-comment z-assembly">; lexicographically, to create a n-ary tree.</span>
<span class="z-comment z-assembly">; Due to endianess, verbs are ordered by the last to the first letter.</span>
<span class="z-constant z-character z-decimal z-assembly">31</span> <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x49424e55</span> <span class="z-comment z-assembly">; buf[:4] == "UNBI"</span>
↓ <span class="z-keyword z-control z-assembly">je</span> 4f0 <span class="z-comment z-assembly">; Verb must be UNBIND or not a verb. Check the rest of buf</span>
↓ <span class="z-keyword z-control z-assembly">ja</span> d0
<span class="z-constant z-character z-decimal z-assembly">1</span> <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x44414548</span> <span class="z-comment z-assembly">; buf[:4] == "HEAD"</span>
↓ <span class="z-keyword z-control z-assembly">je</span> 4e0
<span class="z-constant z-character z-decimal z-assembly">117</span> ↓ <span class="z-keyword z-control z-assembly">ja</span> <span class="z-constant z-character z-decimal z-assembly">90</span>
<span class="z-constant z-character z-decimal z-assembly">97</span> <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x43414b4d</span> <span class="z-comment z-assembly">; buf[:4] == "MKAC"</span>
↓ <span class="z-keyword z-control z-assembly">je</span> <span class="z-constant z-character z-decimal z-assembly">488</span>
<span class="z-constant z-character z-decimal z-assembly">108</span> ↓ <span class="z-keyword z-control z-assembly">jbe</span> <span class="z-constant z-character z-decimal z-assembly">150</span>
<span class="z-constant z-character z-decimal z-assembly">28</span> <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x43454843</span> <span class="z-comment z-assembly">; buf[:4] == "CHEC"</span>
↓ <span class="z-keyword z-control z-assembly">je</span> 3a0
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x43544150</span> <span class="z-comment z-assembly">; buf[:4] == "PATC"</span>
↓ <span class="z-keyword z-control z-assembly">jne</span> 2a0
<span class="z-constant z-character z-decimal z-assembly">9</span> <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-support z-function z-directive z-assembly">BYTE</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rbp</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x4</span><span class="z-source z-assembly">]</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x48</span>
<span class="z-constant z-character z-decimal z-assembly">47</span> <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">edx</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x1c</span>
↓ <span class="z-keyword z-control z-assembly">jne</span> <span class="z-constant z-character z-decimal z-assembly">180</span>
↓ <span class="z-keyword z-control z-assembly">jmp</span> <span class="z-constant z-character z-decimal z-assembly">109</span>
<span class="z-keyword z-control z-assembly">nop</span>
<span class="z-comment z-assembly">; ...</span>
</span></code></pre>
<p>The code listing above shows the output of <code>perf</code>. On the left, under the <code>Samples</code> column, we see how many branch misses there were for each instruction.
On the right, is the assembly code of the <code>string_to_verb</code> function.</p>
<p>We see that the switch statement has been expanded to a series of comparisons.
Interestingly, we see not only jumps on equality (<code>je</code>), but also for greater (<code>ja</code>) or smaller-or-equal (<code>jbe</code>).
This suggests that GCC has turned the switch statement into a decision tree. With some post-processing, we can extract the structure of the decision tree and visualize it:</p>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 5.0.0 (0)
--><!-- Title: Branch tree Pages: 1 --><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0.00 0.00 716.00 445.50">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 441.5)">
<title>Branch tree</title>
<polygon fill="white" stroke="transparent" points="-4,4 -4,-441.5 712,-441.5 712,4 -4,4"/>
<!-- Bx0 -->
<g id="node1" class="node">
<title>Bx0</title>
<polygon fill="none" stroke="black" points="8,-267 8,-288 48,-288 48,-267 8,-267"/>
<text text-anchor="start" x="11" y="-273.8" font-family="Fira Mono" font-size="14.00">UNBI</text>
<polygon fill="none" stroke="black" points="8,-246 8,-267 48,-267 48,-246 8,-246"/>
<text text-anchor="start" x="11" y="-252.8" font-family="Fira Mono" font-size="14.00">HEAD</text>
<polygon fill="none" stroke="black" points="8,-225 8,-246 48,-246 48,-225 8,-225"/>
<text text-anchor="start" x="11" y="-231.8" font-family="Fira Mono" font-size="14.00">MKAC</text>
<polygon fill="none" stroke="black" points="8,-204 8,-225 48,-225 48,-204 8,-204"/>
<text text-anchor="start" x="11" y="-210.8" font-family="Fira Mono" font-size="14.00">CHEC</text>
<polygon fill="none" stroke="black" points="8,-183 8,-204 48,-204 48,-183 8,-183"/>
<text text-anchor="start" x="11" y="-189.8" font-family="Fira Mono" font-size="14.00">PATC</text>
</g>
<!-- Bx90 -->
<g id="node2" class="node">
<title>Bx90</title>
<polygon fill="none" stroke="black" points="171,-201 171,-222 211,-222 211,-201 171,-201"/>
<text text-anchor="start" x="174" y="-207.8" font-family="Fira Mono" font-size="14.00">MOVE</text>
<polygon fill="none" stroke="black" points="171,-180 171,-201 211,-201 211,-180 171,-180"/>
<text text-anchor="start" x="174" y="-186.8" font-family="Fira Mono" font-size="14.00">PURG</text>
<polygon fill="none" stroke="black" points="171,-159 171,-180 211,-180 211,-159 171,-159"/>
<text text-anchor="start" x="174" y="-165.8" font-family="Fira Mono" font-size="14.00">REBI</text>
</g>
<!-- Bx0->Bx90 -->
<g id="edge2" class="edge">
<title>Bx0:f1->Bx90:w</title>
<path fill="none" stroke="black" d="M49,-257C104.58,-257 112.18,-214.77 164.77,-212.13"/>
<polygon fill="black" stroke="black" points="165.04,-213.87 170,-212 164.96,-210.37 165.04,-213.87"/>
<text text-anchor="middle" x="109.5" y="-244.2" font-family="Fira Mono" font-size="11.00">ja</text>
</g>
<!-- Bxd0 -->
<g id="node3" class="node">
<title>Bxd0</title>
<polygon fill="none" stroke="black" points="171,-329 171,-350 211,-350 211,-329 171,-329"/>
<text text-anchor="start" x="174" y="-335.8" font-family="Fira Mono" font-size="14.00">UNLO</text>
<polygon fill="none" stroke="black" points="171,-308 171,-329 211,-329 211,-308 171,-308"/>
<text text-anchor="start" x="174" y="-314.8" font-family="Fira Mono" font-size="14.00">SUBS</text>
<polygon fill="none" stroke="black" points="171,-287 171,-308 211,-308 211,-287 171,-287"/>
<text text-anchor="start" x="174" y="-293.8" font-family="Fira Mono" font-size="14.00">UNSU</text>
<polygon fill="none" stroke="black" points="171,-266 171,-287 211,-287 211,-266 171,-266"/>
<text text-anchor="start" x="174" y="-272.8" font-family="Fira Mono" font-size="14.00">COPY</text>
</g>
<!-- Bx0->Bxd0 -->
<g id="edge1" class="edge">
<title>Bx0:f0->Bxd0:w</title>
<path fill="none" stroke="black" d="M49,-278C107.66,-278 109.51,-336.42 164.84,-339.84"/>
<polygon fill="black" stroke="black" points="164.95,-341.6 170,-340 165.06,-338.1 164.95,-341.6"/>
<text text-anchor="middle" x="109.5" y="-321.2" font-family="Fira Mono" font-size="11.00">ja</text>
</g>
<!-- Bx150 -->
<g id="node5" class="node">
<title>Bx150</title>
<polygon fill="none" stroke="black" points="171,-79 171,-100 211,-100 211,-79 171,-79"/>
<text text-anchor="start" x="174" y="-85.8" font-family="Fira Mono" font-size="14.00">PUT </text>
<polygon fill="none" stroke="black" points="171,-58 171,-79 211,-79 211,-58 171,-58"/>
<text text-anchor="start" x="174" y="-64.8" font-family="Fira Mono" font-size="14.00">MKCA</text>
</g>
<!-- Bx0->Bx150 -->
<g id="edge3" class="edge">
<title>Bx0:f2->Bx150:w</title>
<path fill="none" stroke="black" d="M49,-235C131.13,-235 88.02,-96.16 164.77,-90.2"/>
<polygon fill="black" stroke="black" points="165.07,-91.94 170,-90 164.94,-88.44 165.07,-91.94"/>
<text text-anchor="middle" x="109.5" y="-199.2" font-family="Fira Mono" font-size="11.00">jbe</text>
</g>
<!-- Bx2a0 -->
<g id="node13" class="node">
<title>Bx2a0</title>
<polygon fill="none" stroke="black" points="171,-7 171,-28 211,-28 211,-7 171,-7"/>
<text text-anchor="start" x="174" y="-13.8" font-family="Fira Mono" font-size="14.00">TRAC</text>
</g>
<!-- Bx0->Bx2a0 -->
<g id="edge4" class="edge">
<title>Bx0:f4->Bx2a0:w</title>
<path fill="none" stroke="black" d="M28,-182C28,-86.95 71.67,-19.41 164.95,-17.06"/>
<polygon fill="black" stroke="black" points="165.02,-18.81 170,-17 164.98,-15.31 165.02,-18.81"/>
<text text-anchor="middle" x="109.5" y="-36.2" font-family="Fira Mono" font-size="11.00">jne</text>
</g>
<!-- Bx270 -->
<g id="node12" class="node">
<title>Bx270</title>
<polygon fill="none" stroke="black" points="334,-201 334,-222 374,-222 374,-201 334,-201"/>
<text text-anchor="start" x="337" y="-207.8" font-family="Fira Mono" font-size="14.00">DELE</text>
<polygon fill="none" stroke="black" points="334,-180 334,-201 374,-201 374,-180 334,-180"/>
<text text-anchor="start" x="337" y="-186.8" font-family="Fira Mono" font-size="14.00">M-SE</text>
</g>
<!-- Bx90->Bx270 -->
<g id="edge5" class="edge">
<title>Bx90:f0->Bx270:w</title>
<path fill="none" stroke="black" d="M212,-212C263.99,-212 278.28,-212 327.77,-212"/>
<polygon fill="black" stroke="black" points="328,-213.75 333,-212 328,-210.25 328,-213.75"/>
<text text-anchor="middle" x="272.5" y="-215.2" font-family="Fira Mono" font-size="11.00">jbe</text>
</g>
<!-- Bx2d5 -->
<g id="node15" class="node">
<title>Bx2d5</title>
<polygon fill="none" stroke="black" points="334,-129 334,-150 374,-150 374,-129 334,-129"/>
<text text-anchor="start" x="337" y="-135.8" font-family="Fira Mono" font-size="14.00">MERG</text>
</g>
<!-- Bx90->Bx2d5 -->
<g id="edge6" class="edge">
<title>Bx90:f2->Bx2d5:w</title>
<path fill="none" stroke="black" d="M212,-169C265.67,-169 276.91,-140.85 327.94,-139.09"/>
<polygon fill="black" stroke="black" points="328.03,-140.83 333,-139 327.97,-137.34 328.03,-140.83"/>
<text text-anchor="middle" x="272.5" y="-161.2" font-family="Fira Mono" font-size="11.00">jne</text>
</g>
<!-- Bx118 -->
<g id="node4" class="node">
<title>Bx118</title>
<polygon fill="none" stroke="black" points="334,-412 334,-433 374,-433 374,-412 334,-412"/>
<text text-anchor="start" x="337" y="-418.8" font-family="Fira Mono" font-size="14.00">LOCK</text>
<polygon fill="none" stroke="black" points="334,-391 334,-412 374,-412 374,-391 334,-391"/>
<text text-anchor="start" x="337" y="-397.8" font-family="Fira Mono" font-size="14.00">CONN</text>
<polygon fill="none" stroke="black" points="334,-370 334,-391 374,-391 374,-370 334,-370"/>
<text text-anchor="start" x="337" y="-376.8" font-family="Fira Mono" font-size="14.00">MKCO</text>
</g>
<!-- Bxd0->Bx118 -->
<g id="edge7" class="edge">
<title>Bxd0:f0->Bx118:w</title>
<path fill="none" stroke="black" d="M212,-340C275.43,-340 268.25,-418.52 327.81,-422.82"/>
<polygon fill="black" stroke="black" points="327.94,-424.57 333,-423 328.06,-421.07 327.94,-424.57"/>
<text text-anchor="middle" x="272.5" y="-399.2" font-family="Fira Mono" font-size="11.00">jbe</text>
</g>
<!-- Bx200 -->
<g id="node9" class="node">
<title>Bx200</title>
<polygon fill="none" stroke="black" points="334,-323 334,-344 374,-344 374,-323 334,-323"/>
<text text-anchor="start" x="337" y="-329.8" font-family="Fira Mono" font-size="14.00">PROP</text>
<polygon fill="none" stroke="black" points="334,-302 334,-323 374,-323 374,-302 334,-302"/>
<text text-anchor="start" x="337" y="-308.8" font-family="Fira Mono" font-size="14.00">SEAR</text>
</g>
<!-- Bxd0->Bx200 -->
<g id="edge8" class="edge">
<title>Bxd0:f1->Bx200:w</title>
<path fill="none" stroke="black" d="M212,-319C264.39,-319 277.93,-333.02 327.74,-333.95"/>
<polygon fill="black" stroke="black" points="327.98,-335.7 333,-334 328.02,-332.2 327.98,-335.7"/>
<text text-anchor="middle" x="272.5" y="-331.2" font-family="Fira Mono" font-size="11.00">jbe</text>
</g>
<!-- Bx2f8 -->
<g id="node16" class="node">
<title>Bx2f8</title>
<polygon fill="none" stroke="black" points="334,-251 334,-272 374,-272 374,-251 334,-251"/>
<text text-anchor="start" x="337" y="-257.8" font-family="Fira Mono" font-size="14.00">POST</text>
</g>
<!-- Bxd0->Bx2f8 -->
<g id="edge9" class="edge">
<title>Bxd0:f3->Bx2f8:w</title>
<path fill="none" stroke="black" d="M212,-276C264.39,-276 277.93,-261.98 327.74,-261.05"/>
<polygon fill="black" stroke="black" points="328.02,-262.8 333,-261 327.98,-259.3 328.02,-262.8"/>
<text text-anchor="middle" x="272.5" y="-273.2" font-family="Fira Mono" font-size="11.00">jne</text>
</g>
<!-- Bx190 -->
<g id="node6" class="node">
<title>Bx190</title>
<polygon fill="none" stroke="black" points="497,-412 497,-433 537,-433 537,-412 497,-412"/>
<text text-anchor="start" x="500" y="-418.8" font-family="Fira Mono" font-size="14.00">NOTI</text>
<polygon fill="none" stroke="black" points="497,-391 497,-412 537,-412 537,-391 497,-391"/>
<text text-anchor="start" x="500" y="-397.8" font-family="Fira Mono" font-size="14.00">OPTI</text>
</g>
<!-- Bx118->Bx190 -->
<g id="edge10" class="edge">
<title>Bx118:f0->Bx190:w</title>
<path fill="none" stroke="black" d="M375,-423C426.99,-423 441.28,-423 490.77,-423"/>
<polygon fill="black" stroke="black" points="491,-424.75 496,-423 491,-421.25 491,-424.75"/>
<text text-anchor="middle" x="435.5" y="-426.2" font-family="Fira Mono" font-size="11.00">jbe</text>
</g>
<!-- Bx230 -->
<g id="node10" class="node">
<title>Bx230</title>
<polygon fill="none" stroke="black" points="497,-340 497,-361 537,-361 537,-340 497,-340"/>
<text text-anchor="start" x="500" y="-346.8" font-family="Fira Mono" font-size="14.00">LINK</text>
</g>
<!-- Bx118->Bx230 -->
<g id="edge11" class="edge">
<title>Bx118:f2->Bx230:w</title>
<path fill="none" stroke="black" d="M375,-380C428.67,-380 439.91,-351.85 490.94,-350.09"/>
<polygon fill="black" stroke="black" points="491.03,-351.83 496,-350 490.97,-348.34 491.03,-351.83"/>
<text text-anchor="middle" x="435.5" y="-372.2" font-family="Fira Mono" font-size="11.00">jne</text>
</g>
<!-- Bx1c0 -->
<g id="node7" class="node">
<title>Bx1c0</title>
<polygon fill="none" stroke="black" points="334,-79 334,-100 374,-100 374,-79 334,-79"/>
<text text-anchor="start" x="337" y="-85.8" font-family="Fira Mono" font-size="14.00">ACL </text>
<polygon fill="none" stroke="black" points="334,-58 334,-79 374,-79 374,-58 334,-58"/>
<text text-anchor="start" x="337" y="-64.8" font-family="Fira Mono" font-size="14.00">GET </text>
</g>
<!-- Bx150->Bx1c0 -->
<g id="edge12" class="edge">
<title>Bx150:f0->Bx1c0:w</title>
<path fill="none" stroke="black" d="M212,-90C263.99,-90 278.28,-90 327.77,-90"/>
<polygon fill="black" stroke="black" points="328,-91.75 333,-90 328,-88.25 328,-91.75"/>
<text text-anchor="middle" x="272.5" y="-93.2" font-family="Fira Mono" font-size="11.00">jbe</text>
</g>
<!-- Bx1e5 -->
<g id="node8" class="node">
<title>Bx1e5</title>
<polygon fill="none" stroke="black" points="660,-391 660,-412 700,-412 700,-391 660,-391"/>
<text text-anchor="start" x="663" y="-397.8" font-family="Fira Mono" font-size="14.00">UNLI</text>
</g>
<!-- Bx190->Bx1e5 -->
<g id="edge13" class="edge">
<title>Bx190:f1->Bx1e5:w</title>
<path fill="none" stroke="black" d="M538,-401C589.99,-401 604.28,-401 653.77,-401"/>
<polygon fill="black" stroke="black" points="654,-402.75 659,-401 654,-399.25 654,-402.75"/>
<text text-anchor="middle" x="598.5" y="-404.2" font-family="Fira Mono" font-size="11.00">jne</text>
</g>
<!-- Bx245 -->
<g id="node11" class="node">
<title>Bx245</title>
<polygon fill="none" stroke="black" points="497,-286 497,-307 537,-307 537,-286 497,-286"/>
<text text-anchor="start" x="500" y="-292.8" font-family="Fira Mono" font-size="14.00">REPO</text>
</g>
<!-- Bx200->Bx245 -->
<g id="edge14" class="edge">
<title>Bx200:f1->Bx245:w</title>
<path fill="none" stroke="black" d="M375,-312C427.44,-312 440.88,-297.04 490.73,-296.05"/>
<polygon fill="black" stroke="black" points="491.02,-297.8 496,-296 490.98,-294.3 491.02,-297.8"/>
<text text-anchor="middle" x="435.5" y="-309.2" font-family="Fira Mono" font-size="11.00">jne</text>
</g>
<!-- Bx2c0 -->
<g id="node14" class="node">
<title>Bx2c0</title>
<polygon fill="none" stroke="black" points="497,-180 497,-201 537,-201 537,-180 497,-180"/>
<text text-anchor="start" x="500" y="-186.8" font-family="Fira Mono" font-size="14.00">BIND</text>
</g>
<!-- Bx270->Bx2c0 -->
<g id="edge15" class="edge">
<title>Bx270:f1->Bx2c0:w</title>
<path fill="none" stroke="black" d="M375,-190C426.99,-190 441.28,-190 490.77,-190"/>
<polygon fill="black" stroke="black" points="491,-191.75 496,-190 491,-188.25 491,-191.75"/>
<text text-anchor="middle" x="435.5" y="-193.2" font-family="Fira Mono" font-size="11.00">jne</text>
</g>
</g>
</svg>
<p>Since the switch statement has been decoded into multiple comparisons, we'll need to postprocess the data and sum the count of branch misses across multiple instructions.</p>
<p>We also see that not all misses are recorded on a jump instruction. E.g.</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"><span class="z-comment z-assembly">;Samples Instructions ; Comment</span>
<span class="z-comment z-assembly">; ...</span>
<span class="z-constant z-character z-decimal z-assembly">97</span> <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x43414b4d</span> <span class="z-comment z-assembly">; buf[:4] == "MKAC"</span>
↓ <span class="z-keyword z-control z-assembly">je</span> <span class="z-constant z-character z-decimal z-assembly">488</span>
<span class="z-constant z-character z-decimal z-assembly">108</span> ↓ <span class="z-keyword z-control z-assembly">jbe</span> <span class="z-constant z-character z-decimal z-assembly">150</span>
<span class="z-comment z-assembly">; ...</span>
</span></code></pre>
<p>This is a phenomenon known as skew. Namely, since there are multiple instructions in flight in an out-of-order processor, the CPU cannot precisely attribute the branch miss to a single instruction and can be off by an instruction or two.</p>
<!-- TODO Rewrite this! -->
<p>To account for this we can use the following heuristic to attribute branch misses to the switch statement:</p>
<ul>
<li>Assign each branch miss to the previous <code>cmp</code> instruction seen in the assembly.</li>
<li>Aggretate across the argument of the <code>cmp</code> instructions. Instructions of the kind <code>cmp eax,0x[...]</code> correspond to our switch statement.</li>
</ul>
<p>When we do so, we see that more than 40% of the branch misses come from our switch statement:</p>
<!-- TODO compress table? -->
<table><thead><tr><th style="text-align: right">kind</th><th style="text-align: right">miss cnt</th><th style="text-align: left">miss %</th></tr></thead><tbody>
<tr><td style="text-align: right"><code>cmp eax</code></td><td style="text-align: right">1830</td><td style="text-align: left">47.7 %</td></tr>
<tr><td style="text-align: right"><code>mov</code></td><td style="text-align: right">718</td><td style="text-align: left">18.7 %</td></tr>
<tr><td style="text-align: right"><code>cmp DWORD PTR [rsp+0x4]</code></td><td style="text-align: right">642</td><td style="text-align: left">16.7 %</td></tr>
<tr><td style="text-align: right"><code>cmp DWORD PTR [rbp+0x4]</code></td><td style="text-align: right">193</td><td style="text-align: left">5.0 %</td></tr>
<tr><td style="text-align: right"><code>jmp</code></td><td style="text-align: right">149</td><td style="text-align: left">3.9 %</td></tr>
<tr><td style="text-align: right"><code>cmp BYTE PTR [rbp+0x4]</code></td><td style="text-align: right">128</td><td style="text-align: left">3.3 %</td></tr>
<tr><td style="text-align: right"><code>cmp rbx</code></td><td style="text-align: right">54</td><td style="text-align: left">1.4 %</td></tr>
<tr><td style="text-align: right"><code>lea</code></td><td style="text-align: right">49</td><td style="text-align: left">1.3 %</td></tr>
<tr><td style="text-align: right"><code>add</code></td><td style="text-align: right">23</td><td style="text-align: left">0.6 %</td></tr>
<tr><td style="text-align: right"><code>cmp WORD PTR [rbp+0x8]</code></td><td style="text-align: right">23</td><td style="text-align: left">0.6 %</td></tr>
<tr><td style="text-align: right"><code>xor</code></td><td style="text-align: right">19</td><td style="text-align: left">0.5 %</td></tr>
<tr><td style="text-align: right"><code>cmp rax</code></td><td style="text-align: right">3</td><td style="text-align: left">0.1 %</td></tr>
<tr><td style="text-align: right"><code>cmp BYTE PTR [rbp+0x8]</code></td><td style="text-align: right">3</td><td style="text-align: left">0.1 %</td></tr>
</tbody></table>
<h3 id="branchless-swar-using-pext">Branchless SWAR using <code>pext</code></h3>
<p>We want to eliminate the switch statement and replace it with a branchless solution, such as a jump table.
However, the compiler didn't automatically do this for us because the values we're comparing against are sparse, which would result in a large jump table.
A naive jump table of pointers would be 4 megabytes.</p>
<p>But what if we ignored some of the bits in our comparison value? Most of our verbs consist of uppercase ASCII letters. By examining the binary representation of uppercase ASCII letters, we can see that the upper 3 bits of each letter are always the same.</p>
<table><thead><tr><th style="text-align: right">C</th><th style="text-align: left">Binary</th><th style="text-align: right">C</th><th style="text-align: left">Binary</th></tr></thead><tbody>
<tr><td style="text-align: right">A</td><td style="text-align: left">0b010<strong>00001</strong></td><td style="text-align: right">N</td><td style="text-align: left">0b010<strong>01110</strong></td></tr>
<tr><td style="text-align: right">B</td><td style="text-align: left">0b010<strong>00010</strong></td><td style="text-align: right">O</td><td style="text-align: left">0b010<strong>01111</strong></td></tr>
<tr><td style="text-align: right">C</td><td style="text-align: left">0b010<strong>00011</strong></td><td style="text-align: right">P</td><td style="text-align: left">0b010<strong>10000</strong></td></tr>
<tr><td style="text-align: right">D</td><td style="text-align: left">0b010<strong>00100</strong></td><td style="text-align: right">Q</td><td style="text-align: left">0b010<strong>10001</strong></td></tr>
<tr><td style="text-align: right">E</td><td style="text-align: left">0b010<strong>00101</strong></td><td style="text-align: right">R</td><td style="text-align: left">0b010<strong>10010</strong></td></tr>
<tr><td style="text-align: right">F</td><td style="text-align: left">0b010<strong>00110</strong></td><td style="text-align: right">S</td><td style="text-align: left">0b010<strong>10011</strong></td></tr>
<tr><td style="text-align: right">G</td><td style="text-align: left">0b010<strong>00111</strong></td><td style="text-align: right">T</td><td style="text-align: left">0b010<strong>10100</strong></td></tr>
<tr><td style="text-align: right">H</td><td style="text-align: left">0b010<strong>01000</strong></td><td style="text-align: right">U</td><td style="text-align: left">0b010<strong>10101</strong></td></tr>
<tr><td style="text-align: right">I</td><td style="text-align: left">0b010<strong>01001</strong></td><td style="text-align: right">V</td><td style="text-align: left">0b010<strong>10110</strong></td></tr>
<tr><td style="text-align: right">J</td><td style="text-align: left">0b010<strong>01010</strong></td><td style="text-align: right">W</td><td style="text-align: left">0b010<strong>10111</strong></td></tr>
<tr><td style="text-align: right">K</td><td style="text-align: left">0b010<strong>01011</strong></td><td style="text-align: right">X</td><td style="text-align: left">0b010<strong>11000</strong></td></tr>
<tr><td style="text-align: right">L</td><td style="text-align: left">0b010<strong>01100</strong></td><td style="text-align: right">Y</td><td style="text-align: left">0b010<strong>11001</strong></td></tr>
<tr><td style="text-align: right">M</td><td style="text-align: left">0b010<strong>01101</strong></td><td style="text-align: right">Z</td><td style="text-align: left">0b010<strong>11010</strong></td></tr>
</tbody></table>
<p>Even discarding those 3 bits would reduce our jump table size to 1 kilobyte! </p>
<p>Ideally we would want to discard even more bits, to maximally compress our jump table.
To do this, we need:</p>
<ol>
<li>A fast way to extract the bits.</li>
<li>A fast to search for a subset of bits that <em>uniquely</em> identify a verb.</li>
</ol>
<!-- TODO comment about AMD, and pre-Skylake. Maybe illustrate `pext` -->
<p>The first requirement is straightforward. <code>pext</code> accomplishes this in a single cycle.</p>
<!-- TODO? Explain snoob? -->
<p>The second requirement is a bit more complex. Instead of performing a naive search over all possible 32-bit unsigned integers, we can use a technique called "snoob" that allows us to enumerate only integers with a specific number of set bits.
For example, a brute force search for a bitmask that extracts 7 bits needs up to four billion iterations. One that uses <code>snoob</code> needs only three <em>million</em>.</p>
<!-- TODO getting clever with suffixes doesn't work either... -->
<p>However, even with this optimization, we still face a challenge with the verbs PROPFIND and PROPPATCH, as they share a common 4-character prefix. While we could add a branch to handle this particular case, our preference is to maintain a completely branchless solution.
To distinguish them we exploit the fact that <code>PROPFIND</code> and <code>PROPPATCH</code> have a different length.</p>
<p>A quick python oneliner confirms that the first four characters of our verb together with its length uniquely identify the verb.</p>
<pre class="z-code"><code><span class="z-text z-plain">Python 3.11.3 (main, May 24 2023, 00:00:00) [GCC 12.3.1 20230508 (Red Hat 12.3.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> verbs = [
"ACL", "BIND", "CHECKOUT", "CONNECT", "COPY", "DELETE",
"GET", "HEAD", "LINK", "LOCK", "M-SEARCH", "MERGE",
"MKACTIVITY", "MKCALENDAR", "MKCOL", "MOVE", "NOTIFY", "OPTIONS",
"PATCH", "POST", "PROPFIND", "PROPPATCH", "PURGE", "PUT",
"REBIND", "REPORT", "SEARCH", "SUBSCRIBE", "TRACE", "UNBIND",
"UNLINK", "UNLOCK", "UNSUBSCRIBE",
]
>>> prefix_and_len = {(len(v), v[:4]) for v in verbs}
>>> len(verbs) == len(prefix_and_len)
True
</span></code></pre>
<p>The length has another nice property. Our function takes an <code>std::string_view</code> as a parameter.
A <code>string_view</code> is implemented as a struct of pointer to data and length.
Due to the x64 calling convention, the length will already be passed in a register (<code>rdi</code> in this case). We can confirm this by examining the assembly code once more:</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"><span class="z-comment z-assembly">; Here we implement the check</span>
<span class="z-comment z-assembly">; if (v.size() < 3 or v.size() > 13)</span>
<span class="z-comment z-assembly">; return verb::unknown;</span>
<span class="z-comment z-assembly">; $rdi is the length of `v`</span>
<span class="z-keyword z-control z-assembly">lea</span> <span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">,</span><span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">-</span><span class="z-constant z-character z-hexadecimal z-assembly">0x3</span><span class="z-source z-assembly">]</span> <span class="z-comment z-assembly">; $rax = v.size() - 3ul</span>
<span class="z-comment z-assembly">; Notice that the previous subtraction *underflows* if `v.size() < 3`</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0xa</span> <span class="z-comment z-assembly">; check if `$rax > 10`. If `v.size() < 3`, `$rax` is a very big number and this check still succeeds</span>
<span class="z-keyword z-control z-assembly">ja</span> <span class="z-constant z-character z-decimal z-assembly">510</span>
<span class="z-support z-function z-directive z-assembly">[...]</span>
</span></code></pre>
<p>We can simply "mix in" the string length by xor-ing with the output of <code>pext</code>:</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"><span class="z-support z-type z-stdint z-c">uint32_t</span> <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++"><span class="z-entity z-name z-function z-c++">find_mask</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++"> </span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-support z-type z-stdint z-c">uint32_t</span> best_mask <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span><span class="z-constant z-numeric z-suffix z-c++">u</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-support z-type z-stdint z-c">uint32_t</span> mask <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-binary z-c++"><span class="z-constant z-numeric z-base z-c++">0b</span><span class="z-constant z-numeric z-value z-c++">1'111'111</span><span class="z-constant z-numeric z-suffix z-c++">u</span></span><span class="z-punctuation z-terminator z-c++">;</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">snoob</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">mask</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-comparison z-c">></span> mask<span class="z-punctuation z-terminator z-c++">;</span> mask <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">snoob</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">mask</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> Our lookup function. Reads up to the first 4 bytes, does `pext` with mask and then XORs with the size
</span> <span class="z-storage z-type z-c">auto</span> lam <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-keyword z-operator z-assignment z-c">=</span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>string_view sv<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-support z-type z-stdint z-c">uint32_t</span> val <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span><span class="z-constant z-numeric z-suffix z-c++">u</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-meta z-function-call z-c++"><span class="z-support z-function z-C99 z-c">memcpy</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-keyword z-operator z-c">&</span>val<span class="z-punctuation z-separator z-c++">,</span> sv<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">data</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">min</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">sv<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-keyword z-operator z-word z-c++">sizeof</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span><span class="z-meta z-group z-c++">val</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">_pext_u32</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">val<span class="z-punctuation z-separator z-c++">,</span> mask</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-arithmetic z-c">^</span> sv<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> VERBS_MAP is an hash map of verb to number
</span> <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> `is_uniq` checks that all keys in VERBS_MAP hash to different indices
</span> <span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">is_uniq</span><span class="z-punctuation z-definition z-generic z-begin z-c++"><</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">128</span><span class="z-constant z-numeric z-suffix z-c++">ul</span></span><span class="z-punctuation z-definition z-generic z-end z-c++">></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">VERBS_MAP<span class="z-punctuation z-separator z-c++">,</span> lam</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
best_mask <span class="z-keyword z-operator z-assignment z-c">=</span> mask<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-flow z-break z-c++">break</span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> best_mask<span class="z-punctuation z-terminator z-c++">;</span> <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> Will be zero if no good mask is found
</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span>
</span></code></pre>
<p>This runs in ~50ms and quickly finds an appropriate bitmask:</p>
<pre data-lang="shell" class="language-shell z-code"><code class="language-shell" data-lang="shell"><span class="z-text z-plain">$ ./find_mask
Found mask: 33687137
</span></code></pre>
<p>The implementation is very straightforward. For simplicity, I generate the lookup table at startup.
An alternative would be to use code generation.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"><span class="z-storage z-modifier z-c++">static</span> <span class="z-storage z-modifier z-c++">const</span> <span class="z-support z-type z-stdint z-c">uint32_t</span> MASK <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">33687137</span><span class="z-constant z-numeric z-suffix z-c++">u</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-support z-type z-stdint z-c">uint32_t</span> <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++"><span class="z-entity z-name z-function z-c++">lut_idx</span> </span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++">std<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>string_view <span class="z-variable z-parameter z-c++">sv</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++"> </span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-support z-type z-stdint z-c">uint32_t</span> val <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span><span class="z-constant z-numeric z-suffix z-c++">u</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-meta z-function-call z-c++"><span class="z-support z-function z-C99 z-c">memcpy</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-keyword z-operator z-c">&</span>val<span class="z-punctuation z-separator z-c++">,</span> sv<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">data</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-meta z-function-call z-c++">std<span class="z-punctuation z-accessor z-double-colon z-c++">::</span><span class="z-variable z-function z-c++">min</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">sv<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-separator z-c++">,</span> <span class="z-keyword z-operator z-word z-c++">sizeof</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span><span class="z-meta z-group z-c++">val</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">_pext_u32</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">val<span class="z-punctuation z-separator z-c++">,</span> MASK</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-arithmetic z-c">^</span> sv<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
</span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">using</span> <span class="z-entity z-name z-type z-using z-c++">lut_elem_t</span> <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-support z-type z-stdint z-c">uint8_t</span><span class="z-punctuation z-terminator z-c++">;</span> <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span>verb;
</span><span class="z-storage z-modifier z-c++">static</span> <span class="z-storage z-modifier z-c++">const</span> std<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>array<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>lut_elem_t<span class="z-punctuation z-separator z-c++">,</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">128</span></span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> LOOKUP_TABLE <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-keyword z-control z-c++">using</span> <span class="z-keyword z-control z-c++">namespace</span> boost<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>beast<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>http<span class="z-punctuation z-terminator z-c++">;</span>
std<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>array<span class="z-punctuation z-definition z-generic z-begin z-c++"><</span>lut_elem_t<span class="z-punctuation z-separator z-c++">,</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">128</span></span><span class="z-punctuation z-definition z-generic z-end z-c++">></span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">lookup_table</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">{</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>lut_elem_t<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> verb<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>unknown</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">}</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">for</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-storage z-type z-c">unsigned</span> <span class="z-storage z-type z-c">int</span> as_int <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-terminator z-c++">;</span> as_int <span class="z-keyword z-operator z-comparison z-c"><</span> verb_count<span class="z-punctuation z-terminator z-c++">;</span> <span class="z-keyword z-operator z-arithmetic z-c">++</span>as_int<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
std<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>string_view sv <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">as_string</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>verb<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> as_int</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
lookup_table<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">lut_idx</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">sv</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span> <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>lut_elem_t<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> as_int<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> lookup_table<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span><span class="z-punctuation z-terminator z-c++">;</span>
verb
<span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++"><span class="z-entity z-name z-function z-c++">string_to_verb</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++">std<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>string_view <span class="z-variable z-parameter z-c++">v</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++">
</span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>v<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-comparison z-c"><</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">3</span></span> <span class="z-keyword z-operator z-word z-c++">or</span> v<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-comparison z-c">></span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">13</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> verb<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>unknown<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
verb res <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>verb<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> LOOKUP_TABLE<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">lut_idx</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">v</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>v <span class="z-keyword z-operator z-comparison z-c">==</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">as_string</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">res</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> res<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> verb<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>unknown<span class="z-punctuation z-terminator z-c++">;</span>
</span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span>
</span></code></pre>
<p>With this change, we see greatly improved results. All branch misses are gone.
On the full benchmark, this implementation (dubbed <code>pext</code>) is the second fastest, beaten only the hash table generated by <code>gperf</code>.</p>
<table><thead><tr><th style="text-align: right">ns/op</th><th style="text-align: right">err%</th><th style="text-align: right">cyc/op</th><th style="text-align: right">miss%</th><th style="text-align: left">all verbs</th></tr></thead><tbody>
<tr><td style="text-align: right">63.68</td><td style="text-align: right">0.1%</td><td style="text-align: right">126.02</td><td style="text-align: right">21.7%</td><td style="text-align: left"><code>reference_impl</code></td></tr>
<tr><td style="text-align: right">68.98</td><td style="text-align: right">0.2%</td><td style="text-align: right">136.47</td><td style="text-align: right">10.8%</td><td style="text-align: left"><code>swar32</code></td></tr>
<tr><td style="text-align: right">29.22</td><td style="text-align: right">0.4%</td><td style="text-align: right">57.66</td><td style="text-align: right">0.6%</td><td style="text-align: left"><code>perfect_hash</code></td></tr>
<tr><td style="text-align: right">44.21</td><td style="text-align: right">0.2%</td><td style="text-align: right">87.50</td><td style="text-align: right">14.9%</td><td style="text-align: left"><code>pext_by_len</code></td></tr>
<tr><td style="text-align: right">39.74</td><td style="text-align: right">0.2%</td><td style="text-align: right">78.70</td><td style="text-align: right">0.4%</td><td style="text-align: left"><code>pext</code></td></tr>
</tbody></table>
<p>On the common verbs benchmark results are less good, but still improved.</p>
<table><thead><tr><th style="text-align: right">ns/op</th><th style="text-align: right">err%</th><th style="text-align: right">cyc/op</th><th style="text-align: right">miss%</th><th style="text-align: left">GET/PUT/POST</th></tr></thead><tbody>
<tr><td style="text-align: right">45.76</td><td style="text-align: right">0.7%</td><td style="text-align: right">90.62</td><td style="text-align: right">9.8%</td><td style="text-align: left"><code>reference_impl</code></td></tr>
<tr><td style="text-align: right">49.45</td><td style="text-align: right">0.1%</td><td style="text-align: right">97.92</td><td style="text-align: right">3.6%</td><td style="text-align: left"><code>swar32</code></td></tr>
<tr><td style="text-align: right">38.07</td><td style="text-align: right">0.9%</td><td style="text-align: right">75.15</td><td style="text-align: right">2.7%</td><td style="text-align: left"><code>perfect_hash</code></td></tr>
<tr><td style="text-align: right">31.65</td><td style="text-align: right">0.2%</td><td style="text-align: right">62.65</td><td style="text-align: right">7.2%</td><td style="text-align: left"><code>pext_by_len</code></td></tr>
<tr><td style="text-align: right">43.15</td><td style="text-align: right">0.2%</td><td style="text-align: right">85.44</td><td style="text-align: right">1.8%</td><td style="text-align: left"><code>pext</code></td></tr>
</tbody></table>
<p>Here our solution is beaten both by <code>gperf</code> and by the per-length <code>pext</code> solution.</p>
<h3 id="memcpy-woes">Memcpy woes</h3>
<p>Despite the improvements in the previous section, our solution is still dominated by <code>gperf</code>. Can we do better?</p>
<p>Let us profile again with <code>perf</code>, this time focusing on cycles spent on each instruction.</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"><span class="z-comment z-assembly">; ...</span>
<span class="z-constant z-character z-decimal z-assembly">0</span>.<span class="z-constant z-character z-decimal z-assembly">05</span> │ <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x4</span>
<span class="z-constant z-character z-decimal z-assembly">1</span>.<span class="z-constant z-character z-decimal z-assembly">51</span> │ <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rax</span>
<span class="z-constant z-character z-decimal z-assembly">0</span>.<span class="z-constant z-character z-decimal z-assembly">01</span> │ <span class="z-keyword z-control z-assembly">cmovbe</span> <span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rdi</span> <span class="z-comment z-assembly">; Compute min(sv.size(), 4);</span>
<span class="z-constant z-character z-decimal z-assembly">0</span>.<span class="z-constant z-character z-decimal z-assembly">04</span> │ <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">rbp</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rsi</span>
<span class="z-constant z-character z-decimal z-assembly">1</span>.<span class="z-constant z-character z-decimal z-assembly">81</span> │ <span class="z-keyword z-control z-assembly">mov</span> <span class="z-support z-function z-directive z-assembly">DWORD</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsp</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0xc</span><span class="z-source z-assembly">]</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x0</span> <span class="z-comment z-assembly">; Zero out buf</span>
<span class="z-constant z-character z-decimal z-assembly">0</span>.<span class="z-constant z-character z-decimal z-assembly">01</span> │ <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">rbx</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rdi</span>
<span class="z-constant z-character z-decimal z-assembly">0</span>.<span class="z-constant z-character z-decimal z-assembly">01</span> │ <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">esi</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">eax</span> <span class="z-comment z-assembly">; Put min(sv.size(), 4) into $esi</span>
<span class="z-constant z-character z-decimal z-assembly">0</span>.<span class="z-constant z-character z-decimal z-assembly">03</span> │ <span class="z-keyword z-control z-assembly">test</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">eax</span>
│ ↓ <span class="z-keyword z-control z-assembly">je</span> <span class="z-constant z-character z-decimal z-assembly">45</span>
<span class="z-constant z-character z-decimal z-assembly">2</span>.<span class="z-constant z-character z-decimal z-assembly">63</span> │ <span class="z-keyword z-control z-assembly">xor</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-comment z-assembly">; GCC has inlined std::memcpy. This is a for loop that copies the bytes</span>
<span class="z-comment z-assembly">; from the input string one by one into `buf`</span>
<span class="z-comment z-assembly">; $eax is the counter, $esi is min(sv.size(), 4), $rbp is a pointer to the input string</span>
<span class="z-comment z-assembly">; and $rsp+0xc is a pointer to buf</span>
<span class="z-constant z-character z-decimal z-assembly">2</span>.<span class="z-constant z-character z-decimal z-assembly">35</span> │<span class="z-constant z-character z-decimal z-assembly">34</span>: <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">edx</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-constant z-character z-decimal z-assembly">2</span>.<span class="z-constant z-character z-decimal z-assembly">80</span> │ <span class="z-keyword z-control z-assembly">movzx</span> <span class="z-variable z-parameter z-register z-assembly">ecx</span><span class="z-source z-assembly">,</span><span class="z-support z-function z-directive z-assembly">BYTE</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rbp</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x0</span><span class="z-source z-assembly">]</span>
<span class="z-constant z-character z-decimal z-assembly">2</span>.<span class="z-constant z-character z-decimal z-assembly">72</span> │ <span class="z-keyword z-control z-assembly">inc</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-constant z-character z-decimal z-assembly">2</span>.<span class="z-constant z-character z-decimal z-assembly">62</span> │ <span class="z-keyword z-control z-assembly">mov</span> <span class="z-support z-function z-directive z-assembly">BYTE</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsp</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0xc</span><span class="z-source z-assembly">]</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">cl</span>
<span class="z-constant z-character z-decimal z-assembly">2</span>.<span class="z-constant z-character z-decimal z-assembly">42</span> │ <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">esi</span>
<span class="z-constant z-character z-decimal z-assembly">0</span>.<span class="z-constant z-character z-decimal z-assembly">05</span> │ ↑ <span class="z-keyword z-control z-assembly">jb</span> <span class="z-constant z-character z-decimal z-assembly">34</span>
<span class="z-comment z-assembly">; After we have finished copying the input string into buf, copy buf back into a register</span>
<span class="z-constant z-character z-decimal z-assembly">38</span>.<span class="z-constant z-character z-decimal z-assembly">93</span> │<span class="z-constant z-character z-decimal z-assembly">45</span>: <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-support z-function z-directive z-assembly">DWORD</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsp</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0xc</span><span class="z-source z-assembly">]</span>
<span class="z-constant z-character z-decimal z-assembly">0</span>.<span class="z-constant z-character z-decimal z-assembly">08</span> │ <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">edx</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x2020661</span>
<span class="z-constant z-character z-decimal z-assembly">8</span>.<span class="z-constant z-character z-decimal z-assembly">38</span> │ <span class="z-keyword z-control z-assembly">pext</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">edx</span>
<span class="z-constant z-character z-decimal z-assembly">2</span>.<span class="z-constant z-character z-decimal z-assembly">96</span> │ <span class="z-keyword z-control z-assembly">xor</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ebx</span>
<span class="z-constant z-character z-decimal z-assembly">14</span>.<span class="z-constant z-character z-decimal z-assembly">18</span> │ <span class="z-keyword z-control z-assembly">movzx</span> <span class="z-variable z-parameter z-register z-assembly">r12d</span><span class="z-source z-assembly">,</span><span class="z-support z-function z-directive z-assembly">BYTE</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x424620</span><span class="z-source z-assembly">]</span>
<span class="z-comment z-assembly">; ...</span>
</span></code></pre>
<p>We see that <code>gcc</code> is loading data from memory in a non-optimal way. It has inlined <code>memcpy</code> into a for loop.
This is likely because we are passing a variable length to memcpy.</p>
<!-- TODO Rewrite this? Also, note that this is undefined behaviour in C++, but explicitely supported by GCC and Clang -->
<p>We can get <code>gcc</code> to generate better assembly by giving it an hint.
We can use a union to load the first three characters, and load the fourth character conditionally on input length.
This is undefined behaviour in C++, but is explicitely supported by <code>gcc</code>.</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"><span class="z-support z-type z-stdint z-c">uint32_t</span> <span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++"><span class="z-entity z-name z-function z-c++">load_sv</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++">std<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>string_view <span class="z-variable z-parameter z-c++">sv</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++"> </span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-meta z-union z-c++"><span class="z-keyword z-declaration z-union z-type z-c++">union</span> </span><span class="z-meta z-union z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-union z-c++"><span class="z-meta z-block z-c++">
<span class="z-storage z-type z-c">char</span> as_char<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">4</span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-support z-type z-stdint z-c">uint32_t</span> res<span class="z-punctuation z-terminator z-c++">;</span>
</span></span><span class="z-meta z-union z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span><span class="z-punctuation z-terminator z-c++">;</span>
as_char<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span> <span class="z-keyword z-operator z-assignment z-c">=</span> sv<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">0</span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-terminator z-c++">;</span>
as_char<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span> <span class="z-keyword z-operator z-assignment z-c">=</span> sv<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">1</span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-terminator z-c++">;</span>
as_char<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">2</span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span> <span class="z-keyword z-operator z-assignment z-c">=</span> sv<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">2</span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-terminator z-c++">;</span>
as_char<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">3</span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span> <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>sv<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-comparison z-c">></span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">3</span><span class="z-constant z-numeric z-suffix z-c++">ul</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-keyword z-operator z-ternary z-c">?</span> sv<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">3</span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span> <span class="z-keyword z-operator z-ternary z-c">:</span> <span class="z-string z-quoted z-single z-c"><span class="z-punctuation z-definition z-string z-begin z-c">'</span><span class="z-constant z-character z-escape z-c">\0</span><span class="z-punctuation z-definition z-string z-end z-c">'</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> res<span class="z-punctuation z-terminator z-c++">;</span>
</span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span>
</span></code></pre>
<p>With this change, our implementation is competitive with <code>gperf</code> on both benchmarks.</p>
<table><thead><tr><th style="text-align: right">ns/op</th><th style="text-align: right">err%</th><th style="text-align: right">cyc/op</th><th style="text-align: right">miss%</th><th style="text-align: left">all verbs</th></tr></thead><tbody>
<tr><td style="text-align: right">29.22</td><td style="text-align: right">0.4%</td><td style="text-align: right">57.66</td><td style="text-align: right">0.6%</td><td style="text-align: left"><code>perfect_hash</code></td></tr>
<tr><td style="text-align: right">44.21</td><td style="text-align: right">0.2%</td><td style="text-align: right">87.50</td><td style="text-align: right">14.9%</td><td style="text-align: left"><code>pext_by_len</code></td></tr>
<tr><td style="text-align: right">39.74</td><td style="text-align: right">0.2%</td><td style="text-align: right">78.70</td><td style="text-align: right">0.4%</td><td style="text-align: left"><code>pext</code></td></tr>
<tr><td style="text-align: right">29.75</td><td style="text-align: right">1.4%</td><td style="text-align: right">58.71</td><td style="text-align: right">0.5%</td><td style="text-align: left"><code>pext v2</code></td></tr>
</tbody></table>
<p>It is still behind the per-length <code>pext</code> solution on the common verbs benchmark though:</p>
<table><thead><tr><th style="text-align: right">ns/op</th><th style="text-align: right">err%</th><th style="text-align: right">cyc/op</th><th style="text-align: right">miss%</th><th style="text-align: left">GET/PUT/POST</th></tr></thead><tbody>
<tr><td style="text-align: right">38.07</td><td style="text-align: right">0.9%</td><td style="text-align: right">75.15</td><td style="text-align: right">2.7%</td><td style="text-align: left"><code>perfect_hash</code></td></tr>
<tr><td style="text-align: right">31.65</td><td style="text-align: right">0.2%</td><td style="text-align: right">62.65</td><td style="text-align: right">7.2%</td><td style="text-align: left"><code>pext_by_len</code></td></tr>
<tr><td style="text-align: right">43.15</td><td style="text-align: right">0.2%</td><td style="text-align: right">85.44</td><td style="text-align: right">1.8%</td><td style="text-align: left"><code>pext</code></td></tr>
<tr><td style="text-align: right">38.67</td><td style="text-align: right">0.4%</td><td style="text-align: right">76.47</td><td style="text-align: right">2.5%</td><td style="text-align: left"><code>pext v2</code></td></tr>
</tbody></table>
<h3 id="unsafe-memcpy-optimizations">Unsafe memcpy optimizations</h3>
<p>A look at the assembly shows that we are still using 3 different movs to load the beginning of our string into memory.</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"><span class="z-comment z-assembly">; Samples</span>
<span class="z-constant z-character z-decimal z-assembly">8</span>.<span class="z-constant z-character z-decimal z-assembly">11</span> │ <span class="z-keyword z-control z-assembly">movzx</span> <span class="z-variable z-parameter z-register z-assembly">edx</span><span class="z-source z-assembly">,</span><span class="z-support z-function z-directive z-assembly">BYTE</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x2</span><span class="z-source z-assembly">]</span> <span class="z-comment z-assembly">; Load the first two bytes</span>
<span class="z-constant z-character z-decimal z-assembly">0</span>.<span class="z-constant z-character z-decimal z-assembly">38</span> │ <span class="z-keyword z-control z-assembly">movzx</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-support z-function z-directive z-assembly">WORD</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">]</span> <span class="z-comment z-assembly">; Load the third byte</span>
<span class="z-constant z-character z-decimal z-assembly">1</span>.<span class="z-constant z-character z-decimal z-assembly">02</span> │ <span class="z-keyword z-control z-assembly">shl</span> <span class="z-variable z-parameter z-register z-assembly">edx</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x10</span>
<span class="z-constant z-character z-decimal z-assembly">1</span>.<span class="z-constant z-character z-decimal z-assembly">28</span> │ <span class="z-keyword z-control z-assembly">or</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">edx</span> <span class="z-comment z-assembly">; Combine the first three bytes</span>
<span class="z-constant z-character z-decimal z-assembly">5</span>.<span class="z-constant z-character z-decimal z-assembly">70</span> │ <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">rbx</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rdi</span>
│ <span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">rbp</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rsi</span>
│ <span class="z-keyword z-control z-assembly">xor</span> <span class="z-variable z-parameter z-register z-assembly">edx</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">edx</span>
<span class="z-constant z-character z-decimal z-assembly">2</span>.<span class="z-constant z-character z-decimal z-assembly">11</span> │ <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x3</span> <span class="z-comment z-assembly">; Check if v.size() is bigger than 3</span>
<span class="z-constant z-character z-decimal z-assembly">0</span>.<span class="z-constant z-character z-decimal z-assembly">10</span> │ ↓ <span class="z-keyword z-control z-assembly">je</span> 2c <span class="z-comment z-assembly">; If v.size() is bigger than 3, load the fourth byte. Otherwise $edx will be zero</span>
<span class="z-constant z-character z-decimal z-assembly">6</span>.<span class="z-constant z-character z-decimal z-assembly">14</span> │ <span class="z-keyword z-control z-assembly">movzx</span> <span class="z-variable z-parameter z-register z-assembly">edx</span><span class="z-source z-assembly">,</span><span class="z-support z-function z-directive z-assembly">BYTE</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x3</span><span class="z-source z-assembly">]</span>
<span class="z-constant z-character z-decimal z-assembly">6</span>.<span class="z-constant z-character z-decimal z-assembly">49</span> │2c: <span class="z-keyword z-control z-assembly">shl</span> <span class="z-variable z-parameter z-register z-assembly">edx</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x18</span>
<span class="z-constant z-character z-decimal z-assembly">1</span>.<span class="z-constant z-character z-decimal z-assembly">05</span> │ <span class="z-keyword z-control z-assembly">or</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">edx</span> <span class="z-comment z-assembly">; Combine with the fourth byte</span>
</span></code></pre>
<p>Ideally, we would like to load the four bytes in a single <code>mov $edx, DWORD PTR [rsi]</code>, but we can't.
The issue is the following: a <code>string_view</code> is not guaranteed to be null-terminated. Therefore, for inputs of length 3 a DWORD read might touch uninitialized memory.</p>
<!-- FIXME Warning: unsafety ahead -->
<p>That said, we might be able to do better by borrowing some tricks from <code>glibc</code>.</p>
<!-- FIXME Link -->
<p>We've seen that the <code>memcmp</code> implementation reads beyond the end of the array when the input pointers are far from a page boundary.
Since we are loading only 4 bytes from one pointer, the page boundary check simplifies to:</p>
<script type="math/tex" is_fleqn="true"is_display="true">\newcommand\PageSize[0]{\textnormal{PageSize}}
\newcommand\BitAnd[0]{\mathbin{\&}}
\newcommand\Ptr[1]{{p}_{#1}}
\Ptr{1} \BitAnd \left( \PageSize \right) < \left( \PageSize - 4\right)</script>
<p>But we can simplify even further. Notice that, by the point we are loading our input string into a register, we have already checked that the length of the string is bigger than three.
Therefore the only case where we could cause a pagefault is when the input pointer is <em>exactly</em> 3 bytes before a page boundary.</p>
<ul>
<li>If the input pointer is at least 4 bytes from a page boundary, a DWORD read doesn't cross the page boundary.</li>
<li>If the input pointer is 2 or less bytes from a page boundary, a read <em>does</em> cross a page boundary, but we know that the next page is already allocated, since we know that at least 3 bytes after the input pointer are allocated.</li>
</ul>
<p>This further refines the page boundary check to:</p>
<script type="math/tex" is_fleqn="true"is_display="true">\Ptr{1} \BitAnd \left( \PageSize \right) \neq \left( \PageSize - 3\right)</script>
<p>At a first glance, inputs of length three still need extra handling.
We are loading a byte beyond the end of the string that we need to zero.
We could do so branchlessly using the <code>bzhi</code> instruction:</p>
<p>Initially, it would appear that inputs of length three require additional handling.
For inputs of length three we load a byte beyond the end of the string, which needs to be zeroed out.
One way to do so is the <code>bzhi</code> instruction:</p>
<pre data-lang="c" class="language-c z-code"><code class="language-c" data-lang="c"><span class="z-source z-c"><span class="z-support z-type z-stdint z-c">uint32_t</span> data <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">_bzhi_u32</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">load</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c">v<span class="z-punctuation z-accessor z-c">.</span><span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">data</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-separator z-c">,</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">8</span><span class="z-constant z-numeric z-suffix z-c">ul</span></span> <span class="z-keyword z-operator z-c">*</span> v<span class="z-punctuation z-accessor z-c">.</span><span class="z-meta z-function-call z-c"><span class="z-variable z-function z-c">size</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function-call z-c"></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span></span></span><span class="z-meta z-function-call z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-punctuation z-terminator z-c">;</span>
</span></code></pre>
<p>However, an even better approach is to do <em>nothing</em>. In other words, not masking the extra byte.</p>
<p>To understand why this works, observe that <code>pext</code> <em>discards</em> input bits, but <em>does not mix them</em>.
As a result, we can distinguish the output bits of <code>pext</code> into two groups:</p>
<ul>
<li>Bits extracted from the first three bytes of the input string (the <em>prefix</em>).</li>
<li>Bits extracted from the fourth byte of the input string (the <em>suffix</em>).</li>
</ul>
<p>We can also split the inputs into two kinds:</p>
<ul>
<li>Inputs of length three.</li>
<li>All other inputs.</li>
</ul>
<!-- TODO rewrite? -->
<p>Bits in the prefix are sufficient to distinguish strings of lenghth three by construction.
However the fourth byte only affects bits in the suffix.
Therefore it is enough to ensure that bits extracted from the prefix distinguish strings of length three
from keys of length four or more.</p>
<p>That would guarantee that altering the bits from the suffix cannot cause a collision.
We can simply add multiple entries in our lookup table for strings of length three,
one for each combination of bits in the suffix.</p>
<p>This is true for our mask, and can be enforced by exhaustive checking in our <code>find_mask</code> function.</p>
<!-- Aside: we are also xor-ing with the length, but the length is always smaller than 12, and due to endianess it
only touches the first four bits of the index, while only the 7th depends on the "extra" bytes.
-->
<p>With this improvement, we beat <code>gperf</code> on the full benchmark!</p>
<table><thead><tr><th style="text-align: right">ns/op</th><th style="text-align: right">err%</th><th style="text-align: right">cyc/op</th><th style="text-align: right">miss%</th><th style="text-align: left">all verbs</th></tr></thead><tbody>
<tr><td style="text-align: right">29.22</td><td style="text-align: right">0.4%</td><td style="text-align: right">57.66</td><td style="text-align: right">0.6%</td><td style="text-align: left"><code>perfect_hash</code></td></tr>
<tr><td style="text-align: right">44.21</td><td style="text-align: right">0.2%</td><td style="text-align: right">87.50</td><td style="text-align: right">14.9%</td><td style="text-align: left"><code>pext_by_len</code></td></tr>
<tr><td style="text-align: right">39.74</td><td style="text-align: right">0.2%</td><td style="text-align: right">78.70</td><td style="text-align: right">0.4%</td><td style="text-align: left"><code>pext</code></td></tr>
<tr><td style="text-align: right">29.75</td><td style="text-align: right">1.4%</td><td style="text-align: right">58.71</td><td style="text-align: right">0.5%</td><td style="text-align: left"><code>pext v2</code></td></tr>
<tr><td style="text-align: right">26.25</td><td style="text-align: right">0.7%</td><td style="text-align: right">51.71</td><td style="text-align: right">0.0%</td><td style="text-align: left"><code>pext v3</code></td></tr>
</tbody></table>
<p>We also beat <code>pext_by_len</code> on the restricted benchmark.</p>
<table><thead><tr><th style="text-align: right">ns/op</th><th style="text-align: right">err%</th><th style="text-align: right">cyc/op</th><th style="text-align: right">miss%</th><th style="text-align: left">GET/PUT/POST</th></tr></thead><tbody>
<tr><td style="text-align: right">38.07</td><td style="text-align: right">0.9%</td><td style="text-align: right">75.15</td><td style="text-align: right">2.7%</td><td style="text-align: left"><code>perfect_hash</code></td></tr>
<tr><td style="text-align: right">31.65</td><td style="text-align: right">0.2%</td><td style="text-align: right">62.65</td><td style="text-align: right">7.2%</td><td style="text-align: left"><code>pext_by_len</code></td></tr>
<tr><td style="text-align: right">43.15</td><td style="text-align: right">0.2%</td><td style="text-align: right">85.44</td><td style="text-align: right">1.8%</td><td style="text-align: left"><code>pext</code></td></tr>
<tr><td style="text-align: right">38.67</td><td style="text-align: right">0.4%</td><td style="text-align: right">76.47</td><td style="text-align: right">2.5%</td><td style="text-align: left"><code>pext v2</code></td></tr>
<tr><td style="text-align: right">26.06</td><td style="text-align: right">0.2%</td><td style="text-align: right">51.51</td><td style="text-align: right">0.0%</td><td style="text-align: left"><code>pext v3</code></td></tr>
</tbody></table>
<h3 id="optimizing-the-final-check">Optimizing the final check</h3>
<p>Can we do even better?
Running the latest version of our benchmark under perf shows that the bottleneck is now <code>memcmp</code>:</p>
<pre data-lang="sh" class="language-sh z-code"><code class="language-sh" data-lang="sh"><span class="z-source z-shell z-bash"><span class="z-meta z-function-call z-shell"><span class="z-variable z-function z-shell">$</span></span><span class="z-meta z-function-call z-arguments z-shell"> perf report<span class="z-variable z-parameter z-option z-shell"><span class="z-punctuation z-definition z-parameter z-shell"> -</span>M</span> intel<span class="z-variable z-parameter z-option z-shell"><span class="z-punctuation z-definition z-parameter z-shell"> --</span>input</span><span class="z-keyword z-operator z-assignment z-option z-shell">=</span>pext-v3-cycles-gcc.data</span>
<span class="z-meta z-function-call z-shell"><span class="z-variable z-function z-shell">Samples:</span></span><span class="z-meta z-function-call z-arguments z-shell"> 118K of event <span class="z-string z-quoted z-single z-shell"><span class="z-punctuation z-definition z-string z-begin z-shell">'</span>cycles:pppu<span class="z-punctuation z-definition z-string z-end z-shell">'</span></span>, Event count (approx.</span><span class="z-meta z-function-call z-shell"></span>)<span class="z-meta z-function-call z-shell"><span class="z-support z-function z-colon z-shell">:</span></span><span class="z-meta z-function-call z-arguments z-shell"> 51634360904</span>
<span class="z-meta z-function-call z-shell"><span class="z-variable z-function z-shell">Overhead</span></span><span class="z-meta z-function-call z-arguments z-shell"> Command Shared Object Symbol</span>
<span class="z-meta z-function-call z-shell"><span class="z-variable z-function z-shell">29.87<span class="z-meta z-group z-expansion z-job z-shell"><span class="z-punctuation z-definition z-variable z-job z-shell">%</span></span></span></span><span class="z-meta z-function-call z-arguments z-shell"> benchmark_nb benchmark_nb <span class="z-keyword z-control z-regexp z-set z-begin z-shell">[</span>.<span class="z-keyword z-control z-regexp z-set z-end z-shell">]</span> ankerl::nanobench::Bench::run<span class="z-keyword z-operator z-assignment z-redirection z-shell"><</span>run_benchmark<span class="z-keyword z-operator z-assignment z-redirection z-shell"><</span>boost::beast::http:</span>
<span class="z-meta z-function-call z-shell"><span class="z-variable z-function z-shell">29.50<span class="z-meta z-group z-expansion z-job z-shell"><span class="z-punctuation z-definition z-variable z-job z-shell">%</span></span></span></span><span class="z-meta z-function-call z-arguments z-shell"> benchmark_nb libc.so.6 <span class="z-keyword z-control z-regexp z-set z-begin z-shell">[</span>.<span class="z-keyword z-control z-regexp z-set z-end z-shell">]</span> __memcmp_avx2_movbe</span>
<span class="z-meta z-function-call z-shell"><span class="z-variable z-function z-shell">23.83<span class="z-meta z-group z-expansion z-job z-shell"><span class="z-punctuation z-definition z-variable z-job z-shell">%</span></span></span></span><span class="z-meta z-function-call z-arguments z-shell"> benchmark_nb benchmark_nb <span class="z-keyword z-control z-regexp z-set z-begin z-shell">[</span>.<span class="z-keyword z-control z-regexp z-set z-end z-shell">]</span> pext::string_to_verb_v3</span>
<span class="z-meta z-function-call z-shell"><span class="z-variable z-function z-shell">12.06<span class="z-meta z-group z-expansion z-job z-shell"><span class="z-punctuation z-definition z-variable z-job z-shell">%</span></span></span></span><span class="z-meta z-function-call z-arguments z-shell"> benchmark_nb benchmark_nb <span class="z-keyword z-control z-regexp z-set z-begin z-shell">[</span>.<span class="z-keyword z-control z-regexp z-set z-end z-shell">]</span> boost::beast::http::as_string</span>
<span class="z-meta z-function-call z-shell"><span class="z-variable z-function z-shell">2.02<span class="z-meta z-group z-expansion z-job z-shell"><span class="z-punctuation z-definition z-variable z-job z-shell">%</span></span></span></span><span class="z-meta z-function-call z-arguments z-shell"> benchmark_nb benchmark_nb <span class="z-keyword z-control z-regexp z-set z-begin z-shell">[</span>.<span class="z-keyword z-control z-regexp z-set z-end z-shell">]</span> memcmp@plt</span>
</span></code></pre>
<p>That is, the bottleneck is not hashing our input string view, but checking that
our hash has not given us a false positive:</p>
<pre data-lang="cpp" class="language-cpp z-code"><code class="language-cpp" data-lang="cpp"><span class="z-source z-c++"> verb
<span class="z-meta z-function z-c++"><span class="z-meta z-toc-list z-full-identifier z-c++"><span class="z-entity z-name z-function z-c++">string_to_verb_v3</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function z-parameters z-c++"><span class="z-meta z-group z-c++">std<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>string_view <span class="z-variable z-parameter z-c++">v</span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-meta z-function z-c++">
</span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span></span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++">
<span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>v<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-comparison z-c"><</span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">3</span></span> <span class="z-keyword z-operator z-word z-c++">or</span> v<span class="z-punctuation z-accessor z-dot z-c++">.</span><span class="z-meta z-method-call z-c++"><span class="z-variable z-function z-member z-c++">size</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-method-call z-c++"></span><span class="z-meta z-method-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span> <span class="z-keyword z-operator z-comparison z-c">></span> <span class="z-meta z-number z-integer z-decimal z-c++"><span class="z-constant z-numeric z-value z-c++">13</span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> verb<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>unknown<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
verb res <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>verb<span class="z-punctuation z-section z-group z-end z-c++">)</span></span> LOOKUP_TABLE<span class="z-meta z-brackets z-c++"><span class="z-punctuation z-section z-brackets z-begin z-c++">[</span><span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">lut_idx_v3</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">v</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-brackets z-end z-c++">]</span></span><span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> string_view::operator!= calls `memcmp` under the hood
</span><mark> <span class="z-keyword z-control z-c++">if</span> <span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span>v <span class="z-keyword z-operator z-comparison z-c">!=</span> <span class="z-meta z-function-call z-c++"><span class="z-variable z-function z-c++">as_string</span><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-begin z-c++">(</span></span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++">res</span></span><span class="z-meta z-function-call z-c++"><span class="z-meta z-group z-c++"><span class="z-punctuation z-section z-group z-end z-c++">)</span></span></span><span class="z-punctuation z-section z-group z-end z-c++">)</span></span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span> <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> BOTTLENECK!
</span></mark> <span class="z-keyword z-control z-flow z-return z-c++">return</span> verb<span class="z-punctuation z-accessor z-double-colon z-c++">::</span>unknown<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span> <span class="z-keyword z-control z-c++">else</span> <span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-begin z-c++">{</span>
<span class="z-keyword z-control z-flow z-return z-c++">return</span> res<span class="z-punctuation z-terminator z-c++">;</span>
<span class="z-punctuation z-section z-block z-end z-c++">}</span></span>
</span></span><span class="z-meta z-function z-c++"><span class="z-meta z-block z-c++"><span class="z-punctuation z-section z-block z-end z-c++">}</span></span></span>
</span></code></pre>
<p>We notice several inefficiencies in our false positive check:</p>
<!-- TODO Alignment check probabilities -->
<!-- TODO show assembly? -->
<ol>
<li>
<p>Perf shows that <code>memcmp</code> is not getting inlined. Our page alignment check logic is inspired by <code>memcmp</code>'s implementation, so it is likely that inlining <code>memcmp</code> the compiler would be able to elide redundant computations.</p>
</li>
<li>
<p><code>memcmp</code> has to branch on the size of the input arrays. But we know that our input string size is between 3 and 13 characters. Therefore we can use 16 bytes loads and omit all branching.</p>
</li>
</ol>
<!-- TODO explain this better -->
<ol start="3">
<li>
<p><code>memcmp</code> needs to check that it is safe to read beyond the end of the array for both inputs. But we control one of the input arrays, and we can just pad the input with nulls. </p>
</li>
<li>
<p><code>memcmp</code> has to find the first different byte, we don't. So we can use <code>vptest</code> instead of <code>vmovmskb</code> and save one instruction.</p>
</li>
<li>
<p>Our lookup table implementation has an inefficient layout. We are using a lookup table of <code>string_view</code>s, but a <code>string_view</code> is itself a pair of <code>(pointer, length)</code>. This leads to a double indirection on lookup.
A better implementation would keep the length and the data side by side, in the same cacheline.</p>
</li>
</ol>
<!-- TODO mention shift mask load -->
<p>After implementing the changes above, we see a substantial speedup on both benchmarks:</p>
<table><thead><tr><th style="text-align: right">ns/op</th><th style="text-align: right">err%</th><th style="text-align: right">cyc/op</th><th style="text-align: right">miss%</th><th style="text-align: left">all verbs</th></tr></thead><tbody>
<tr><td style="text-align: right">29.22</td><td style="text-align: right">0.4%</td><td style="text-align: right">57.66</td><td style="text-align: right">0.6%</td><td style="text-align: left"><code>perfect_hash</code></td></tr>
<tr><td style="text-align: right">44.21</td><td style="text-align: right">0.2%</td><td style="text-align: right">87.50</td><td style="text-align: right">14.9%</td><td style="text-align: left"><code>pext_by_len</code></td></tr>
<tr><td style="text-align: right">39.74</td><td style="text-align: right">0.2%</td><td style="text-align: right">78.70</td><td style="text-align: right">0.4%</td><td style="text-align: left"><code>pext</code></td></tr>
<tr><td style="text-align: right">29.75</td><td style="text-align: right">1.4%</td><td style="text-align: right">58.71</td><td style="text-align: right">0.5%</td><td style="text-align: left"><code>pext v2</code></td></tr>
<tr><td style="text-align: right">26.25</td><td style="text-align: right">0.7%</td><td style="text-align: right">51.71</td><td style="text-align: right">0.0%</td><td style="text-align: left"><code>pext v3</code></td></tr>
<tr><td style="text-align: right">22.04</td><td style="text-align: right">1.1%</td><td style="text-align: right">43.40</td><td style="text-align: right">0.0%</td><td style="text-align: left"><code>pext v4</code></td></tr>
</tbody></table>
<table><thead><tr><th style="text-align: right">ns/op</th><th style="text-align: right">err%</th><th style="text-align: right">cyc/op</th><th style="text-align: right">miss%</th><th style="text-align: left">GET/PUT/POST</th></tr></thead><tbody>
<tr><td style="text-align: right">38.07</td><td style="text-align: right">0.9%</td><td style="text-align: right">75.15</td><td style="text-align: right">2.7%</td><td style="text-align: left"><code>perfect_hash</code></td></tr>
<tr><td style="text-align: right">31.65</td><td style="text-align: right">0.2%</td><td style="text-align: right">62.65</td><td style="text-align: right">7.2%</td><td style="text-align: left"><code>pext_by_len</code></td></tr>
<tr><td style="text-align: right">43.15</td><td style="text-align: right">0.2%</td><td style="text-align: right">85.44</td><td style="text-align: right">1.8%</td><td style="text-align: left"><code>pext</code></td></tr>
<tr><td style="text-align: right">38.67</td><td style="text-align: right">0.4%</td><td style="text-align: right">76.47</td><td style="text-align: right">2.5%</td><td style="text-align: left"><code>pext v2</code></td></tr>
<tr><td style="text-align: right">26.06</td><td style="text-align: right">0.2%</td><td style="text-align: right">51.51</td><td style="text-align: right">0.0%</td><td style="text-align: left"><code>pext v3</code></td></tr>
<tr><td style="text-align: right">21.98</td><td style="text-align: right">1.0%</td><td style="text-align: right">43.28</td><td style="text-align: right">0.0%</td><td style="text-align: left"><code>pext v4</code></td></tr>
</tbody></table>
<h3 id="interpreting-the-results">Interpreting the results</h3>
<p>When we began optimizing, we first examined the full benchmark and concentrated our efforts on reducing branch misses.
This ultimately improved performance on the full benchmark.</p>
<p>However, when it came to the restricted benchmark, branch misses were not as significant of an issue.
We'd expect the restricted benchmark to be more friendly to a branchy implementation, as fewer cases means easier to predict branches. </p>
<p>Then why did our branchless implementation improve the restricted benchmark, even surpassing <code>pext_by_len</code>?</p>
<p>One way to interpret the results is the following. When using a well-performing perfect hash,
hashing is no longer the limiting factor. Rather memory loads and <code>memcmp</code> become bottlenecks.</p>
<p>To address this, the <code>pext_by_len</code> implementation uses branching based on input size, and optimizes <code>memcpy</code> based on size.</p>
<p>Our implementation takes a different approach. We specializes <code>memcpy</code> to always compare 16 bytes, and branch
to a slow path only when the input is aligned to page size.</p>
<p>But a branch on pointer alignment is more predictable than one on input size. Assuming the input pointer is a random number, the slow path will be hit approximately 0.37% of the time:</p>
<script type="math/tex" is_fleqn="true"is_display="true">\frac{15}{4096} = 0.003662109375 \approx 0.37\%</script>
<p>Therefore, the trade-off between the two options is a net win.</p>
<h3 id="acknowledgments">Acknowledgments</h3>
<p>I want to thank A.Y., N.S., V.P. and in particular B.X.V. for their help in writing and editing this post.
I also want to thank Wojciech Muła for his analysis of the problem.</p>
<p>All my code is available <a href="https://github.com/kryggird/verb_parse_phf">here</a>.</p>
<!-- TODO Note that this is all string_view fault's. Array with no null, and 16 bytes of null padding would not have these problems. -->
<!--
Start by analyzing SWAR32
- Does increasingly badly on higher number of iterations.
- `perf` shows lots of branch mispredictions => remove branch by LUT?
- Use `pext` to create a LUT, describe how to search for a key.
- `perf` shows lots of time taken by memmove.
- Look at assembly, we call `memcpy`, neither `gcc` nor `clang` specializes
- Replace with union/type punning
- Observe that now last check takes most of the time.
- Last check calls `memcmp`.
- Look at `memcmp_eq`.
- Specialize `memcmp`
- Backport to initial load? Manually inline?
- Backport to PHF?
- Compress table further? (Three ideas: 64bit load/search, use `select` (PTSelect to be specific) to compress the LUT, 8bit values).
- Remove shift mask by `_mm_cmpgt_epi8(LOOKUP_TABLE[idx], 0)`;
- Binary search on SWAR values?
- AVX2 hyper specialization for common values.
- Mention computer specs
TODO "the compiler" vs "GCC"
TODO Reference macro-fusion?
https://stackoverflow.com/questions/33721204/test-whether-a-register-is-zero-with-cmp-reg-0-vs-or-reg-reg/33724806#33724806
***
The reference implementation is a trie with nested comparisons.
But paradoxically GCC is able to turn the topmost switch into a lookup table
since the range of cases is smaller (`A` to `U` instead of a bunch of sparse 32 bit numbers).
```asm
Percent│ mov rcx,rdi
1.59 │ mov rax,rsi
0.02 │ cmp rdi,0x2
│ ↓ jbe 30
3.52 │ movzx edx,BYTE PTR [rax] ; Load the first character into $edx
│ lea rdi,[rsi+0x1]
1.58 │ sub edx,0x41 ; 0x41 is 'A'. This helps reduce the size of the lookup table
0.00 │ lea rsi,[rcx-0x1]
│ ; 0x16 is 'U'. Bail out if the first letter comes after 'U', as the last
│ ; verb in lexicographic order is 'UNSUBSCRIBE'
2.38 │ cmp dl,0x14
│ ↓ ja 30
│ movzx edx,dl
│ ; Lookup table jump. Notice how this is an hotspot according to perf
9.20 │ → jmp QWORD PTR [rdx*8+0x41d068]
│ nop
│ 30: mov eax,0x21
│ ← ret
│ cs nop WORD PTR [rax+rax*1+0x0]
4.28 │ cmp BYTE PTR [rax+0x1],0x4e
0.24 │ ↑ jne 30
1.81 │ lea rdx,[rcx-0x2]
0.04 │ cmp rcx,0x6
0.08 │ ↓ je 4a8
```
-->
A look inside `memcmp` on Intel AVX2 hardware.
2023-04-27T00:00:00+00:00
2023-04-27T00:00:00+00:00
Unknown
https://xoranth.net/memcmp-avx2/
<p><a href="https://en.cppreference.com/w/cpp/string/byte/memcmp"><code>memcmp</code></a> is a C standard library function that compares two arrays lexicographically.
It will be familiar to C programmers, and it is often used as a building block for string operations in other languages. For instance, C++ <code>std::string_view</code> comparison uses <code>memcmp</code> under the hood in some implementations.</p>
<p>While it can be implemented in C using a simple for loop, Glibc provides an optimized implementation written in assembly.
By digging into this implementation, we can learn a lot about low level optimization.</p>
<p>This post will procede in four steps. First I'll provide a quick summary of what <code>memcmp</code> does.
Then we will delve into the assembly specialization for x86-64 AVX2, dividing it into three logical sections:</p>
<ul>
<li>Handling of small arrays, where "small" denotes arrays smaller than 32 bytes.</li>
<li>Handling of arrays ranging from 32 to 256 bytes.</li>
<li>Handling of large arrays.</li>
</ul>
<p>The section dedicated to small arrays is the longest, as the logic for small arrays is the least straightforward.
32 bytes is smaller than the size of an AVX2 SIMD register, and usually handling small arrays requires either:</p>
<ul>
<li>Writing scalar code.</li>
<li>Some masking support from the hardware, that AVX2 doesn't provide.</li>
</ul>
<p>We will see how the implementation avoids the need for either.</p>
<h3 id="summary-of-memcmp">Summary of <code>memcmp</code></h3>
<p>According to cppreference, <code>memcmp</code>:</p>
<blockquote>
<p>Reinterprets the objects pointed to by lhs and rhs as arrays of unsigned char and compares the first <code>count</code> bytes of these arrays. The comparison is done lexicographically.</p>
<p>The sign of the result is the sign of the difference between the values of the first pair of bytes (both interpreted as unsigned char) that differ in the objects being compared.</p>
</blockquote>
<p>A pure C implementation of such a function is very straightforward, we just loop over the array till we find a difference.
If we do find one, we return the signed difference of the two bytes:</p>
<pre data-lang="c" class="language-c z-code"><code class="language-c" data-lang="c"><span class="z-source z-c"><span class="z-storage z-type z-c">int</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">simple_memcmp</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-storage z-modifier z-c">const</span> <span class="z-storage z-type z-c">void</span><span class="z-keyword z-operator z-c">*</span> <span class="z-variable z-parameter z-c">p1</span><span class="z-punctuation z-separator z-c">,</span> <span class="z-storage z-modifier z-c">const</span> <span class="z-storage z-type z-c">void</span><span class="z-keyword z-operator z-c">*</span> <span class="z-variable z-parameter z-c">p2</span><span class="z-punctuation z-separator z-c">,</span> <span class="z-support z-type z-sys-types z-c">size_t</span> <span class="z-variable z-parameter z-c">count</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-meta z-function z-c"> </span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span></span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c">
<span class="z-storage z-modifier z-c">const</span> <span class="z-storage z-type z-c">unsigned</span> <span class="z-storage z-type z-c">char</span><span class="z-keyword z-operator z-c">*</span> lhs <span class="z-keyword z-operator z-assignment z-c">=</span> p1<span class="z-punctuation z-terminator z-c">;</span>
<span class="z-storage z-modifier z-c">const</span> <span class="z-storage z-type z-c">unsigned</span> <span class="z-storage z-type z-c">char</span><span class="z-keyword z-operator z-c">*</span> rhs <span class="z-keyword z-operator z-assignment z-c">=</span> p2<span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-c">for</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span><span class="z-support z-type z-sys-types z-c">size_t</span> i <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span> i <span class="z-keyword z-operator z-comparison z-c"><</span> count<span class="z-punctuation z-terminator z-c">;</span> <span class="z-keyword z-operator z-arithmetic z-c">++</span>i<span class="z-punctuation z-section z-group z-end z-c">)</span></span>
<span class="z-keyword z-control z-c">if</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span>lhs<span class="z-meta z-brackets z-c"><span class="z-punctuation z-section z-brackets z-begin z-c">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c">]</span></span> <span class="z-keyword z-operator z-comparison z-c">!=</span> rhs<span class="z-meta z-brackets z-c"><span class="z-punctuation z-section z-brackets z-begin z-c">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c">]</span></span><span class="z-punctuation z-section z-group z-end z-c">)</span></span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span><span class="z-storage z-type z-c">int</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span> lhs<span class="z-meta z-brackets z-c"><span class="z-punctuation z-section z-brackets z-begin z-c">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c">]</span></span> <span class="z-keyword z-operator z-arithmetic z-c">-</span> <span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span><span class="z-storage z-type z-c">int</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span> rhs<span class="z-meta z-brackets z-c"><span class="z-punctuation z-section z-brackets z-begin z-c">[</span>i<span class="z-punctuation z-section z-brackets z-end z-c">]</span></span><span class="z-punctuation z-terminator z-c">;</span>
<span class="z-keyword z-control z-flow z-return z-c">return</span> <span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">0</span></span><span class="z-punctuation z-terminator z-c">;</span>
</span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-end z-c">}</span></span></span>
</span></code></pre>
<p>In the x64 calling convention, the first arguments are passed in registers.
So <code>p1</code> and <code>p2</code> will be passed respectively in <code>$rsi</code> and <code>$rdi</code>, while the <code>count</code> will be passed in <code>$rdx</code>.</p>
<p>Let's now dig into the assembly.</p>
<h3 id="efficient-handling-of-short-arrays">Efficient handling of short arrays</h3>
<p>Immediately after the entry point, we branch to a specialized routine that handles arrays smaller than 32 bytes.</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"> <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x20</span> <span class="z-comment z-assembly">; 0x20 is 32 in hexadecimal</span>
<span class="z-keyword z-control z-assembly">jb</span> 2e0
</span></code></pre>
<p>Following the jump, we see some pretty unusual code.</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"> <span class="z-comment z-assembly">; If count is smaller or equal to 1, go to a specialized routine.</span>
2e0: <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">edx</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x1</span>
<span class="z-keyword z-control z-assembly">jbe</span> <span class="z-constant z-character z-decimal z-assembly">360</span>
<span class="z-keyword z-control z-assembly">mov</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">edi</span>
<span class="z-keyword z-control z-assembly">or</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">esi</span> <span class="z-comment z-assembly">; $rax, $rsi are p1, p2</span>
<span class="z-keyword z-control z-assembly">and</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0xfff</span> <span class="z-comment z-assembly">; 0xfff is 4095 in hexadecimal</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0xfe0</span> <span class="z-comment z-assembly">; 0xfe0 is 4096 - 32 in hexadecimal</span>
<span class="z-keyword z-control z-assembly">jg</span> <span class="z-constant z-character z-decimal z-assembly">320</span>
</span></code></pre>
<p>To unravel this code, let's consider a simple case.
To compare two arrays which are 32 bytes long, we can load each to a YMM register using AVX2 <code>loadu</code> instruction.
We can then compare the two YMM registers in a single instruction.</p>
<p>However, if <code>count</code> is less than 32 bytes, we still load 32 bytes of data.
We'll then have to "discard" the comparison results for the excess bytes.
Since we are reading beyond the end of the arrays, we need to be careful not to touch and unmapped page and cause a pagefault.</p>
<p>To avoid this, we must check if the next page is allocated. To be safe, we should check that bytes <script type="math/tex" >[p_i, p_i+32)</script>
don't cross a page boundary. Pages are always aligned the page size, so we just need to check that the pointer modulo the page size isn't too close to the page boundary.
In math terms:</p>
<script type="math/tex" is_fleqn="true"is_display="true">\newcommand\PageSize[0]{\textnormal{PageSize}}
\newcommand\BitAnd[0]{\mathbin{\&}}
\newcommand\BitOr[0]{\mathbin{\|}}
\newcommand\BitMod[0]{\mathbin{\%}}
\newcommand\Ptr[1]{{p}_{#1}}
\begin{align*}
\Ptr{i}' &< \PageSize - 32
\end{align*}</script>
<p>for <script type="math/tex" >\Ptr{i}' = \Ptr{i} \pmod \PageSize</script>
and <script type="math/tex" >i \in \Set{1, 2}</script>
.</p>
<p>We can express both conditions as a single expression taking the maximum of <script type="math/tex" >\Ptr{1}'</script>
, <script type="math/tex" >\Ptr{2}'</script>
:</p>
<script type="math/tex" is_fleqn="true"is_display="true">\begin{align*}
\PageSize - 32 > max(\Ptr{1}', \Ptr{2}')
\end{align*}</script>
<p>This has the advantage that <script type="math/tex" >max(\Ptr{1}', \Ptr{2}')</script>
can be bound from above.</p>
<script type="math/tex" is_fleqn="true"is_display="true">\def\arraystretch{1.4em}
\newcommand\htmlRef[1]{\href{###1}{\text{\footnotesize #1}}}
\begin{array}{rcl|l}
\max(\Ptr{1}', \Ptr{2}')
&\lt& \Ptr{1}' \BitOr \Ptr{2}' & \text{\htmlRef{binorineq}} \\
&=& (\Ptr{1} \BitAnd (\PageSize - 1)) \BitOr ( \Ptr{2} \BitAnd (\PageSize - 1)) & \text{\htmlRef{pow2eq}} \\
&=& (\Ptr{1} \BitOr \Ptr{2}) \BitAnd (\PageSize - 1) & \text{\htmlRef{disteq}}
\end{array}</script>
<p>On x86-64 hardware, only page sizes of 4Kb, 2Mb and 1Gb are allowed. To be safe, we'll assume that the page size is 4096 bytes. With that in mind, we can establish the following condition, which is enough to prevent a page fault:</p>
<script type="math/tex" is_fleqn="true"is_display="true">(\Ptr{1} \BitOr \Ptr{2}) \BitAnd 4095 < 4064</script>
<p>This condition corresponds to the assembly above.</p>
<h3 id="fast-comparison-using-avx2-and-bmi">Fast comparison using AVX2 and BMI</h3>
<p>After loading the two arrays into ymm registers, we need to:</p>
<ol>
<li>Compare them for equality.</li>
<li>Mask or discard the comparisons of characters beyond the end of the array.</li>
</ol>
<p>AVX2 doesn't have byte-level masked loads, and loading a mask likely requires a lookup table.
Therefore using AVX2 instructions to mask excessive characters is inefficient.</p>
<p>A better approach is to use <code>vpcmpeqb</code> for comparison, followed by <code>vpmovmskb</code> to move the comparison result to a general purpose register and finally clear the excess bits using bit manipulation tricks.</p>
<p><code>vpcmpeqb</code> compares each byte in its two inputs, and returns <code>-1</code> if they are equal and <code>0</code> if they differ.
<code>vpmovmskb</code> sets the n-th bit of its destination register to the sign bit of the n-th byte in its input ymm register. </p>
<p>Combining <code>vpcmpeqb</code> and <code>vpmovmskb</code> will return an integer where the n-th bit is one if the n-th bytes of the input arrays are equal, and zero otherwise.</p>
<p>However, since <code>memcmp</code> returns 0 if the inputs are equal, we need to bitwise negate the output of <code>vpmovmskb</code>.</p>
<p>Finally we need to mask the excess bits. The most straightforward way is to use a shift to create a bitmask, like in this C example:</p>
<pre data-lang="c" class="language-c z-code"><code class="language-c" data-lang="c"><span class="z-source z-c"><span class="z-support z-type z-stdint z-c">uint32_t</span> <span class="z-meta z-function z-c"><span class="z-entity z-name z-function z-c">zero_high_bits</span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span></span></span><span class="z-meta z-function z-parameters z-c"><span class="z-meta z-group z-c"><span class="z-support z-type z-stdint z-c">uint32_t</span> <span class="z-variable z-parameter z-c">input</span><span class="z-punctuation z-separator z-c">,</span> <span class="z-support z-type z-stdint z-c">uint32_t</span> <span class="z-variable z-parameter z-c">count</span><span class="z-punctuation z-section z-group z-end z-c">)</span></span></span><span class="z-meta z-function z-c"> </span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-begin z-c">{</span></span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c">
<span class="z-support z-type z-stdint z-c">uint32_t</span> mask <span class="z-keyword z-operator z-assignment z-c">=</span> <span class="z-keyword z-operator z-arithmetic z-c">~</span><span class="z-meta z-group z-c"><span class="z-punctuation z-section z-group z-begin z-c">(</span><span class="z-meta z-number z-integer z-decimal z-c"><span class="z-constant z-numeric z-value z-c">1</span></span> <span class="z-keyword z-operator z-arithmetic z-c"><<</span> count<span class="z-punctuation z-section z-group z-end z-c">)</span></span><span class="z-punctuation z-terminator z-c">;</span> <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> E.g. 3 becomes 0b00011111...1
</span> <span class="z-keyword z-control z-flow z-return z-c">return</span> input <span class="z-keyword z-operator z-c">&</span> mask<span class="z-punctuation z-terminator z-c">;</span> <span class="z-comment z-line z-double-slash z-c"><span class="z-punctuation z-definition z-comment z-c">//</span> Keep only the count lowest bits
</span></span></span><span class="z-meta z-function z-c"><span class="z-meta z-block z-c"><span class="z-punctuation z-section z-block z-end z-c">}</span></span></span>
</span></code></pre>
<p>Fortunately most Intel and AMD provide an instruction that does all the above.
The instruction is <code>bzhi</code> from the BMI2 extension. It has a throughput of 1 ins/cycle on both Skylake and Zen, which is extremely good.
Also, while BMI2 and AVX2 are separate extensions, every CPU produced that supports AVX2 supports BMI.</p>
<p>This leads to the following assembly</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"> <span class="z-keyword z-control z-assembly">vmovdqu</span> <span class="z-variable z-parameter z-register z-assembly">ymm2</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpcmpeqb</span> <span class="z-variable z-parameter z-register z-assembly">ymm2</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm2</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpmovmskb</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm2</span>
<span class="z-keyword z-control z-assembly">inc</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">bzhi</span> <span class="z-variable z-parameter z-register z-assembly">edx</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">edx</span>
<span class="z-comment z-assembly">; `bzhi` sets the flags if the input is not zero. </span>
<span class="z-keyword z-control z-assembly">jne</span> e0
<span class="z-keyword z-control z-assembly">xor</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">vzeroupper</span>
<span class="z-keyword z-control z-assembly">ret</span>
</span></code></pre>
<p>This shows the entire process for two example strings:</p>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 7.1.0 (0)
-->
<!-- Title: Simd ops Pages: 1 -->
<svg width="642pt" height="419pt"
viewBox="0.00 0.00 642.00 419.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 415)">
<title>Simd ops</title>
<polygon fill="white" stroke="none" points="-4,4 -4,-415 638,-415 638,4 -4,4"/>
<!-- xmm0 -->
<g id="node1" class="node">
<title>xmm0</title>
<polygon fill="none" stroke="black" points="258,-367 258,-388 281,-388 281,-367 258,-367"/>
<text text-anchor="start" x="261" y="-373.8" font-family="Fira Mono" font-size="14.00">G </text>
<polygon fill="none" stroke="black" points="281,-367 281,-388 304,-388 304,-367 281,-367"/>
<text text-anchor="start" x="284" y="-373.8" font-family="Fira Mono" font-size="14.00">E </text>
<polygon fill="none" stroke="black" points="304,-367 304,-388 327,-388 327,-367 304,-367"/>
<text text-anchor="start" x="307" y="-373.8" font-family="Fira Mono" font-size="14.00">T </text>
<polygon fill="none" stroke="black" points="327,-367 327,-388 350,-388 350,-367 327,-367"/>
<text text-anchor="start" x="330" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">  </text>
<polygon fill="none" stroke="black" points="350,-367 350,-388 373,-388 373,-367 350,-367"/>
<text text-anchor="start" x="353" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">/ </text>
<polygon fill="none" stroke="black" points="373,-367 373,-388 396,-388 396,-367 373,-367"/>
<text text-anchor="start" x="376" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">i </text>
<polygon fill="none" stroke="black" points="396,-367 396,-388 419,-388 419,-367 396,-367"/>
<text text-anchor="start" x="399" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">m </text>
<polygon fill="none" stroke="black" points="419,-367 419,-388 442,-388 442,-367 419,-367"/>
<text text-anchor="start" x="422" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">a </text>
<polygon fill="none" stroke="black" points="442,-367 442,-388 465,-388 465,-367 442,-367"/>
<text text-anchor="start" x="445" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">g </text>
<polygon fill="none" stroke="black" points="465,-367 465,-388 488,-388 488,-367 465,-367"/>
<text text-anchor="start" x="468" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">e </text>
<polygon fill="none" stroke="black" points="488,-367 488,-388 511,-388 511,-367 488,-367"/>
<text text-anchor="start" x="491" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">s </text>
<polygon fill="none" stroke="black" points="511,-367 511,-388 534,-388 534,-367 511,-367"/>
<text text-anchor="start" x="514" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">/ </text>
<polygon fill="none" stroke="black" points="534,-367 534,-388 557,-388 557,-367 534,-367"/>
<text text-anchor="start" x="537" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">l </text>
<polygon fill="none" stroke="black" points="557,-367 557,-388 580,-388 580,-367 557,-367"/>
<text text-anchor="start" x="560" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">o </text>
<polygon fill="none" stroke="black" points="580,-367 580,-388 603,-388 603,-367 580,-367"/>
<text text-anchor="start" x="583" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">g </text>
<polygon fill="none" stroke="black" points="603,-367 603,-388 626,-388 626,-367 603,-367"/>
<text text-anchor="start" x="606" y="-373.8" font-family="Fira Mono" font-size="14.00" fill="gray">o </text>
<text text-anchor="start" x="0" y="-400.8" font-family="Fira Mono" font-size="14.00">vmovdqu </text>
<text text-anchor="start" x="67" y="-400.8" font-family="Fira Mono" font-size="14.00" fill="purple">ymm1</text>
<text text-anchor="start" x="101" y="-400.8" font-family="Fira Mono" font-size="14.00">, YMMWORD PTR [p1]</text>
</g>
<!-- xmm1 -->
<g id="node2" class="node">
<title>xmm1</title>
<polygon fill="none" stroke="black" points="258,-295 258,-316 281,-316 281,-295 258,-295"/>
<text text-anchor="start" x="261" y="-301.8" font-family="Fira Mono" font-size="14.00">G </text>
<polygon fill="none" stroke="black" points="281,-295 281,-316 304,-316 304,-295 281,-295"/>
<text text-anchor="start" x="284" y="-301.8" font-family="Fira Mono" font-size="14.00">E </text>
<polygon fill="none" stroke="black" points="304,-295 304,-316 327,-316 327,-295 304,-295"/>
<text text-anchor="start" x="307" y="-301.8" font-family="Fira Mono" font-size="14.00">T </text>
<polygon fill="none" stroke="black" points="327,-295 327,-316 350,-316 350,-295 327,-295"/>
<text text-anchor="start" x="330" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">  </text>
<polygon fill="none" stroke="black" points="350,-295 350,-316 373,-316 373,-295 350,-295"/>
<text text-anchor="start" x="353" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">/ </text>
<polygon fill="none" stroke="black" points="373,-295 373,-316 396,-316 396,-295 373,-295"/>
<text text-anchor="start" x="376" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">i </text>
<polygon fill="none" stroke="black" points="396,-295 396,-316 419,-316 419,-295 396,-295"/>
<text text-anchor="start" x="399" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">n </text>
<polygon fill="none" stroke="black" points="419,-295 419,-316 442,-316 442,-295 419,-295"/>
<text text-anchor="start" x="422" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">d </text>
<polygon fill="none" stroke="black" points="442,-295 442,-316 465,-316 465,-295 442,-295"/>
<text text-anchor="start" x="445" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">e </text>
<polygon fill="none" stroke="black" points="465,-295 465,-316 488,-316 488,-295 465,-295"/>
<text text-anchor="start" x="468" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">x </text>
<polygon fill="none" stroke="black" points="488,-295 488,-316 511,-316 511,-295 488,-295"/>
<text text-anchor="start" x="491" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">. </text>
<polygon fill="none" stroke="black" points="511,-295 511,-316 534,-316 534,-295 511,-295"/>
<text text-anchor="start" x="514" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">h </text>
<polygon fill="none" stroke="black" points="534,-295 534,-316 557,-316 557,-295 534,-295"/>
<text text-anchor="start" x="537" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">t </text>
<polygon fill="none" stroke="black" points="557,-295 557,-316 580,-316 580,-295 557,-295"/>
<text text-anchor="start" x="560" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">m </text>
<polygon fill="none" stroke="black" points="580,-295 580,-316 603,-316 603,-295 580,-295"/>
<text text-anchor="start" x="583" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">l </text>
<polygon fill="none" stroke="black" points="603,-295 603,-316 626,-316 626,-295 603,-295"/>
<text text-anchor="start" x="606" y="-301.8" font-family="Fira Mono" font-size="14.00" fill="gray">  </text>
<text text-anchor="start" x="0" y="-328.8" font-family="Fira Mono" font-size="14.00">vmovdqu </text>
<text text-anchor="start" x="67" y="-328.8" font-family="Fira Mono" font-size="14.00" fill="#663399">ymm2</text>
<text text-anchor="start" x="101" y="-328.8" font-family="Fira Mono" font-size="14.00">, YMMWORD PTR [p2]</text>
</g>
<!-- xmm0--xmm1 -->
<g id="edge1" class="edge">
<title>xmm0:p0--xmm1:p0</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M269,-366C269,-344.22 269,-338.78 269,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge4" class="edge">
<title>xmm0:p1--xmm1:p1</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M292,-366C292,-344.22 292,-338.78 292,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge7" class="edge">
<title>xmm0:p2--xmm1:p2</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M315,-366C315,-344.22 315,-338.78 315,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge10" class="edge">
<title>xmm0:p3--xmm1:p3</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M338,-366C338,-344.22 338,-338.78 338,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge13" class="edge">
<title>xmm0:p4--xmm1:p4</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M361,-366C361,-344.22 361,-338.78 361,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge16" class="edge">
<title>xmm0:p5--xmm1:p5</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M384,-366C384,-344.22 384,-338.78 384,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge19" class="edge">
<title>xmm0:p6--xmm1:p6</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M407,-366C407,-344.22 407,-338.78 407,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge22" class="edge">
<title>xmm0:p7--xmm1:p7</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M430,-366C430,-344.22 430,-338.78 430,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge25" class="edge">
<title>xmm0:p8--xmm1:p8</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M454,-366C454,-344.22 454,-338.78 454,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge28" class="edge">
<title>xmm0:p9--xmm1:p9</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M477,-366C477,-344.22 477,-338.78 477,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge31" class="edge">
<title>xmm0:p10--xmm1:p10</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M500,-366C500,-344.22 500,-338.78 500,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge34" class="edge">
<title>xmm0:p11--xmm1:p11</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M523,-366C523,-344.22 523,-338.78 523,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge37" class="edge">
<title>xmm0:p12--xmm1:p12</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M546,-366C546,-344.22 546,-338.78 546,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge40" class="edge">
<title>xmm0:p13--xmm1:p13</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M569,-366C569,-344.22 569,-338.78 569,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge43" class="edge">
<title>xmm0:p14--xmm1:p14</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M592,-366C592,-344.22 592,-338.78 592,-317"/>
</g>
<!-- xmm0--xmm1 -->
<g id="edge46" class="edge">
<title>xmm0:p15--xmm1:p15</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M615,-366C615,-344.22 615,-338.78 615,-317"/>
</g>
<!-- xmm2 -->
<g id="node3" class="node">
<title>xmm2</title>
<polygon fill="none" stroke="black" points="258,-223 258,-244 281,-244 281,-223 258,-223"/>
<text text-anchor="start" x="261" y="-229.8" font-family="Fira Mono" font-size="14.00">ff</text>
<polygon fill="none" stroke="black" points="281,-223 281,-244 304,-244 304,-223 281,-223"/>
<text text-anchor="start" x="284" y="-229.8" font-family="Fira Mono" font-size="14.00">ff</text>
<polygon fill="none" stroke="black" points="304,-223 304,-244 327,-244 327,-223 304,-223"/>
<text text-anchor="start" x="307" y="-229.8" font-family="Fira Mono" font-size="14.00">ff</text>
<polygon fill="none" stroke="black" points="327,-223 327,-244 350,-244 350,-223 327,-223"/>
<text text-anchor="start" x="330" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">ff</text>
<polygon fill="none" stroke="black" points="350,-223 350,-244 373,-244 373,-223 350,-223"/>
<text text-anchor="start" x="353" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">ff</text>
<polygon fill="none" stroke="black" points="373,-223 373,-244 396,-244 396,-223 373,-223"/>
<text text-anchor="start" x="376" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">ff</text>
<polygon fill="none" stroke="black" points="396,-223 396,-244 419,-244 419,-223 396,-223"/>
<text text-anchor="start" x="399" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">00</text>
<polygon fill="none" stroke="black" points="419,-223 419,-244 442,-244 442,-223 419,-223"/>
<text text-anchor="start" x="422" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">00</text>
<polygon fill="none" stroke="black" points="442,-223 442,-244 465,-244 465,-223 442,-223"/>
<text text-anchor="start" x="445" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">00</text>
<polygon fill="none" stroke="black" points="465,-223 465,-244 488,-244 488,-223 465,-223"/>
<text text-anchor="start" x="468" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">00</text>
<polygon fill="none" stroke="black" points="488,-223 488,-244 511,-244 511,-223 488,-223"/>
<text text-anchor="start" x="491" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">00</text>
<polygon fill="none" stroke="black" points="511,-223 511,-244 534,-244 534,-223 511,-223"/>
<text text-anchor="start" x="514" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">00</text>
<polygon fill="none" stroke="black" points="534,-223 534,-244 557,-244 557,-223 534,-223"/>
<text text-anchor="start" x="537" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">00</text>
<polygon fill="none" stroke="black" points="557,-223 557,-244 580,-244 580,-223 557,-223"/>
<text text-anchor="start" x="560" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">00</text>
<polygon fill="none" stroke="black" points="580,-223 580,-244 603,-244 603,-223 580,-223"/>
<text text-anchor="start" x="583" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">00</text>
<polygon fill="none" stroke="black" points="603,-223 603,-244 626,-244 626,-223 603,-223"/>
<text text-anchor="start" x="606" y="-229.8" font-family="Fira Mono" font-size="14.00" fill="gray">00</text>
<text text-anchor="start" x="55" y="-256.8" font-family="Fira Mono" font-size="14.00">vpcmpeqb </text>
<text text-anchor="start" x="130" y="-256.8" font-family="Fira Mono" font-size="14.00" fill="slateblue">ymm3</text>
<text text-anchor="start" x="164" y="-256.8" font-family="Fira Mono" font-size="14.00">,</text>
<text text-anchor="start" x="173" y="-256.8" font-family="Fira Mono" font-size="14.00" fill="purple">ymm1</text>
<text text-anchor="start" x="207" y="-256.8" font-family="Fira Mono" font-size="14.00">,</text>
<text text-anchor="start" x="216" y="-256.8" font-family="Fira Mono" font-size="14.00" fill="#663399">ymm2</text>
</g>
<!-- xmm1--xmm2 -->
<g id="edge2" class="edge">
<title>xmm1:p0--xmm2:p0</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M269,-294C269,-272.22 269,-266.78 269,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge5" class="edge">
<title>xmm1:p1--xmm2:p1</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M292,-294C292,-272.22 292,-266.78 292,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge8" class="edge">
<title>xmm1:p2--xmm2:p2</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M315,-294C315,-272.22 315,-266.78 315,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge11" class="edge">
<title>xmm1:p3--xmm2:p3</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M338,-294C338,-272.22 338,-266.78 338,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge14" class="edge">
<title>xmm1:p4--xmm2:p4</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M361,-294C361,-272.22 361,-266.78 361,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge17" class="edge">
<title>xmm1:p5--xmm2:p5</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M384,-294C384,-272.22 384,-266.78 384,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge20" class="edge">
<title>xmm1:p6--xmm2:p6</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M407,-294C407,-272.22 407,-266.78 407,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge23" class="edge">
<title>xmm1:p7--xmm2:p7</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M430,-294C430,-272.22 430,-266.78 430,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge26" class="edge">
<title>xmm1:p8--xmm2:p8</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M454,-294C454,-272.22 454,-266.78 454,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge29" class="edge">
<title>xmm1:p9--xmm2:p9</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M477,-294C477,-272.22 477,-266.78 477,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge32" class="edge">
<title>xmm1:p10--xmm2:p10</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M500,-294C500,-272.22 500,-266.78 500,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge35" class="edge">
<title>xmm1:p11--xmm2:p11</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M523,-294C523,-272.22 523,-266.78 523,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge38" class="edge">
<title>xmm1:p12--xmm2:p12</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M546,-294C546,-272.22 546,-266.78 546,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge41" class="edge">
<title>xmm1:p13--xmm2:p13</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M569,-294C569,-272.22 569,-266.78 569,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge44" class="edge">
<title>xmm1:p14--xmm2:p14</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M592,-294C592,-272.22 592,-266.78 592,-245"/>
</g>
<!-- xmm1--xmm2 -->
<g id="edge47" class="edge">
<title>xmm1:p15--xmm2:p15</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M615,-294C615,-272.22 615,-266.78 615,-245"/>
</g>
<!-- eax -->
<g id="node4" class="node">
<title>eax</title>
<text text-anchor="start" x="340" y="-157.8" font-family="Fira Mono" font-size="14.00">1</text>
<text text-anchor="start" x="353" y="-157.8" font-family="Fira Mono" font-size="14.00">1</text>
<text text-anchor="start" x="366" y="-157.8" font-family="Fira Mono" font-size="14.00">1</text>
<text text-anchor="start" x="379" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">1</text>
<text text-anchor="start" x="392" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">1</text>
<text text-anchor="start" x="405" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">1</text>
<text text-anchor="start" x="418" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="431" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="444" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="457" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="470" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="483" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="496" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="509" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="522" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="535" y="-157.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<polygon fill="none" stroke="black" points="337,-151.5 337,-172.5 547,-172.5 547,-151.5 337,-151.5"/>
<text text-anchor="start" x="169" y="-184.8" font-family="Fira Mono" font-size="14.00">vpmovmskb </text>
<text text-anchor="start" x="252" y="-184.8" font-family="Fira Mono" font-size="14.00" fill="steelblue">eax</text>
<text text-anchor="start" x="277" y="-184.8" font-family="Fira Mono" font-size="14.00">,</text>
<text text-anchor="start" x="286" y="-184.8" font-family="Fira Mono" font-size="14.00" fill="slateblue">ymm3</text>
<text text-anchor="start" x="320" y="-184.8" font-family="Fira Mono" font-size="14.00"> </text>
</g>
<!-- xmm2--eax -->
<g id="edge3" class="edge">
<title>xmm2:p0--eax:p0</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M269,-222C269,-181.94 344,-212.06 344,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge6" class="edge">
<title>xmm2:p1--eax:p1</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M292,-222C292,-185.55 357,-208.45 357,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge9" class="edge">
<title>xmm2:p2--eax:p2</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M315,-222C315,-188.96 370,-205.04 370,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge12" class="edge">
<title>xmm2:p3--eax:p3</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M338,-222C338,-192.1 383,-201.9 383,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge15" class="edge">
<title>xmm2:p4--eax:p4</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M361,-222C361,-194.87 396,-199.13 396,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge18" class="edge">
<title>xmm2:p5--eax:p5</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M384,-222C384,-197.15 409,-196.85 409,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge21" class="edge">
<title>xmm2:p6--eax:p6</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M407,-222C407,-198.8 422,-195.2 422,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge24" class="edge">
<title>xmm2:p7--eax:p7</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M430,-222C430,-199.67 435,-194.33 435,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge27" class="edge">
<title>xmm2:p8--eax:p8</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M454,-222C454,-199.67 449,-194.33 449,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge30" class="edge">
<title>xmm2:p9--eax:p9</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M477,-222C477,-198.8 462,-195.2 462,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge33" class="edge">
<title>xmm2:p10--eax:p10</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M500,-222C500,-197.15 475,-196.85 475,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge36" class="edge">
<title>xmm2:p11--eax:p11</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M523,-222C523,-194.87 488,-199.13 488,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge39" class="edge">
<title>xmm2:p12--eax:p12</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M546,-222C546,-192.1 501,-201.9 501,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge42" class="edge">
<title>xmm2:p13--eax:p13</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M569,-222C569,-188.96 514,-205.04 514,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge45" class="edge">
<title>xmm2:p14--eax:p14</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M592,-222C592,-185.55 527,-208.45 527,-172"/>
</g>
<!-- xmm2--eax -->
<g id="edge48" class="edge">
<title>xmm2:p15--eax:p15</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M615,-222C615,-181.94 540,-212.06 540,-172"/>
</g>
<!-- r9 -->
<g id="node5" class="node">
<title>r9</title>
<text text-anchor="start" x="340" y="-85.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="353" y="-85.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="366" y="-85.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="379" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="392" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="405" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="418" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">1</text>
<text text-anchor="start" x="431" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="444" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="457" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="470" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="483" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="496" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="509" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="522" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<text text-anchor="start" x="535" y="-85.8" font-family="Fira Mono" font-size="14.00" fill="gray">0</text>
<polygon fill="none" stroke="black" points="337,-79.5 337,-100.5 547,-100.5 547,-79.5 337,-79.5"/>
<text text-anchor="start" x="253" y="-112.8" font-family="Fira Mono" font-size="14.00">incr </text>
<text text-anchor="start" x="295" y="-112.8" font-family="Fira Mono" font-size="14.00" fill="steelblue">eax</text>
<text text-anchor="start" x="320" y="-112.8" font-family="Fira Mono" font-size="14.00"> </text>
</g>
<!-- eax--r9 -->
<g id="edge49" class="edge">
<title>eax:p0--r9:p6</title>
<path fill="none" stroke="black" stroke-dasharray="5,2" d="M344,-151C344,-109.58 422,-141.42 422,-100"/>
</g>
<!-- eax--r9 -->
<g id="edge50" class="edge">
<title>eax:p7--r9:p7</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M435,-151C435,-128.33 435,-122.67 435,-100"/>
</g>
<!-- eax--r9 -->
<g id="edge51" class="edge">
<title>eax:p8--r9:p8</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M449,-151C449,-128.33 449,-122.67 449,-100"/>
</g>
<!-- eax--r9 -->
<g id="edge52" class="edge">
<title>eax:p9--r9:p9</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M462,-151C462,-128.33 462,-122.67 462,-100"/>
</g>
<!-- eax--r9 -->
<g id="edge53" class="edge">
<title>eax:p10--r9:p10</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M475,-151C475,-128.33 475,-122.67 475,-100"/>
</g>
<!-- eax--r9 -->
<g id="edge54" class="edge">
<title>eax:p11--r9:p11</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M488,-151C488,-128.33 488,-122.67 488,-100"/>
</g>
<!-- eax--r9 -->
<g id="edge55" class="edge">
<title>eax:p12--r9:p12</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M501,-151C501,-128.33 501,-122.67 501,-100"/>
</g>
<!-- eax--r9 -->
<g id="edge56" class="edge">
<title>eax:p13--r9:p13</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M514,-151C514,-128.33 514,-122.67 514,-100"/>
</g>
<!-- eax--r9 -->
<g id="edge57" class="edge">
<title>eax:p14--r9:p14</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M527,-151C527,-128.33 527,-122.67 527,-100"/>
</g>
<!-- eax--r9 -->
<g id="edge58" class="edge">
<title>eax:p15--r9:p15</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M540,-151C540,-128.33 540,-122.67 540,-100"/>
</g>
<!-- edx -->
<g id="node6" class="node">
<title>edx</title>
<text text-anchor="start" x="340" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="353" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="366" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="379" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="392" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="405" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="418" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="431" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="444" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="457" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="470" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="483" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="496" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="509" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="522" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<text text-anchor="start" x="535" y="-13.8" font-family="Fira Mono" font-size="14.00">0</text>
<polygon fill="none" stroke="black" points="337,-7.5 337,-28.5 547,-28.5 547,-7.5 337,-7.5"/>
<text text-anchor="start" x="194" y="-40.8" font-family="Fira Mono" font-size="14.00">bzhi </text>
<text text-anchor="start" x="236" y="-40.8" font-family="Fira Mono" font-size="14.00" fill="steelblue">eax</text>
<text text-anchor="start" x="261" y="-40.8" font-family="Fira Mono" font-size="14.00">,</text>
<text text-anchor="start" x="270" y="-40.8" font-family="Fira Mono" font-size="14.00" fill="steelblue">eax</text>
<text text-anchor="start" x="295" y="-40.8" font-family="Fira Mono" font-size="14.00">,16 </text>
</g>
<!-- r9--edx -->
<g id="edge59" class="edge">
<title>r9:p0--edx:p0</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M344,-79C344,-56.33 344,-50.67 344,-28"/>
</g>
<!-- r9--edx -->
<g id="edge60" class="edge">
<title>r9:p1--edx:p1</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M357,-79C357,-56.33 357,-50.67 357,-28"/>
</g>
<!-- r9--edx -->
<g id="edge61" class="edge">
<title>r9:p2--edx:p2</title>
<path fill="none" stroke="silver" stroke-dasharray="5,2" d="M370,-79C370,-56.33 370,-50.67 370,-28"/>
</g>
</g>
</svg>
<h3 id="handling-the-unequal-case">Handling the unequal case</h3>
<p>If after clearing out the extra bits, <code>edx</code> is not zero, we found a difference.
In particular, if the first <script type="math/tex" >m</script>
bytes are identical, <code>edx</code> will contain <script type="math/tex" >m</script>
zeros followed by a 1.</p>
<p>Fortunately, BMI contains another useful instruction that helps us.
<code>tzcnt</code> counts how many trailing zeros a register has.</p>
<script type="math/tex" is_fleqn="true"is_display="true">\newcommand\tzcnt[0]{\textnormal{tzcnt}}
\tzcnt ( 0\textnormal{b}\underbrace{00000}_{m} 1 \textnormal{xxxx} ) = m</script>
<p>Now that we have the index of the first differing byte, we take their signed difference to obtain the result.</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"> e0: <span class="z-keyword z-control z-assembly">tzcnt</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">movzx</span> <span class="z-variable z-parameter z-register z-assembly">ecx</span><span class="z-source z-assembly">,</span><span class="z-support z-function z-directive z-assembly">BYTE</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">movzx</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-support z-function z-directive z-assembly">BYTE</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">sub</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ecx</span>
<span class="z-keyword z-control z-assembly">vzeroupper</span>
<span class="z-keyword z-control z-assembly">ret</span>
</span></code></pre>
<h3 id="avoiding-the-loop-epilogue">Avoiding the loop epilogue</h3>
<p>Now that we know that the input arrays are bigger than 32 characters, there's no need to worry about pagefaults or clearing "extra" bits.</p>
<p>We can load, compare and <code>vpmovmskb</code> in increments of 32 bytes till we have less than 32 bytes left to compare. The first four steps of the logic are inlined.</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"> <span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x40</span>
<span class="z-keyword z-control z-assembly">jbe</span> <span class="z-constant z-character z-decimal z-assembly">264</span> <span class="z-comment z-assembly">; Go to EPILOGUE 1</span>
<span class="z-keyword z-control z-assembly">vmovdqu</span> <span class="z-variable z-parameter z-register z-assembly">ymm2</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x20</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpcmpeqb</span> <span class="z-variable z-parameter z-register z-assembly">ymm2</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm2</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x20</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpmovmskb</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm2</span>
<span class="z-keyword z-control z-assembly">inc</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">jne</span> <span class="z-constant z-character z-decimal z-assembly">100</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x80</span>
<span class="z-keyword z-control z-assembly">jbe</span> <span class="z-constant z-character z-decimal z-assembly">250</span> <span class="z-comment z-assembly">; Go to EPILOGUE 2</span>
<span class="z-keyword z-control z-assembly">vmovdqu</span> <span class="z-variable z-parameter z-register z-assembly">ymm3</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x40</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpcmpeqb</span> <span class="z-variable z-parameter z-register z-assembly">ymm3</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm3</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x40</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpmovmskb</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm3</span>
<span class="z-keyword z-control z-assembly">inc</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">jne</span> <span class="z-constant z-character z-decimal z-assembly">120</span> <span class="z-comment z-assembly">; Go to EPILOGUE 3</span>
<span class="z-keyword z-control z-assembly">vmovdqu</span> <span class="z-variable z-parameter z-register z-assembly">ymm4</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x60</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpcmpeqb</span> <span class="z-variable z-parameter z-register z-assembly">ymm4</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm4</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x60</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpmovmskb</span> <span class="z-variable z-parameter z-register z-assembly">ecx</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm4</span>
<span class="z-keyword z-control z-assembly">inc</span> <span class="z-variable z-parameter z-register z-assembly">ecx</span>
<span class="z-keyword z-control z-assembly">jne</span> 15b <span class="z-comment z-assembly">; Go to EPILOGUE 4</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0x100</span>
<span class="z-keyword z-control z-assembly">ja</span> <span class="z-constant z-character z-decimal z-assembly">170</span> <span class="z-comment z-assembly">; Go to main loop PROLOGUE</span>
</span></code></pre>
<p>The assembly above can only handle arrays whose length is a multiple of 32. What can we do when that's not the case?</p>
<p>Two options come to mind. First, we can write a scalar loop to handle the remaining bytes. Alternatively, we can reuse the logic that is used for small arrays.</p>
<p>But there's an even better solution: read and test the <em>last</em> 32 bytes of the array.
By doing this, we eliminate the need to read beyond the end of the array.
Then, unlike the small arrays logic, we read only memory we know is allocated. In particular, we don't need a page alignment check.</p>
<p>It's important to note that although there is overlap between this final iteration and the previous ones, it doesn't affect the end result.
Our objective is to locate the position of the first difference in the entire array (if any). But we know that the overlapping bytes contain no difference, and therefore don't affect the outcome.</p>
<p>The only place where we need to be careful is if a difference is found.
In that case, <code>tzcnt</code> will tell us the offset from <code>end_of_array - 32</code>, instead than <code>begin_of_array + num_iter * 32</code>.
We can adjust the final stanza to account for that:</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"> <span class="z-comment z-assembly">; EPILOGUE 1</span>
<span class="z-constant z-character z-decimal z-assembly">264</span>: <span class="z-keyword z-control z-assembly">vmovdqu</span> <span class="z-variable z-parameter z-register z-assembly">ymm1</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">-</span><span class="z-constant z-character z-hexadecimal z-assembly">0x20</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpcmpeqb</span> <span class="z-variable z-parameter z-register z-assembly">ymm1</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm1</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">-</span><span class="z-constant z-character z-hexadecimal z-assembly">0x20</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpmovmskb</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm1</span>
<span class="z-keyword z-control z-assembly">inc</span> <span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-keyword z-control z-assembly">jne</span> 2c0 <span class="z-comment z-assembly">; If a difference found, go to tzcnt stanza</span>
<span class="z-keyword z-control z-assembly">vzeroupper</span>
<span class="z-keyword z-control z-assembly">ret</span> <span class="z-comment z-assembly">; No difference, return 0</span>
<span class="z-comment z-assembly">; ...</span>
2c0: <span class="z-keyword z-control z-assembly">tzcnt</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">eax</span>
<span class="z-comment z-assembly">; $eax now contains the offset of the first difference from `end_of_array - 32`</span>
<span class="z-keyword z-control z-assembly">add</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">edx</span> <span class="z-comment z-assembly">; $edx is the size of the array.</span>
<span class="z-comment z-assembly">; $rsi is the beginning of the first array</span>
<span class="z-comment z-assembly">; $rax now contains `size_of_array + offset`</span>
<span class="z-comment z-assembly">; 0x20 is 32</span>
<span class="z-comment z-assembly">; So $rsi + $rax is `begin_of_array + size_of_array + offset`</span>
<span class="z-comment z-assembly">; But `begin_of_array + size_of_array = end_of_array` so</span>
<span class="z-comment z-assembly">; So $rsi + $rax - 0x20 = end_of_array - 32 + offset</span>
<span class="z-comment z-assembly">; which is the position of the first different byte</span>
<span class="z-keyword z-control z-assembly">movzx</span> <span class="z-variable z-parameter z-register z-assembly">ecx</span><span class="z-source z-assembly">,</span><span class="z-support z-function z-directive z-assembly">BYTE</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">-</span><span class="z-constant z-character z-hexadecimal z-assembly">0x20</span><span class="z-source z-assembly">]</span>
<span class="z-comment z-assembly">; Ditto for the second array</span>
<span class="z-keyword z-control z-assembly">movzx</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-support z-function z-directive z-assembly">BYTE</span> <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rax</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">-</span><span class="z-constant z-character z-hexadecimal z-assembly">0x20</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">sub</span> <span class="z-variable z-parameter z-register z-assembly">eax</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ecx</span>
<span class="z-keyword z-control z-assembly">vzeroupper</span>
<span class="z-keyword z-control z-assembly">ret</span> <span class="z-comment z-assembly">; Return difference of first unequal bytes</span>
</span></code></pre>
<h3 id="handling-big-arrays">Handling big arrays</h3>
<p>Finally we reach the main logic. This logic is called only for arrays bigger than 256 elements.
It is optimized with the idea that there will be many iterations before we find the difference.
In particular, it has a prologue and a main loop.</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"> <span class="z-comment z-assembly">; PROLOGUE</span>
<span class="z-constant z-character z-decimal z-assembly">170</span>: <span class="z-keyword z-control z-assembly">lea</span> <span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">,</span><span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rdx</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">-</span><span class="z-constant z-character z-hexadecimal z-assembly">0x80</span><span class="z-source z-assembly">]</span> <span class="z-comment z-assembly">; Turn count into ptr to end of array</span>
<span class="z-keyword z-control z-assembly">sub</span> <span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rdi</span> <span class="z-comment z-assembly">; Difference of two input ptrs</span>
<span class="z-keyword z-control z-assembly">and</span> <span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0xffffffffffffffe0</span> <span class="z-comment z-assembly">; Align to 32 bytes</span>
<span class="z-keyword z-control z-assembly">sub</span> <span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0xffffffffffffff80</span>
<span class="z-comment z-assembly">; LOOP</span>
<span class="z-constant z-character z-decimal z-assembly">180</span>: <span class="z-keyword z-control z-assembly">vmovdqu</span> <span class="z-variable z-parameter z-register z-assembly">ymm1</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpcmpeqb</span> <span class="z-variable z-parameter z-register z-assembly">ymm1</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm1</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vmovdqu</span> <span class="z-variable z-parameter z-register z-assembly">ymm2</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x20</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpcmpeqb</span> <span class="z-variable z-parameter z-register z-assembly">ymm2</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm2</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x20</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vmovdqu</span> <span class="z-variable z-parameter z-register z-assembly">ymm3</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x40</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpcmpeqb</span> <span class="z-variable z-parameter z-register z-assembly">ymm3</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm3</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x40</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vmovdqu</span> <span class="z-variable z-parameter z-register z-assembly">ymm4</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rsi</span><span class="z-source z-assembly">+</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">*</span><span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x60</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpcmpeqb</span> <span class="z-variable z-parameter z-register z-assembly">ymm4</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm4</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span><span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">+</span><span class="z-constant z-character z-hexadecimal z-assembly">0x60</span><span class="z-source z-assembly">]</span>
<span class="z-keyword z-control z-assembly">vpand</span> <span class="z-variable z-parameter z-register z-assembly">ymm5</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm2</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm1</span>
<span class="z-keyword z-control z-assembly">vpand</span> <span class="z-variable z-parameter z-register z-assembly">ymm6</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm4</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm3</span>
<span class="z-keyword z-control z-assembly">vpand</span> <span class="z-variable z-parameter z-register z-assembly">ymm7</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm6</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm5</span> <span class="z-comment z-assembly">; Tree reduction</span>
<span class="z-keyword z-control z-assembly">vpmovmskb</span> <span class="z-variable z-parameter z-register z-assembly">ecx</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">ymm7</span>
<span class="z-keyword z-control z-assembly">inc</span> <span class="z-variable z-parameter z-register z-assembly">ecx</span>
<span class="z-keyword z-control z-assembly">jne</span> <span class="z-constant z-character z-decimal z-assembly">140</span> <span class="z-comment z-assembly">; Found differece</span>
<span class="z-keyword z-control z-assembly">sub</span> <span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">,</span><span class="z-constant z-character z-hexadecimal z-assembly">0xffffffffffffff80</span> <span class="z-comment z-assembly">; Add 32 * 4 to first array ptr</span>
<span class="z-keyword z-control z-assembly">cmp</span> <span class="z-variable z-parameter z-register z-assembly">rdi</span><span class="z-source z-assembly">,</span><span class="z-variable z-parameter z-register z-assembly">rdx</span>
<span class="z-keyword z-control z-assembly">jb</span> <span class="z-constant z-character z-decimal z-assembly">180</span> <span class="z-comment z-assembly">; Go to LOOP</span>
</span></code></pre>
<!-- TODO Fix word salad -->
<p>The loop optimizations are pretty standard. The loop is unrolled, having four comparisons at a time in order to offset the overhead from pointer manipulation and to better exploit instruction level parallelism.
Also, instead of checking the result of each comparison individually, we bitwise and them together first and check only the aggregated result.
This exploits the fact that <code>vpand</code> can run on three CPU ports, while <code>vpmovmskb</code> can only run on one.</p>
<p>More interesting is the prologue. There's two optimizations of note here:</p>
<ol>
<li>
<p>The pointer to the second array is replaced by the difference of second to first.
Loads from the second array can still be expressed using x86-64 base+offset memory indexing. E.g.</p>
<pre data-lang="asm" class="language-asm z-code"><code class="language-asm" data-lang="asm"><span class="z-source z-assembly"> <span class="z-keyword z-control z-assembly">vmovdqu</span> <span class="z-variable z-parameter z-register z-assembly">ymm1</span><span class="z-source z-assembly">,</span>YMMWORD <span class="z-support z-function z-directive z-assembly">PTR</span> <span class="z-source z-assembly">[</span>p1 <span class="z-source z-assembly">+</span> (p2 <span class="z-source z-assembly">-</span> p1) <span class="z-source z-assembly">*</span> <span class="z-constant z-character z-decimal z-assembly">1</span><span class="z-source z-assembly">]</span>
</span></code></pre>
<p>This saves us from having to increment the second pointer inside the loop.</p>
</li>
<li>
<p>The pointer to the first array is rounded down to 32 bytes.
This ensures that at least half of the memory loads will be aligned, resulting in a higher throughput.</p>
</li>
</ol>
<h3 id="wrapup">Wrapup</h3>
<p>We have investigated an optimized implementation of <code>memcmp</code> which uses SIMD instructions extensively.
We have several techniques to:</p>
<ul>
<li>Quickly find differences in an array using SIMD and BMI2 instructions.</li>
<li>Handle data smaller than the SIMD register size.</li>
<li>Eliminating the need for a separate scalar epilogue to handle arrays of uneven sizes.</li>
<li>Extract instruction level parallelism from loops.</li>
</ul>
<p>For people that want to dig even deeper, I'd recommend looking at the assembly of <a href="https://sourceware.org/git/?p=glibc.git;a=tree;f=sysdeps/x86_64;h=0f31d18abc49cc5cc7a37e5a2f21917e911a276e;hb=HEAD">various <code>glibc</code> routines</a>.
For a reference on SIMD instructions on Intel and their latencies and throughput, I would reccomend the excellent <a href="https://uops.info/">uops.info</a>.</p>
<p><em>Finally, I want to thank A.Y., B.X.V., C.P., N.S. and V.P. for their invaluable feedback in writing this post</em> </p>
<!-- Link https://www.realworldtech.com/forum/?threadid=168200&curpostid=168784 -->
<hr />
<div class="footnote-definition" id="binorineq"><sup class="footnote-definition-label">1</sup>
<p>We used the inequality <script type="math/tex" >\max(a, b) < a \BitOr b</script>
.</p>
</div>
<div class="footnote-definition" id="pow2eq"><sup class="footnote-definition-label">2</sup>
<p>We used the following equality <script type="math/tex" >a \pmod {2^k} = a \BitAnd (2^k - 1)</script>
and the fact that the page size is a power of two.</p>
</div>
<div class="footnote-definition" id="disteq"><sup class="footnote-definition-label">3</sup>
<p>We used the distributivity property <script type="math/tex" >(a \BitAnd c) \BitOr (b \BitAnd c) = (a \BitOr b) \BitAnd c</script>
.</p>
</div>